Exploratory Analysis and Preprocessing
Exploratory Analysis and Preprocessing (TEMPLATE)
- Pandas documentation
- Python for Data Science (Jake VanderPlas)
- SciKit Learn http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
Post outline
- Merging datasets
- Filtering data
- By single criteria
- By multiple criteria
- Dealing with null values and outliers
- Dealing with categorical data
- Brief intro to regular expressions
- The big three (scaling, transforming and normalizing)
Post Content
Having worked as a scientist for nearly half a decade, I can tell you that the collection of data is never without its challenges. Sensors often miss cycles and/or return erroneous values. It’s important to have a process for dealing with these issues prior to doing a final or creating plots for visualizing the data.
Additionally, data often needs to be restructured before being passed into a model. I’ve written a process for performing preprocessing of data and included methods that I’ve found to be useful.