4 Tools to Speed Up Exploratory Data Analysis (EDA) in Python
4 Tools to Speed Up Exploratory Data Analysis (EDA) in Python
An easy to use one-line of code to automate EDA
This article will show the best four packages in Python that can automate your data exploration and analysis. I will go through each one of them, what it does and how you can use it.
- DataPrep
- Pandas Profiling
- SweetViz
- AutoViz
DataPrep
DataPrep lets you prepare your data using a single library with a few lines of code.
The DataPrep ecosystem currently consists of three components: connector, EDA, and Clean API.
The Connector enables a simple data collection from Web APIs by providing a standard set of operations. The EDA component handles the exploration data analysis part, and finally, Clean API provides functions for quickly and efficiently cleaning and validating data.
Forexample, using the Philly Parking Violations dataset, we can call plot() to get an overview of EDA on the data frame or plot correlations with a single line of code, using plot_correlations().
It is also possible to generate a detailed report with one line of code using DataPrep. Here is a create_report() method called on a DataFrame.
import pandas as pd
from dataprep.eda import create_reportdf = pd.read_csv("parking_violations.csv")create_report(df)
And you get back an extensive and interactive report for not only variables but also correlations as well as interactions and missing value.
![](https://miro.medium.com/max/700/1*FZXR_hivWHa0NrfU0w2mQQ.gif)
DataPrep eases the amount and effort you need as a data scientist to explore the dataset. With Just one line of code, you can get an overview of your dataset, missing values, correlations, and statistical description of the dataset, as we have seen above.
To install DataPrep, run:
pip install dataprep
Visit also the DataPrep Documentation here for more information:
Pandas Profiling
Generates profile reports from a pandas
DataFrame
.
Pandas profiling also enables you to perform similar EDA as all packages in this article. It has an extensive use case and more tutorials than all of the packages.
With just one line of code, you can generate an EDA report using Pandas Profiling with descriptive statistics, Correlations, Missing value, text analysis, and more.
Let us call ProfileReport() on the Philly DataFrame to generate an EDA report.
from pandas_profiling import ProfileReport
profile = ProfileReport(df, title=”Report”)
profile
Pandas Profiling generates a similar report with a sleek User Interface (UI).
![](https://miro.medium.com/max/700/1*B17dmbomjTPXkT2fhmas8w.gif)
You can install using the pip package manager by running
pip install pandas-profiling[notebook]
Visit the Github repository for more tutorials and documentation:
SweetViz
In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!
SweetViz also provides an interactive EDA with just two lines of code. In addition, you can compare two datasets easily, like training and test dataset for your machine learning projects.
To get a report from SweetViz, you can run the following command on any data frame, and it will generate an HTML report.
![](https://miro.medium.com/max/700/1*VYxqkDOjF6lU9uVurb-OYg.gif)
AutoViz
Automatically Visualize any dataset, any size with a single line of code
AutoViz provides similar functionality. You can generate much more detailed plots for your dataset with AutoViz using only one line of code. Here is a report generated with AutoViz using the Philly Parking Dataset.
from autoviz.AutoViz_Class import AutoViz_ClassAV = AutoViz_Class()df_av = AV.AutoViz(‘parking.csv’)
Note that you do not even need Pandas to read the data. AutoViz will load it when you provide the path to the dataset. Here is the report we generate with Autoviz.
![](https://miro.medium.com/max/700/1*Zl9fIOeDwLKu-QbzCqxqlQ.png)
In AutoViz, you have much more plots (i.e., violin, boxplots, and more) as well as statistical and probability values. However, the UI is not neat as others Report, and you do not have interactive plots.
To install AutoViz, run the following command:
pip install autoviz
Final Thoughts
The four packages offer almost a similar functionality. You can automate your EDA with simple, intuitive, and one line of code.
Of all the four packages in this article, DataPrep provides much more functionality than simple EDA. It can help you ingest more data sources and offer a speedup for large datasets.
Comments
Post a Comment