Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
62 views

Python Lib

The document lists 30 Python libraries that can boost productivity for data scientists. It discusses libraries for tasks like visualization, automated machine learning, handling class imbalance, explaining ML models, time series forecasting, feature engineering, testing code, and more. The libraries include YellowBrick, PyCaret, imbalanced-learn, SHAP, Prophet, Featuretools, Streamlit, PandasProfiling, and Icecream. The document encourages connecting on LinkedIn or subscribing to the author's newsletter to learn about Python and data science daily.

Uploaded by

ARCHANA R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Python Lib

The document lists 30 Python libraries that can boost productivity for data scientists. It discusses libraries for tasks like visualization, automated machine learning, handling class imbalance, explaining ML models, time series forecasting, feature engineering, testing code, and more. The libraries include YellowBrick, PyCaret, imbalanced-learn, SHAP, Prophet, Featuretools, Streamlit, PandasProfiling, and Icecream. The document encourages connecting on LinkedIn or subscribing to the author's newsletter to learn about Python and data science daily.

Uploaded by

ARCHANA R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

30 Python Libraries

to (Hugely) Boost
Your Data Science
Productivity

Avi Chawla

avichawla.substack.com
avichawla.substack.com

Data Science is much


more than Pandas,
NumPy and
Sklearn.

Here are 30 open-source


libraries to upgrade
your data game.
avichawla.substack.com

1. YellowBrick

A suite of visualization
and diagnostic tools Matplotlib Sklearn

for faster model


selection.
avichawla.substack.com

2. PyCaret

Automate ML workflows
with this low-code
library.
avichawla.substack.com

3. imbalanced-learn

A variety of methods
to handle class
imbalance.
avichawla.substack.com

4. Modin

Boost Pandas' performance


up to 70x by modifying
the import.
avichawla.substack.com

5. SHAP

Explain the output of


any ML model in few
lines of code.
avichawla.substack.com

6. Missingno

Visualize missing values


in your dataset
with ease.
avichawla.substack.com

7. Prophet

Produce high-quality
forecasts on
time-series
data.
avichawla.substack.com

8. Parallel-Pandas

Parallelize Pandas across all


CPU cores for faster
computation.
avichawla.substack.com

9. Featuretools

Automated feature
engineering for
ML models.
avichawla.substack.com

10. Lazy Predict

Train 30 machine learning


models in one line
of code.
avichawla.substack.com

11. mlxtend

A collection of utility functions


for processing, evaluating,
visualizing models.
avichawla.substack.com

12. Vaex

High performance package


for lazy Out-of-Core
DataFrames.
avichawla.substack.com

13. SweetViz

In-depth EDA report


in two lines
of code.
avichawla.substack.com

14. Skorch

PyTorch Sklearn
Leverage the power of
PyTorch with the
elegance of sklearn.
avichawla.substack.com

15. Faiss

Efficient algorithms for


similarity search and
clustering dense
vectors.
avichawla.substack.com

16. statsmodel

Statistical testing and


data exploration
at fingertips.
avichawla.substack.com

17. Pandas-Profiling

Generate a high-level
EDA report of your
data in no time.
avichawla.substack.com

18. Streamlit

Create and host data-based


Python web apps
in few lines
of code.
avichawla.substack.com

19. Category-encoders

Over 15 categorical
data encoders.
avichawla.substack.com

20. DuckDB

Run SQL queries


on DataFrame.
avichawla.substack.com

21. PandasML

Pandas data wrangling +


Sklearn algorithms +
Matplotlib visualization.
Sklearn
Matplotlib Pandas
avichawla.substack.com

22. Pytest

An elegant testing
framework to test
your code.
avichawla.substack.com

23. Numexpr

Parallelize NumPy to
all CPU cores for
20x speedup.
avichawla.substack.com

24. CSV-Kit

Explore, query and


describe CSV files
from terminal.
avichawla.substack.com

25. PivotTableJS

Drap-n-drop tools to
group, pivot, plot
dataframe.
avichawla.substack.com

26. Faker

Generate fake yet


meaningful data
in seconds.
avichawla.substack.com

27. Icecream

Don't debug with


print(). Use
icecream.
avichawla.substack.com

28. Pyforest

No need to write imports.


Automatic package
import.
avichawla.substack.com

29. PySnooper

Profile your code. Track


new variables, and
their updates.
avichawla.substack.com

30. Sidetable

Supercharge Pandas'
value_counts()
method.
Hope that
helped.

Checkout my daily newsletter to learn


something new about Python and Data
Science everyday.

avichawla.substack.com

Connect with me on LinkedIn.

https://www.linkedin.com/in/avi-chawla

You might also like