Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
67% found this document useful (3 votes)
518 views

Data Science Basics Cheatsheet

This document provides a summary of common functionality from Pandas, NumPy, and Scikit-Learn for data science basics. It covers topics such as importing and exploring data, cleaning data, filtering and grouping data, joining data, and writing data out. The full cheatsheet can be found online at elitedatascience.com.

Uploaded by

acutotu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
518 views

Data Science Basics Cheatsheet

This document provides a summary of common functionality from Pandas, NumPy, and Scikit-Learn for data science basics. It covers topics such as importing and exploring data, cleaning data, filtering and grouping data, joining data, and writing data out. The full cheatsheet can be found online at elitedatascience.com.

Uploaded by

acutotu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Python Cheatsheet:

Data Science Basics

In this cheat sheet, we summarize common and useful functionality from Pandas, NumPy, and Scikit-Learn. To see
the most up-to-date full version, visit the online cheatsheet at elitedatascience.com.

SETUP Data Cleaning


First, make sure you have the following installed on your computer: df.columns = ['a','b','c']
pd.isnull()
• Python 2.7+ or Python 3
• Pandas pd.notnull()
• Jupyter Notebook (optional, but recommended)
df.dropna()
*note: We strongly recommend installing the Anaconda Distribution, which
df.dropna(axis=1)
comes with all of those packages.
df.dropna(axis=1,thresh=n)
df.fillna(x)
Importing Data
s.fillna(s.mean())
pd.read_csv(filename)
s.astype(float)
pd.read_table(filename)
s.replace(1,'one')
pd.read_excel(filename)
s.replace([1,3],['one','three'])
pd.read_sql(query, connection_object)
df.rename(columns=lambda x: x + 1)
pd.read_json(json_string)
df.rename(columns={'old_name': 'new_ name'})
pd.read_html(url)
df.set_index('column_one')
pd.read_clipboard()
df.rename(index=lambda x: x + 1)
pd.DataFrame(dict)

Exploring Data Filter, Sort and Group By


df[df[col] > 0.5]
df.shape()
df[(df[col] > 0.5) & (df[col] < 0.7)]
df.head(n)
df.sort_values(col1)
df.tail(n)
df.sort_values(col2,ascending=False)
df.info()
df.sort_values([col1,col2], ascending=[True,False])
df.describe()
df.groupby(col)
s.value_counts(dropna=False)
df.groupby([col1,col2])
df.apply(pd.Series.value_counts)
df.groupby(col1)[col2].mean()
df.describe()
df.pivot_table(index=col1, values= col2,col3], aggfunc=mean)
df.mean()
df.groupby(col1).agg(np.mean)
df.corr()
df.apply(np.mean)
df.count()
df.apply(np.max, axis=1)
df.max()
df.min()
df.median()
Joining and Combining
df1.append(df2)
df.std()
pd.concat([df1, df2],axis=1)
df1.join(df2,on=col1,how='inner')
Selecting
df[col]
df[[col1, col2]]
Writing Data
df.to_csv(filename)
s.iloc[0]
df.to_excel(filename)
s.loc[0]
df.to_sql(table_name, connection_object)
df.iloc[0,:]
df.to_json(filename)
df.iloc[0,0]
df.to_html(filename)
df.to_clipboard()

ELITEDATASCIENCE.COM

You might also like