Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Python For Data Science 1662157639

Sheet Python

Uploaded by

fabrizio lucerof
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Python For Data Science 1662157639

Sheet Python

Uploaded by

fabrizio lucerof
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Python For Data Science Cheat Sheet Lists Also see NumPy Arrays Libraries

>>> a = 'is' Import libraries


Python Basics >>> b = 'nice' >>> import numpy Data analysis Machine learning
Learn More Python for Data Science Interactively at www.datacamp.com >>> my_list = ['my', 'list', a, b] >>> import numpy as np
>>> my_list2 = [[4,5,6,7], [3,4,5,6]] Selective import
>>> from math import pi Scientific computing 2D plotting
Variables and Data Types Selecting List Elements Index starts at 0
Subset Install Python
Variable Assignment
>>> my_list[1] Select item at index 1
>>> x=5
>>> my_list[-3] Select 3rd last item
>>> x
Slice
5 >>> my_list[1:3] Select items at index 1 and 2
Calculations With Variables >>> my_list[1:] Select items after index 0
>>> my_list[:3] Select items before index 3 Leading open data science platform Free IDE that is included Create and share
>>> x+2 Sum of two variables
>>> my_list[:] Copy my_list powered by Python with Anaconda documents with live code,
7 visualizations, text, ...
>>> x-2 Subtraction of two variables
Subset Lists of Lists
>>> my_list2[1][0] my_list[list][itemOfList]
3
>>> my_list2[1][:2] Numpy Arrays Also see Lists
>>> x*2 Multiplication of two variables
>>> my_list = [1, 2, 3, 4]
10 List Operations >>> my_array = np.array(my_list)
>>> x**2 Exponentiation of a variable
25 >>> my_list + my_list >>> my_2darray = np.array([[1,2,3],[4,5,6]])
>>> x%2 Remainder of a variable ['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice']
Selecting Numpy Array Elements Index starts at 0
1 >>> my_list * 2
>>> x/float(2) Division of a variable ['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice'] Subset
2.5 >>> my_list2 > 4 >>> my_array[1] Select item at index 1
True 2
Types and Type Conversion Slice
List Methods >>> my_array[0:2] Select items at index 0 and 1
str() '5', '3.45', 'True' Variables to strings
my_list.index(a) Get the index of an item array([1, 2])
>>>
int() 5, 3, 1 Variables to integers >>> my_list.count(a) Count an item Subset 2D Numpy arrays
>>> my_list.append('!') Append an item at a time >>> my_2darray[:,0] my_2darray[rows, columns]
my_list.remove('!') Remove an item array([1, 4])
float() 5.0, 1.0 Variables to floats >>>
>>> del(my_list[0:1]) Remove an item Numpy Array Operations
bool() True, True, True >>> my_list.reverse() Reverse the list
Variables to booleans >>> my_array > 3
>>> my_list.extend('!') Append an item array([False, False, False, True], dtype=bool)
>>> my_list.pop(-1) Remove an item >>> my_array * 2
Asking For Help >>> my_list.insert(0,'!') Insert an item array([2, 4, 6, 8])
>>> help(str) >>> my_list.sort() Sort the list >>> my_array + np.array([5, 6, 7, 8])
array([6, 8, 10, 12])
Strings
>>> my_string = 'thisStringIsAwesome' Numpy Array Functions
String Operations Index starts at 0
>>> my_string >>> my_array.shape Get the dimensions of the array
'thisStringIsAwesome' >>> my_string[3] >>> np.append(other_array) Append items to an array
>>> my_string[4:9] >>> np.insert(my_array, 1, 5) Insert items in an array
String Operations >>> np.delete(my_array,[1]) Delete items in an array
String Methods >>> np.mean(my_array) Mean of the array
>>> my_string * 2
'thisStringIsAwesomethisStringIsAwesome' >>> my_string.upper() String to uppercase >>> np.median(my_array) Median of the array
>>> my_string + 'Innit' >>> my_string.lower() String to lowercase >>> my_array.corrcoef() Correlation coefficient
'thisStringIsAwesomeInnit' >>> my_string.count('w') Count String elements >>> np.std(my_array) Standard deviation
>>> 'm' in my_string >>> my_string.replace('e', 'i') Replace String elements
True >>> my_string.strip() Strip whitespaces DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Asking For Help Dropping
>>> help(pd.Series.loc)
>>> s.drop(['a', 'c']) Drop values from rows (axis=0)
Pandas Basics Selection Also see NumPy Arrays >>> df.drop('Country', axis=1) Drop values from columns(axis=1)
Learn Python for Data Science Interactively at www.DataCamp.com
Getting
>>> s['b'] Get one element Sort & Rank
-5
Pandas >>> df.sort_index() Sort by labels along an axis
>>> df.sort_values(by='Country') Sort by the values along an axis
>>> df[1:] Get subset of a DataFrame
The Pandas library is built on NumPy and provides easy-to-use Country Capital Population >>> df.rank() Assign ranks to entries
data structures and data analysis tools for the Python 1 India New Delhi 1303171035
2 Brazil Brasília 207847528
programming language. Retrieving Series/DataFrame Information
Selecting, Boolean Indexing & Setting Basic Information
Use the following import convention: By Position >>> df.shape (rows,columns)
>>> import pandas as pd >>> df.iloc[[0],[0]] Select single value by row & >>> df.index Describe index
'Belgium' column >>> df.columns Describe DataFrame columns
Pandas Data Structures >>> df.iat([0],[0])
>>>
>>>
df.info()
df.count()
Info on DataFrame
Number of non-NA values
Series 'Belgium'
Summary
A one-dimensional labeled array a 3 By Label
>>> df.loc[[0], ['Country']] Select single value by row & >>> df.sum() Sum of values
capable of holding any data type b -5
'Belgium' column labels >>> df.cumsum() Cummulative sum of values
>>> df.min()/df.max() Minimum/maximum values
c 7 >>> df.at([0], ['Country']) >>> df.idxmin()/df.idxmax()
Index Minimum/Maximum index value
d 4 'Belgium' >>> df.describe() Summary statistics
>>> df.mean() Mean of values
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
By Label/Position >>> df.median() Median of values
>>> df.ix[2] Select single row of
DataFrame Country
Capital
Brazil
Brasília
subset of rows Applying Functions
Population 207847528 >>> f = lambda x: x*2
Columns
Country Capital Population A two-dimensional labeled >>> df.ix[:,'Capital'] Select a single column of >>> df.apply(f) Apply function
>>> df.applymap(f) Apply function element-wise
data structure with columns 0 Brussels subset of columns
0 Belgium Brussels 11190846 1 New Delhi
of potentially different types 2 Brasília Data Alignment
1 India New Delhi 1303171035
Index >>> df.ix[1,'Capital'] Select rows and columns
2 Brazil Brasília 207847528 Internal Data Alignment
'New Delhi'
NA values are introduced in the indices that don’t overlap:
Boolean Indexing
>>> data = {'Country': ['Belgium', 'India', 'Brazil'], >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
>>> s[~(s > 1)] Series s where value is not >1
'Capital': ['Brussels', 'New Delhi', 'Brasília'], >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> s + s3
'Population': [11190846, 1303171035, 207847528]} >>> df[df['Population']>1200000000] Use filter to adjust DataFrame a 10.0
b NaN
>>> df = pd.DataFrame(data, Setting
c 5.0
columns=['Country', 'Capital', 'Population']) >>> s['a'] = 6 Set index a of Series s to 6
d 7.0

I/O Arithmetic Operations with Fill Methods


You can also do the internal data alignment yourself with
Read and Write to CSV Read and Write to SQL Query or Database Table
the help of the fill methods:
>>> pd.read_csv('file.csv', header=None, nrows=5) >>> from sqlalchemy import create_engine >>> s.add(s3, fill_value=0)
>>> df.to_csv('myDataFrame.csv') >>> engine = create_engine('sqlite:///:memory:') a 10.0
>>> pd.read_sql("SELECT * FROM my_table;", engine) b -5.0
Read and Write to Excel c 5.0
>>> pd.read_sql_table('my_table', engine) d 7.0
>>> pd.read_excel('file.xlsx') >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> s.sub(s3, fill_value=2)
>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') >>> s.div(s3, fill_value=4)
read_sql()is a convenience wrapper around read_sql_table() and
Read multiple sheets from the same file >>> s.mul(s3, fill_value=3)
read_sql_query()
>>> xlsx = pd.ExcelFile('file.xls')
>>> df = pd.read_excel(xlsx, 'Sheet1') >>> pd.to_sql('myDf', engine) DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Inspecting Your Array Subsetting, Slicing, Indexing Also see Lists
>>> a.shape Array dimensions Subsetting
NumPy Basics >>>
>>>
len(a)
b.ndim
Length of array
Number of array dimensions
>>> a[2]
3
1 2 3 Select the element at the 2nd index
Learn Python for Data Science Interactively at www.DataCamp.com >>> e.size Number of array elements >>> b[1,2] 1.5 2 3 Select the element at row 1 column 2
>>> b.dtype Data type of array elements 6.0 4 5 6 (equivalent to b[1][2])
>>> b.dtype.name Name of data type
>>> b.astype(int) Convert an array to a different type Slicing
NumPy >>> a[0:2]
array([1, 2])
1 2 3 Select items at index 0 and 1
2
The NumPy library is the core library for scientific computing in Asking For Help >>> b[0:2,1] 1.5 2 3 Select items at rows 0 and 1 in column 1
>>> np.info(np.ndarray.dtype) array([ 2., 5.]) 4 5 6
Python. It provides a high-performance multidimensional array
Array Mathematics
1.5 2 3
>>> b[:1] Select all items at row 0
object, and tools for working with these arrays. array([[1.5, 2., 3.]]) 4 5 6 (equivalent to b[0:1, :])
Arithmetic Operations >>> c[1,...] Same as [1,:,:]
Use the following import convention: array([[[ 3., 2., 1.],
>>> import numpy as np [ 4., 5., 6.]]])
>>> g = a - b Subtraction
array([[-0.5, 0. , 0. ], >>> a[ : :-1] Reversed array a
NumPy Arrays [-3. , -3. , -3. ]])
array([3, 2, 1])

>>> np.subtract(a,b) Boolean Indexing


1D array 2D array 3D array Subtraction
>>> a[a<2] Select elements from a less than 2
>>> b + a Addition 1 2 3
array([[ 2.5, 4. , 6. ], array([1])
axis 1 axis 2
1 2 3 axis 1 [ 5. , 7. , 9. ]]) Fancy Indexing
1.5 2 3 >>> np.add(b,a) Addition >>> b[[1, 0, 1, 0],[0, 1, 2, 0]] Select elements (1,0),(0,1),(1,2) and (0,0)
axis 0 axis 0 array([ 4. , 2. , 6. , 1.5])
4 5 6 >>> a / b Division
array([[ 0.66666667, 1. , 1. ], >>> b[[1, 0, 1, 0]][:,[0,1,2,0]] Select a subset of the matrix’s rows
[ 0.25 , 0.4 , 0.5 ]]) array([[ 4. ,5. , 6. , 4. ], and columns
>>> np.divide(a,b) Division [ 1.5, 2. , 3. , 1.5],
Creating Arrays >>> a * b
array([[ 1.5, 4. , 9. ],
Multiplication
[ 4. , 5.
[ 1.5, 2.
,
,
6.
3.
,
,
4. ],
1.5]])

>>> a = np.array([1,2,3]) [ 4. , 10. , 18. ]])


>>> b = np.array([(1.5,2,3), (4,5,6)], dtype = float) >>> np.multiply(a,b) Multiplication Array Manipulation
>>> c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], >>> np.exp(b) Exponentiation
dtype = float) >>> np.sqrt(b) Square root Transposing Array
>>> np.sin(a) Print sines of an array >>> i = np.transpose(b) Permute array dimensions
Initial Placeholders >>> np.cos(b) Element-wise cosine >>> i.T Permute array dimensions
>>> np.log(a) Element-wise natural logarithm
>>> np.zeros((3,4)) Create an array of zeros >>> e.dot(f) Dot product
Changing Array Shape
>>> np.ones((2,3,4),dtype=np.int16) Create an array of ones array([[ 7., 7.], >>> b.ravel() Flatten the array
>>> d = np.arange(10,25,5) Create an array of evenly [ 7., 7.]]) >>> g.reshape(3,-2) Reshape, but don’t change data
spaced values (step value)
>>> np.linspace(0,2,9) Create an array of evenly Comparison Adding/Removing Elements
spaced values (number of samples) >>> h.resize((2,6)) Return a new array with shape (2,6)
>>> e = np.full((2,2),7) Create a constant array >>> a == b Element-wise comparison >>> np.append(h,g) Append items to an array
>>> f = np.eye(2) Create a 2X2 identity matrix array([[False, True, True], >>> np.insert(a, 1, 5) Insert items in an array
>>> np.random.random((2,2)) Create an array with random values [False, False, False]], dtype=bool) >>> np.delete(a,[1]) Delete items from an array
>>> np.empty((3,2)) Create an empty array >>> a < 2 Element-wise comparison
array([True, False, False], dtype=bool) Combining Arrays
>>> np.array_equal(a, b) Array-wise comparison >>> np.concatenate((a,d),axis=0) Concatenate arrays
I/O array([ 1, 2,
>>> np.vstack((a,b))
3, 10, 15, 20])
Stack arrays vertically (row-wise)
Aggregate Functions array([[ 1. , 2. , 3. ],
Saving & Loading On Disk [ 1.5, 2. , 3. ],
>>> a.sum() Array-wise sum [ 4. , 5. , 6. ]])
>>> np.save('my_array', a) >>> a.min() Array-wise minimum value >>> np.r_[e,f] Stack arrays vertically (row-wise)
>>> np.savez('array.npz', a, b) >>> b.max(axis=0) Maximum value of an array row >>> np.hstack((e,f)) Stack arrays horizontally (column-wise)
>>> np.load('my_array.npy') >>> b.cumsum(axis=1) Cumulative sum of the elements array([[ 7., 7., 1., 0.],
>>> a.mean() Mean [ 7., 7., 0., 1.]])
Saving & Loading Text Files >>> b.median() Median >>> np.column_stack((a,d)) Create stacked column-wise arrays
>>> np.loadtxt("myfile.txt") >>> a.corrcoef() Correlation coefficient array([[ 1, 10],
>>> np.std(b) Standard deviation [ 2, 15],
>>> np.genfromtxt("my_file.csv", delimiter=',') [ 3, 20]])
>>> np.savetxt("myarray.txt", a, delimiter=" ") >>> np.c_[a,d] Create stacked column-wise arrays
Copying Arrays Splitting Arrays
Data Types >>> h = a.view() Create a view of the array with the same data >>> np.hsplit(a,3) Split the array horizontally at the 3rd
>>> np.copy(a) Create a copy of the array [array([1]),array([2]),array([3])] index
>>> np.int64 Signed 64-bit integer types >>> np.vsplit(c,2) Split the array vertically at the 2nd index
>>> np.float32 Standard double-precision floating point >>> h = a.copy() Create a deep copy of the array [array([[[ 1.5, 2. , 1. ],
>>> np.complex Complex numbers represented by 128 floats [ 4. , 5. , 6. ]]]),
array([[[ 3., 2., 3.],
>>>
>>>
np.bool
np.object
Boolean type storing TRUE and FALSE values
Python object type Sorting Arrays [ 4., 5., 6.]]])]

>>> np.string_ Fixed-length string type >>> a.sort() Sort an array


>>> np.unicode_ Fixed-length unicode type >>> c.sort(axis=0) Sort the elements of an array's axis DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Advanced Indexing Also see NumPy Arrays Combining Data
Selecting data1 data2
Pandas >>> df3.loc[:,(df3>1).any()] Select cols with any vals >1 X1 X2 X1 X3
Learn Python for Data Science Interactively at www.DataCamp.com >>> df3.loc[:,(df3>1).all()] Select cols with vals > 1
>>> df3.loc[:,df3.isnull().any()] Select cols with NaN a 11.432 a 20.784
>>> df3.loc[:,df3.notnull().all()] Select cols without NaN b 1.303 b NaN
Indexing With isin c 99.906 d 20.784
>>> df[(df.Country.isin(df2.Type))] Find same elements
Reshaping Data >>> df3.filter(items=”a”,”b”]) Filter on values
Merge
>>> df.select(lambda x: not x%5) Select specific elements
Pivot Where X1 X2 X3
>>> pd.merge(data1,
>>> df3= df2.pivot(index='Date', Spread rows into columns >>> s.where(s > 0) Subset the data data2, a 11.432 20.784
columns='Type', Query how='left',
values='Value') b 1.303 NaN
>>> df6.query('second > first') Query DataFrame on='X1')
c 99.906 NaN
Date Type Value

0 2016-03-01 a 11.432 Type a b c Setting/Resetting Index >>> pd.merge(data1, X1 X2 X3


1 2016-03-02 b 13.031 Date data2, a 11.432 20.784
>>> df.set_index('Country') Set the index
how='right',
2 2016-03-01 c 20.784 2016-03-01 11.432 NaN 20.784 >>> df4 = df.reset_index() Reset the index b 1.303 NaN
on='X1')
3 2016-03-03 a 99.906 >>> df = df.rename(index=str, Rename DataFrame d NaN 20.784
2016-03-02 1.303 13.031 NaN columns={"Country":"cntry",
4 2016-03-02 a 1.303 "Capital":"cptl", >>> pd.merge(data1,
2016-03-03 99.906 NaN 20.784 "Population":"ppltn"}) X1 X2 X3
5 2016-03-03 c 20.784 data2,
how='inner', a 11.432 20.784
Pivot Table Reindexing on='X1') b 1.303 NaN
>>> s2 = s.reindex(['a','c','d','e','b'])
>>> df4 = pd.pivot_table(df2, Spread rows into columns X1 X2 X3
values='Value', Forward Filling Backward Filling >>> pd.merge(data1,
index='Date', data2, a 11.432 20.784
columns='Type']) >>> df.reindex(range(4), >>> s3 = s.reindex(range(5), how='outer', b 1.303 NaN
method='ffill') method='bfill') on='X1') c 99.906 NaN
Stack / Unstack Country Capital Population 0 3
0 Belgium Brussels 11190846 1 3 d NaN 20.784
>>> stacked = df5.stack() Pivot a level of column labels 1 India New Delhi 1303171035 2 3
>>> stacked.unstack() Pivot a level of index labels 2 Brazil Brasília 207847528 3 3 Join
3 Brazil Brasília 207847528 4 3
0 1 1 5 0 0.233482 >>> data1.join(data2, how='right')
1 5 0.233482 0.390959 1 0.390959 MultiIndexing Concatenate
2 4 0.184713 0.237102 2 4 0 0.184713
>>> arrays = [np.array([1,2,3]),
3 3 0.433522 0.429401 1 0.237102 np.array([5,4,3])] Vertical
>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays) >>> s.append(s2)
Unstacked 3 3 0 0.433522
>>> tuples = list(zip(*arrays)) Horizontal/Vertical
1 0.429401 >>> index = pd.MultiIndex.from_tuples(tuples, >>> pd.concat([s,s2],axis=1, keys=['One','Two'])
Stacked names=['first', 'second']) >>> pd.concat([data1, data2], axis=1, join='inner')
>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)
Melt >>> df2.set_index(["Date", "Type"])
>>> pd.melt(df2, Gather columns into rows
Dates
id_vars=["Date"],
value_vars=["Type", "Value"],
Duplicate Data >>> df2['Date']= pd.to_datetime(df2['Date'])
>>> df2['Date']= pd.date_range('2000-1-1',
value_name="Observations") >>> s3.unique() Return unique values periods=6,
>>> df2.duplicated('Type') Check duplicates freq='M')
Date Type Value
Date Variable Observations >>> dates = [datetime(2012,5,1), datetime(2012,5,2)]
0 2016-03-01 Type a >>> df2.drop_duplicates('Type', keep='last') Drop duplicates >>> index = pd.DatetimeIndex(dates)
0 2016-03-01 a 11.432 1 2016-03-02 Type b
>>> df.index.duplicated() Check index duplicates >>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')
1 2016-03-02 b 13.031 2 2016-03-01 Type c
2 2016-03-01 c 20.784 3 2016-03-03 Type a Grouping Data Visualization Also see Matplotlib
4 2016-03-02 Type a
3 2016-03-03 a 99.906
5 2016-03-03 Type c Aggregation >>> import matplotlib.pyplot as plt
4 2016-03-02 a 1.303 >>> df2.groupby(by=['Date','Type']).mean()
6 2016-03-01 Value 11.432 >>> s.plot() >>> df2.plot()
>>> df4.groupby(level=0).sum()
5 2016-03-03 c 20.784 7 2016-03-02 Value 13.031 >>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), >>> plt.show() >>> plt.show()
8 2016-03-01 Value 20.784 'b': np.sum})
9 2016-03-03 Value 99.906 Transformation
>>> customSum = lambda x: (x+x%2)
10 2016-03-02 Value 1.303
>>> df4.groupby(level=0).transform(customSum)
11 2016-03-03 Value 20.784

Iteration Missing Data


>>> df.dropna() Drop NaN values
>>> df.iteritems() (Column-index, Series) pairs >>> df3.fillna(df3.mean()) Fill NaN values with a predetermined value
>>> df.iterrows() (Row-index, Series) pairs >>> df2.replace("a", "f") Replace values with others
DataCamp
Learn Python for Data Science Interactively
Python For Data Science Cheat Sheet Excel Spreadsheets Pickled Files
>>> file = 'urbanpop.xlsx' >>> import pickle
Importing Data >>> data = pd.ExcelFile(file) >>> with open('pickled_fruit.pkl', 'rb') as file:
pickled_data = pickle.load(file)
>>> df_sheet2 = data.parse('1960-1966',
Learn Python for data science Interactively at www.DataCamp.com skiprows=[0],
names=['Country',
'AAM: War(2002)'])
>>> df_sheet1 = data.parse(0, HDF5 Files
parse_cols=[0],
Importing Data in Python skiprows=[0], >>> import h5py
>>> filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5'
names=['Country'])
Most of the time, you’ll use either NumPy or pandas to import >>> data = h5py.File(filename, 'r')
your data: To access the sheet names, use the sheet_names attribute:
>>> import numpy as np >>> data.sheet_names
>>> import pandas as pd Matlab Files
Help SAS Files >>> import scipy.io
>>> filename = 'workspace.mat'
>>> from sas7bdat import SAS7BDAT >>> mat = scipy.io.loadmat(filename)
>>> np.info(np.ndarray.dtype)
>>> help(pd.read_csv) >>> with SAS7BDAT('urbanpop.sas7bdat') as file:
df_sas = file.to_data_frame()

Text Files Exploring Dictionaries


Stata Files Accessing Elements with Functions
Plain Text Files >>> data = pd.read_stata('urbanpop.dta') >>> print(mat.keys()) Print dictionary keys
>>> filename = 'huck_finn.txt' >>> for key in data.keys(): Print dictionary keys
>>> file = open(filename, mode='r') Open the file for reading print(key)
>>> text = file.read() Read a file’s contents Relational Databases meta
quality
>>> print(file.closed) Check whether file is closed
>>> from sqlalchemy import create_engine strain
>>> file.close() Close file
>>> print(text) >>> engine = create_engine('sqlite://Northwind.sqlite') >>> pickled_data.values() Return dictionary values
>>> print(mat.items()) Returns items in list format of (key, value)
Use the table_names() method to fetch a list of table names: tuple pairs
Using the context manager with
>>> with open('huck_finn.txt', 'r') as file:
>>> table_names = engine.table_names() Accessing Data Items with Keys
print(file.readline()) Read a single line
print(file.readline()) Querying Relational Databases >>> for key in data ['meta'].keys() Explore the HDF5 structure
print(file.readline()) print(key)
>>> con = engine.connect() Description
>>> rs = con.execute("SELECT * FROM Orders") DescriptionURL
Table Data: Flat Files >>> df = pd.DataFrame(rs.fetchall()) Detector
>>> df.columns = rs.keys() Duration
GPSstart
Importing Flat Files with numpy >>> con.close()
Observatory
Files with one data type Using the context manager with Type
UTCstart
>>> filename = ‘mnist.txt’ >>> with engine.connect() as con:
>>> print(data['meta']['Description'].value) Retrieve the value for a key
>>> data = np.loadtxt(filename, rs = con.execute("SELECT OrderID FROM Orders")
delimiter=',', String used to separate values df = pd.DataFrame(rs.fetchmany(size=5))
df.columns = rs.keys()
skiprows=2,
usecols=[0,2],
Skip the first 2 lines
Read the 1st and 3rd column
Navigating Your FileSystem
dtype=str) The type of the resulting array Querying relational databases with pandas
Magic Commands
Files with mixed data types >>> df = pd.read_sql_query("SELECT * FROM Orders", engine)
>>> filename = 'titanic.csv' !ls List directory contents of files and directories
>>> data = np.genfromtxt(filename, %cd .. Change current working directory
%pwd Return the current working directory path
delimiter=',',
names=True, Look for column header
Exploring Your Data
dtype=None)
NumPy Arrays os Library
>>> data_array = np.recfromcsv(filename) >>> data_array.dtype Data type of array elements >>> import os
>>> data_array.shape Array dimensions >>> path = "/usr/tmp"
The default dtype of the np.recfromcsv() function is None. >>> wd = os.getcwd() Store the name of current directory in a string
>>> len(data_array) Length of array
>>> os.listdir(wd) Output contents of the directory in a list
Importing Flat Files with pandas >>> os.chdir(path) Change current working directory
pandas DataFrames >>> os.rename("test1.txt", Rename a file
>>> filename = 'winequality-red.csv' "test2.txt")
>>> data = pd.read_csv(filename, >>> df.head() Return first DataFrame rows
nrows=5, >>> os.remove("test1.txt") Delete an existing file
Number of rows of file to read >>> df.tail() Return last DataFrame rows >>> os.mkdir("newdir") Create a new directory
header=None, Row number to use as col names >>> df.index Describe index
sep='\t', Delimiter to use >>> df.columns Describe DataFrame columns
comment='#', Character to split comments >>> df.info() Info on DataFrame
na_values=[""]) String to recognize as NA/NaN >>> data_array = data.values Convert a DataFrame to an a NumPy array DataCamp
Learn R for Data Science Interactively
Working with Different Programming Languages Widgets
Python For Data Science Cheat Sheet Kernels provide computation and communication with front-end interfaces Notebook widgets provide the ability to visualize and control changes
Jupyter Notebook like the notebooks. There are three main kernels: in your data, often as a control like a slider, textbox, etc.
Learn More Python for Data Science Interactively at www.DataCamp.com
You can use them to build interactive GUIs for your notebooks or to
IRkernel IJulia
synchronize stateful and stateless information between Python and
Installing Jupyter Notebook will automatically install the IPython kernel. JavaScript.
Saving/Loading Notebooks Restart kernel Interrupt kernel
Create new notebook Restart kernel & run Interrupt kernel & Download serialized Save notebook
all cells clear all output state of all widget with interactive
Open an existing
Connect back to a models in use widgets
Make a copy of the notebook Restart kernel & run remote notebook
current notebook all cells Embed current
Rename notebook Run other installed
widgets
kernels
Revert notebook to a
Save current notebook
previous checkpoint Command Mode:
and record checkpoint
Download notebook as
Preview of the printed - IPython notebook 15
notebook - Python
- HTML
Close notebook & stop - Markdown 13 14
- reST
running any scripts - LaTeX 1 2 3 4 5 6 7 8 9 10 11 12
- PDF

Writing Code And Text


Code and text are encapsulated by 3 basic cell types: markdown cells, code
cells, and raw NBConvert cells.
Edit Cells Edit Mode: 1. Save and checkpoint 9. Interrupt kernel
2. Insert cell below 10. Restart kernel
3. Cut cell 11. Display characteristics
Cut currently selected cells Copy cells from 4. Copy cell(s) 12. Open command palette
to clipboard clipboard to current 5. Paste cell(s) below 13. Current kernel
cursor position 6. Move cell up 14. Kernel status
Paste cells from Executing Cells 7. Move cell down 15. Log out from notebook server
clipboard above Paste cells from 8. Run current cell
current cell Run selected cell(s) Run current cells down
clipboard below
and create a new one
Paste cells from current cell
below Asking For Help
clipboard on top Run current cells down
Delete current cells
of current cel and create a new one Walk through a UI tour
Split up a cell from above Run all cells
Revert “Delete Cells” List of built-in keyboard
current cursor Run all cells above the Run all cells below
invocation shortcuts
position current cell the current cell Edit the built-in
Merge current cell Merge current cell keyboard shortcuts
Change the cell type of toggle, toggle Notebook help topics
with the one above with the one below current cell scrolling and clear Description of
Move current cell up Move current cell toggle, toggle current outputs markdown available Information on
down scrolling and clear in notebook unofficial Jupyter
Adjust metadata
underlying the Find and replace all output Notebook extensions
Python help topics
current notebook in selected cells IPython help topics
View Cells
Remove cell Copy attachments of NumPy help topics
attachments current cell Toggle display of Jupyter SciPy help topics
Toggle display of toolbar Matplotlib help topics
Paste attachments of Insert image in logo and filename
SymPy help topics
current cell selected cells Toggle display of cell Pandas help topics
action icons:
Insert Cells - None About Jupyter Notebook
- Edit metadata
Toggle line numbers - Raw cell format
Add new cell above the Add new cell below the - Slideshow
current one in cells - Attachments
current one DataCamp
- Tags
Learn Python for Data Science Interactively

You might also like