Python For Data Science Cheat Sheet Lists Also see NumPy Arrays Libraries
>>> a = 'is' Import libraries
Python Basics >>> b = 'nice' >>> import numpy Data analysis Machine learning Learn More Python for Data Science Interactively at www.datacamp.com >>> my_list = ['my', 'list', a, b] >>> import numpy as np >>> my_list2 = [[4,5,6,7], [3,4,5,6]] Selective import >>> from math import pi Scientific computing 2D plotting Variables and Data Types Selecting List Elements Index starts at 0 Subset Install Python Variable Assignment >>> my_list[1] Select item at index 1 >>> x=5 >>> my_list[-3] Select 3rd last item >>> x Slice 5 >>> my_list[1:3] Select items at index 1 and 2 Calculations With Variables >>> my_list[1:] Select items after index 0 >>> my_list[:3] Select items before index 3 Leading open data science platform Free IDE that is included Create and share >>> x+2 Sum of two variables >>> my_list[:] Copy my_list powered by Python with Anaconda documents with live code, 7 visualizations, text, ... >>> x-2 Subtraction of two variables Subset Lists of Lists >>> my_list2[1][0] my_list[list][itemOfList] 3 >>> my_list2[1][:2] Numpy Arrays Also see Lists >>> x*2 Multiplication of two variables >>> my_list = [1, 2, 3, 4] 10 List Operations >>> my_array = np.array(my_list) >>> x**2 Exponentiation of a variable 25 >>> my_list + my_list >>> my_2darray = np.array([[1,2,3],[4,5,6]]) >>> x%2 Remainder of a variable ['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice'] Selecting Numpy Array Elements Index starts at 0 1 >>> my_list * 2 >>> x/float(2) Division of a variable ['my', 'list', 'is', 'nice', 'my', 'list', 'is', 'nice'] Subset 2.5 >>> my_list2 > 4 >>> my_array[1] Select item at index 1 True 2 Types and Type Conversion Slice List Methods >>> my_array[0:2] Select items at index 0 and 1 str() '5', '3.45', 'True' Variables to strings my_list.index(a) Get the index of an item array([1, 2]) >>> int() 5, 3, 1 Variables to integers >>> my_list.count(a) Count an item Subset 2D Numpy arrays >>> my_list.append('!') Append an item at a time >>> my_2darray[:,0] my_2darray[rows, columns] my_list.remove('!') Remove an item array([1, 4]) float() 5.0, 1.0 Variables to floats >>> >>> del(my_list[0:1]) Remove an item Numpy Array Operations bool() True, True, True >>> my_list.reverse() Reverse the list Variables to booleans >>> my_array > 3 >>> my_list.extend('!') Append an item array([False, False, False, True], dtype=bool) >>> my_list.pop(-1) Remove an item >>> my_array * 2 Asking For Help >>> my_list.insert(0,'!') Insert an item array([2, 4, 6, 8]) >>> help(str) >>> my_list.sort() Sort the list >>> my_array + np.array([5, 6, 7, 8]) array([6, 8, 10, 12]) Strings >>> my_string = 'thisStringIsAwesome' Numpy Array Functions String Operations Index starts at 0 >>> my_string >>> my_array.shape Get the dimensions of the array 'thisStringIsAwesome' >>> my_string[3] >>> np.append(other_array) Append items to an array >>> my_string[4:9] >>> np.insert(my_array, 1, 5) Insert items in an array String Operations >>> np.delete(my_array,[1]) Delete items in an array String Methods >>> np.mean(my_array) Mean of the array >>> my_string * 2 'thisStringIsAwesomethisStringIsAwesome' >>> my_string.upper() String to uppercase >>> np.median(my_array) Median of the array >>> my_string + 'Innit' >>> my_string.lower() String to lowercase >>> my_array.corrcoef() Correlation coefficient 'thisStringIsAwesomeInnit' >>> my_string.count('w') Count String elements >>> np.std(my_array) Standard deviation >>> 'm' in my_string >>> my_string.replace('e', 'i') Replace String elements True >>> my_string.strip() Strip whitespaces DataCamp Learn Python for Data Science Interactively Python For Data Science Cheat Sheet Asking For Help Dropping >>> help(pd.Series.loc) >>> s.drop(['a', 'c']) Drop values from rows (axis=0) Pandas Basics Selection Also see NumPy Arrays >>> df.drop('Country', axis=1) Drop values from columns(axis=1) Learn Python for Data Science Interactively at www.DataCamp.com Getting >>> s['b'] Get one element Sort & Rank -5 Pandas >>> df.sort_index() Sort by labels along an axis >>> df.sort_values(by='Country') Sort by the values along an axis >>> df[1:] Get subset of a DataFrame The Pandas library is built on NumPy and provides easy-to-use Country Capital Population >>> df.rank() Assign ranks to entries data structures and data analysis tools for the Python 1 India New Delhi 1303171035 2 Brazil Brasília 207847528 programming language. Retrieving Series/DataFrame Information Selecting, Boolean Indexing & Setting Basic Information Use the following import convention: By Position >>> df.shape (rows,columns) >>> import pandas as pd >>> df.iloc[[0],[0]] Select single value by row & >>> df.index Describe index 'Belgium' column >>> df.columns Describe DataFrame columns Pandas Data Structures >>> df.iat([0],[0]) >>> >>> df.info() df.count() Info on DataFrame Number of non-NA values Series 'Belgium' Summary A one-dimensional labeled array a 3 By Label >>> df.loc[[0], ['Country']] Select single value by row & >>> df.sum() Sum of values capable of holding any data type b -5 'Belgium' column labels >>> df.cumsum() Cummulative sum of values >>> df.min()/df.max() Minimum/maximum values c 7 >>> df.at([0], ['Country']) >>> df.idxmin()/df.idxmax() Index Minimum/Maximum index value d 4 'Belgium' >>> df.describe() Summary statistics >>> df.mean() Mean of values >>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) By Label/Position >>> df.median() Median of values >>> df.ix[2] Select single row of DataFrame Country Capital Brazil Brasília subset of rows Applying Functions Population 207847528 >>> f = lambda x: x*2 Columns Country Capital Population A two-dimensional labeled >>> df.ix[:,'Capital'] Select a single column of >>> df.apply(f) Apply function >>> df.applymap(f) Apply function element-wise data structure with columns 0 Brussels subset of columns 0 Belgium Brussels 11190846 1 New Delhi of potentially different types 2 Brasília Data Alignment 1 India New Delhi 1303171035 Index >>> df.ix[1,'Capital'] Select rows and columns 2 Brazil Brasília 207847528 Internal Data Alignment 'New Delhi' NA values are introduced in the indices that don’t overlap: Boolean Indexing >>> data = {'Country': ['Belgium', 'India', 'Brazil'], >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) >>> s[~(s > 1)] Series s where value is not >1 'Capital': ['Brussels', 'New Delhi', 'Brasília'], >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> s + s3 'Population': [11190846, 1303171035, 207847528]} >>> df[df['Population']>1200000000] Use filter to adjust DataFrame a 10.0 b NaN >>> df = pd.DataFrame(data, Setting c 5.0 columns=['Country', 'Capital', 'Population']) >>> s['a'] = 6 Set index a of Series s to 6 d 7.0
I/O Arithmetic Operations with Fill Methods
You can also do the internal data alignment yourself with Read and Write to CSV Read and Write to SQL Query or Database Table the help of the fill methods: >>> pd.read_csv('file.csv', header=None, nrows=5) >>> from sqlalchemy import create_engine >>> s.add(s3, fill_value=0) >>> df.to_csv('myDataFrame.csv') >>> engine = create_engine('sqlite:///:memory:') a 10.0 >>> pd.read_sql("SELECT * FROM my_table;", engine) b -5.0 Read and Write to Excel c 5.0 >>> pd.read_sql_table('my_table', engine) d 7.0 >>> pd.read_excel('file.xlsx') >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> s.sub(s3, fill_value=2) >>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') >>> s.div(s3, fill_value=4) read_sql()is a convenience wrapper around read_sql_table() and Read multiple sheets from the same file >>> s.mul(s3, fill_value=3) read_sql_query() >>> xlsx = pd.ExcelFile('file.xls') >>> df = pd.read_excel(xlsx, 'Sheet1') >>> pd.to_sql('myDf', engine) DataCamp Learn Python for Data Science Interactively Python For Data Science Cheat Sheet Inspecting Your Array Subsetting, Slicing, Indexing Also see Lists >>> a.shape Array dimensions Subsetting NumPy Basics >>> >>> len(a) b.ndim Length of array Number of array dimensions >>> a[2] 3 1 2 3 Select the element at the 2nd index Learn Python for Data Science Interactively at www.DataCamp.com >>> e.size Number of array elements >>> b[1,2] 1.5 2 3 Select the element at row 1 column 2 >>> b.dtype Data type of array elements 6.0 4 5 6 (equivalent to b[1][2]) >>> b.dtype.name Name of data type >>> b.astype(int) Convert an array to a different type Slicing NumPy >>> a[0:2] array([1, 2]) 1 2 3 Select items at index 0 and 1 2 The NumPy library is the core library for scientific computing in Asking For Help >>> b[0:2,1] 1.5 2 3 Select items at rows 0 and 1 in column 1 >>> np.info(np.ndarray.dtype) array([ 2., 5.]) 4 5 6 Python. It provides a high-performance multidimensional array Array Mathematics 1.5 2 3 >>> b[:1] Select all items at row 0 object, and tools for working with these arrays. array([[1.5, 2., 3.]]) 4 5 6 (equivalent to b[0:1, :]) Arithmetic Operations >>> c[1,...] Same as [1,:,:] Use the following import convention: array([[[ 3., 2., 1.], >>> import numpy as np [ 4., 5., 6.]]]) >>> g = a - b Subtraction array([[-0.5, 0. , 0. ], >>> a[ : :-1] Reversed array a NumPy Arrays [-3. , -3. , -3. ]]) array([3, 2, 1])
>>> np.subtract(a,b) Boolean Indexing
1D array 2D array 3D array Subtraction >>> a[a<2] Select elements from a less than 2 >>> b + a Addition 1 2 3 array([[ 2.5, 4. , 6. ], array([1]) axis 1 axis 2 1 2 3 axis 1 [ 5. , 7. , 9. ]]) Fancy Indexing 1.5 2 3 >>> np.add(b,a) Addition >>> b[[1, 0, 1, 0],[0, 1, 2, 0]] Select elements (1,0),(0,1),(1,2) and (0,0) axis 0 axis 0 array([ 4. , 2. , 6. , 1.5]) 4 5 6 >>> a / b Division array([[ 0.66666667, 1. , 1. ], >>> b[[1, 0, 1, 0]][:,[0,1,2,0]] Select a subset of the matrix’s rows [ 0.25 , 0.4 , 0.5 ]]) array([[ 4. ,5. , 6. , 4. ], and columns >>> np.divide(a,b) Division [ 1.5, 2. , 3. , 1.5], Creating Arrays >>> a * b array([[ 1.5, 4. , 9. ], Multiplication [ 4. , 5. [ 1.5, 2. , , 6. 3. , , 4. ], 1.5]])
>>> a = np.array([1,2,3]) [ 4. , 10. , 18. ]])
>>> b = np.array([(1.5,2,3), (4,5,6)], dtype = float) >>> np.multiply(a,b) Multiplication Array Manipulation >>> c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], >>> np.exp(b) Exponentiation dtype = float) >>> np.sqrt(b) Square root Transposing Array >>> np.sin(a) Print sines of an array >>> i = np.transpose(b) Permute array dimensions Initial Placeholders >>> np.cos(b) Element-wise cosine >>> i.T Permute array dimensions >>> np.log(a) Element-wise natural logarithm >>> np.zeros((3,4)) Create an array of zeros >>> e.dot(f) Dot product Changing Array Shape >>> np.ones((2,3,4),dtype=np.int16) Create an array of ones array([[ 7., 7.], >>> b.ravel() Flatten the array >>> d = np.arange(10,25,5) Create an array of evenly [ 7., 7.]]) >>> g.reshape(3,-2) Reshape, but don’t change data spaced values (step value) >>> np.linspace(0,2,9) Create an array of evenly Comparison Adding/Removing Elements spaced values (number of samples) >>> h.resize((2,6)) Return a new array with shape (2,6) >>> e = np.full((2,2),7) Create a constant array >>> a == b Element-wise comparison >>> np.append(h,g) Append items to an array >>> f = np.eye(2) Create a 2X2 identity matrix array([[False, True, True], >>> np.insert(a, 1, 5) Insert items in an array >>> np.random.random((2,2)) Create an array with random values [False, False, False]], dtype=bool) >>> np.delete(a,[1]) Delete items from an array >>> np.empty((3,2)) Create an empty array >>> a < 2 Element-wise comparison array([True, False, False], dtype=bool) Combining Arrays >>> np.array_equal(a, b) Array-wise comparison >>> np.concatenate((a,d),axis=0) Concatenate arrays I/O array([ 1, 2, >>> np.vstack((a,b)) 3, 10, 15, 20]) Stack arrays vertically (row-wise) Aggregate Functions array([[ 1. , 2. , 3. ], Saving & Loading On Disk [ 1.5, 2. , 3. ], >>> a.sum() Array-wise sum [ 4. , 5. , 6. ]]) >>> np.save('my_array', a) >>> a.min() Array-wise minimum value >>> np.r_[e,f] Stack arrays vertically (row-wise) >>> np.savez('array.npz', a, b) >>> b.max(axis=0) Maximum value of an array row >>> np.hstack((e,f)) Stack arrays horizontally (column-wise) >>> np.load('my_array.npy') >>> b.cumsum(axis=1) Cumulative sum of the elements array([[ 7., 7., 1., 0.], >>> a.mean() Mean [ 7., 7., 0., 1.]]) Saving & Loading Text Files >>> b.median() Median >>> np.column_stack((a,d)) Create stacked column-wise arrays >>> np.loadtxt("myfile.txt") >>> a.corrcoef() Correlation coefficient array([[ 1, 10], >>> np.std(b) Standard deviation [ 2, 15], >>> np.genfromtxt("my_file.csv", delimiter=',') [ 3, 20]]) >>> np.savetxt("myarray.txt", a, delimiter=" ") >>> np.c_[a,d] Create stacked column-wise arrays Copying Arrays Splitting Arrays Data Types >>> h = a.view() Create a view of the array with the same data >>> np.hsplit(a,3) Split the array horizontally at the 3rd >>> np.copy(a) Create a copy of the array [array([1]),array([2]),array([3])] index >>> np.int64 Signed 64-bit integer types >>> np.vsplit(c,2) Split the array vertically at the 2nd index >>> np.float32 Standard double-precision floating point >>> h = a.copy() Create a deep copy of the array [array([[[ 1.5, 2. , 1. ], >>> np.complex Complex numbers represented by 128 floats [ 4. , 5. , 6. ]]]), array([[[ 3., 2., 3.], >>> >>> np.bool np.object Boolean type storing TRUE and FALSE values Python object type Sorting Arrays [ 4., 5., 6.]]])]
>>> np.string_ Fixed-length string type >>> a.sort() Sort an array
>>> np.unicode_ Fixed-length unicode type >>> c.sort(axis=0) Sort the elements of an array's axis DataCamp Learn Python for Data Science Interactively Python For Data Science Cheat Sheet Advanced Indexing Also see NumPy Arrays Combining Data Selecting data1 data2 Pandas >>> df3.loc[:,(df3>1).any()] Select cols with any vals >1 X1 X2 X1 X3 Learn Python for Data Science Interactively at www.DataCamp.com >>> df3.loc[:,(df3>1).all()] Select cols with vals > 1 >>> df3.loc[:,df3.isnull().any()] Select cols with NaN a 11.432 a 20.784 >>> df3.loc[:,df3.notnull().all()] Select cols without NaN b 1.303 b NaN Indexing With isin c 99.906 d 20.784 >>> df[(df.Country.isin(df2.Type))] Find same elements Reshaping Data >>> df3.filter(items=”a”,”b”]) Filter on values Merge >>> df.select(lambda x: not x%5) Select specific elements Pivot Where X1 X2 X3 >>> pd.merge(data1, >>> df3= df2.pivot(index='Date', Spread rows into columns >>> s.where(s > 0) Subset the data data2, a 11.432 20.784 columns='Type', Query how='left', values='Value') b 1.303 NaN >>> df6.query('second > first') Query DataFrame on='X1') c 99.906 NaN Date Type Value
0 2016-03-01 a 11.432 Type a b c Setting/Resetting Index >>> pd.merge(data1, X1 X2 X3
1 2016-03-02 b 13.031 Date data2, a 11.432 20.784 >>> df.set_index('Country') Set the index how='right', 2 2016-03-01 c 20.784 2016-03-01 11.432 NaN 20.784 >>> df4 = df.reset_index() Reset the index b 1.303 NaN on='X1') 3 2016-03-03 a 99.906 >>> df = df.rename(index=str, Rename DataFrame d NaN 20.784 2016-03-02 1.303 13.031 NaN columns={"Country":"cntry", 4 2016-03-02 a 1.303 "Capital":"cptl", >>> pd.merge(data1, 2016-03-03 99.906 NaN 20.784 "Population":"ppltn"}) X1 X2 X3 5 2016-03-03 c 20.784 data2, how='inner', a 11.432 20.784 Pivot Table Reindexing on='X1') b 1.303 NaN >>> s2 = s.reindex(['a','c','d','e','b']) >>> df4 = pd.pivot_table(df2, Spread rows into columns X1 X2 X3 values='Value', Forward Filling Backward Filling >>> pd.merge(data1, index='Date', data2, a 11.432 20.784 columns='Type']) >>> df.reindex(range(4), >>> s3 = s.reindex(range(5), how='outer', b 1.303 NaN method='ffill') method='bfill') on='X1') c 99.906 NaN Stack / Unstack Country Capital Population 0 3 0 Belgium Brussels 11190846 1 3 d NaN 20.784 >>> stacked = df5.stack() Pivot a level of column labels 1 India New Delhi 1303171035 2 3 >>> stacked.unstack() Pivot a level of index labels 2 Brazil Brasília 207847528 3 3 Join 3 Brazil Brasília 207847528 4 3 0 1 1 5 0 0.233482 >>> data1.join(data2, how='right') 1 5 0.233482 0.390959 1 0.390959 MultiIndexing Concatenate 2 4 0.184713 0.237102 2 4 0 0.184713 >>> arrays = [np.array([1,2,3]), 3 3 0.433522 0.429401 1 0.237102 np.array([5,4,3])] Vertical >>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays) >>> s.append(s2) Unstacked 3 3 0 0.433522 >>> tuples = list(zip(*arrays)) Horizontal/Vertical 1 0.429401 >>> index = pd.MultiIndex.from_tuples(tuples, >>> pd.concat([s,s2],axis=1, keys=['One','Two']) Stacked names=['first', 'second']) >>> pd.concat([data1, data2], axis=1, join='inner') >>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index) Melt >>> df2.set_index(["Date", "Type"]) >>> pd.melt(df2, Gather columns into rows Dates id_vars=["Date"], value_vars=["Type", "Value"], Duplicate Data >>> df2['Date']= pd.to_datetime(df2['Date']) >>> df2['Date']= pd.date_range('2000-1-1', value_name="Observations") >>> s3.unique() Return unique values periods=6, >>> df2.duplicated('Type') Check duplicates freq='M') Date Type Value Date Variable Observations >>> dates = [datetime(2012,5,1), datetime(2012,5,2)] 0 2016-03-01 Type a >>> df2.drop_duplicates('Type', keep='last') Drop duplicates >>> index = pd.DatetimeIndex(dates) 0 2016-03-01 a 11.432 1 2016-03-02 Type b >>> df.index.duplicated() Check index duplicates >>> index = pd.date_range(datetime(2012,2,1), end, freq='BM') 1 2016-03-02 b 13.031 2 2016-03-01 Type c 2 2016-03-01 c 20.784 3 2016-03-03 Type a Grouping Data Visualization Also see Matplotlib 4 2016-03-02 Type a 3 2016-03-03 a 99.906 5 2016-03-03 Type c Aggregation >>> import matplotlib.pyplot as plt 4 2016-03-02 a 1.303 >>> df2.groupby(by=['Date','Type']).mean() 6 2016-03-01 Value 11.432 >>> s.plot() >>> df2.plot() >>> df4.groupby(level=0).sum() 5 2016-03-03 c 20.784 7 2016-03-02 Value 13.031 >>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), >>> plt.show() >>> plt.show() 8 2016-03-01 Value 20.784 'b': np.sum}) 9 2016-03-03 Value 99.906 Transformation >>> customSum = lambda x: (x+x%2) 10 2016-03-02 Value 1.303 >>> df4.groupby(level=0).transform(customSum) 11 2016-03-03 Value 20.784
Iteration Missing Data
>>> df.dropna() Drop NaN values >>> df.iteritems() (Column-index, Series) pairs >>> df3.fillna(df3.mean()) Fill NaN values with a predetermined value >>> df.iterrows() (Row-index, Series) pairs >>> df2.replace("a", "f") Replace values with others DataCamp Learn Python for Data Science Interactively Python For Data Science Cheat Sheet Excel Spreadsheets Pickled Files >>> file = 'urbanpop.xlsx' >>> import pickle Importing Data >>> data = pd.ExcelFile(file) >>> with open('pickled_fruit.pkl', 'rb') as file: pickled_data = pickle.load(file) >>> df_sheet2 = data.parse('1960-1966', Learn Python for data science Interactively at www.DataCamp.com skiprows=[0], names=['Country', 'AAM: War(2002)']) >>> df_sheet1 = data.parse(0, HDF5 Files parse_cols=[0], Importing Data in Python skiprows=[0], >>> import h5py >>> filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5' names=['Country']) Most of the time, you’ll use either NumPy or pandas to import >>> data = h5py.File(filename, 'r') your data: To access the sheet names, use the sheet_names attribute: >>> import numpy as np >>> data.sheet_names >>> import pandas as pd Matlab Files Help SAS Files >>> import scipy.io >>> filename = 'workspace.mat' >>> from sas7bdat import SAS7BDAT >>> mat = scipy.io.loadmat(filename) >>> np.info(np.ndarray.dtype) >>> help(pd.read_csv) >>> with SAS7BDAT('urbanpop.sas7bdat') as file: df_sas = file.to_data_frame()
Text Files Exploring Dictionaries
Stata Files Accessing Elements with Functions Plain Text Files >>> data = pd.read_stata('urbanpop.dta') >>> print(mat.keys()) Print dictionary keys >>> filename = 'huck_finn.txt' >>> for key in data.keys(): Print dictionary keys >>> file = open(filename, mode='r') Open the file for reading print(key) >>> text = file.read() Read a file’s contents Relational Databases meta quality >>> print(file.closed) Check whether file is closed >>> from sqlalchemy import create_engine strain >>> file.close() Close file >>> print(text) >>> engine = create_engine('sqlite://Northwind.sqlite') >>> pickled_data.values() Return dictionary values >>> print(mat.items()) Returns items in list format of (key, value) Use the table_names() method to fetch a list of table names: tuple pairs Using the context manager with >>> with open('huck_finn.txt', 'r') as file: >>> table_names = engine.table_names() Accessing Data Items with Keys print(file.readline()) Read a single line print(file.readline()) Querying Relational Databases >>> for key in data ['meta'].keys() Explore the HDF5 structure print(file.readline()) print(key) >>> con = engine.connect() Description >>> rs = con.execute("SELECT * FROM Orders") DescriptionURL Table Data: Flat Files >>> df = pd.DataFrame(rs.fetchall()) Detector >>> df.columns = rs.keys() Duration GPSstart Importing Flat Files with numpy >>> con.close() Observatory Files with one data type Using the context manager with Type UTCstart >>> filename = ‘mnist.txt’ >>> with engine.connect() as con: >>> print(data['meta']['Description'].value) Retrieve the value for a key >>> data = np.loadtxt(filename, rs = con.execute("SELECT OrderID FROM Orders") delimiter=',', String used to separate values df = pd.DataFrame(rs.fetchmany(size=5)) df.columns = rs.keys() skiprows=2, usecols=[0,2], Skip the first 2 lines Read the 1st and 3rd column Navigating Your FileSystem dtype=str) The type of the resulting array Querying relational databases with pandas Magic Commands Files with mixed data types >>> df = pd.read_sql_query("SELECT * FROM Orders", engine) >>> filename = 'titanic.csv' !ls List directory contents of files and directories >>> data = np.genfromtxt(filename, %cd .. Change current working directory %pwd Return the current working directory path delimiter=',', names=True, Look for column header Exploring Your Data dtype=None) NumPy Arrays os Library >>> data_array = np.recfromcsv(filename) >>> data_array.dtype Data type of array elements >>> import os >>> data_array.shape Array dimensions >>> path = "/usr/tmp" The default dtype of the np.recfromcsv() function is None. >>> wd = os.getcwd() Store the name of current directory in a string >>> len(data_array) Length of array >>> os.listdir(wd) Output contents of the directory in a list Importing Flat Files with pandas >>> os.chdir(path) Change current working directory pandas DataFrames >>> os.rename("test1.txt", Rename a file >>> filename = 'winequality-red.csv' "test2.txt") >>> data = pd.read_csv(filename, >>> df.head() Return first DataFrame rows nrows=5, >>> os.remove("test1.txt") Delete an existing file Number of rows of file to read >>> df.tail() Return last DataFrame rows >>> os.mkdir("newdir") Create a new directory header=None, Row number to use as col names >>> df.index Describe index sep='\t', Delimiter to use >>> df.columns Describe DataFrame columns comment='#', Character to split comments >>> df.info() Info on DataFrame na_values=[""]) String to recognize as NA/NaN >>> data_array = data.values Convert a DataFrame to an a NumPy array DataCamp Learn R for Data Science Interactively Working with Different Programming Languages Widgets Python For Data Science Cheat Sheet Kernels provide computation and communication with front-end interfaces Notebook widgets provide the ability to visualize and control changes Jupyter Notebook like the notebooks. There are three main kernels: in your data, often as a control like a slider, textbox, etc. Learn More Python for Data Science Interactively at www.DataCamp.com You can use them to build interactive GUIs for your notebooks or to IRkernel IJulia synchronize stateful and stateless information between Python and Installing Jupyter Notebook will automatically install the IPython kernel. JavaScript. Saving/Loading Notebooks Restart kernel Interrupt kernel Create new notebook Restart kernel & run Interrupt kernel & Download serialized Save notebook all cells clear all output state of all widget with interactive Open an existing Connect back to a models in use widgets Make a copy of the notebook Restart kernel & run remote notebook current notebook all cells Embed current Rename notebook Run other installed widgets kernels Revert notebook to a Save current notebook previous checkpoint Command Mode: and record checkpoint Download notebook as Preview of the printed - IPython notebook 15 notebook - Python - HTML Close notebook & stop - Markdown 13 14 - reST running any scripts - LaTeX 1 2 3 4 5 6 7 8 9 10 11 12 - PDF
Writing Code And Text
Code and text are encapsulated by 3 basic cell types: markdown cells, code cells, and raw NBConvert cells. Edit Cells Edit Mode: 1. Save and checkpoint 9. Interrupt kernel 2. Insert cell below 10. Restart kernel 3. Cut cell 11. Display characteristics Cut currently selected cells Copy cells from 4. Copy cell(s) 12. Open command palette to clipboard clipboard to current 5. Paste cell(s) below 13. Current kernel cursor position 6. Move cell up 14. Kernel status Paste cells from Executing Cells 7. Move cell down 15. Log out from notebook server clipboard above Paste cells from 8. Run current cell current cell Run selected cell(s) Run current cells down clipboard below and create a new one Paste cells from current cell below Asking For Help clipboard on top Run current cells down Delete current cells of current cel and create a new one Walk through a UI tour Split up a cell from above Run all cells Revert “Delete Cells” List of built-in keyboard current cursor Run all cells above the Run all cells below invocation shortcuts position current cell the current cell Edit the built-in Merge current cell Merge current cell keyboard shortcuts Change the cell type of toggle, toggle Notebook help topics with the one above with the one below current cell scrolling and clear Description of Move current cell up Move current cell toggle, toggle current outputs markdown available Information on down scrolling and clear in notebook unofficial Jupyter Adjust metadata underlying the Find and replace all output Notebook extensions Python help topics current notebook in selected cells IPython help topics View Cells Remove cell Copy attachments of NumPy help topics attachments current cell Toggle display of Jupyter SciPy help topics Toggle display of toolbar Matplotlib help topics Paste attachments of Insert image in logo and filename SymPy help topics current cell selected cells Toggle display of cell Pandas help topics action icons: Insert Cells - None About Jupyter Notebook - Edit metadata Toggle line numbers - Raw cell format Add new cell above the Add new cell below the - Slideshow current one in cells - Attachments current one DataCamp - Tags Learn Python for Data Science Interactively