Python For Data Science Cheat Sheet 2.0

This cheat sheet provides an overview of core Python concepts including data types, variables, operators, control flow statements, functions and modules. It covers common data structures like lists, dictionaries, strings and how to manipulate them using built-in methods and functions.

Uploaded by

charenbahaeddinehemmem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

Python For Data Science Cheat Sheet 2.0

Uploaded by

charenbahaeddinehemmem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Python

Cheat Sheet

Pandas | Numpy | Sklearn

Matplotlib | Seaborn
BS4 | Selenium | Scrapy

Artificial Corner
Python Basics Variables
Variable assignment:
Creating a new list:
numbers = [4, 3, 10, 7, 1, 2]

Cheat Sheet
message_1 = "I'm learning Python" Sorting a list:
message_2 = "and it's fun!" >>> numbers.sort()
[1, 2, 3, 4, 7, 10]
String concatenation (+ operator):
Here you will find all the Python core concepts you need to
message_1 + ' ' + message_2 >>> numbers.sort(reverse=True)
know before learning any third-party library.
[10, 7, 4, 3, 2, 1]
String concatenation (f-string):
Data Types f'{message_1} {message_2}' Update value on a list:
>>> numbers[0] = 1000
Integers (int): 1 >>> numbers
Float (float): 1.2
List [1000, 7, 4, 3, 2, 1]
String (str): "Hello World" Creating a list:
Copying a list:
Boolean: True/False countries = ['United States', 'India', new_list = countries[:]
'China', 'Brazil'] new_list_2 = countries.copy()
List: [value1, value2]
Dictionary: {key1:value1, key2:value2, ...} Create an empty list:
my_list = [] Built-in Functions
Numeric Operators Comparison Operators Indexing: Print an object:
>>> countries[0] print("Hello World")
+ Addition
== United States
Equal to
Return the length of x:
- Subtraction >>> countries[3] len(x)
!= Different Brazil
* Multiplication Return the minimum value:
> Greater than >>> countries[-1] min(x)
Division Brazil
/ < Less than Return the maximum value:
Slicing: max(x)
** Exponent >>>countries[0:3]
>= Greater than or equal to
['United States', 'India', 'China'] Returns a sequence of numbers:
% Modulus
<= Less than or equal to range(x1,x2,n) # from x1 to x2
>>>countries[1:] (increments by n)
// Floor division ['India', 'China', 'Brazil']
Convert x to a string:
>>>countries[:2] str(x)
['United States', 'India']
String methods Convert x to an integer/float:
Adding elements to a list: int(x)
string.upper(): converts to uppercase countries.append('Canada') float(x)
string.lower(): converts to lowercase countries.insert(0,'Canada')
string.title(): converts to title case Convert x to a list:
Nested list: list(x)
string.count('l'): counts how many times "l" nested_list = [countries, countries_2]
appears
string.find('h'): position of the "h" first Remove element:
countries.remove('United States')
ocurrance countries.pop(0) # removes and returns value
string.replace('o', 'u'): replaces "o" with "u" del countries[0]
Dictionary If Statement Functions
Creating a dictionary: Create a function:
Conditional test:
my_data = {'name':'Frank', 'age':26} def function(<params>):
if <condition>:
<code> <code>
Create an empty dictionary: elif <condition>: return <data>
my_dict = {} <code>
Get value of key "name":
...
else:
Modules
>>> my_data["name"] <code> Import module:
'Frank' import module
Example: module.method()
Get the keys: if age>=18:
>>> my_data.keys() print("You're an adult!") OS module:
dict_keys(['name', 'age']) import os
Conditional test with list: os.getcwd()
Get the values: if <value> in <list>: os.listdir()
>>> my_data.values() <code> os.makedirs(<path>)
dict_values(['Frank', 26])
Get the pair key-value:
>>> my_data.items()
Loops Special Characters
dict_items([('name', 'Frank'), ('age', 26)]) For loop: # Comment
for <variable> in <list>:
Adding/updating items in a dictionary: <code> \n New Line
my_data['height']=1.7
my_data.update({'height':1.8, For loop and enumerate list elements:
'languages':['English', 'Spanish']}) for i, element in enumerate(<list>): Boolean Operators
>>> my_data Boolean Operators
<code> (Pandas)
{'name': 'Frank',
'age': 26, For loop and obtain dictionary elements: and logical AND & logical AND
'height': 1.8, for key, value in my_dict.items():
'languages': ['English', 'Spanish']} <code> or logical OR | logical OR
Remove an item: While loop: not logical NOT ~ logical NOT
my_data.pop('height') while <condition>:
del my_data['languages'] <code>
my_data.clear()
Copying a dictionary: Data Validation
new_dict = my_data.copy()
Try-except: Below are my guides, tutorials and
try: complete Data Science course:
<code>
except <error>: - Medium Guides
<code> - YouTube Tutorials
Loop control statement: - Data Science Course (Udemy)
break: stops loop execution - Make Money Using Your Programming
continue: jumps to next iteration & Data Science Skills
pass: does nothing
Made by Frank Andrade: artificialcorner.com
Pandas
Concatenate multiple data frames horizontally:
Selecting rows and columns df3 = pd.DataFrame([[7],[8],[9]],
index=['A','B','C'],

Cheat Sheet
Select single column: columns=['col3'])
df['col1']
pd.concat([df,df3], axis=1)
Select multiple columns:
Pandas provides data analysis tools for Python. All of the df[['col1', 'col2']] Only merge complete rows (INNER JOIN):
following code examples refer to the dataframe below. df.merge(df3)
Show first n rows:
df.head(2) Left column stays complete (LEFT OUTER JOIN):
axis 1 df.merge(df3, how='left')
col1 col2 Show last n rows:
df.tail(2) Right column stays complete (RIGHT OUTER JOIN):
A 1 4 df.merge(df3, how='right')
Select rows by index values:
axis 0
df = B 2 5 df.loc['A'] df.loc[['A', 'B']] Preserve all values (OUTER JOIN):
df.merge(df3, how='outer')
C 3 6 Select rows by position:
df.iloc[1] df.iloc[1:] Merge rows by index:
df.merge(df3,left_index=True,
right_index=True)
Getting Started Data wrangling
Fill NaN values:
Import pandas: Filter by value: df.fillna(0)
import pandas as pd df[df['col1'] > 1]
Apply your own function:
Sort by one column: def func(x):
Create a series: df.sort_values('col1') return 2**x
s = pd.Series([1, 2, 3], df.apply(func)
Sort by columns:
index=['A', 'B', 'C'], df.sort_values(['col1', 'col2'],
name='col1') ascending=[False, True]) Arithmetics and statistics
Create a dataframe: Add to all values:
Identify duplicate rows:
data = [[1, 4], [2, 5], [3, 6]] df.duplicated() df + 10
index = ['A', 'B', 'C']
Identify unique rows: Sum over columns:
df = pd.DataFrame(data, index=index,
df['col1'].unique() df.sum()
columns=['col1', 'col2'])
Read a csv file with pandas: Swap rows and columns: Cumulative sum over columns:
df = pd.read_csv('filename.csv') df = df.transpose() df.cumsum()
df = df.T
Mean over columns:
Advanced parameters: Drop a column: df.mean()
df = pd.read_csv('filename.csv', sep=',', df = df.drop('col1', axis=1)
Standard deviation over columns:
names=['col1', 'col2'], Clone a data frame: df.std()
index_col=0, clone = df.copy()
Count unique values:
encoding='utf-8',
Concatenate multiple dataframes vertically: df['col1'].value_counts()
nrows=3) df2 = df + 5 # new dataframe
pd.concat([df,df2]) Summarize descriptive statistics:
df.describe()
Hierarchical indexing Data export Visualization
Create hierarchical index: Data as NumPy array: The plots below are made with a dataframe
df.stack() df.values with the shape of df_gdp (pivot() method)
Dissolve hierarchical index: Save data as CSV file:
df.unstack() df.to_csv('output.csv', sep=",") Import matplotlib:
import matplotlib.pyplot as plt
Format a dataframe as tabular string:
Aggregation df.to_string() Start a new diagram:
plt.figure()
Create group object: Convert a dataframe to a dictionary:
g = df.groupby('col1') df.to_dict() Scatter plot:
df.plot(kind='scatter')
Iterate over groups: Save a dataframe as an Excel table:
for i, group in g: df.to_excel('output.xlsx') Bar plot:
print(i, group) df.plot(kind='bar',
xlabel='data1',
Aggregate groups: ylabel='data2')
g.sum() Pivot and Pivot Table
g.prod() Lineplot:
g.mean() Read csv file 1: df.plot(kind='line',
g.std() df_gdp = pd.read_csv('gdp.csv') figsize=(8,4))
g.describe() Boxplot:
The pivot() method:
Select columns from groups: df_gdp.pivot(index="year", df['col1'].plot(kind='box')
g['col2'].sum() columns="country", Histogram over one column:
g[['col2', 'col3']].sum() values="gdppc")
df['col1'].plot(kind='hist',
Transform values: Read csv file 2: bins=3)
import math df_sales=pd.read_excel( Piechart:
g.transform(math.log) 'supermarket_sales.xlsx')
df.plot(kind='pie',
Apply a list function on each group: Make pivot table: y='col1',
def strsum(group): df_sales.pivot_table(index='Gender', title='Population')
return ''.join([str(x) for x in group.value]) aggfunc='sum') Set tick marks:
g['col2'].apply(strsum) Make a pivot tables that says how much male and labels = ['A', 'B', 'C', 'D']
female spend in each category: positions = [1, 2, 3, 4]
plt.xticks(positions, labels)
df_sales.pivot_table(index='Gender', plt.yticks(positions, labels)
Below are my guides, tutorials and columns='Product line', Label diagram and axes:
complete Pandas course: values='Total',
aggfunc='sum') plt.title('Correlation')
- Medium Guides plt.xlabel('Nunstück')
plt.ylabel('Slotermeyer')
- YouTube Tutorials
- Pandas Course (Udemy) Save most recent diagram:
- Make Money Using Your Programming plt.savefig('plot.png')
plt.savefig('plot.png',dpi=300)
& Data Science Skills plt.savefig('plot.svg')
Made by Frank Andrade: artificialcorner.com
NumPy Saving & Loading Text Files Aggregate functions:
np.loadtxt('my_file.txt') a.sum()
np.genfromtxt('my_file.csv', a.min()

Cheat Sheet delimiter=',') b.max(axis= 0)

np.savetxt('myarray.txt', a, b.cumsum(axis= 1) # Cumulative sum
delimiter= ' ') a.mean()
NumPy provides tools for working with arrays. All of the Inspecting Your Array b.median()
a.shape a.corrcoef() # Correlation coefficient
following code examples refer to the arrays below. np.std(b) # Standard deviation
len(a)
NumPy Arrays b.ndim
e.size Copying arrays:
1D Array 2D Array
axis 1 b.dtype # data type h = a.view() # Create a view
b.dtype.name np.copy(a)
1 2 3 1.5 2 3 b.astype(int) # change data type h = a.copy() # Create a deep copy
axis 0
Data Types Sorting arrays:
4 5 6 a.sort() # Sort an array
np.int64
np.float32 c.sort(axis=0)
Getting Started np.complex
np.bool Array Manipulation
Import numpy: np.object
np.string_ Transposing Array:
import numpy as np
np.unicode_ i = np.transpose(b)
i.T
Create arrays: Array Mathematics Changing Array Shape:
a = np.array([1,2,3]) Arithmetic Operations b.ravel()
b = np.array([(1.5,2,3), (4,5,6)], dtype=float) >>> g = a-b g.reshape(3,-2)
c = np.array([[(1.5,2,3), (4,5,6)], array([[-0.5, 0. , 0. ],
[-3. , 3. , 3. ]]) Adding/removing elements:
[(3,2,1), (4,5,6)]],
>>> np.subtract(a,b) h.resize((2,6))
dtype = float) np.append(h,g)
Initial placeholders: >>> b+a np.insert(a, 1, 5)
np.zeros((3,4)) # Create an array of zeros array([[2.5, 4. , 6. ], np.delete(a,[1])
[ 5. , 7. , 9. ]])
np.ones((2,3,4),dtype=np.int16) >>> np.add(b,a) Combining arrays:
d = np.arange(10,25,5) np.concatenate((a,d),axis=0)
np.linspace( 0,2, 9) >>> a/b np.vstack((a,b)) # stack vertically
array([[ 0.66666667, 1. , 1. ], np.hstack((e,f)) # stack horizontally
e = np.full((2,2), 7) [ 0.25 , 0.4 , 0.5 ]])
f = np.eye(2) >>> np.divide(a,b) Splitting arrays:
np.random.random((2,2)) np.hsplit(a,3) # Split horizontally
>>> a*b np.vsplit(c,2) # Split vertically
np.empty((3,2)) array([[ 1.5, 4. , 9. ],
[ 4. , 10. , 18. ]]) Subsetting 1.5 2 3

Saving & Loading On Disk: >>> np.multiply(a,b) b[1,2] 4 5 6

np.save('my_array', a) >>> np.exp(b) Slicing:

np.savez('array.npz', a, b) >>> np.sqrt(b) a[0:2] 1 2 3

np.load('my_array.npy') >>> np.sin(a)

>>> np.log(a) Boolean Indexing:
1 2 3
>>> e.dot(f) a[a<2]
Scikit-Learn Training and Test Data
from sklearn.model_selection import train_test_split

Cheat Sheet
X_train,X_test,y_train,y_test = train_test_split(X,y,
random_state = 0)#Splits data into training and test set

Sklearn is a free machine learning library for Python. It features various

Preprocessing The Data
Standardization
classification, regression and clustering algorithms. Standardizes the features by removing the mean and scaling to unit variance.
from sklearn.preprocessing import StandardScaler
Getting Started scaler = StandardScaler().fit(X_train)
standarized_X = scaler.transform(X_train)
The code below demonstrates the basic steps of using sklearn to create and run a model standarized_X_test = scaler.transform(X_test)
on a set of data.
The steps in the code include loading the data, splitting into train and test sets, scaling Normalization
Each sample (row of the data matrix) with at least one non-zero component is
the sets, creating the model, fitting the model on the data using the trained model to rescaled independently of other samples so that its norm equals one.
make predictions on the test set, and finally evaluating the performance of the model. from sklearn.preprocessing import Normalizer
from sklearn import neighbors,datasets,preprocessing scaler = Normalizer().fit(X_train)
normalized_X = scaler.transform(X_train)
from sklearn.model_selection import train_test_split normalized_X_test = scaler.transform(X_test)
from sklearn.metrics import accuracy_score
Binarization
iris = datasets.load_iris() Binarize data (set feature values to 0 or 1) according to a threshold.
X,y = iris.data[:,:2], iris.target from sklearn.preprocessing import Binarizer
X_train, X_test, y_train, y_test=train_test_split(X,y) binarizer = Binarizer(threshold = 0.0).fit(X)
scaler = preprocessing_StandardScaler().fit(X_train) binary_X = binarizer.transform(X_test)
X_train = scaler.transform(X_train) Encoding Categorical Features
X_test = scaler.transform(X_test) Imputation transformer for completing missing values.
knn = neighbors.KNeighborsClassifier(n_neighbors = 5) from sklearn import preprocessing
le = preprocessing.LabelEncoder()
knn.fit(X_train, y_train) le.fit_transform(X_train)
y_pred = knn.predict(X_test)
Imputing Missing Values
accuracy_score(y_test, y_pred)
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=0, strategy ='mean')
imp.fit_transform(X_train)
Loading the Data
Generating Polynomial Features
The data needs to be numeric and stored as NumPy arrays or SciPy spare matrix
from sklearn.preprocessing import PolynomialFeatures
(numeric arrays, such as Pandas DataFrame’s are also ok) poly = PolynomialFeatures(5)
>>> import numpy as np poly.fit_transform(X)
>>> X = np.random.random((10,5))
array([[0.21,0.33],
[0.23, 0.60],
[0.48, 0.62]])
>>> y = np.array(['A','B','A'])
array(['A', 'B', 'A'])
Create Your Model Evaluate Your Model’s Performance
Supervised Learning Models Classification Metrics
Linear Regression Accuracy Score
from sklearn.linear_model import LinearRegression knn.score(X_test,y_test)
from sklearn.metrics import accuracy_score
lr = LinearRegression(normalize = True) accuracy_score(y_test,y_pred)
Support Vector Machines (SVM)
from sklearn.svm import SVC Classification Report
from sklearn.metrics import classification_report
svc = SVC(kernel = 'linear') print(classification_report(y_test,y_pred))
Naive Bayes
from sklearn.naive_bayes import GaussianNB Confusion Matrix
from sklearn .metrics import confusion_matrix
gnb = GaussianNB() print(confusion_matrix(y_test,y_pred))
KNN
from sklearn import neighbors Regression Metrics
Mean Absolute Error
knn = neighbors.KNeighborsClassifier(n_neighbors = 5) from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test,y_pred)
Unsupervised Learning Models
Mean Squared Error
Principal Component Analysis (PCA) from sklearn.metrics import mean_squared_error
from sklearn.decomposition import PCA mean_squared_error(y_test,y_pred)
pca = PCA(n_components = 0.95)
R² Score
K means from sklearn.metrics import r2_score
from sklearn.cluster import KMeans r2_score(y_test, y_pred)
k_means = KMeans(n_clusters = 3, random_state = 0)
Clustering Metrics
Model Fitting Adjusted Rand Index
from sklearn.metrics import adjusted_rand_score
Fitting supervised and unsupervised learning models onto data. adjusted_rand_score(y_test,y_pred)
Supervised Learning
Homogeneity
lr.fit(X, y) # Fit the model to the data from sklearn.metrics import homogeneity_score
knn.fit(X_train,y_train) homogeneity_score(y_test,y_pred)
svc.fit(X_train,y_train)
V-measure
Unsupervised Learning from sklearn.metrics import v_measure_score
k_means.fit(X_train) # Fit the model to the data v_measure_score(y_test,y_pred)
pca_model = pca.fit_transform(X_train) #Fit to data,then transform
Tune Your Model
Prediction Grid Search
Predict Labels from sklearn.model_selection import GridSearchCV
params = {'n_neighbors':np.arange(1,3),
y_pred = lr.predict(X_test) # Supervised Estimators 'metric':['euclidean','cityblock']}
y_pred = k_means.predict(X_test) # Unsupervised Estimators grid = GridSearchCV(estimator = knn, param_grid = params)
Estimate probability of a label grid.fit(X_train, y_train)
print(grid.best_score_)
y_pred = knn.predict_proba(X_test) print(grid.best_estimator_)
Data Viz Barplot
x = ['USA', 'UK', 'Australia']
Seaborn

Cheat Sheet
y = [40, 50, 33] Workflow
plt.bar(x, y)
plt.show() import seaborn as sns
import matplotlib.pyplot as plt
Matplotlib is a Python 2D plotting library that produces Piechart import pandas as pd
Lineplot
figures in a variety of formats. plt.pie(y, labels=x, autopct='%.0f %%')
plt.show() plt.figure(figsize=(10, 5))
Figure flights = sns.load_dataset("flights")
Y-axis Histogram may_flights=flights.query("month=='May'")
ages = [15, 16, 17, 30, 31, 32, 35] ax = sns.lineplot(data=may_flights,
bins = [15, 20, 25, 30, 35] x="year",
plt.hist(ages, bins, edgecolor='black') y="passengers")
plt.show() ax.set(xlabel='x', ylabel='y',
title='my_title, xticks=[1,2,3])
Boxplots ax.legend(title='my_legend,
ages = [15, 16, 17, 30, 31, 32, 35] title_fontsize=13)
Matplotlib X-axis
plt.boxplot(ages) plt.show()
Workflow plt.show() Barplot
The basic steps to creating plots with matplotlib are Prepare Scatterplot tips = sns.load_dataset("tips")
a = [1, 2, 3, 4, 5, 4, 3 ,2, 5, 6, 7] ax = sns.barplot(x="day",
Data, Plot, Customize Plot, Save Plot and Show Plot. y="total_bill,
b = [7, 2, 3, 5, 5, 7, 3, 2, 6, 3, 2]
import matplotlib.pyplot as plt plt.scatter(a, b) data=tips)
Histogram
Example with lineplot plt.show()
penguins = sns.load_dataset("penguins")
Prepare data sns.histplot(data=penguins,
x = [2017, 2018, 2019, 2020, 2021] Subplots Boxplot
x="flipper_length_mm")
y = [43, 45, 47, 48, 50]
Add the code below to make multple plots with 'n' tips = sns.load_dataset("tips")
Plot & Customize Plot ax = sns.boxplot(x=tips["total_bill"])
number of rows and columns.
plt.plot(x,y,marker='o',linestyle='--',
fig, ax = plt.subplots(nrows=1, Scatterplot
color='g', label='USA') ncols=2, tips = sns.load_dataset("tips")
plt.xlabel('Years') sharey=True, sns.scatterplot(data=tips,
plt.ylabel('Population (M)') figsize=(12, 4)) x="total_bill",
Plot & Customize Each Graph y="tip")
plt.title('Years vs Population') ax[0].plot(x, y, color='g')
plt.legend(loc='lower right') ax[0].legend()
Figure aesthetics
ax[1].plot(a, b, color='r') sns.set_style('darkgrid') # stlyes
plt.yticks([41, 45, 48, 51]) sns.set_palette('husl', 3) # palettes
ax[1].legend()
Save Plot plt.show() sns.color_palette('husl') # colors
plt.savefig('example.png') Fontsize of the axes title, x and y labels, tick labels
Show Plot and legend:
plt.show() plt.rc('axes', titlesize=18)
Markers: '.', 'o', 'v', '<', '>' plt.rc('axes', labelsize=14)
plt.rc('xtick', labelsize=13)
Line Styles: '-', '--', '-.', ':' plt.rc('ytick', labelsize=13)
Colors: 'b', 'g', 'r', 'y' # blue, green, red, yellow plt.rc('legend', fontsize=13)
plt.rc('font', size=13)
Web Scraping “Siblings” are nodes with the same parent.
It’s recommended for beginners to use IDs to find
elements and if there isn't any build an XPath.
XPath

Cheat Sheet
We need to learn XPath to scrape with Selenium or
Scrapy.
Beautiful Soup
Web Scraping is the process of extracting data from a XPath Syntax
website. Before studying Beautiful Soup and Selenium, it's
Workflow An XPath usually contains a tag name, attribute
Importing the libraries name, and attribute value.
good to review some HTML basics first. from bs4 import BeautifulSoup
import requests //tagName[@AttributeName="Value"]
HTML for Web Scraping Fetch the pages
Let's take a look at the HTML element syntax. result=requests.get("www.google.com") Let’s check some examples to locate the article,
result.status_code # get status code title, and transcript elements of the HTML code we
Tag Attribute Attribute result.headers # get the headers
name name value End tag used before.
Page content
content = result.text //article[@class="main-article"]
<h1 class="title"> Titanic (1997) </h1> //h1
Create soup
soup = BeautifulSoup(content,"lxml") //div[@class="full-script"]
Attribute Affected content
HTML in a readable format
HTML Element print(soup.prettify()) XPath Functions and Operators
XPath functions
This is a single HTML element, but the HTML code behind a Find an element
soup.find(id="specific_id") //tag[contains(@AttributeName, "Value")]
website has hundreds of them.
Find elements XPath Operators: and, or
HTML code example
soup.find_all("a")
<article class="main-article"> soup.find_all("a","css_class") //tag[(expression 1) and (expression 2)]
<h1> Titanic (1997) </h1> soup.find_all("a",class_="my_class")
<p class="plot"> 84 years later ... </p> soup.find_all("a",attrs={"class":
"my_class"}) XPath Special Characters
<div class="full-script"> 13 meters. You ... </div> Get inner text
Selects the children from the node set on the
</article> sample = element.get_text() /
sample = element.get_text(strip=True, left side of this character
The HTML code is structured with “nodes”. Each rectangle below separator= ' ') Specifies that the matching node set should
Get specific attributes // be located at any level within the document
represents a node (element, attribute and text nodes) sample = element.get('href')
Specifies the current context should be used
Root Element Parent Node
. (refers to present node)
<article>
- Medium Guides/YouTube Tutorials
..
Here are my guides/tutorials and courses Refers to a parent node
A wildcard character that selects all
Element Attribute Element Element - Web Scraping Course * elements or attributes regardless of names
<h1> class="main-article" <p> <div>
Siblings - Data Science Course @ Select an attribute
- Automation Course () Grouping an XPath expression
Text Attribute Text Attribute Text
Titanic (1997) class="plot" 84 years later ... class="full-script"" 13 meters. You ... - Make Money Using Programming Skills Indicates that a node with index "n" should
[n]
be selected
Made by Frank Andrade: artificialcorner.com
Selenium 4 Scrapy
Note that there are a few changes between Selenium 3.x versions and Scrapy is the most powerful web scraping framework in Python, but it's a bit
Selenium 4. complicated to set up, so check my guide or its documentation to set it up.
Import libraries:
from selenium import webdriver Creating a Project and Spider
from selenium.webdriver.chrome.service import Service To create a new project, run the following command in the terminal.
scrapy startproject my_first_spider
web="www.google.com" To create a new spider, first change the directory.
path='introduce chromedriver path' cd my_first_spider
service = Service(executable_path=path) # selenium 4 Create an spider
driver = webdriver.Chrome(service=service) # selenium 4 scrapy genspider example example.com
driver.get(web)
The Basic Template
Note: When you create a spider, you obtain a template with the following content.
driver = webdriver.Chrome(path) # selenium 3.x
import scrapy
Find an element
driver.find_element(by="id", value="...") # selenium 4 class ExampleSpider(scrapy.Spider):
driver.find_element_by_id("write-id-here") # selenium 3.x name = 'example'
allowed_domains = ['example.com'] Class
Find elements start_urls = ['http://example.com/']
driver.find_elements(by="xpath", value="...") # selenium 4
driver.find_elements_by_xpath("write-xpath-here") # selenium 3.x
def parse(self, response):
Parse method
Quit driver pass
driver.quit()
The class is built with the data we introduced in the previous command, but the
Getting the text parse method needs to be built by us. To build it, use the functions below.
data = element.text
Finding elements
Implicit Waits To find elements in Scrapy, use the response argument from the parse method
import time
time.sleep(2) response.xpath('//tag[@AttributeName="Value"]')
Getting the text
Explicit Waits To obtain the text element we use text() and either .get() or .getall(). For example:
from selenium.webdriver.common.by import By response.xpath(‘//h1/text()’).get()
from selenium.webdriver.support.ui import WebDriverWait response.xpath(‘//tag[@Attribute=”Value”]/text()’).getall()
from selenium.webdriver.support import expected_conditions as EC
Return data extracted
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, To see the data extracted we have to use the yield keyword
'id_name')))
# Wait 5 seconds until an element is clickable def parse(self, response):
title = response.xpath(‘//h1/text()’).get()
Options: Headless mode, change window size
from selenium.webdriver.chrome.options import Options # Return data extracted
options = Options() yield {'titles': title}
options.headless = True
options.add_argument('window-size=1920x1080') Run the spider and export data to CSV or JSON
driver=webdriver.Chrome(service=service,options=options) scrapy crawl example
scrapy crawl example -o name_of_file.csv
scrapy crawl example -o name_of_file.json

Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Etk Material Curves 20121018
No ratings yet
Etk Material Curves 20121018
16 pages
Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Notes For Fintech Assesment, Cheatsheet
No ratings yet
Notes For Fintech Assesment, Cheatsheet
19 pages
Getting Started With Python Cheat Sheet
No ratings yet
Getting Started With Python Cheat Sheet
1 page
Python Cheat Sheet For Beginners
No ratings yet
Python Cheat Sheet For Beginners
1 page
Python BasicsGUIA PYTHON-01
No ratings yet
Python BasicsGUIA PYTHON-01
1 page
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Pythonn SE
No ratings yet
Pythonn SE
18 pages
python interviews
No ratings yet
python interviews
154 pages
DAO Cheatsheet
No ratings yet
DAO Cheatsheet
3 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Numpy
No ratings yet
Numpy
40 pages
Python_for_DataScience
No ratings yet
Python_for_DataScience
47 pages
Python
No ratings yet
Python
5 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
No ratings yet
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
156 pages
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
No ratings yet
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
9 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Pandas
No ratings yet
Pandas
5 pages
Cheat Sheet Template
No ratings yet
Cheat Sheet Template
3 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Python For Data Cheetsheet
No ratings yet
Python For Data Cheetsheet
13 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
DS Practical
No ratings yet
DS Practical
30 pages
Week 1: 1 The Python Programming Language: Functions
No ratings yet
Week 1: 1 The Python Programming Language: Functions
9 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Course_ Introduction to Data Science (SD211105)
No ratings yet
Course_ Introduction to Data Science (SD211105)
10 pages
Python For Data Analysis
67% (3)
Python For Data Analysis
39 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Introduction to Pandas Programming 2
No ratings yet
Introduction to Pandas Programming 2
3 pages
I.p file
No ratings yet
I.p file
20 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Learn Data Analysis With Pandas - Introduction
No ratings yet
Learn Data Analysis With Pandas - Introduction
2 pages
Comprehensive Python CheatSheet 1731972192
No ratings yet
Comprehensive Python CheatSheet 1731972192
10 pages
Imp Details
No ratings yet
Imp Details
6 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Combined Cheatsheet
No ratings yet
Combined Cheatsheet
5 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Manipulating Dataframes - Beginner
No ratings yet
Manipulating Dataframes - Beginner
2 pages
Experiment 678910
No ratings yet
Experiment 678910
12 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Using Python For Data Analysis - July 2018 - Slides
No ratings yet
Using Python For Data Analysis - July 2018 - Slides
43 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
TM Duct Inverter (18-55K NEW) (ESMA 2019) 29.11.2022
No ratings yet
TM Duct Inverter (18-55K NEW) (ESMA 2019) 29.11.2022
97 pages
Resumen CCNA Data Center 200-155
No ratings yet
Resumen CCNA Data Center 200-155
15 pages
Manupatra Boolean Operators
No ratings yet
Manupatra Boolean Operators
1 page
Ansys Modeling and Meshing Guide
100% (5)
Ansys Modeling and Meshing Guide
276 pages
TTL 1 Course Outline
No ratings yet
TTL 1 Course Outline
4 pages
Dynamics of The Mass Distribution of Liquid Ti
No ratings yet
Dynamics of The Mass Distribution of Liquid Ti
23 pages
Gr 11 Earth Science Reviewer
No ratings yet
Gr 11 Earth Science Reviewer
22 pages
Analytical Solutions of Eddy-Current Problems in A Finite Length Cylinder
No ratings yet
Analytical Solutions of Eddy-Current Problems in A Finite Length Cylinder
11 pages
Rampal Ahirwar Final Thesis
No ratings yet
Rampal Ahirwar Final Thesis
73 pages
CATALOGO ALLEN BRADLEY 150-qs001 - En-P
No ratings yet
CATALOGO ALLEN BRADLEY 150-qs001 - En-P
16 pages
Verilog-Mode Veritedium 20090925
No ratings yet
Verilog-Mode Veritedium 20090925
41 pages
Test Management
No ratings yet
Test Management
57 pages
Seat No.
No ratings yet
Seat No.
1 page
E9 The Synchronous Motor
100% (1)
E9 The Synchronous Motor
10 pages
EN 05SpecSheet C40-55-EU-Kubota Stage 5-Engine
No ratings yet
EN 05SpecSheet C40-55-EU-Kubota Stage 5-Engine
6 pages
4 QEM Process Capability
No ratings yet
4 QEM Process Capability
6 pages
Syllabus LTR+KOP+AKL
No ratings yet
Syllabus LTR+KOP+AKL
3 pages
AP6265 Series (Preliminary) : Features General Description
No ratings yet
AP6265 Series (Preliminary) : Features General Description
16 pages
(Ebook) Biogeochemistry of Inland Waters by Gene E. Likens ISBN 0123819962 instant download
100% (1)
(Ebook) Biogeochemistry of Inland Waters by Gene E. Likens ISBN 0123819962 instant download
49 pages
BSIT
No ratings yet
BSIT
3 pages
C250D5P
No ratings yet
C250D5P
4 pages
Composites and Applications
No ratings yet
Composites and Applications
46 pages
Beam Weld Splice Connection 1
No ratings yet
Beam Weld Splice Connection 1
3 pages
Siddhanta Shiromani CH 14 Translated
No ratings yet
Siddhanta Shiromani CH 14 Translated
3 pages
UniversityPhysicsVolume1-Ch07
No ratings yet
UniversityPhysicsVolume1-Ch07
2 pages
The Method of Animation: 1 A Short Refresher On Projective Coordinates
50% (2)
The Method of Animation: 1 A Short Refresher On Projective Coordinates
9 pages
Mind Map Stoikiometri 1
0% (1)
Mind Map Stoikiometri 1
1 page
Graphic Design and Layout Lab LM1
No ratings yet
Graphic Design and Layout Lab LM1
3 pages
Math 11 Gen Math Q2 Week 7
100% (1)
Math 11 Gen Math Q2 Week 7
18 pages

Python For Data Science Cheat Sheet 2.0

Uploaded by

Python For Data Science Cheat Sheet 2.0

Uploaded by

Python

Pandas | Numpy | Sklearn

Cheat Sheet delimiter=',') b.max(axis= 0)

Saving & Loading On Disk: >>> np.multiply(a,b) b[1,2] 4 5 6

np.save('my_array', a) >>> np.exp(b) Slicing:

np.load('my_array.npy') >>> np.sin(a)

Sklearn is a free machine learning library for Python. It features various

You might also like