Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

PythonForMachineLearning

The document provides an overview of using Python for machine learning, highlighting its advantages such as a large community and extensive libraries like TensorFlow. It covers installation steps, basic commands for data manipulation using libraries like pandas and matplotlib, and introduces concepts like data cleaning, visualization, and statistical analysis. Additionally, it discusses advanced topics like dimensionality reduction and association rules, as well as time series forecasting techniques.

Uploaded by

salmaabo341
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

PythonForMachineLearning

The document provides an overview of using Python for machine learning, highlighting its advantages such as a large community and extensive libraries like TensorFlow. It covers installation steps, basic commands for data manipulation using libraries like pandas and matplotlib, and introduces concepts like data cleaning, visualization, and statistical analysis. Additionally, it discusses advanced topics like dimensionality reduction and association rules, as well as time series forecasting techniques.

Uploaded by

salmaabo341
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

PYTHON MACHINE

LEARNING
Why Python?
 A large community
 tons of machine learning specific libraries
 easy to learn
 TensorFlow make Python the leading language in the data science community.

About Python
 It is case sensitive
 Text indentation is important in logic flow!
 Use # symbol to add a code comment
 Use ‘’’ ‘’’ to comment a block of code
Prepare your machine
 Install Python 2.7.xx
 Pip install pandas
 Pip install matplotlib Sklearn Statistics scipy seaborn Ipython
pip install --no-cache-dir pystan #to force redownload exist package in case of error
 Get Data set for training through
http://archive.ics.uci.edu/ml/datasets.html
How to Run Python ML application
 Open “IDLE (Python GUI)”  File  New File
 Write your code on the new window then  Run  Run Module

Alt+3  Comment lines


Alt+4  Remove lines comments
Anaconda
 Anaconda is Python distribution for data scientists .
 It has around 270 packages including the most important ones for most scientific
applications, data analysis, and machine learning such as NumPy, SciPy, Pandas,
IPython, matplotlib, and scikit-learn.
 Download it for free from https://www.continuum.io/downloads
Import Statistical libraries
import time
import random
import datetime
import pandas as pd
import matplotlib.pyplot as plt
import statistics
from scipy import stats
import sklearn
import seaborn
from IPython.display import Image
import numpy as np
import pandas as pd
 Pandas DataFrame similar to Python lists but it has extra column (index) and Any
operation to perform on a DataFrame , get’s performed on every single element inside it.
 Example of convert python list to DataFrame 

Common Operations
1) Set dataset columns names  data.columns = [‘blue’,’green’,’red’]
2) Get value of any cell use cell index and column index/name like this : data.ix[2,’red’]
3) replace NaN with zero using : data = data.replace(np.nan,0)
4) Select a subset of columns in new dataFrame : NewDF= data [ [ ‘blue’ , ’red’ ] ]
5) Count of rows that has data on a column: data[‘red'].value_counts().sum()
The missing values are dennoted by NaN __
6) Select count of distinct value on a column: data[‘blue’].nunique()
7) Drop duplicates rows: data = data.drop_duplicates()
Basic commands
Reading the Data from disk into Memory
 data = pd.read_csv('examples/trip.csv’)

Reading the Data from HDFS into Memory


 with hd.open("/home/file.csv") as f:
 data = pd.read_csv(f)

Printing Size of the Dataset and Printing First/Last Few Rows


 print len(data)
 data.head()
 data.tail()

 data.describe() #get data summary count, mean, the min and max values

Order Dataset and get first and last element of a column


 data = data.sort_values(by= 'birthyear')
 data.reset_index()
 print data.ix[1, 'birthyear’ ] #first element
 print data.ix[len(data)-1, 'birthyear’ ] #last element
DataFrame cleaning
 If need to convert column with negative values
data['AwaitingTime'] = data['AwaitingTime'].apply(lambda x: abs(x))
 Convert string values to integer values (DayOfTheWeek to Integers)
Day_mapping = {'Monday' : 0, 'Tuesday' : 1, 'Wednesday' : 2, 'Thursday' : 3, 'Saturday' : 5, 'Sunday' : 6}
data['DayOfTheWeek'] = data['DayOfTheWeek'].map(Day_mapping)
 Remove rows that has age column less than zero
data = data[data['Age'] >= 0]
 Delete column from Dataset
del data[‘from_station_id']
Filter data to remove rows that has NA/0
> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
 df = df[(df.T != 0).any()] #keep row if any feature contain values
> print df
> df = df[(df.T != 0).all()] #keep row if all features contain values
> print df

Original DataFrame Keep raw if ANY contains a value Keep raw if ALL contains values

a b a b a b
1 0 0 2 1 0 4 1 1
2 1 0 3 0 1
3 0 1 4 1 1 df[(df.T != 0).all()]

4 1 1
df[(df.T != 0).any()]
Drop the columns where Any/All elements are NaN
 data.dropna(how='any’)

how : {‘any’, ‘all’}


 any : if any NA values are present, drop that label
 all : if all values are NA, drop that label
Drop duplicate rows
Consider duplicate if all the row are the same, return result in new dataframe
 dataframe1 = df.drop_duplicates(subset=None, inplace=False)

Consider duplicate if Column1, Column3 are the same, drop duplicate except last row, return result in the same dataframe
 df.drop_duplicates(subset=[‘Column1', ‘Column3'], keep=‘last’)

Save unique data


 df.to_csv(file_name_output)
Split Dataset, Create training, and test dataset
How to split dataset by 70-30?
[70% of the observations fall into training and 30% of observations fall into the test dataset.]

 x_train,x_test, y_train, y_test = train_test_split(data[‘x’],data[‘y'], test_size=0.3, random_state=1)


import matplotlib.pyplot as plt
 plt.bar – creates a bar chart
 plt.scatter – makes a scatter plot
 plt.boxplot – makes a box and whisker plot
 plt.hist – makes a histogram
 plt.plot – creates a line plot
Plot for Data array
import numpy as np
import matplotlib.pyplot as plt

Data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])


Column1=0
Column2=1

for i in range( len(Data) ):


plt.plot( Data[ i ] [Column1] , Data[ i ] [Column2] ,'r.’) #r. :red

plt.show()
Bar plot for Data array
import numpy as np
import matplotlib.pyplot as plt

Data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])


Column1=0
Column2=1
BarWidth=1/1.5
for i in range( len(Data) ):
plt.bar( Data[ i ] [Column1] , Data[ i ] [Column2] , BarWidth , color="blue")
plt.show()
Scatter plot for Data
import numpy as np
import matplotlib.pyplot as plt
Data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])
Column1=0
Column2=1

plt.scatter( Data[:, Column1] , Data[:, Column2] , marker = "x“ , s=150 , linewidths = 5 , zorder = 1)
plt.show()

#Where s is the size of the marker


Calculating Pair Plot Between All Features
import numpy as np
import matplotlib.pyplot as plt
data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])

import pandas
df = pandas.DataFrame(Data)

#Calculating Pair Plot Between All Features


seaborn.pairplot(df)

plt.show()
Basic commands
Sort, Filter, group, and Plotting the Data
 data = data.sort_values(by='birthyear’)
 data = data[(data['birthyear'] >= 1931) & (data['birthyear']<=1999)]
 groupby_birthyear = data.groupby('birthyear').size()
 groupby_birthyear.plot.bar(title = 'Distribution of birth years’, figsize = (15,4))
 plt.show()
Stacked column chart
data1 = data.groupby(['birthyear', 'gender']).size().unstack('gender').fillna(0)
data1.plot.bar(title ='Distribution of birth years by Gender', stacked=True, figsize = (15,4))
Convert string to DateTime

Create new column as date/time


 data['StartTime1'] = data['starttime'].apply(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y %H:%M") )

Extract Year, Month, Day, and Hour


 data['year'] = data['StartTime1'].apply(lambda x: x.year )
 data['month'] = data['StartTime1'].apply(lambda x: x.month )
 data['day'] = data['StartTime1'].apply(lambda x: x.day )
 data['hour'] = data['StartTime1'].apply(lambda x: x.hour )
Split String column to create new columns for Year-Month-Day

for index, component in enumerate(['year', 'month', 'day']):


data[component] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-')[index]))

OR
data[‘year’] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-’)[0]))
data[‘month’] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-’)[1]))
data[‘day’] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-’)[2]))

Notes
‘abcde’[0:2]  ‘ab’
‘abcde’[:2]  ‘ab’
‘abcde’[:-1]  ‘abcd’
Concepts
 Mean : Average
 Median: When the frequency of observations in the data is odd, the middle
data point is returned as the median.
 Mode: returns the observation in the dataset with the highest frequency.
 Variance: represents variability of data points about the mean.
 Standard deviation: just like variance, also captures the spread of data along
the mean. The only difference is that it is a square root of the variance.
 Normal distribution (Gaussian distribution): the mean lies at the center of this
distribution with a spread (i.e., standard deviation) around it. Some 68% of the
observations lie within 1 standard deviation from the mean; 95% of the
observations lie within 2 standard deviations from the mean, whereas 99.7% of
the observations lie within 3 standard deviations from the mean.
 Outliers: refer to the values distinct from majority of the observations. These
occur either naturally, due to equipment failure, or because of entry mistakes.
Plot mean
Draw mean of ‘TripDuration’ group by ‘StartTime_date’
 data.groupby('starttime_date')['tripduration'].mean()
.plot.bar(title = 'Distribution of Trip duration by date', figsize = (15,4))

Calculate Mean, Standard deviation, Median


 print data['tripduration_mean'].mean()
 print data['tripduration_mean'].std()
 print data['tripduration_mean'].median()
Seasonal pattern vs Cyclic Pattern vs Trend
 Seasonal: pattern over fixed period, like monthly pattern
 Cycle: pattern overall without worry about periods, to check
if patterns repeat over non-periodic time cycles.
 Trend: In long-term does the continuous variable increase/decrease

Seasonal pattern Cycle pattern Trend


Correlation directions
Calculate the correlation
 pd.set_option('precision', 3)
 correlations = data[['tripduration','age']].corr()
 print(correlations)
.corr(method='pearson')
method : {‘pearson’, ‘kendall’, ‘spearman’}

pearson : standard correlation coefficient


kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation

For more information http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/


Rename Dataset columns' names
 Get data from the next URL and save as “concrete_data.csv”
http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

 data = pd.read_csv('examples/concrete_data.csv’)
 print len(data)
 data.head()

Renaming the Columns


 data.columns = ['cement_component', 'furnace_slag', 'flay_ash’,\
'water_component’, 'superplasticizer', 'coarse_aggregate’, \
'fine_aggregate', 'age', 'concrete_strength']
Loop through dataset
Draw relation between each feature and concrete_strength

plt.figure(figsize=(15,10.5))
plot_count = 1
for feature in list(data.columns)[:-1]:
plt.subplot(3,3,plot_count)
plt.scatter(data[feature], data['concrete_strength'])
plt.xlabel(feature.replace('_',' ').title())
plt.ylabel('Concrete strength')
plot_count+=1
plt.show()
#//Get the correlation between data
pd.set_option('display.width', 100)
pd.set_option('precision', 3)
correlations = data.corr(method='pearson')
print(correlations)
Dimensionality Reduction
PCA Dimensionality Reduction
When we have a big dataset and we need to reduce the number of columns for fast
training (remove redundancy features).
Example : Dataset (data) contains 20 features and we need to reduce them to 4 features

from sklearn.decomposition import PCA

NewDataFrame = PCA(n_components = 4).fit_transform(data)


Association Algorithm
Association Rules
Association rules analysis is a technique to uncover how items are
associated to each other.
There are three common ways to measure association:
1) Support: This says how popular an itemset is, as measured by the
proportion of transactions in which an itemset appears.

2) Confidence: This says how likely item Y is purchased when item X is


purchased (not good if one item is common)

3) Lift: This says how likely item Y is purchased when item X is purchased
Association Rules (Apriori Algorithm)
from apyori import apriori

transactions = [
['beer', 'nuts'],
['beer', 'cheese'],
]
results = list(apriori(transactions))
Association Rules (Example)
 pip install apyori

# Import the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv(‘apriori_data2.csv’, header = None)
records = [] for i in range(0, 11):
records.append([str(dataset.values[i,j]) for j in range(0, 10)])
# Train Apriori Model
from apyori import apriori
rules = apriori(records, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)
# Visualising the results
results = list(rules)
Time Series
Supervised machine learning
Time Series
A time series forecast is different from regression in that time acts as an exploratory variable and should be continuous
along equal intervals.

Algorithms
1) Autoregressive Forecast Model: uses observations at previous time steps to predict observations at future time step.

2) ARIMA Forecast Model: linear model, predict after remove trends and seasonality.

3) Prophet: Facebook library, it is quick and gives very good results, forecasting time series data.
Related Packages: pip install fbprophet
The input to Prophet is always a dataframe with two columns: ds and y.
The ds (datestamp) column must contain a date or datetime (either is fine).
The y column must be numeric, and represents the measurement we wish to forecast,
It is good to use log(y) and not the actual y to remove trends and noise data
FB Prophet
COMPUTER VISION
Open CV for computer vision
 To install OpenCV pip install opencv-python
 Normal images are 4 dimensional array for each pixel (Blue ,Green, Red, and Alpha)
 It is normal to process images in Gray scale for fast performance
 Video is just a loop of images. So, any code processing images can process video too

For more information about open CV please visit


 https://www.tutorialspoint.com/opencv/index.htm
 https://www.youtube.com/playlist?list=PLQVvvaa0QuDdttJXlLtAJxJetJcqmqlQq
Common OpenCV functions
 image = cv2.imread(ImgPATH) #Read image
 cv2.cvtColor(ImgPATH, cv2.COLOR_BGR2GRAY) #Convert to gray scale
 cv2.imshow( ‘Window Title’, image ) #Show Image in a window
 cv2.imwrite(ImgPATH, image) # to write image to HD
 _ , image = cv2.threshold(image, threshold, maxval, thresholdType) #change pixel to maxval if value greater than threshold
 image = cv2.adaptiveThreshold(image, maxValue, adaptiveMethod, thresholdType, blockSize, C) #Threshold is automatic calculated

Original THRESH_BINARY THRESH_BINARY_INV THRESH_TRUNC THRESH_TOZERO THRESH_TOZERO_INV


thresholdType

adaptiveMethod
 ADAPTIVE_THRESH_MEAN_C : threshold value is the mean of neighborhood area.
 ADAPTIVE_THRESH_GAUSSIAN_C : threshold value is the weighted sum of neighborhood values where weights are a Gaussian window.
blockSize : A variable of the integer type representing size of the pixelneighborhood used to calculate the threshold value.
C : A variable of double type representing the constant used in the both methods (subtracted from the mean or weighted mean).
OpenCV Video Stream Example
#VideoStream1 = cv2.VideoCapture('c:/boxed-correct.avi’)  read stream from avi video file
VideoStream1 = cv2.VideoCapture(0)  read stream from the first connected video cam
while True:
IsReturnFrame1,ImgFrame1=VideoStream1.read()
if IsReturnFrame1 == 0: break
GrayImgFrame1=cv2.cvtColor(ImgFrame1,cv2.COLOR_BGR2GRAY)
cv2.imshow('ImageTitle1',ImgFrame1)
cv2.imshow('ImageTitle2',GrayImgFrame1)
if cv2.waitKey(0): break

VideoStream1.release()
cv2.destroyAllWindows()
Image processing using OpenCV

Original

Threshold_Color

ThresholdGray Threshold_GAUSSIAN Threshold_MEAN


Object Recognition MainImage

template

Search for image with exact lighting/scale/angle

Result
NLP

Natural Language Processing


NLP (Natural Language Processing)
Why we need NLP packages?
 NLP Package handle a wide range of tasks such as Named-entity recognition , part-of-speech (POS) tagging, sentiment
analysis, document classification, topic modeling, and much more.

Named-entity recognition: means extract names of persons, organizations, locations, time, quantities, percentages, etc.
example: Jim bought 300 shares of Acme Corp. in 2006.  [Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.
Part-of-speech tagging: used to identification of words as nouns, verbs, adjectives, adverbs, etc.

Top 5 Python NLP libraries


 NLTK good as education and research tool. Its modularized structure makes it excellent for learning and exploring NLP
concepts, but it's not meant for production.
 TextBlob is built on top of NLTK, and it's more easily-accessible. good library for fast-prototyping or building applications that
don't require highly optimized performance. Beginners should start here.
 Stanford's CoreNLP is a Java library with Python wrappers. It's in many existing production systems due to its speed.
 SpaCy is a new NLP library that's designed to be fast, streamlined, and production-ready. It's not as widely adopted.
 Gensim is most commonly used for topic modeling and similarity detection. It's not a general-purpose NLP library, but for the
tasks it does handle, it does them well.
NLTK Package for Natural Language Processing
 Pip install nltk
 To download all packages using GUI, write  nltk.download()
NLTK concepts
Tokenizing
 Word tokenizers : split text by words
 Sentence tokenizers : split text by paragraphs

Lexicon
 Get words and their actual means in the context

Corpora
 Text classification. Ex: medical journals, presidential speech

Lemmatizing (stemming)
 return the word to its root, ie (gone, went, going)  go
WordNet
 List of different words that have the same meaning for the given word
NLTK Simple example
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords

EXAMPLE_TEXT = "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn't eat cardboard."
WordsArray = sent_tokenize(EXAMPLE_TEXT)
print(WordsArray)

ListOfAvailableStopWords= set(stopwords.words("english"))
print(ListOfAvailableStopWords)
textblob
 pip install textblob
 python -m textblob.download_corpora
Sentimental Analysis Example
 We have a sample of 3000 random users comments on imdb.com, amazon.com, and
yelp.com website
http://archive.ics.uci.edu/ml/machine-learning-databases/00331/
 Each comment has a score, Score is either 1 (for positive) or 0 (for negative)
Python Deep Learning
Common Deep learning fields
Main Fields Visual examples:
 Computer vision  Colorization of Black and White Images.
https://www.youtube.com/watch?v=_MJU8VK2PI4
 Speech recognition
 Adding Sounds To Silent Movies.
 Natural language processing https://www.youtube.com/watch?v=0FW99AQmMc8

 Audio recognition  Automatic Machine Translation.

 Social network filtering  Object Classification in Photographs.

 Machine translation  Automatic Handwriting Generation.


 Character Text Generation.
 Image Caption Generation.
 Automatic Game Playing.
https://www.youtube.com/watch?v=TmPfTpjtdgg
Common Deep learning development tool (Keras)
 Keras is a free Artificial Neural Networks (ANN) library (deep learning library).
 it is a high-level neural networks API, written in Python and capable of running on top of
TensorFlow, CNTK, or Theano.
 It was developed with a focus on enabling fast experimentation. Being able to go from
idea to result with the least possible delay is key to doing good research.

Keras
TensorFlow CNTK Theano
Deep learning Computer vision (image processing)
3 common ways to detect objects
 median based features
 edge based features
 threshold based features
How to use Neural Network Models in Keras
Five steps
1. Define Network.
2. Compile Network.
3. Fit Network.
4. Evaluate Network.
5. Make Predictions.
Step 1. Define Network
 Neural networks are defined in Keras as a sequence of layers.
 The first layer in the network must define the number of inputs to expect. for a Multilayer
Perceptron model this is specified by the input_dim attribute.
 Example of small Multilayer Perceptron model (2 inputs, 5 hidden layers, 1 output)
model = Sequential()
model.add(Dense(5, input_dim=2))
model.add(Dense(1))
 Re-write after add activation function
model = Sequential()
model.add(Dense(5, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Available Activation Functions
Optional, it is like a filter, used to solve some common predictive modeling problem, to get significant boost in performance.
 Sigmoid: used for Binary Classification (2 class) one neuron the output layer. What ever the input it will map to zero or one.
 Softmax: used for Multiclass Classification (>2 class), one output neuron per class value.
 Linear: used for Regression, the number of neurons matching the number of outputs.
 Tanh: what ever the input it will convert to number between -1 and 1
 Relu: either 0 for a<0 or a for a>0. so, it just remove the negative values and pass the positive as it is.
 LeakyReLU: minimize the value of negative values and pass as it is if positive
 elu
 selu
 softplus
 softsign
 hard_sigmoid
 PReLU
 ELU
 ThresholdedReLU
Step 2. Compile Network
 Specifically the optimization algorithm to use to train the network and the loss function used to evaluate the network that is
minimized by the optimization algorithm.

model.compile(optimizer='sgd', loss='mse’)

Optimizers tool to minimize loss between prediction and real value. Commonly used optimization algorithms:
 ‘sgd‘ (Stochastic Gradient Descent) requires the tuning of a learning rate and momentum.
 ADAM requires the tuning of learning rate.
 RMSprop requires the tuning of learning rate.

Loss functions:
 Regression: Mean Squared Error or ‘mse‘.
 Binary Classification (2 class): ‘binary_crossentropy‘.
 Multiclass Classification (>2 class): ‘categorical_crossentropy‘.

Finally, you can also specify metrics to collect while fitting the model in addition to the loss function. Generally, the most useful
additional metric to collect is accuracy.

model.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])


Optimizers
Step 3. Fit Network
history = model.fit(X, y, batch_size=10, epochs=100)

The network is trained using the backpropagation algorithm


 Batch size is the number of samples that going to be propagated through the network.
 epochs is the number of training times (for ALL the training examples)

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2
iterations to complete 1 epoch.

Step 4. Evaluate Network
 The model evaluates the loss across all of the test patterns, as well as any other metrics
specified when the model was compiled.

 For example, for a model compiled with the accuracy metric, we could evaluate it on
a new dataset as follows:
loss, accuracy = model.evaluate(X, Y)
print("Loss: %.2f, Accuracy: %.2f% %" % (loss, accuracy*100))
Step 5. Make Predictions
probabilities = model.predict(X)
predictions = [float(round(x)) for x in probabilities]
accuracy = numpy.mean(predictions == Y) #count the number of True and divide by the total size
print("Prediction Accuracy: %.2f% %" % (accuracy*100))
Binary classification using Neural Network in Keras
Diabetes Data Set
Detect Diabetes Disease based on analysis
Dataset Attributes:
1. Number of times pregnant
2. Plasma
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
Save prediction model
 After train our model, ie, model.fit(X_train, Y_train), we can save this traning to use later.
 This task can done by Pickle package(Python Object Serialization Library), using dump
and load methods. Pickle can save any object not just the prediction model.

import pickle
…………….
model.fit(X_train, Y_train)
# save the model to disk
pickle.dump(model, open("c:/data.dump", 'wb’)) #wb= write bytes

# some time later... load the model from disk


model = pickle.load(open("c:/data.dump", 'rb’)) #rb= read bytes

You might also like