0% found this document useful (0 votes)

6 views

PythonForMachineLearning

The document provides an overview of using Python for machine learning, highlighting its advantages such as a large community and extensive libraries like TensorFlow. It covers installation steps, basic commands for data manipulation using libraries like pandas and matplotlib, and introduces concepts like data cleaning, visualization, and statistical analysis. Additionally, it discusses advanced topics like dimensionality reduction and association rules, as well as time series forecasting techniques.

Uploaded by

salmaabo341

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

PythonForMachineLearning

Uploaded by

salmaabo341

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

PYTHON MACHINE

LEARNING
Why Python?
 A large community
 tons of machine learning specific libraries
 easy to learn
 TensorFlow make Python the leading language in the data science community.

About Python
 It is case sensitive
 Text indentation is important in logic flow!
 Use # symbol to add a code comment
 Use ‘’’ ‘’’ to comment a block of code
Prepare your machine
 Install Python 2.7.xx
 Pip install pandas
 Pip install matplotlib Sklearn Statistics scipy seaborn Ipython
pip install --no-cache-dir pystan #to force redownload exist package in case of error
 Get Data set for training through
http://archive.ics.uci.edu/ml/datasets.html
How to Run Python ML application
 Open “IDLE (Python GUI)”  File  New File
 Write your code on the new window then  Run  Run Module

Alt+3  Comment lines

Alt+4  Remove lines comments
Anaconda
 Anaconda is Python distribution for data scientists .
 It has around 270 packages including the most important ones for most scientific
applications, data analysis, and machine learning such as NumPy, SciPy, Pandas,
IPython, matplotlib, and scikit-learn.
 Download it for free from https://www.continuum.io/downloads
Import Statistical libraries
import time
import random
import datetime
import pandas as pd
import matplotlib.pyplot as plt
import statistics
from scipy import stats
import sklearn
import seaborn
from IPython.display import Image
import numpy as np
import pandas as pd
 Pandas DataFrame similar to Python lists but it has extra column (index) and Any
operation to perform on a DataFrame , get’s performed on every single element inside it.
 Example of convert python list to DataFrame 

Common Operations
1) Set dataset columns names  data.columns = [‘blue’,’green’,’red’]
2) Get value of any cell use cell index and column index/name like this : data.ix[2,’red’]
3) replace NaN with zero using : data = data.replace(np.nan,0)
4) Select a subset of columns in new dataFrame : NewDF= data [ [ ‘blue’ , ’red’ ] ]
5) Count of rows that has data on a column: data[‘red'].value_counts().sum()
The missing values are dennoted by NaN __
6) Select count of distinct value on a column: data[‘blue’].nunique()
7) Drop duplicates rows: data = data.drop_duplicates()
Basic commands
Reading the Data from disk into Memory
 data = pd.read_csv('examples/trip.csv’)

Reading the Data from HDFS into Memory

 with hd.open("/home/file.csv") as f:
 data = pd.read_csv(f)

Printing Size of the Dataset and Printing First/Last Few Rows

 print len(data)
 data.head()
 data.tail()

 data.describe() #get data summary count, mean, the min and max values

Order Dataset and get first and last element of a column

 data = data.sort_values(by= 'birthyear')
 data.reset_index()
 print data.ix[1, 'birthyear’ ] #first element
 print data.ix[len(data)-1, 'birthyear’ ] #last element
DataFrame cleaning
 If need to convert column with negative values
data['AwaitingTime'] = data['AwaitingTime'].apply(lambda x: abs(x))
 Convert string values to integer values (DayOfTheWeek to Integers)
Day_mapping = {'Monday' : 0, 'Tuesday' : 1, 'Wednesday' : 2, 'Thursday' : 3, 'Saturday' : 5, 'Sunday' : 6}
data['DayOfTheWeek'] = data['DayOfTheWeek'].map(Day_mapping)
 Remove rows that has age column less than zero
data = data[data['Age'] >= 0]
 Delete column from Dataset
del data[‘from_station_id']
Filter data to remove rows that has NA/0
> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
 df = df[(df.T != 0).any()] #keep row if any feature contain values
> print df
> df = df[(df.T != 0).all()] #keep row if all features contain values
> print df

Original DataFrame Keep raw if ANY contains a value Keep raw if ALL contains values

a b a b a b
1 0 0 2 1 0 4 1 1
2 1 0 3 0 1
3 0 1 4 1 1 df[(df.T != 0).all()]

4 1 1
df[(df.T != 0).any()]
Drop the columns where Any/All elements are NaN
 data.dropna(how='any’)

how : {‘any’, ‘all’}

 any : if any NA values are present, drop that label
 all : if all values are NA, drop that label
Drop duplicate rows
Consider duplicate if all the row are the same, return result in new dataframe
 dataframe1 = df.drop_duplicates(subset=None, inplace=False)

Consider duplicate if Column1, Column3 are the same, drop duplicate except last row, return result in the same dataframe
 df.drop_duplicates(subset=[‘Column1', ‘Column3'], keep=‘last’)

Save unique data

 df.to_csv(file_name_output)
Split Dataset, Create training, and test dataset
How to split dataset by 70-30?
[70% of the observations fall into training and 30% of observations fall into the test dataset.]

 x_train,x_test, y_train, y_test = train_test_split(data[‘x’],data[‘y'], test_size=0.3, random_state=1)

import matplotlib.pyplot as plt
 plt.bar – creates a bar chart
 plt.scatter – makes a scatter plot
 plt.boxplot – makes a box and whisker plot
 plt.hist – makes a histogram
 plt.plot – creates a line plot
Plot for Data array
import numpy as np
import matplotlib.pyplot as plt

Data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])

Column1=0
Column2=1

for i in range( len(Data) ):

plt.plot( Data[ i ] [Column1] , Data[ i ] [Column2] ,'r.’) #r. :red

plt.show()
Bar plot for Data array
import numpy as np
import matplotlib.pyplot as plt

Data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])

Column1=0
Column2=1
BarWidth=1/1.5
for i in range( len(Data) ):
plt.bar( Data[ i ] [Column1] , Data[ i ] [Column2] , BarWidth , color="blue")
plt.show()
Scatter plot for Data
import numpy as np
import matplotlib.pyplot as plt
Data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])
Column1=0
Column2=1

plt.scatter( Data[:, Column1] , Data[:, Column2] , marker = "x“ , s=150 , linewidths = 5 , zorder = 1)
plt.show()

#Where s is the size of the marker

Calculating Pair Plot Between All Features
import numpy as np
import matplotlib.pyplot as plt
data=np.array([ [1,2], [3,2], [4,6], [7,2], [1,4], [9,1] ])

import pandas
df = pandas.DataFrame(Data)

#Calculating Pair Plot Between All Features

seaborn.pairplot(df)

plt.show()
Basic commands
Sort, Filter, group, and Plotting the Data
 data = data.sort_values(by='birthyear’)
 data = data[(data['birthyear'] >= 1931) & (data['birthyear']<=1999)]
 groupby_birthyear = data.groupby('birthyear').size()
 groupby_birthyear.plot.bar(title = 'Distribution of birth years’, figsize = (15,4))
 plt.show()
Stacked column chart
data1 = data.groupby(['birthyear', 'gender']).size().unstack('gender').fillna(0)
data1.plot.bar(title ='Distribution of birth years by Gender', stacked=True, figsize = (15,4))
Convert string to DateTime

Create new column as date/time

 data['StartTime1'] = data['starttime'].apply(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y %H:%M") )

Extract Year, Month, Day, and Hour

 data['year'] = data['StartTime1'].apply(lambda x: x.year )
 data['month'] = data['StartTime1'].apply(lambda x: x.month )
 data['day'] = data['StartTime1'].apply(lambda x: x.day )
 data['hour'] = data['StartTime1'].apply(lambda x: x.hour )
Split String column to create new columns for Year-Month-Day

for index, component in enumerate(['year', 'month', 'day']):

data[component] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-')[index]))

OR
data[‘year’] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-’)[0]))
data[‘month’] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-’)[1]))
data[‘day’] = data[‘AppointmentRegistration’].apply(lambda x: int(x.split('T')[0].split('-’)[2]))

Notes
‘abcde’[0:2]  ‘ab’
‘abcde’[:2]  ‘ab’
‘abcde’[:-1]  ‘abcd’
Concepts
 Mean : Average
 Median: When the frequency of observations in the data is odd, the middle
data point is returned as the median.
 Mode: returns the observation in the dataset with the highest frequency.
 Variance: represents variability of data points about the mean.
 Standard deviation: just like variance, also captures the spread of data along
the mean. The only difference is that it is a square root of the variance.
 Normal distribution (Gaussian distribution): the mean lies at the center of this
distribution with a spread (i.e., standard deviation) around it. Some 68% of the
observations lie within 1 standard deviation from the mean; 95% of the
observations lie within 2 standard deviations from the mean, whereas 99.7% of
the observations lie within 3 standard deviations from the mean.
 Outliers: refer to the values distinct from majority of the observations. These
occur either naturally, due to equipment failure, or because of entry mistakes.
Plot mean
Draw mean of ‘TripDuration’ group by ‘StartTime_date’
 data.groupby('starttime_date')['tripduration'].mean()
.plot.bar(title = 'Distribution of Trip duration by date', figsize = (15,4))

Calculate Mean, Standard deviation, Median

 print data['tripduration_mean'].mean()
 print data['tripduration_mean'].std()
 print data['tripduration_mean'].median()
Seasonal pattern vs Cyclic Pattern vs Trend
 Seasonal: pattern over fixed period, like monthly pattern
 Cycle: pattern overall without worry about periods, to check
if patterns repeat over non-periodic time cycles.
 Trend: In long-term does the continuous variable increase/decrease

Seasonal pattern Cycle pattern Trend

Correlation directions
Calculate the correlation
 pd.set_option('precision', 3)
 correlations = data[['tripduration','age']].corr()
 print(correlations)
.corr(method='pearson')
method : {‘pearson’, ‘kendall’, ‘spearman’}

pearson : standard correlation coefficient

kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation

For more information http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/

Rename Dataset columns' names
 Get data from the next URL and save as “concrete_data.csv”
http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

 data = pd.read_csv('examples/concrete_data.csv’)
 print len(data)
 data.head()

Renaming the Columns

 data.columns = ['cement_component', 'furnace_slag', 'flay_ash’,\
'water_component’, 'superplasticizer', 'coarse_aggregate’, \
'fine_aggregate', 'age', 'concrete_strength']
Loop through dataset
Draw relation between each feature and concrete_strength

plt.figure(figsize=(15,10.5))
plot_count = 1
for feature in list(data.columns)[:-1]:
plt.subplot(3,3,plot_count)
plt.scatter(data[feature], data['concrete_strength'])
plt.xlabel(feature.replace('_',' ').title())
plt.ylabel('Concrete strength')
plot_count+=1
plt.show()
#//Get the correlation between data
pd.set_option('display.width', 100)
pd.set_option('precision', 3)
correlations = data.corr(method='pearson')
print(correlations)
Dimensionality Reduction
PCA Dimensionality Reduction
When we have a big dataset and we need to reduce the number of columns for fast
training (remove redundancy features).
Example : Dataset (data) contains 20 features and we need to reduce them to 4 features

from sklearn.decomposition import PCA

NewDataFrame = PCA(n_components = 4).fit_transform(data)

Association Algorithm
Association Rules
Association rules analysis is a technique to uncover how items are
associated to each other.
There are three common ways to measure association:
1) Support: This says how popular an itemset is, as measured by the
proportion of transactions in which an itemset appears.

2) Confidence: This says how likely item Y is purchased when item X is

purchased (not good if one item is common)

3) Lift: This says how likely item Y is purchased when item X is purchased
Association Rules (Apriori Algorithm)
from apyori import apriori

transactions = [
['beer', 'nuts'],
['beer', 'cheese'],
]
results = list(apriori(transactions))
Association Rules (Example)
 pip install apyori

# Import the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv(‘apriori_data2.csv’, header = None)
records = [] for i in range(0, 11):
records.append([str(dataset.values[i,j]) for j in range(0, 10)])
# Train Apriori Model
from apyori import apriori
rules = apriori(records, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)
# Visualising the results
results = list(rules)
Time Series
Supervised machine learning
Time Series
A time series forecast is different from regression in that time acts as an exploratory variable and should be continuous
along equal intervals.

Algorithms
1) Autoregressive Forecast Model: uses observations at previous time steps to predict observations at future time step.

2) ARIMA Forecast Model: linear model, predict after remove trends and seasonality.

3) Prophet: Facebook library, it is quick and gives very good results, forecasting time series data.
Related Packages: pip install fbprophet
The input to Prophet is always a dataframe with two columns: ds and y.
The ds (datestamp) column must contain a date or datetime (either is fine).
The y column must be numeric, and represents the measurement we wish to forecast,
It is good to use log(y) and not the actual y to remove trends and noise data
FB Prophet
COMPUTER VISION
Open CV for computer vision
 To install OpenCV pip install opencv-python
 Normal images are 4 dimensional array for each pixel (Blue ,Green, Red, and Alpha)
 It is normal to process images in Gray scale for fast performance
 Video is just a loop of images. So, any code processing images can process video too

For more information about open CV please visit

 https://www.tutorialspoint.com/opencv/index.htm
 https://www.youtube.com/playlist?list=PLQVvvaa0QuDdttJXlLtAJxJetJcqmqlQq
Common OpenCV functions
 image = cv2.imread(ImgPATH) #Read image
 cv2.cvtColor(ImgPATH, cv2.COLOR_BGR2GRAY) #Convert to gray scale
 cv2.imshow( ‘Window Title’, image ) #Show Image in a window
 cv2.imwrite(ImgPATH, image) # to write image to HD
 _ , image = cv2.threshold(image, threshold, maxval, thresholdType) #change pixel to maxval if value greater than threshold
 image = cv2.adaptiveThreshold(image, maxValue, adaptiveMethod, thresholdType, blockSize, C) #Threshold is automatic calculated

Original THRESH_BINARY THRESH_BINARY_INV THRESH_TRUNC THRESH_TOZERO THRESH_TOZERO_INV

thresholdType

adaptiveMethod
 ADAPTIVE_THRESH_MEAN_C : threshold value is the mean of neighborhood area.
 ADAPTIVE_THRESH_GAUSSIAN_C : threshold value is the weighted sum of neighborhood values where weights are a Gaussian window.
blockSize : A variable of the integer type representing size of the pixelneighborhood used to calculate the threshold value.
C : A variable of double type representing the constant used in the both methods (subtracted from the mean or weighted mean).
OpenCV Video Stream Example
#VideoStream1 = cv2.VideoCapture('c:/boxed-correct.avi’)  read stream from avi video file
VideoStream1 = cv2.VideoCapture(0)  read stream from the first connected video cam
while True:
IsReturnFrame1,ImgFrame1=VideoStream1.read()
if IsReturnFrame1 == 0: break
GrayImgFrame1=cv2.cvtColor(ImgFrame1,cv2.COLOR_BGR2GRAY)
cv2.imshow('ImageTitle1',ImgFrame1)
cv2.imshow('ImageTitle2',GrayImgFrame1)
if cv2.waitKey(0): break

VideoStream1.release()
cv2.destroyAllWindows()
Image processing using OpenCV

Original

Threshold_Color

ThresholdGray Threshold_GAUSSIAN Threshold_MEAN

Object Recognition MainImage

template

Search for image with exact lighting/scale/angle

Result
NLP

Natural Language Processing

NLP (Natural Language Processing)
Why we need NLP packages?
 NLP Package handle a wide range of tasks such as Named-entity recognition , part-of-speech (POS) tagging, sentiment
analysis, document classification, topic modeling, and much more.

Named-entity recognition: means extract names of persons, organizations, locations, time, quantities, percentages, etc.
example: Jim bought 300 shares of Acme Corp. in 2006.  [Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.
Part-of-speech tagging: used to identification of words as nouns, verbs, adjectives, adverbs, etc.

Top 5 Python NLP libraries

 NLTK good as education and research tool. Its modularized structure makes it excellent for learning and exploring NLP
concepts, but it's not meant for production.
 TextBlob is built on top of NLTK, and it's more easily-accessible. good library for fast-prototyping or building applications that
don't require highly optimized performance. Beginners should start here.
 Stanford's CoreNLP is a Java library with Python wrappers. It's in many existing production systems due to its speed.
 SpaCy is a new NLP library that's designed to be fast, streamlined, and production-ready. It's not as widely adopted.
 Gensim is most commonly used for topic modeling and similarity detection. It's not a general-purpose NLP library, but for the
tasks it does handle, it does them well.
NLTK Package for Natural Language Processing
 Pip install nltk
 To download all packages using GUI, write  nltk.download()
NLTK concepts
Tokenizing
 Word tokenizers : split text by words
 Sentence tokenizers : split text by paragraphs

Lexicon
 Get words and their actual means in the context

Corpora
 Text classification. Ex: medical journals, presidential speech

Lemmatizing (stemming)
 return the word to its root, ie (gone, went, going)  go
WordNet
 List of different words that have the same meaning for the given word
NLTK Simple example
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords

EXAMPLE_TEXT = "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn't eat cardboard."
WordsArray = sent_tokenize(EXAMPLE_TEXT)
print(WordsArray)

ListOfAvailableStopWords= set(stopwords.words("english"))
print(ListOfAvailableStopWords)
textblob
 pip install textblob
 python -m textblob.download_corpora
Sentimental Analysis Example
 We have a sample of 3000 random users comments on imdb.com, amazon.com, and
yelp.com website
http://archive.ics.uci.edu/ml/machine-learning-databases/00331/
 Each comment has a score, Score is either 1 (for positive) or 0 (for negative)
Python Deep Learning
Common Deep learning fields
Main Fields Visual examples:
 Computer vision  Colorization of Black and White Images.
https://www.youtube.com/watch?v=_MJU8VK2PI4
 Speech recognition
 Adding Sounds To Silent Movies.
 Natural language processing https://www.youtube.com/watch?v=0FW99AQmMc8

 Audio recognition  Automatic Machine Translation.

 Social network filtering  Object Classification in Photographs.

 Machine translation  Automatic Handwriting Generation.

 Character Text Generation.
 Image Caption Generation.
 Automatic Game Playing.
https://www.youtube.com/watch?v=TmPfTpjtdgg
Common Deep learning development tool (Keras)
 Keras is a free Artificial Neural Networks (ANN) library (deep learning library).
 it is a high-level neural networks API, written in Python and capable of running on top of
TensorFlow, CNTK, or Theano.
 It was developed with a focus on enabling fast experimentation. Being able to go from
idea to result with the least possible delay is key to doing good research.

Keras
TensorFlow CNTK Theano
Deep learning Computer vision (image processing)
3 common ways to detect objects
 median based features
 edge based features
 threshold based features
How to use Neural Network Models in Keras
Five steps
1. Define Network.
2. Compile Network.
3. Fit Network.
4. Evaluate Network.
5. Make Predictions.
Step 1. Define Network
 Neural networks are defined in Keras as a sequence of layers.
 The first layer in the network must define the number of inputs to expect. for a Multilayer
Perceptron model this is specified by the input_dim attribute.
 Example of small Multilayer Perceptron model (2 inputs, 5 hidden layers, 1 output)
model = Sequential()
model.add(Dense(5, input_dim=2))
model.add(Dense(1))
 Re-write after add activation function
model = Sequential()
model.add(Dense(5, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Available Activation Functions
Optional, it is like a filter, used to solve some common predictive modeling problem, to get significant boost in performance.
 Sigmoid: used for Binary Classification (2 class) one neuron the output layer. What ever the input it will map to zero or one.
 Softmax: used for Multiclass Classification (>2 class), one output neuron per class value.
 Linear: used for Regression, the number of neurons matching the number of outputs.
 Tanh: what ever the input it will convert to number between -1 and 1
 Relu: either 0 for a<0 or a for a>0. so, it just remove the negative values and pass the positive as it is.
 LeakyReLU: minimize the value of negative values and pass as it is if positive
 elu
 selu
 softplus
 softsign
 hard_sigmoid
 PReLU
 ELU
 ThresholdedReLU
Step 2. Compile Network
 Specifically the optimization algorithm to use to train the network and the loss function used to evaluate the network that is
minimized by the optimization algorithm.

model.compile(optimizer='sgd', loss='mse’)

Optimizers tool to minimize loss between prediction and real value. Commonly used optimization algorithms:
 ‘sgd‘ (Stochastic Gradient Descent) requires the tuning of a learning rate and momentum.
 ADAM requires the tuning of learning rate.
 RMSprop requires the tuning of learning rate.

Loss functions:
 Regression: Mean Squared Error or ‘mse‘.
 Binary Classification (2 class): ‘binary_crossentropy‘.
 Multiclass Classification (>2 class): ‘categorical_crossentropy‘.

Finally, you can also specify metrics to collect while fitting the model in addition to the loss function. Generally, the most useful
additional metric to collect is accuracy.

model.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])

Optimizers
Step 3. Fit Network
history = model.fit(X, y, batch_size=10, epochs=100)

The network is trained using the backpropagation algorithm

 Batch size is the number of samples that going to be propagated through the network.
 epochs is the number of training times (for ALL the training examples)

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2
iterations to complete 1 epoch.

Step 4. Evaluate Network
 The model evaluates the loss across all of the test patterns, as well as any other metrics
specified when the model was compiled.

 For example, for a model compiled with the accuracy metric, we could evaluate it on
a new dataset as follows:
loss, accuracy = model.evaluate(X, Y)
print("Loss: %.2f, Accuracy: %.2f% %" % (loss, accuracy*100))
Step 5. Make Predictions
probabilities = model.predict(X)
predictions = [float(round(x)) for x in probabilities]
accuracy = numpy.mean(predictions == Y) #count the number of True and divide by the total size
print("Prediction Accuracy: %.2f% %" % (accuracy*100))
Binary classification using Neural Network in Keras
Diabetes Data Set
Detect Diabetes Disease based on analysis
Dataset Attributes:
1. Number of times pregnant
2. Plasma
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
Save prediction model
 After train our model, ie, model.fit(X_train, Y_train), we can save this traning to use later.
 This task can done by Pickle package(Python Object Serialization Library), using dump
and load methods. Pickle can save any object not just the prediction model.

import pickle
…………….
model.fit(X_train, Y_train)
# save the model to disk
pickle.dump(model, open("c:/data.dump", 'wb’)) #wb= write bytes

# some time later... load the model from disk

model = pickle.load(open("c:/data.dump", 'rb’)) #rb= read bytes

Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Consistency and Replication - PPT
No ratings yet
Consistency and Replication - PPT
55 pages
AL Notes
No ratings yet
AL Notes
61 pages
EDS - Python Cheat Sheet
No ratings yet
EDS - Python Cheat Sheet
3 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
Phython Example
No ratings yet
Phython Example
12 pages
Data science and analtics Laboratory
No ratings yet
Data science and analtics Laboratory
21 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Data Science
No ratings yet
Data Science
18 pages
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
data science practicals
No ratings yet
data science practicals
47 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
hduud
No ratings yet
hduud
55 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Code shabab error 7
No ratings yet
Code shabab error 7
5 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
UNITIV.BtechIot
No ratings yet
UNITIV.BtechIot
43 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
DAV Assign6
No ratings yet
DAV Assign6
8 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
Certificate
No ratings yet
Certificate
25 pages
Analysis and Prediction of House Prices by Linear Regression Model
No ratings yet
Analysis and Prediction of House Prices by Linear Regression Model
91 pages
NumPy, SciPy, Pandas, Quandl Cheat Sheet
100% (3)
NumPy, SciPy, Pandas, Quandl Cheat Sheet
4 pages
Tutorial 4
No ratings yet
Tutorial 4
8 pages
609008987-EDA-Lab-Manual
No ratings yet
609008987-EDA-Lab-Manual
93 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Designing A Database BIM
No ratings yet
Designing A Database BIM
7 pages
Different File Formats
No ratings yet
Different File Formats
10 pages
AI Based Accounting
No ratings yet
AI Based Accounting
18 pages
COMPUTER Mcqs PDF
No ratings yet
COMPUTER Mcqs PDF
31 pages
Selenium Interview Questions
No ratings yet
Selenium Interview Questions
20 pages
Samsung NVMe SSD 970 EVO Data Sheet Rev.1.0
No ratings yet
Samsung NVMe SSD 970 EVO Data Sheet Rev.1.0
6 pages
Tally ERP9 Latest Features
No ratings yet
Tally ERP9 Latest Features
7 pages
WordPress Booking Plugin (WP - Easybooking) Instructions (From Wp-Easybooking - Com)
No ratings yet
WordPress Booking Plugin (WP - Easybooking) Instructions (From Wp-Easybooking - Com)
19 pages
Chapter 3 Discrete Random Variable PDF
No ratings yet
Chapter 3 Discrete Random Variable PDF
23 pages
Pearson Practice English Quick Start Guide For Students MyGrammarLab
No ratings yet
Pearson Practice English Quick Start Guide For Students MyGrammarLab
11 pages
SRS App Forest
No ratings yet
SRS App Forest
14 pages
Dirsearch
No ratings yet
Dirsearch
115 pages
201 To 300 Multiple Choice Questions For MS Word - MCQ Sets
No ratings yet
201 To 300 Multiple Choice Questions For MS Word - MCQ Sets
12 pages
PI Questions Day-2
No ratings yet
PI Questions Day-2
4 pages
Computer Architecture
No ratings yet
Computer Architecture
2 pages
Huawei - TOW/SWOT/Environment/Situational Analysis
No ratings yet
Huawei - TOW/SWOT/Environment/Situational Analysis
10 pages
DBMS Letter Bold
No ratings yet
DBMS Letter Bold
6 pages
Above 10lacs Co Mumbai
No ratings yet
Above 10lacs Co Mumbai
293 pages
Introduction To ML Lecture 1
No ratings yet
Introduction To ML Lecture 1
33 pages
Led Pov Display Code
No ratings yet
Led Pov Display Code
3 pages
CRM Tools
No ratings yet
CRM Tools
21 pages
J2Ee/Jee (Java 2 Enterprise Edition) Technology
No ratings yet
J2Ee/Jee (Java 2 Enterprise Edition) Technology
36 pages
APC Silcon 60 80kW 208 480V User Guide PDF
No ratings yet
APC Silcon 60 80kW 208 480V User Guide PDF
34 pages
LPS - Lecture 2 - Handout
No ratings yet
LPS - Lecture 2 - Handout
66 pages
Mod Menu Log - Zombie - Survival.craft.z
No ratings yet
Mod Menu Log - Zombie - Survival.craft.z
418 pages
Final Lesson Plan in Arts
No ratings yet
Final Lesson Plan in Arts
8 pages
STEP 7 - Programming With STEP 7
100% (8)
STEP 7 - Programming With STEP 7
650 pages
SIL Validation Workshop (2-Day)
0% (2)
SIL Validation Workshop (2-Day)
3 pages
Unit 2 Software Engineering-208 (Bca IV Sem)
No ratings yet
Unit 2 Software Engineering-208 (Bca IV Sem)
41 pages