100% found this document useful (1 vote)

336 views

Lesson 2 - Data Preprocessing

This document provides an overview of key concepts in data wrangling and manipulation using Python. It covers loading and exploring data from CSV and Excel files, different data wrangling techniques like slicing and indexing, identifying unique values, and data manipulation methods such as typecasting, merging, concatenation and joins. The learning objectives are to demonstrate data import/exploration in Python, different data wrangling techniques and their significance, and performing data manipulation using coercion, merging, concatenation and joins.

Uploaded by

Omid khosravi

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

336 views

Lesson 2 - Data Preprocessing

Uploaded by

Omid khosravi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Machine Learning

Lesson 2: Data Wrangling and Manipulation

© Simplilearn. All rights reserved.

Concepts Covered

Data acquisition

Data exploration techniques

Data wrangling techniques

Data manipulation techniques

Typecasting
Learning Objectives

By the end of this lesson, you will be able to:

Demonstrate data import and exploration using Python

Demonstrate different data wrangling techniques and their significance

Perform data manipulation in python using coercion, merging, concatenation,

and joins
Data Preprocessing
Topic 1: Data Exploration
Loading .csv File in Python
Before starting with a dataset, the first step is to load the dataset. Below is the code for the
same:

CSV File Program Data Program

Code

df = pandas.read_csv("/home/simpy/Datasets/BostonHousing.csv")

Path to file
Loading Data to .csv File
Below is the code for loading the data within an existing csv file:

Program Program Data CSV File

Code

df.to_csv("/home/simpy/Datasets/BostonHousing.csv")

Path to file
Loading .xlsx File in Python
Below is the code for loading an xlsx file within python:

XLS File Program Data Program

Code

df = pandas.read_excel("/home/simpy/Datasets/BostonHousing.xlsx")
Loading Data to .xlsx File
Below is the code for loading program data into an existing xlsx file:

Program Program Data XLS File

Code

df.to_excel("/home/simpy/Datasets/BostonHousing.xlsx")
Assisted Practice
Data Exploration Duration: 5 mins.

Problem Statement: Extract data from the given SalaryGender CSV file and store the data from
each column in a separate NumPy array.

Objective: Import the dataset (csv) in/from your Python notebook to local system.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and
password that are generated. Click on the Launch Lab button. On the page that appears, enter the
username and password in the respective fields, and click Login.
Data Exploration Techniques

Dimensionality Check The shape attribute returns a two-item tuple (number of rows and the
number of columns) for the data frame. For a Series, it returns a one-item
Type of Dataset tuple.

Code
Slicing and Indexing
df.shape
Identifying Unique
Elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
You can use the type ( ) in python to return the type of object.
Type of Dataset
Checking the type of data frame:

Slicing and Indexing

Code

Identifying Unique type(df)

Elements

Value Extraction
Checking the type of a column (çhas) within a data frame:
Feature Mean
Code

Feature Median df['chas'].dtype

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
You can use the : operator with the start index on left and end index on
right of it to output the corresponding slice.
Type of Dataset
Slicing a list: list = [1,2,3,4,5]
Slicing and Indexing Code

Identifying Unique list[1:3]

Elements

Value Extraction Slicing a Data frame (df) using iloc indexer:

Feature Mean Code

df.iloc[:,1:3]
Feature Median

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
Using unique ( ) on the column of interest will return a numpy array with
unique values of the column.
Type of Dataset
Extracting all unique values out of ‘’crim” column:

Slicing and Indexing Code

Identifying Unique df['crim'].unique()

Elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
Using value ( ) on the column of interest will return a numpy array with all
the values of the column.
Type of Dataset
Extracting values out of ‘’crim” column:

Slicing and Indexing Code

Identifying Unique df['crim’].values()

Elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
Using mean( ) on the data frame will return mean of the data frame across
all the columns.
Type of Dataset
Code

Slicing and Indexing df.mean()

Identifying Unique
Elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
Using median( ) on the data frame will return median values of the data
frame across all the columns.
Type of Dataset
Code

Slicing and Indexing df.median()

Identifying Unique
Elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Data Exploration Techniques (Contd.)

Dimensionality Check
Using mode( ) on the data frame will return mode values of the data frame
across all the columns, rows with axis=0 and axis = 1, respectively.
Type of Dataset
Code

Slicing and Indexing df.mode(axis=0)

Identifying Unique
Elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Let’s now consider multiple features and understand the effect of one over other with
respect to correlation (using seaborn)

©Simplilearn. All rights reserved

Seaborn is a library for making
attractive and informative statistical
graphics in Python. It is built on top of
matplotlib and integrated with the
PyData Stack, including support for
numpy and pandas data structures, and
statistical routines.

©Simplilearn. All rights reserved

Plotting a Heatmap with Seaborn
Below is the code for plotting a heatmap within Python:

Code

import matplotlib.pyplot as plt

import seaborn as sns
correlations = df.corr()
sns.heatmap(data = correlations,square = True, cmap = "bwr")

plt.yticks(rotation=0)
plt.xticks(rotation=90)

Rectangular dataset (2D

dataset that can be Matplotlib
If True, set the Axes
coerced into an ndarray) colormap name or
aspect to “equal” so
object, or list of
each cell will be
colors
square-shaped
Plotting a Heatmap with Seaborn (Contd.)
Below is the heatmap obtained, where, approaching red colour means maximum correlation
and approaching blue means minimal correlation.

Minimum correlation
Maximum correlation
Assisted Practice
Data Exploration Duration: 15 mins.

Problem Statement: Suppose you are a public school administrator. Some schools in your state of Tennessee
are performing below average academically. Your superintendent under pressure from frustrated parents and
voters approached you with the task of understanding why these schools are under-performing. To improve
school performance, you need to learn more about these schools and their students, just as a business needs to
understand its own strengths and weaknesses and its customers. The data includes various demographic, school
faculty, and income variables.

Objective: Perform exploratory data analysis which includes: determining the type of the data, correlation
analysis over the same. You need to convert the data into useful information:
▪ Read the data in pandas data frame
▪ Describe the data to find more details
▪ Find the correlation between ‘reduced_lunch’ and ‘school_rating’

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that
are generated. Click on the Launch Lab button. On the page that appears, enter the username and password in
the respective fields, and click Login.
Unassisted Practice
Data Exploration Duration: 15
mins.

Problem Statement: Mtcars, an automobile company in Chambersburg, United States has recorded the
production of its cars within a dataset. With respect to some of the feedback given by their customers they are
coming up with a new model. As a result of it they have to explore the current dataset to derive further insights out if
it.

Objective: Import the dataset, explore for dimensionality, type and average value of the horsepower across all the
cars. Also, identify few of mostly correlated features which would help in modification.

Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-
world problems.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login
Data Import
The first step is to import the data as a part of exploration.

Code

df1 = pandas.read_csv(“mtcars.csv“)
Data Exploration
The shape property is usually used to get the current shape of an
array/df.

Dimensionality Check Code

df1.shape

Type of Dataset

Identifying mean value

Data Exploration
type(), returns type of the given object.

Dimensionality Check Code

type(df1)

Type of Dataset

Identifying mean value

Data Exploration

mean( ) function can be used to calculate mean/average of a given list of numbers.

Dimensionality Check Code

df1[‘hp’].mean()

Type of Dataset

Identifying mean value

Identifying Correlation Using a Heatmap

Heatmap function in seaborn is used to plot the correlation matrix.

Code

import matplotlib.pyplot as plt

import seaborn as sns
correlations = df1.corr()
sns.heatmap(data = correlations,square = True, cmap = “viridis")

plt.yticks(rotation=0)
plt.xticks(rotation=90)
Identifying Correlation Using a Heatmap
Graphical representation of data where the individual values contained in a
matrix are represented in colors.

From the adjacent map, you

can clearly see that
cylinder (cyl) and
displacement (disp) are the
most correlated features.
Data Preprocessing
Topic 2: Data Wrangling
Data Wrangling

The process of manually converting or mapping data from one raw format into another format is called data wrangling. This
includes munging and data visualization.

Discovering Structuring

Different Tasks
in Data Cleaning
Wrangling

Enrichment
Validating
Need of Data Wrangling

Following are the problems that can be avoided with wrangled data:

Missing data, a very common problem

Presence of noisy data (erroneous data and outliers)

Inconsistent data

Develop a more accurate model

Prevent data leakage

Missing Values in a Dataset

Consider a random dataset given below, illustrating

missing values.
Missing Value Detection

Consider a dataset below, imported as df1 within Python, having some missing values.

Code

Detecting
missing df1.isna().any()
values
Missing Value Treatment

Code

Mean Imputation: Replace the missing value from sklearn.preprocessing import Imputer
with variable’s mean mean_imputer =
Imputer(missing_values=np.nan,strategy='mean',axis=1)
mean_imputer = mean_imputer.fit(df1)
imputed_df = mean_imputer.transform(df1.values)
df1 = pd.DataFrame(data=imputed_df,columns=cols)
df1
Missing Value Treatment (Contd.)

Code

Mean Imputation: Replace the missing value from sklearn.preprocessing import Imputer
with variable’s mean median_imputer=Imputer(missing_values=np.nan,strategy
=‘median',axis=1)
median_imputer = median_imputer.fit(df1)
imputed_df = median_imputer.transform(df1.values)
df1 = pd.DataFrame(data=imputed_df,columns=cols)
Median Imputation: Replace the missing df1
value with variable’s median

Note: Mean imputation/Median imputation is again model dependent and is valid only on numerical data.
Outlier Values in a Dataset

4
FREQUENCY

3 OUTLIER?

An outlier is a value that lies

2
outside the usual observation of
1 values.
0
7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0

x MIDPOINT

13
12
11
10
Y3

9
8
7
6
5
5.0 7.5 10.0 12.5 15.0
X1

Note: Outliers skew the data when you are trying to do any type of average.
Dealing with an Outlier

Detect any outlier in the first column of df1

Outlier Detection
Code

import seaborn as sns

sns.boxplot(x=df1['Assignment'])
Outlier Treatment

Outliers:
Values < 60
Dealing with an Outlier

Create a filter based on the boxplot obtained

and apply the filter to the data frame

Outlier Detection
Code

filter=df1['Assignment'].values>60
Outlier Treatment df1_outlier_rem=df1[filter]
df1_outlier_rem
Assisted Practice
Data Wrangling Duration: 15 mins.

Problem Statement: Load the load_diabetes datasets internally from sklearn and check for any missing value or
outlier data in the ‘data’ column. If any irregularities found treat them accordingly.

Objective: Perform missing value and outlier data treatment.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Unassisted Practice
Data Wrangling Duration: 5 mins.

Problem Statement: Mtcars, the automobile company in the United States have planned to rework on optimizing
the horsepower of their cars, as most of the customers feedbacks were centred around horsepower. However, while
developing a ML model with respect to horsepower, the efficiency of the model was compromised. Irregularity might
be one of the causes.

Objective: Check for missing values and outliers within the horsepower column and remove them.

Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-
world problems.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Check for Irregularities

Check for missing values Check for Outliers

Code Code

df1['hp'].isna().any() sns.boxplot(x=df1['hp'])

Outlier
Outlier Treatment

Data with hp>250 is the outlier data. Therefore, you can filter it accordingly.

Code

filter = df1['hp']<250
df1_out_rem = df1[filter]
sns.boxplot(x=df2_out_rem['hp'])

Outlier filtered
data
Data Preprocessing
Topic 3: Data Manipulation
Functionalities of Data Object in Python
A data object is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns.

head( )

tail( )

values( )

groupby( )

Concatenation

Merging
Functionalities of Data Object in Python (Contd.)

head( ) Head( ) returns the first n rows of the data structure

tail( ) Code

values( ) import pandas as pd

import numpy as np
df=pd.Series(np.arange(1,51))
print(df.head(6))
groupby( )

Concatenation

Merging
Functionalities of Data Object in Python (Contd.)

head( ) Tail( ) returns the last n rows of the data structure

tail( ) Code

values( ) import pandas as pd

import numpy as np

groupby( ) df=pd.Series(np.arange(1,51))
print(df.tail(6))

Concatenation

Merging
Functionalities of Data Object in Python (Contd.)

head( ) values( ) returns the actual data in the series of the array

tail( ) Code

values( ) import pandas as pd

import numpy as np

groupby( ) df=pd.Series(np.arange(1,51))
print(df.values)

Concatenation

Merging
Functionalities of Data Object using Python (Contd.)

head( ) The Data Frame is grouped according to the ‘Team’ and ‘ICC_Rank’ columns

Code
tail( )

import pandas as pd
values( ) world_cup={'Team':['West Indies','West
indies','India','Australia','Pakistan','Sri
Lanka','Australia','Australia','Australia','
Insia','Australia'],
groupby( )
'Rank':[7,7,2,1,6,4,1,1,1,2,1],
'Year':[1975,1979,1983,1987,1992,1996,1999,2003
,2007,2011,2015]}
Concatenation df=pd.DataFrame(world_cup)
print(df.groupby(['Team','Rank’]).groups)

Merging
Functionalities of Data Object in Python (Contd.)

Concatenation combines two or more data structures.

head( )
Code

tail( )

import pandas
values( ) world_champions={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka’],
'ICC_rank':[2,3,7,8,4],
'World_champions_Year':[2011,2015,1979,1992,1996],
groupby( ) 'Points':[874,787,753,673,855]}
chokers={'Team':['South Africa','New
Zealand','Zimbabwe'],'ICC_rank':[1,5,9],
Concatenation 'Points':[895,764,656]}
df1=pandas.DataFrame(world_champions)
df2=pandas.DataFrame(chokers)
print(pandas.concat([df1,df2],axis=1))
Merging
Functionalities of Data Object in Python (Contd.)

The concatenated output:

head( )

tail( )

values( )

groupby( )

Concatenation

Merging
Functionalities of Data Object in Python (Contd.)

Merging is the Pandas operation that performs database joins on objects

head( )
Code

tail( )

import pandas
values( ) champion_stats={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka'],
'ICC_rank':[2,3,7,8,4],
'World_champions_Year':[2011,2015,1979,1992,1996],
groupby( ) 'Points':[874,787,753,673,855]}
match_stats={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka'],
Concatenation 'World_cup_played':[11,10,11,9,8],
'ODIs_played':[733,988,712,679,662]}
df1=pandas.DataFrame(champion_stats)
df2=pandas.DataFrame(match_stats)
Merging
print(df1)
print(df2)
print(pandas.merge(df1,df2,on='Team'))
Functionalities of Data Object in Python (Contd.)

head( )

tail( )

values( )

groupby( )

Concatenation

Merging The merged object

contains all the columns
of the data frames
merged
Different Types of Joins

Joins are used to combine records from two or more tables in a database. Below
are the four most commonly used joins:

Left Join Right Join Inner Join Full Outer Join

Left Join

Left Join Code

import pandas
world_champions={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka'],
'ICC_rank':[2,3,7,8,4],
'World_champions_Year':[2011,2015,1979,1992,1996],
'Points':[874,787,753,673,855]}
chokers={'Team':['South Africa','New
Zealand','Zimbabwe'],
'ICC_rank':[1,5,9],'Points':[895,764,656]}
Returns all rows from df1=pandas.DataFrame(world_champions)
the left table, even if df2=pandas.DataFrame(chokers)
there are no matches in print(pandas.merge(df1,df2,on='Team',how='left'))
the right table
Right Join

Right Join Code

import pandas
world_champions={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka'],
'ICC_rank':[2,3,7,8,4],
'World_champions_Year':[2011,2015,1979,1992,1996],
'Points':[874,787,753,673,855]}
chokers={'Team':['South Africa','New
Zealand','Zimbabwe'],'ICC_rank':[1,5,9],'Points':[89
5,764,656]}
Preserves the unmatched df1=pandas.DataFrame(world_champions)
rows from the second df2=pandas.DataFrame(chokers)
(right) table, joining them print(pandas.merge(df1,df2,on='Team',how=‘right'))
with a NULL in the shape
of the first (left) table
Inner Join

Inner Join Code

import pandas
world_champions={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka'],
'ICC_rank':[2,3,7,8,4],
'World_champions_Year':[2011,2015,1979,1992,1996],
'Points':[874,787,753,673,855]}
chokers={'Team':['South Africa','New
Zealand','Zimbabwe'],'ICC_rank':[1,5,9],'Points':[89
5,764,656]}
Selects all rows from df1=pandas.DataFrame(world_champions)
both participating tables df2=pandas.DataFrame(chokers)
if there is a match print(pandas.merge(df1,df2,on='Team',how=‘inner'))
between the columns
Full Outer Join

Code
Full Outer Join

import pandas
world_champions={'Team':['India','Australia','West
Indies','Pakistan','Sri Lanka'],
'ICC_rank':[2,3,7,8,4],
'World_champions_Year':[2011,2015,1979,1992,1996],
'Points':[874,787,753,673,855]}
chokers={'Team':['South Africa',’New
Zealand','Zimbabwe'],'ICC_rank':[1,5,9],'Points':[89
5,764,656]}
df1=pandas.DataFrame(world_champions)
Returns all records when
df2=pandas.DataFrame(chokers)
there is a match in either
print(pandas.merge(df1,df2,on='Team',how=‘outer'))
left (table1) or right
(table2) table records
Typecasting
It converts the data type of an object to the required data
type.

string( ) Int( )
Returns string from any Returns an integer object
numeric object or converts from any number or string.
any number to string

float( )
Returns a floating-point
number from a number or a
string
Typecasting Using Int, float and string( )
Few typecasted data types

Code Code

int(12.32) float(23)
Code

int(12.32)
Code Code

float('21.43
int(‘43’)
')
Assisted Practice
Data Manipulation Duration: 10 mins.

Problem Statement: As a macroeconomic analyst at the Organization for Economic Cooperation and Development
(OECD), your job is to collect relevant data for analysis. It looks like you have three countries in the north_america data
frame and one country in the south_america data frame. As these are in two separate plots, it's hard to compare the
average labor hours between North America and South America. If all the countries were into the same data frame, it
would be much easier to do this comparison.

Objective: Demonstrate concatenation.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Unassisted Practice
Data Manipulation Duration: 10 mins.

Problem Statement: SFO Public Department - referred to as SFO has captured all the salary data of its employees
from year 2011-2014. Now in 2018 the organization is facing some financial crisis. As a first step HR wants to
rationalize employee cost to save payroll budget. You have to do data manipulation and answer the below questions:
1. How much total salary cost has increased from year 2011 to 2014?
2. Who was the top earning employee across all the years?

Objective: Perform data manipulation and visualization techniques

Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-
world problems.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Answer 1

Check the mean salary cost per year and see how it has increased per
year.

Code

salary = pd.read_csv('Salaries.csv')
mean_year =
salary.groupby('Year').mean()['TotalPayBenefits']
print ( mean_year)
Answer 2

Group the total salary with respect to employee name:

Code

top_sal =
salary.groupby('EmployeeName').sum()['TotalPayBenefi
ts']
print((top_sal.sort_values(axis=0)))
Key Takeaways

Now, you are able to:

Demonstrate data import and exploration using Python

Demonstrate different data wrangling techniques and their significance

Perform data manipulation in python using coercion, merging,

concatenation, and joins
Knowledge
Check

Knowledge
Check
Which of the following plots can be used to detect an outlier?
1

a. Boxplot

b. Histogram

c. Scatter plot

d. All of the above

Knowledge
Check
Which of the following plots can be used to detect an outlier?
1

a. Boxplot

b. Histogram

c. Scatter plot

d. All of the above

The correct answer is d . All of the above

All the above plots can be used to detect an outlier.
Knowledge
What is the output of the below Python code?
Check
import numpy as np percentiles = [98, 76.37, 55.55, 69, 88]
2 first_subject = np.array(percentiles) print first_subject.dtype

a. float32

b. float

c. int32

d. float64
Knowledge What is the output of the below Python code?
Check import numpy as np
percentiles = [98, 76.37, 55.55, 69, 88]
2 first_subject = np.array(percentiles)
print first_subject.dtype

a. float32

b. float

c. int32

d. float64

The correct answer is d. float64

Float64’s can represent numbers much more accurately than other floats and has more storage capacity.
Lesson-End Project Duration: 20 mins.

Problem Statement: From the raw data below create a data frame:
'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'last_name': ['Miller', 'Jacobson', ".", 'Milner', 'Cooze'],
'age': [42, 52, 36, 24, 73], 'preTestScore': [4, 24, 31, ".", "."],'postTestScore': ["25,000", "94,000", 57, 62, 70]

Objective: Perform data processing on raw data:

▪ Save the data frame into a csv file as project.csv
▪ Read the project.csv and print the data frame
▪ Read the project.csv without column heading
▪ Read the project.csv and make the index columns as 'First Name’ and 'Last Name'
▪ Print the data frame in a Boolean form as True or False. True for Null/ NaN values and false for
non-null values
▪ Read the data frame by skipping first 3 rows and print the data frame

Access: Click the Labs tab in the left side panel of the LMS. Copy or note the username and password that are
generated. Click the Launch Lab button. On the page that appears, enter the username and password in the
respective fields and click Login.
Thank You

Complete Download Reference and Reflexivity 2nd Edition John Perry PDF All Chapters
93% (14)
Complete Download Reference and Reflexivity 2nd Edition John Perry PDF All Chapters
60 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Snowflake Certification
No ratings yet
Snowflake Certification
102 pages
SPENDING MY TIME by Roxette
No ratings yet
SPENDING MY TIME by Roxette
1 page
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
Claudia Hirsch Europass CV
No ratings yet
Claudia Hirsch Europass CV
9 pages
Introduction To Splunk
No ratings yet
Introduction To Splunk
7 pages
McGrawHill_CompTIA_CySA_Cybersecurity_Analyst_Certification_Practice (2)
No ratings yet
McGrawHill_CompTIA_CySA_Cybersecurity_Analyst_Certification_Practice (2)
420 pages
DataStage Faq S
No ratings yet
DataStage Faq S
57 pages
Talend Data Integration
No ratings yet
Talend Data Integration
5 pages
SQL interview questions for a Data Engineer
No ratings yet
SQL interview questions for a Data Engineer
11 pages
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
No ratings yet
Cleaning Dirty Data With Pandas & Python - DevelopIntelligence Blog PDF
8 pages
Purdue Data Science Master Program Slimupv2
100% (1)
Purdue Data Science Master Program Slimupv2
28 pages
SQL Practical
100% (1)
SQL Practical
97 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Splunk QA Official
No ratings yet
Splunk QA Official
26 pages
SQL Interview Questions and Answers G
No ratings yet
SQL Interview Questions and Answers G
67 pages
Python Technical Interviews Questions
100% (1)
Python Technical Interviews Questions
15 pages
Advanced UNIX Commands
No ratings yet
Advanced UNIX Commands
3 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Hive Workshop Practical
No ratings yet
Hive Workshop Practical
29 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
Alteryx Topic
No ratings yet
Alteryx Topic
2 pages
Note - 0001846880 - How To Add A New Server Intelligence Agent SIA To Existing CentralManagementServer CMS For BI 4.0 4.1 On Linux Unix AIX
No ratings yet
Note - 0001846880 - How To Add A New Server Intelligence Agent SIA To Existing CentralManagementServer CMS For BI 4.0 4.1 On Linux Unix AIX
4 pages
Talend Data Integration Basics
No ratings yet
Talend Data Integration Basics
3 pages
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
18 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
Mo 210
0% (1)
Mo 210
3 pages
Data Modeling Concept Latest
No ratings yet
Data Modeling Concept Latest
25 pages
Intro To OLTP and OLAP
No ratings yet
Intro To OLTP and OLAP
19 pages
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
No ratings yet
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
6 pages
Brainalyst's SQL Interview Guide
No ratings yet
Brainalyst's SQL Interview Guide
112 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Informatica Course Content
No ratings yet
Informatica Course Content
5 pages
Big Data Analytics PDF
No ratings yet
Big Data Analytics PDF
22 pages
Big Data
No ratings yet
Big Data
11 pages
MYSQL MCQs - 1
No ratings yet
MYSQL MCQs - 1
3 pages
Querying Microsoft SQL Server
No ratings yet
Querying Microsoft SQL Server
3 pages
VMware PowerCli Get-Help
No ratings yet
VMware PowerCli Get-Help
63 pages
Module 6 - Guided Lab - Creating A Virtual Private Cloud
No ratings yet
Module 6 - Guided Lab - Creating A Virtual Private Cloud
9 pages
Idera Whitepaper SQL Server Security Practices
No ratings yet
Idera Whitepaper SQL Server Security Practices
13 pages
DP-600
No ratings yet
DP-600
212 pages
Ibnonrowsetmssg
No ratings yet
Ibnonrowsetmssg
10 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
Full Download Learning Informatica PowerCenter 10 x enterprise data warehousing and intelligent data centers Second Edition. Edition Rahul Malewar PDF DOCX
100% (5)
Full Download Learning Informatica PowerCenter 10 x enterprise data warehousing and intelligent data centers Second Edition. Edition Rahul Malewar PDF DOCX
65 pages
Power BI Security Whitepaper
No ratings yet
Power BI Security Whitepaper
27 pages
ML Lab Manual 2018-19
No ratings yet
ML Lab Manual 2018-19
129 pages
Business Requirements Document /: Project Name Module Name
No ratings yet
Business Requirements Document /: Project Name Module Name
11 pages
Python-Training Test
No ratings yet
Python-Training Test
13 pages
Informatica Interview Questions (Scenario-Based) :: Source Qualifier Transformation Filter Transformation
No ratings yet
Informatica Interview Questions (Scenario-Based) :: Source Qualifier Transformation Filter Transformation
59 pages
Python Programming
No ratings yet
Python Programming
17 pages
Database: Note
No ratings yet
Database: Note
81 pages
IDQ Functionality Imp
No ratings yet
IDQ Functionality Imp
7 pages
Exam DP 100 Data Science Solution On Azure Skills Measured
No ratings yet
Exam DP 100 Data Science Solution On Azure Skills Measured
6 pages
Rank, Dense Rank
100% (1)
Rank, Dense Rank
3 pages
New Batches Info: Quality Thought Ai-Data Science Diploma
No ratings yet
New Batches Info: Quality Thought Ai-Data Science Diploma
16 pages
Microsoft Dumps 70-761 v2017-01-10 by Matt 60q
No ratings yet
Microsoft Dumps 70-761 v2017-01-10 by Matt 60q
64 pages
W Purch Cost F
100% (1)
W Purch Cost F
25 pages
RMK Group CoE Selection Result
0% (1)
RMK Group CoE Selection Result
32 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
No ratings yet
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
9 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Lesson 5 - Supervised Learning-Classification
100% (1)
Lesson 5 - Supervised Learning-Classification
91 pages
Lesson 1 - Introduction To AI and Machine Learning
No ratings yet
Lesson 1 - Introduction To AI and Machine Learning
44 pages
Lesson 0 - Course Introduction
No ratings yet
Lesson 0 - Course Introduction
6 pages
Kupdf Min PDF
No ratings yet
Kupdf Min PDF
1 page
Effective Oral Presentation Skills PDF
100% (6)
Effective Oral Presentation Skills PDF
6 pages
RH - G2B - U6 - L5 - Saving - An - Island - Sherry - 200117175421
No ratings yet
RH - G2B - U6 - L5 - Saving - An - Island - Sherry - 200117175421
28 pages
Master Thesis Interview
100% (2)
Master Thesis Interview
6 pages
Chapter Eight
No ratings yet
Chapter Eight
6 pages
Main PDF
No ratings yet
Main PDF
60 pages
Mod 3 Math 311
No ratings yet
Mod 3 Math 311
12 pages
MTI-NCOI-Annotations-Form
No ratings yet
MTI-NCOI-Annotations-Form
5 pages
OOPS Concepts
No ratings yet
OOPS Concepts
9 pages
Giglad Documentation
No ratings yet
Giglad Documentation
117 pages
电影论文
100% (1)
电影论文
5 pages
Arun Kolatkar - Historical Imagination
No ratings yet
Arun Kolatkar - Historical Imagination
33 pages
The Project Gutenberg Ebook of Don Quixote, by Miguel de Cervantes
No ratings yet
The Project Gutenberg Ebook of Don Quixote, by Miguel de Cervantes
601 pages
TLE 7 Weekly Home Learning Plan
No ratings yet
TLE 7 Weekly Home Learning Plan
3 pages
John Dryden
No ratings yet
John Dryden
4 pages
Early History of The Creek Indians and T PDF
100% (1)
Early History of The Creek Indians and T PDF
499 pages
Su2 Te U09-Revised
No ratings yet
Su2 Te U09-Revised
12 pages
Holy Cross of Bunawan, IN C Bunawan, Davao City Curriculum Map
No ratings yet
Holy Cross of Bunawan, IN C Bunawan, Davao City Curriculum Map
10 pages
Oral Communication in Context Worktext 3333
100% (2)
Oral Communication in Context Worktext 3333
11 pages
PRESENT CONTINUOUS WORKSHOP PDF 1
No ratings yet
PRESENT CONTINUOUS WORKSHOP PDF 1
3 pages
Functions and Liraries Introduction Guide So Machine)
No ratings yet
Functions and Liraries Introduction Guide So Machine)
42 pages
Anglo Saxon
No ratings yet
Anglo Saxon
17 pages
MICRO SYLLABUS - M Commerce VR20
100% (1)
MICRO SYLLABUS - M Commerce VR20
2 pages
Summative Test 1
No ratings yet
Summative Test 1
4 pages
文章写作的特点
100% (1)
文章写作的特点
6 pages
RPMS Portfolio 2023-2024
No ratings yet
RPMS Portfolio 2023-2024
2 pages
Grammar Translation Method
No ratings yet
Grammar Translation Method
10 pages
Learn Chinese
No ratings yet
Learn Chinese
5 pages