Data Science Ppt1 Update
Data Science Ppt1 Update
It is the most important step as this step consumes the most amount of time.
Around 70 percent of the time is spent on data exploration.
The main ingredient for data science is data so when we get data,
it is seldom that data is in a correct structured form.
Removing Noise in data
As data scientists , we usually don’t think about how our data is collected. We focus
on analysis, not measurement.
it can be dangerous if we’re dealing with noisy data.
A dirty dataset can be a bottleneck that reduces the quality of the entire analysis
pipeline.
There is a lot of noise present in the data.
Noise here means a lot of unwanted data that is not required. So what we do in this
step?
This step involves sampling and transformation of data in which we check the
observations (rows) and features (columns) and remove the noise by using statistical
methods.
.
By the relationship we mean whether the features(columns) are dependent on each
other or independent of each other?
whether there are missing values in the data or not?
So basically the data is transformed for further use.
Hence this is one of the most time-consuming steps.
2. Modeling
This is the second step where we actually use Machine Learning algorithms.
Here we actually fit the data into the model.
The selection of a model depends on the type of data we have and the business
requirement.
3. Testing the Model
The model is tested with test data to check the accuracy and other characteristics
of the model and make the required changes in the model to get the desired result.
In case we do not get the desired accuracy we can again go to step 2(modeling)
select a different model and then repeat the same step 3 and choose the model
which gives the best result as per the business requirement
4. Deploying Models
Once we get the desired result by proper testing as per the business requirements,
we finalize the model which gives us the best result as per testing results and
deploy the model in the production environment.
Characteristics of Data Science
4. Government Policies
The Government can use data science to prepare better policies to the needs of
the people and what they want using the data they can get by conducting surveys
and others from other official sources
Advantages and Disadvantages of Data
Science
Advantages
Some of the advantages are as follows:
It helps us to get insights from the historical data with its powerful tools.
It helps to optimize the business, hire the right persons and generate more revenue
as using data science helps you to make better future decisions for the business.
Companies can develop and market their products better as they can better select
their target customers.
Introduction to Data Science also helps consumers search for better goods,
especially in e-commerce sites based on the data-driven recommendation system.
Disadvantages
The disadvantages are generally when data science is used for customer profiling
and infringement of customer privacy.
As their information, such as transactions, purchases, and subscriptions, is visible
their parent companies.
The information obtained using data science can be used against a certain group,
individual, country or community.
Data science includes domain knowledge, statistics and coding skills as all these
are combined to give desired results. The vast set of data science involves the
application of machine learning and deep learning as the study of past to predict
future or the study of behavioral tracts requires data that could not be analyzed
without data science.
Structured vs Unstructured Data
Data Science is primarily the Science used to uncover hidden patterns from data. Those hidden patterns can be
used to achieve the most optimize results in several fields and hence improve the lives of the people.
Working with Data Science
Model Selection– Model Building is the core activity of a data science project. It
is carried out either Statistical Driven or using Machine Learning Techniques.
Selecting the right model for a particular problem statement is essential as every
model cannot fit in perfectly for every data set.
Model Selection
This is the stage where we can finally start evaluating our complete data science
system.
The end of modeling is characterized by model evaluation where you measure:
Accuracy : How well the model performs i.e. does it describe the data accurately.
Relevance: Does the model answer the original question that you set out to
answer
Deployment– Once the model is built, and the business is satisfied with the
findings, the model could be deployed to production and used in the product.
Once used by humans , you get feedback.
The more accurately you capture the feedback, the more effective will be the
changes that you make to your model and more accurate will your final results be.
What can you do with Data Science?
Let’s look at some of the usages of Data Science which has made our life easy in
recent times.
Example 1
YouTube is the favorite mode of entertainment, knowledge, news in our daily
lives. We prefer to watch videos than going through slides of long articles. But
how did we become so addictive to YouTube? What has made YouTube so unique
and different?
Well, YouTube uses our data to recommend the videos; we would like to see next. It
uses a recommender system algorithm to track our search patterns and based on that;
its intelligence system shows us those videos which are somewhat related to the one
we have seen.
So basically, it saves our time and energy to manually look for videos which might
be helpful to us based on our liking.
Example 2
Similar to YouTube, the recommender system is also used in e-commerce
websites like Netflix, Amazon.
In the case of Netflix, we are shown those TV shows or movies which are
somewhat related to the one we have watched and thus saves our time to look for
more similar videos.
Additionally, Amazon recommends the products based on our buying pattern, and
it displays those products which other buyers have bought along with that product
or what we could buy based on our shopping habits or patterns.
Example 3
One of the major breakthroughs in Data Science is Amazon’s Alexa or Apple’s
Siri. Often we find tedious to surf through our phone for contacts or feel lazy to
set up alarm bells or reminders.
In this regard, the virtual assistant systems do all the stuff for us only by listening
to our commands. We tell Alexa or Siri about the things we want and the system
convert our natural voice to text using the Natural Language Processing topology
and extract insights from that text to solve our problems.
In layman terms, this Intelligent Systems uses Speech to Voice terminology to
save time and solve our problems.
Example 4
Data Science has eased the life of athletes and people involved in Sports arenas as
well. The enormous amount of data that’s available these days could be used to
analyze a sportsman’s health and mental conditions to prepare accordingly for a
game.
Also, the data could be used to make strategies and outplay the opponent even
before the match starts.
Example 5
Data Science has eased the life in the Healthcare sector as well. The medics and
the researchers could use Deep Learning to analyze a cell and stop a disease from
occurring in the first place.
They could also prescribe adequate medication for a patient based on the
prediction from the data.
Python Overview
• Introduction
• Data Types, Expression and Variables
• String
• Conditions and Branching
• Loops
• Functions
• List, Tuples, Dictionaries and Sets
What is Python?
English language.
Python has syntax that allows dePython works on different platforms (Windows,
Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to write programs with fewer lines than some
other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon
as it is written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-oriented way or a functional
way.
Good to know
The most recent major version of Python is Python 3.
It is possible to write Python in an Integrated Development Environment, such as :
Thonny,
Pycharm,
Netbeans or
Eclipse
which are particularly useful when managing larger collections of Python files.
Python Syntax compared to other
programming languages
Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope
of loops, functions and classes. Other programming languages often use curly-
brackets for this purpose.
Python Indentation
if 5 > 2:
print("Five is greater than two!")
if 5 > 2:
print("Five is greater than two!")
You have to use the same number of spaces in the same block of code, otherwise Python will give you an
error:
Example
Syntax Error:
if 5 > 2:
print("Five is greater than two!")
print("Five is greater than two!")
Python Variables
The data stored in memory can be of many types. For example, a person's age is
stored as a numeric value and his or her address is stored as alphanumeric
characters. Python has various standard data types that are used to define the
operations possible on them and the storage method for each of them.
Python has five standard data types −
Numbers
String
List
Tuple
Dictionary
Python Strings
Strings in Python are identified as a contiguous set of characters represented in the
quotation marks.
The plus (+) sign is the string concatenation operator and the asterisk (*) is the repetition
operator. For example
Loops can execute a block of code number of times until a certain condition is
met. Their usage is fairly common in programming. Unlike other programming
language that have For Loop, while loop, dowhile, etc.
What is For Loop?
For loop is used to iterate over elements of a sequence. It is often used when you
have a piece of code which you want to repeat "n" number of time.
What is While Loop?
While Loop is used to repeat a block of code. Instead of running the code block
once, It executes the code block multiple times until a certain condition is met.
Syntax for while loop:
While expression
statement
Example:
X=0
While(x < 4):
print(x)
x=x+1
How to use "For Loop"
In Python, "for loops" are called iterators.
Just like while loop, "For Loop" is also used to repeat the program.
But unlike while loop which depends on condition true or false. "For Loop"
depends on the elements it has to iterate.
Example:
for x in range(2 ,10):
print(x)
For Loop iterates with number declared in the range.
For example,
For Loop for x in range (2,7)
When this code is executed, it will print the number between 2 and 7 (2,3,4,5,6).
In this code, number 7 is not considered inside the range.
For Loops can also be used for a set of other things and not just number
n =10
i=i
sum=0
while i<=n :
sum=sum+I
i++
Print(“The sum is = “ , sum)
Comments