Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Python for Data Science- Week 1

The document provides an introduction to Python for Data Science, covering essential concepts such as data science definitions, data processing, and the advantages of using Python. It discusses Python's evolution, programming paradigms, and various integrated development environments (IDEs) suitable for Python development, including Spyder and Jupyter Notebook. Additionally, it details Python's basic data types, variable naming conventions, and operators used in programming.

Uploaded by

abiwagh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python for Data Science- Week 1

The document provides an introduction to Python for Data Science, covering essential concepts such as data science definitions, data processing, and the advantages of using Python. It discusses Python's evolution, programming paradigms, and various integrated development environments (IDEs) suitable for Python development, including Spyder and Jupyter Notebook. Additionally, it details Python's basic data types, variable naming conventions, and operators used in programming.

Uploaded by

abiwagh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Sunday, 12 January 2025

PYTHON FOR DATA SCIENCE

MODULE 1

Lec 1: Introduction to Python for Data Science

What is Data Science?


• Data Science is the science of analysing raw data using statistics and machine learning
techniques with the purpose of drawing insights from the data
• Data Science is used in many industries to allow them to make better business decisions,
and in the sciences to test models or theories
• This requires a process of inspecting, cleaning, transforming, modelling, analysing and
interpreting raw data.

Data Perspective:
• Read Data
• Data processing and cleaning (Finding and removing errors, missing data)
• Summarizing Data
• Visualization (Looking at data pictorially)
• Deriving insights from data

Data Sciences using Python:


• Python libraries provide key feature sets that are essential for data science
• Data manipulation and pre-processing
• Python’s “Pandas” library o ers a variety of functions for data wrangling and
manipulation.
• Data Summary
• Visualization
• Plotting libraries like ‘Matplotlib’ and ‘seaborn’ aid in condensing statistical
information and help in identifying trends and relationships
• Machine learning libraries like ‘sci-kit learn’ o er a bouquet of machine learning
algorithms

Advantages of Python:
• Provides a good ecosystem of libraries that are robust and varied
• Tight-knit integration with big data frameworks like Hadoop, Spark, etc.
• Supports both object-oriented and functional programming paradigms
• Python is reasonably fast to prototype
• Provides support for reading les from local, databases and cloud.

Lec 2: Introduction to Python

Evolution of Python:
• Python was developed by Guido van Rossum in the late eighties at the ‘National
Research Institute for Mathematics and Computer Science’ at the Netherlands

1
ff
fi
ff
Python as a programming language:
• Supports multiple programming paradigms
• Functional, Structural, OOPs, etc.
• Dynamic typing
• Runtime type safety checks
• Reference counts
• Deallocates objects that are not used for long
• Late binding
• Methods are looked up by name during runtime
• Python design is guided by 20 aphorisms as described in Zen of Python by Tim Peters
• Standard Cpython interpreter is managed by “Python Software Foundation”
• There are other interpreters namely JPython (Java), Iron Python (C#), Stackless Python (C,
used for parallelism), PyPy( Python itself JIT compilation)
• Standard libraries are written in Python itself
• High standards of readability

• Cross-platform (Windows, Linux, Mac)


• Highly supported by a large community group
• Better error handle

Python vs Java
• Java is statically typed i.e., type safety is checked during compilation (static compilation)
• Thus in Java, the time required to develop the code is more
• Python which Is dynamically typed compensates for huge compilation time when
compared to Java
• Codes that are dynamically typed tend to be less verbose therefore o ering more
readability

Advantages of using Python:


• Python has several features that make it well-suited for data science.
• Open source and community development
• Developed under Open Source Initiative license making it free to use and distribute
even commercially
• The syntax used is simple to understand for speci c data science tasks
• Combines well with the majority of the cloud platform service providers

Coding Environment:
• A software program can be written using a terminal, a command prompt (cmd), a text
editor or through an Integrated Development Environment (IDE)
• The program needs to be saved in a le with an appropriate extension (.py for
Python, .mat for MatLab, etc.) and can be executed in the corresponding environment
(Python, MatLab, etc.)
• Integrated Development Environment (IDE) is a software product solely developed to
support software development in various or speci c programming languages

Integrated Development Environment (IDE)


• Software application consisting of a cohesive unit of tools required for development
• Designed to simplify software development
• Utilities provided by IDEs include tools for managing, compiling, deploying and debugging
software

2
fi
fi
fi
ff
Coding Environment- IDE:
• An IDE usually comprises of:
• Source code editor
• Compiler
• Debugger
• Additional features include syntax error highlighting, code completion
• O ers support in building and executing the program along with debugging the code from
within the environment
• Best IDEs provide version control features
• Eclipse+PyDev, SublimeText, Atom, GNU Emacs, Vi/Vim, Visual Studio, and Visual Studio
Code are general IDEs with Python support.
• Apart from these, some of the Python-speci c editors include Pycharm, Jupyter, Spyder,
Thonny

Spyder
• Supported across Linux, Mac OS X and Windows platforms
• Available as open source version
• Can be installed separately or through Anaconda distribution
• Developed for Python and speci cally data science
• Features include
• Code editor with robust syntax and error highlighting
• Code completion and navigation
• Debugger
• Integrated document
• Interface similar to MATLAB and RStudio

PyCharm
• Supported across Linux, Mac OS X and Windows platform
• Available as a community (free open source) and professional (paid) version
• Supports only Python
• Can be installed separately or through Anaconda distribution
• Features include
• Code editor provides syntax and error highlighting
• Code completion and navigation
• Unit testing
• Debugger
• Version control

Jupyter Notebook
• Web application that allows the creation and manipulation of documents called
‘notebook’
• Supported across Linux, Mac OS X and Windows platforms
• Available as open source version
• Bundled with Anaconda distribution or can be installed separately
• Supports Julia, Python, R and Scala
• Consists of an ordered collection of input and output cells that contain code, texts, plots
etc.
• Allows sharing of code and narrative text through output formats like PDF, HTML, etc.
• Lacks most of the features of a good IDE

3
ff
fi
fi
Lec 3: Introduction to Spyder - Part 1

The entire interface is split into three windows:


1) Script Window: Window for writing all lines of codes and commands
2) Files/Variables/Help:
i) Files: Displays all the existing les in your current working directory
ii) Variables: Display all variables and objects used in the code. Shows the variable
name, type, size (whether it's a single value or array) and value stored in variable
iii) Help: Displays instructions for basic functions of Python
3) Console: Panel to display all printed statements of the code being run. It can also be
used to perform elementary actions and commands, although it would not be saved.

Directory: This is a le system cataloguing structure that contains references to other


computer les and possibly other directories.

Setting Working Directory


There are three ways to set up a working directory
1) Icon
2) Using library ‘os’
3) Using the command ‘cd’

Creating a Script File


There are two ways of creating a script le
1) By clicking the icon ‘New File’ below the menubar.
2) By clicking the ‘File’ menu in the menubar and select ‘New File’

Variable
• An identi er containing known information
• Information is referred to as value
• Variable name points to a memory address or a storage location and is used to reference
the stored value

Lec 4: Introduction to Spyder - Part 2

Executing Script les


To run full code
1) Press ‘Run File’ from the icon bar (play button
2) ‘F5’ to run full code

To run the chosen line, select the line and


1) Press ‘Run Selection’ from the icon bar
2) Press ‘Ctrl+Enter’ or ‘F9’

4
fi
fi
fi
fi
fi
fi
Commenting lines of code
• Adding comments will help in understanding algorithms used while developing codes
• In practice, commented statements will be added before the code and begin with a ‘#’
• Multiple lines can also be commented
• Commenting multiple lines:
• Select lines that have to be commented on and then press ‘Ctrl + 1’
• Select ‘Edit’ in the menu and select ‘Comment/Uncomment’
• Uses - To add a description, render lines of code inert during testing.

Clearing an Overpopulated Console


1) Type ‘%clear’ in console
2) Place the cursor on the console and press ‘Ctrl + L’

Removing/Deleting Variable(s)
1) Removing single variable: Using ‘del’ followed by the variable name (Eg: del b)
2) Removing multiple variables: Using ‘del’ followed by variable names separated by a
comma (Eg: del a,b)

Clearing the environment at once


There are two ways to clear the environment
1) Type ‘%reset’ in the console and type ‘y’ after the prompt (to con rm the command)
2) Click the ‘Erase’ symbol to remove variables in the environment

Basic libraries in Python


1) NumPy - Numerical Python
2) Pandas - Panneled Dataframe Python
3) Matplotlib - Visualization
4) Sklearn - Machine Learning

Lec 5: Variables and Data Types

Naming Variables
• Values assigned to variables using an assignment operator ‘=‘
• Variable names should be short and descriptive
• Avoid using variable names that clash with inbuilt functions
• Designed to indicate the intent of its use to the end-user
• One character variable names are usually used in looping constructs, functions, etc.

• Variables can be named alphanumerically


Eg: age2= 55
• However, the rst letter must start with an alphabet (lowercase or uppercase)
Eg: The ‘2age’ variable name will show a syntax error
• Only ‘_’ is the special character allowed in variable naming. Any other special character
used will throw an error. Variable names should not begin or end with an underscore, even
though both are allowed

Naming Conventions
1) Camel (lower and upper)
Eg: ageEmp = 45
2) Snake
Eg: age_emp=45

5
fi
fi
3) Pascal
Eg: AgeEmp = 45
Basic Data Types

Basic Data Types Description Values Representation

Represents two values


of logic an associated
Boolean True and False bool
with conditional
statements

Positive and negative


Integer Set of all integers (Z) int
whole numbers

Contains real and Set of complex


Complex complex
imaginary part (a+ib) numbers

Float Real Numbers Floating point numbers oat

All string or characters


enclosed between Sequence of
String Str
single or double characters
quotes

Identifying object data type


Find data type of object using ‘type()’ function
Syntax : type(object)

Verifying Object data type


To verify if an object is of a certain data type:
Syntax: type(object) is datatype
This will give output as either ‘True’ or ‘False’

Coercing object to new data type


Convert the data type of an object to another
Syntax: datatype(object)
Changes can be stored in the same variable or a di erent variable
Eg: ht1 = 186.6
ht2 = int(ht1) (ht2= 186)

Only a few coercions are accepted

Lec 6: Operators

Operators and Operand


• Operators are special symbols that help in carrying out an assignment operation or
arithmetic or logical computation
• The value that the operator operates on is called the operand

6
fl
ff
I] Arithmetic Operators
• Used to perform mathematical operations between two operands
• Eg: Create two variables ‘a’ and ‘b’ with values ’10’ and ‘5’, respectively

Symbol Operation Example

In : a+b
+ Addition
Out : 15

In : a-b
- Subtraction
Out : 5

In : a*b
* Multiplication
Out : 50

In : a/b
/ Division
Out : 2.0

In : a%b
% Remainder (Modulus)
Out : 0

In : a**b
** Exponent
Out : 100000

Hierarchy of Arithmetic Operators

Decreasing order of Precedence Operation


Parantheses ()

Exponent **

Division /

Multiplication *

Addition and Subtraction +,-

II] Assignment Operator

Used to assign values to variables


Symbol Operation Example
Assign values from right side operands to left a = 10
= side operand b=5
Adds right operand to left operand and stores a +=b
+= result on left side operand (a = a+b) print(a) [a=15]

7
Symbol Operation Example
a -= b
Subtracts right operand from left operand
-= and stores result on left side operand (a= a-b)
print(a) [a=5]

Multiplies right operand from left operand and a*= b


*= stores result on left side operand (a = a*b) print(a) [a=50]
Divides right operand from left operand and a /=b
/= stores result on left side operand print(a) [a=2.0]

III] Relational or Comparison Operators

• Tests numerical equalities and inequalities between two operands and returns a boolean
value
• All operators have the same precedence
• Create two variables ‘x’ and ‘y’ with values 5 and 7 respectively

Symbol Operation Example


In : print(x<y)
< Strictly Less than
Out : True
In : print(x<=y)
<= Less than or Equal to
Out : True
In : print(x>y)
> Strictly Greater than
Out : False
In : print(x>=y)
>= Greater than or Equal to
Out : False
In : print(x==y)
== Equal to Equal to
Out : False
In : print(x!=y)
!= Not equal to
Out : True

IV] Logical Operators


• Used when operands are conditional statements and returns a boolean value
• In Python, logical operators are designed to work with scalars or boolean values
• Create two variables ‘x’ and ‘y’ with values 5 and 7 respectively

Symbol Operation Example

In : print((x>y) or (x<y))
or Logical OR
Out : True

In : print((x>y) and (x<y))


and Logical AND
Out : False

In : print(not (x==y))
not Logical NOT
Out : True

8
V] Bitwise Operators

• Used when operands are integers


• Integers are treated as a string of binary digits
• Operates bit by bit
• Can also operate on conditional statements which compare scalar values or arrays
• Bitwise OR ( | ), AND ( & )

You might also like