Python for Data Science- Week 1
Python for Data Science- Week 1
MODULE 1
Data Perspective:
• Read Data
• Data processing and cleaning (Finding and removing errors, missing data)
• Summarizing Data
• Visualization (Looking at data pictorially)
• Deriving insights from data
Advantages of Python:
• Provides a good ecosystem of libraries that are robust and varied
• Tight-knit integration with big data frameworks like Hadoop, Spark, etc.
• Supports both object-oriented and functional programming paradigms
• Python is reasonably fast to prototype
• Provides support for reading les from local, databases and cloud.
Evolution of Python:
• Python was developed by Guido van Rossum in the late eighties at the ‘National
Research Institute for Mathematics and Computer Science’ at the Netherlands
1
ff
fi
ff
Python as a programming language:
• Supports multiple programming paradigms
• Functional, Structural, OOPs, etc.
• Dynamic typing
• Runtime type safety checks
• Reference counts
• Deallocates objects that are not used for long
• Late binding
• Methods are looked up by name during runtime
• Python design is guided by 20 aphorisms as described in Zen of Python by Tim Peters
• Standard Cpython interpreter is managed by “Python Software Foundation”
• There are other interpreters namely JPython (Java), Iron Python (C#), Stackless Python (C,
used for parallelism), PyPy( Python itself JIT compilation)
• Standard libraries are written in Python itself
• High standards of readability
Python vs Java
• Java is statically typed i.e., type safety is checked during compilation (static compilation)
• Thus in Java, the time required to develop the code is more
• Python which Is dynamically typed compensates for huge compilation time when
compared to Java
• Codes that are dynamically typed tend to be less verbose therefore o ering more
readability
Coding Environment:
• A software program can be written using a terminal, a command prompt (cmd), a text
editor or through an Integrated Development Environment (IDE)
• The program needs to be saved in a le with an appropriate extension (.py for
Python, .mat for MatLab, etc.) and can be executed in the corresponding environment
(Python, MatLab, etc.)
• Integrated Development Environment (IDE) is a software product solely developed to
support software development in various or speci c programming languages
2
fi
fi
fi
ff
Coding Environment- IDE:
• An IDE usually comprises of:
• Source code editor
• Compiler
• Debugger
• Additional features include syntax error highlighting, code completion
• O ers support in building and executing the program along with debugging the code from
within the environment
• Best IDEs provide version control features
• Eclipse+PyDev, SublimeText, Atom, GNU Emacs, Vi/Vim, Visual Studio, and Visual Studio
Code are general IDEs with Python support.
• Apart from these, some of the Python-speci c editors include Pycharm, Jupyter, Spyder,
Thonny
Spyder
• Supported across Linux, Mac OS X and Windows platforms
• Available as open source version
• Can be installed separately or through Anaconda distribution
• Developed for Python and speci cally data science
• Features include
• Code editor with robust syntax and error highlighting
• Code completion and navigation
• Debugger
• Integrated document
• Interface similar to MATLAB and RStudio
PyCharm
• Supported across Linux, Mac OS X and Windows platform
• Available as a community (free open source) and professional (paid) version
• Supports only Python
• Can be installed separately or through Anaconda distribution
• Features include
• Code editor provides syntax and error highlighting
• Code completion and navigation
• Unit testing
• Debugger
• Version control
Jupyter Notebook
• Web application that allows the creation and manipulation of documents called
‘notebook’
• Supported across Linux, Mac OS X and Windows platforms
• Available as open source version
• Bundled with Anaconda distribution or can be installed separately
• Supports Julia, Python, R and Scala
• Consists of an ordered collection of input and output cells that contain code, texts, plots
etc.
• Allows sharing of code and narrative text through output formats like PDF, HTML, etc.
• Lacks most of the features of a good IDE
3
ff
fi
fi
Lec 3: Introduction to Spyder - Part 1
Variable
• An identi er containing known information
• Information is referred to as value
• Variable name points to a memory address or a storage location and is used to reference
the stored value
4
fi
fi
fi
fi
fi
fi
Commenting lines of code
• Adding comments will help in understanding algorithms used while developing codes
• In practice, commented statements will be added before the code and begin with a ‘#’
• Multiple lines can also be commented
• Commenting multiple lines:
• Select lines that have to be commented on and then press ‘Ctrl + 1’
• Select ‘Edit’ in the menu and select ‘Comment/Uncomment’
• Uses - To add a description, render lines of code inert during testing.
Removing/Deleting Variable(s)
1) Removing single variable: Using ‘del’ followed by the variable name (Eg: del b)
2) Removing multiple variables: Using ‘del’ followed by variable names separated by a
comma (Eg: del a,b)
Naming Variables
• Values assigned to variables using an assignment operator ‘=‘
• Variable names should be short and descriptive
• Avoid using variable names that clash with inbuilt functions
• Designed to indicate the intent of its use to the end-user
• One character variable names are usually used in looping constructs, functions, etc.
Naming Conventions
1) Camel (lower and upper)
Eg: ageEmp = 45
2) Snake
Eg: age_emp=45
5
fi
fi
3) Pascal
Eg: AgeEmp = 45
Basic Data Types
Lec 6: Operators
6
fl
ff
I] Arithmetic Operators
• Used to perform mathematical operations between two operands
• Eg: Create two variables ‘a’ and ‘b’ with values ’10’ and ‘5’, respectively
In : a+b
+ Addition
Out : 15
In : a-b
- Subtraction
Out : 5
In : a*b
* Multiplication
Out : 50
In : a/b
/ Division
Out : 2.0
In : a%b
% Remainder (Modulus)
Out : 0
In : a**b
** Exponent
Out : 100000
Exponent **
Division /
Multiplication *
7
Symbol Operation Example
a -= b
Subtracts right operand from left operand
-= and stores result on left side operand (a= a-b)
print(a) [a=5]
• Tests numerical equalities and inequalities between two operands and returns a boolean
value
• All operators have the same precedence
• Create two variables ‘x’ and ‘y’ with values 5 and 7 respectively
In : print((x>y) or (x<y))
or Logical OR
Out : True
In : print(not (x==y))
not Logical NOT
Out : True
8
V] Bitwise Operators