Unit 1 Python Basics
Unit 1 Python Basics
4 ANALYSING Analyze data using Python libraries such as Pandas and NumPy.
• Comments are indicated by the hash mark (pound sign) #. Any text preceded
by the sign # is ignored by the Python interpreter. This is often used to add
comments to code. At times you may also want to exclude certain blocks of
code without deleting them.
• Two statements on the same line are separated with a semicolon ";"
• No semicolon at the end of lines
• A long line continue on next with “\" (it is not always needed)
• Grouping is obtained through indentation
• One Python script is considered a module that can be run or imported by other
modules
• Assignment uses the equal sign "="
Python Indentations
• Python uses indentation to indicate a block of code
Example
a=5
b=2
if a > b:
print("a is greater than b!")
• Python automatically recognizes the data type based on the value that is assigned to a variable
Example
x=5 # x is of type int
x = "Hello" # x is now of type str
print(x) # This will print "Hello" as variable x has been assigned a new value
Python Numbers
• There are three numeric types in Python:
• int
• float
• complex
• Variables of numeric types are created automatically as soon as you assign a value to them:
Example
x = 23 # int
y = 5.6 # float
z = 1+2j # complex
• To identify the type of any variable or object in python, use the type() function
Python Operators
• Operators are used to perform different kinds of operations on variables and values
• Following is a list of operators in Python:
• Arithmetic operators (+,-,*,/,//,%,**)
• Assignment operators (=,+=,-= etc)
• Comparison operators (==,!=,>,<,>=,<=)
• Logical operators (and, or, not)
• Membership operators (in, not in)
• Identity operators (is, not is)
• Bitwise operators (works on bits)
Operators
Lexicon MILE
The following tokens are operators:
+ - * ** / // % @
<< >> & | ^ ~
< > <= >= == !=
Delimeters
] { }
, : . ; @ = ->
+= -= *= /= //= %= @=
&= |= ^= >>= <<= **=
Keywords
Lexicon MILE
The following identifiers are used as reserved words, or keywords of the language,
and cannot be used as ordinary identifiers. They must be spelled exactly as written here:
y = (x = x + 1) is invalid
• As in C: x += 1 is valid
• Note that pre/post increment/decrement: x++; ++x; x--; --x are invalid
str String type. ASCII-valued only in Python 2.x and Unicode in Python 3
long Arbitrary precision signed integer. Large intvalues are automatically converted
to long.
Numbers
Lexicon MILE
The primary Python types for numbers are int and float. Python also supports other types of
numbers, such as Decimal and Fraction. In addition, Python has built-in support for complex
numbers, and uses the j or J suffix to indicate the imaginary part (e.g. 3+5j).
The Python interpreter acts as a simple calculator. You can type an expression at it and it will
writevalue. Expression syntax is straightforward: the operators +, -, * and / work just like in most
the
other languages (for example, Pascal or C); parentheses (()) can be used for grouping.
For example:
>>> 2 + 2
4
>>> 50 - 5*6 20
>>> (50 - 5*6) / 4
5.0
>>> 8/ 5 # division always returns a floating
point number
1.6
The integer numbers (e.g. 2, 4, 20) have type int, and numbers with a fractional part (e.g. 5.0,
1.6) have type float.
Numbers
Division (/) always returns a float. To do floor division (mathematical division that rounds
down to nearest integer) and get an integer result (discarding any fractional result), you can use
Lexicon MILE
the // operator; to calculate the remainder you can use
%:
>>> 17 / 3 # classic division returns a float
5.666666666666667
>>>
>>> 17 // 3 # floor division discards the fractional part
5
>>> 17 % 3 # the % operator returns the remainder of the division
2
>>> 5 * 3 + 2 # result * divisor + remainder
17
With Python, it is possible to use the ** operator to calculate
powers.
>>> 5 ** 2 # 5 squared
25
>>> 2 ** 7 # 2 to the power of 7
128
** has higher precedence than -. Example: -3**2 will be interpreted as -(3**2)
and thus result in -9. To avoid this and get 9, you can use (-3)**2.
Numbers
The equal sign (=) is used to assign a value to a variable. Afterwards, no result is displayed before
the next interactive prompt:
>>> width = 20
>>> height = 5 * 9
>>> width * height
900
If a variable is not “defined” (assigned a value), trying to use it will give you an error:
>>> n # try to access an undefined variable
Traceback (most recent call last):
File "<stdin>", line 1, in <module> NameError:
name 'n' is not defined
There is full support for floating point; operators with mixed type operands convert the integer
operand to floating point:
>>> 3 * 3.75 / 1.5
7.5
Numbers
In interactive mode, the last printed expression is assigned to the variable _. This means that
when you are using Python as a desk it is somewhat easier to
calculator, calculations, for example: continue
>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax 12.5625
>>> price + _ 113.0625
>>> round(_, 2)
113.06
Strings
MBA@IICMR
Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They
can be enclosed in single quotes (’...’) or double quotes ("...") with the same result. \ can be used to
escape quotes:
# \n means newline
>>> s # without
print(), \n is included in the
output
'First line.\nSecond line.'
>>> print(s) # with
Strings
print("""\
Usage: thingy
[OPTIONS]
-h Display this usage
-H message Hostname to
hostname connect to
""")
produces the following output (note that the initial newline is not included):
| P | y
| t | h |
o | n 0 1
2 3
4 5 6
-6 -5 -4 -3
-2 -1
The first row of numbers gives the position of the indices 0...6 in the string; the
second row gives the corresponding negative indices.
For non-negative indices, the length of a slice is the difference of the indices, if both are
Strings - Indexing & Slicing
MBA@IICMR Attempting to use an index that is too large will result in an error:
>>> word[42] # the word only has 6
characters
Traceback (most recent call last):
File "<stdin>", line 1, in <module> IndexError: string
index out of range
However, out of range slice indexes are handled gracefully when used for slicing:
>>> word[4:42]
'on'
>>> word[42:]
'‘
Python strings
cannot be changed
—they are
immutable.
Therefore, assigning
to an
indexed position in
the string results in
an error:
>>>
Strings
MBA@IICMR
In [003]: a = [1, 2, 3]
.....: if a:
.....: print 'I found something!'
.....: In [005]: bool([]), bool([1, 2, 3])
I found something! Out[005]: (False, True)
In [010]: b = 5
In [011]: b is not None Out[011]:
True
add_and_maybe_multiply(a, b, c=None):
result = a + b
if c is not None:
result = result * c return result
Type casting
The str, bool, int and float types are also functions which can be used to
cast values to those types:
In [012]: s = '3.14159'
In [015]: int(fval)
Out[015]: 3
In [016]: bool(fval)
Out[016]:
True
In [017]: bool(0)
Out[017]: False
Dates and times
The built-in Python datetime module provides datetime, date, and time types. The
datetime type as you may imagine combines the information stored in date and time and is the
most commonly used:
Given a datetime instance, you can extract the equivalent date and time objects
by calling methods on the datetime of the same name:
In [021]: dt.date()
Out[021]: datetime.date(2011, 10, 29)
In [022]: dt.time()
Out[022]: datetime.time(20, 30, 21)
Dates and times
The strftimemethod formats a datetimeas a string:
When aggregating or otherwise grouping time series data, it will occasionally be useful to
replace fields of a series of datetimes, for example replacing the minute and second fields
with zero, producing a new object:
In [028]: delta
Out[028]: datetime.timedelta(17, 7179)
In [030]: dt
Out[030]: datetime.datetime(2011, 10, 29, 20, 30, 21) In
[031]: dt + delta
Out[031]: datetime.datetime(2011, 11, 15, 22, 30)
Datetime format specification
Type Description
%Y 4-digit year
%y 2-digit year
%m 2-digit month [01, 12]
%d 2-digit day [01, 31]
%H Hour (24-hour clock) [00, 23]
%I Hour (12-hour clock) [01, 12]
%M 2-digit minute [00, 59]
%S Second [00, 61] (seconds 60, 61 account for leap seconds)
%w Weekday as integer [0 (Sunday), 6]
%U Week number of the year [00, 53]. Sunday is considered the first day of the week, and days before the
first Sunday of the year are “week 0”.
%W Week number of the year [00, 53]. Monday is considered the first day of the week, and days before the
first Monday of the year are “week 0”.
%z UTC time zone offset as +HHMM or -HHMM, empty if time zone naive
%F Shortcut for %Y-%m-%d, for example 2012-4-18
%D Shortcut for %m/%d/%y, for example 04/18/12
if
Control Flow
Statements
The if statement is one of the most well-known control flow statement types. It checks
a condition which, if True, evaluates the code in the block that
follows.
>>> x = int(input("Please enter an integer:
")) Please enter an integer: 42
>>> if x < 0:
... x = 0
... print('Negative changed to zero')
... elif x == 0:
... print('Zero')
... elif x == 1:
... print('Single')
... else:
... print('More')
...
More blocks. The keyword
‘elif‘ is short can
An if statement for be
‘else if’, and
optionally is useful
followed to avoid
by one or excessive indentation. An if ...
more elif
elif ... elif ... sequence is a substitute for the switch or case statements
found in other languages.
Control Flow
for
Statements
Python’s for statement iterates over the items of any sequence (a list or a string), in the
order that they appear in the sequence. For example
>>> # Measure some strings:
... words = ['cat', 'window', 'defenestrate']
>>> for w in words:
... print(w, len(w))
...
cat 3
window 6
defenestrate 12
If you need to modify the sequence you are iterating over while inside the loop (for
example to duplicate selected items), it is recommended that you first make a copy.
Iterating over a sequence does not implicitly make a copy. The slice notation makes this
especially convenient:
>>> for w in words[:]: # Loop over a slice copy of the entire list.
... if len(w) > 6:
... words.insert(0, w)
...
>>> words
['defenestrate', 'cat', 'window', 'defenestrate']
Control Flow
while
loop
A while loop specifies a condition and a block of code that is to be executed until the
condition evaluates to False or the loop is explicitly ended with break:
x = 256 >>> # Fibonacci series:
total = 0 ... # the sum of two elements defines the next
while x > ... a, b = 0, 1
0: >>> while b <
if total > 10:
500: break ... print(b)
total += x ... a, b = b, a+b
x = x // 2 ...
1
1
2
3
5
8
Control Flow
break and continue Statements, and else Clauses on
Loops
The break statement, like in C, breaks out of the smallest enclosing for or while loop.
Loop statements may have an else clause; it is executed when the loop terminates
through exhaustion of the list (with for) or when the condition becomes false (with while),
but not when the loop is terminated by a break statement. This is exemplified by the
following loop, which searches for prime numbers:
>>> for n in range(2, 10):
... for x in range(2, n):
... if n % x == 0:
... print(n, 'equals', x, '*', n//x)
... break
... else: #the else clause belongs to the for loop, not the if statement
... # loop fell through without finding a factor
... print(n, 'is a prime number')
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3
Control Flow
break and continue Statements, and else Clauses on
Loops
When used with a loop, the else clause has more in common with the else clause
of a try statement than it does that of if statements: a try statement’s else
clause runs when no exception occurs, and a loop’s else clause runs when no break
occurs.
The continue statement, also borrowed from C, continues with the next iteration of
the loop:
>>> for num in range(2, 10):
... if num % 2 == 0:
... print("Found an even number", num)
... continue
... print("Found a number", num)
Found an even number 2
Found a number 3
Found an even number 4
Found a number 5
Found an even number 6
Found a number 7
Found an even number 8
Found a number 9
Control Flow
pass
Statements
The pass statement does nothing. It can be used when a statement is
required syntactically but the program requires no action. For example:
if x < 0:
print
'negative!' elif x ==
0:
# TODO: put something
smart here pass
else:
print 'positive!'
As you can see, range produces integers up to but not including the endpoint. A
common use of range is for iterating through sequences by index:
seq = [1, 2, 3, 4]
for i in range(len(seq)):
val = seq[i]
Control Flow
The xrange()
Function
For very long ranges, it’s recommended to use xrange, which takes the same
arguments as range but returns an iterator that generates integers one by one rather than
generating all of them up-front and storing them in a (potentially very large) list. This
snippet sums all numbers from 0 to 100 that are multiples of 3 or 5:
sum = 0
for i in xrange(100):
# % is the modulo
operator if x % 3 == 0 or x
% 5 == 0: sum += i
In Python 3, range always returns an iterator, and thus it is not necessary to use the
xrange function.
Data Structures and Sequences
Tuple
A tuple is a one-dimensional, fixed-length, immutable sequence of Python objects. The
easiest way to create one is with a comma-separated sequence of values:
In [038]: tup = 4, 5, 6
In [039]: tup
Out[039]: (4, 5, 6)
When defining tuples in more complicated expressions, it’s often necessary to enclose the
values in parentheses, as in this example of creating a tuple of tuples:
Elements can be accessed with square brackets [] as with most other sequence types. Like
C, C++, Java, and many other languages, sequences are 0-indexed in Python:
In [045]: tup[0]
Out[045]: 's'
Tuple
Data Structures and Sequences
While the objects stored in a tuple may be mutable themselves, once created it’s not
possible to modify which object is stored in each slot:
In [046]: tup = tuple(['foo', [1, 2], True])
In [047]: tup[2] = False
--------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-365-c7308343b841> in <module>()
----> 1 tup[2] = False
TypeError: 'tuple' object does not support item assignment
# however
In [048]: tup[1].append(3)
In [049]: tup
Out[049]: ('foo', [1, 2, 3], True)
Tuples can be concatenated using the + operator to produce longer tuples:
In [050]: (4, None, 'foo') + (6, 0) + ('bar',)
Out[050]: (4, None, 'foo', 6, 0, 'bar')
Data Structures and Sequences
Tuple
Multiplying a tuple by an integer, as with lists, has the effect of concatenating together
that many copies of the tuple.
In [051]: ('foo', 'bar') * 4
Out[051]: ('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')
Note that the objects themselves are not copied, only the references to them.
Unpacking tuples
If you try to assign to a tuple-like expression of variables, Python will attempt to unpack
the value on the right-hand side of the equals sign:
In [052]: tup = (4, 5, 6)
In [053]: a, b, c = tup In [054]:
b
Out[054]: 5
Data Structures and Sequences
Unpacking tuples
Even sequences with nested tuples can be unpacked:
In [055]: tup = 4, 5, (6, 7)
In [056]: a, b, (c, d) = tup In [057]:
d Out[057]: 7
Using this functionality it’s easy to swap variable names, a task which in many languages might
look like:
tmp = a
a=b
b=
tmp b,
a = a, b
One of the most
common uses of
Data Structures and Sequences
Tuple methods
Since the size and contents of a tuple cannot be modified, it is very light on instance methods. One
particularly useful one (also available on lists) is count, which counts the number of occurrences of a
value:
In [058]: a = (1, 2, 2, 2, 3, 4, 2)
In [059]: a.count(2)
Out[059]: 4
Data Structures and Sequences
List
Python has a number of compound data types, used to group together other values. The most versatile is
the list, which can be written as a list of comma-separated values (items) between square brackets.
Lists might contain items of different types, but usually the items all have the same type. They can be
defined using square brackets [] or using the list type function:
List
Adding and removing elements
Elements can be appended to the end of the list with the append method:
In [064]: b_list.append('dwarf')
In [065]: b_list
Out[065]: ['foo', 'peekaboo', 'baz', 'dwarf']
Using insert you can insert an element at a specific location in the list:
List
Sorting
A list can be sorted in-place (without creating a new object) by calling its sort function:
In [078]: a = [7, 2, 5, 1, 3]
In [079]: a.sort()
In [080]: a
Out[080]: [1, 2, 3, 5, 7]
sort has a few options that will occasionally come in handy. One is the ability to pass a secondary
sort key, i.e. a function that produces a value to use to sort the objects. For example, we could sort a
collection of strings by their lengths:
The built-in bisect module implements binary-search and insertion into a sorted list.
bisect.bisect finds the location where an element should be inserted to keep it sorted, while
bisect.insort actually inserts the element into that location:
In [087]: bisect.insort(c, 6)
In [088]: c
Out[088]: [1, 2, 2, 2, 3, 4, 6, 7]
Note: The bisect module functions do not check whether the list is sorted as doing so would be
computationally expensive. Thus, using them with an unsorted list will succeed without error but may lead
to incorrect results.
Data Structures and Sequences
MBA@IICMR
List
Slicing
You can select sections of list-like types (arrays, tuples, NumPy arrays) by using slice
notation, which in its basic form consists of start:stop passed to the indexing
operator []:
While element at the start index is included, the stop index is not included, so that the number of elements in
the result is stop - start. Either the start or stop can be omitted in which case they default to the
start of the sequence and the end of the sequence, respectively:
In [092]: seq[:5]
Out[092]: [7, 2, 3, 6, 3]
In [093]: seq[3:]
Out[093]: [6, 3, 5, 6, 0, 1]
List
sorted
The sorted function returns a new sorted list from the elements of any sequence:
A common pattern for getting a sorted list of the unique elements in a sequence is to
combine sorted with set:
In [101]: sorted(set('this is just some string'))
Out[101]: [' ', 'e', 'g', 'h', 'i', 'j', 'm', 'n', 'o', 'r', 's', 't',
'u']
Data Structures and Sequences
MBA@IICMR
List
zip
zip “pairs” up the elements of a number of lists, tuples, or other sequences, to
create a list of tuples:
In [102]: seq1 = ['foo', 'bar', 'baz']
In [103]: seq2 = ['one', 'two', 'three'] In [104]:
zip(seq1, seq2)
Out[104]: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]
zip can take an arbitrary number of sequences, and the number of elements it
produces
is determined by the shortest sequence: In [105]:
seq3 = [False, True] In [106]: zip(seq1,
seq2, seq3)
Out[106]: [('foo', 'one', False), ('bar', 'two', True)]
List Methods
The list data type has many methods. Here are all of the methods of list
objects:
MBA@IICMR
List Methods Description
list.append(x) Add an item to the end of the list. Equivalent to a[len(a):] = [x].
list.extend(L) Extend the list by appending all the items in the given list. Equivalent to
a[len(a):] = L.
list.insert(i, x) Insert an item at a given position. The first argument is the index of the
element before which to insert, so a.insert(0, x) inserts at the front
of the list, and a.insert(len(a), x) is equivalent to a.append(x).
list.remove(x) Remove the first item from the list whose value is x. It is an error if there
is no such item.
list.pop([i ]) Remove the item at the given position in the list, and return it. If no index is
specified, a.pop() removes and returns the last item in the list. (The
square brackets around the i in the method signature denote that the
parameter is optional, not that you should type square brackets at that
position.)
list.clear() Remove all items from the list. Equivalent to del a[:].
List Methods
MBA@IICMR
list.index(x) Return the index in the list of the first item whose value is x. It is an
error if there is no such item.
list.sort Sort the items of the list in place (the arguments can be used for sort
(key=None, customization, see sorted() for their explanation).
reverse=False
)
list.reverse() Reverse the elements of the list in place.
Example continued…
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]
>>>
a.pop()
1234.5
>>> a
[-1, 1,
66.25, 333,
333]
Note: Methods like insert, remove or sort that only modify the list have
The Del Statement
MBA@IICMR
There is a way to remove an item from a list given its index instead of its value: the del statement. This
differs from the pop() method which returns a value. The del statement can also be used to remove
slices from a list or clear the entire list (which we did earlier by assignment of an empty list to the slice).
For example:
Referencing the name a hereafter is an error (at least until another value is assigned to it).
Tuple
MBA@IICMR
Tuple
Tuples are used to store multiple items in a single variable.
Tuple is one of 4 built-in data types in Python used to store collections of data
thistuple = ("apple", "banana", "cherry")
print(thistuple)
Dictionaries
MBA@IICMR
Another useful data type built into Python is the dictionary. Dictionaries are indexed
by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be
used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object
either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be
modified in place using index assignments, slice assignments, or methods like append() and
extend().
It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that
the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}.
Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the
dictionary; this is also the way dictionaries
are written on output.
The main operations on a dictionary are storing a value with some key and extracting the value
given the key. It is also possible to delete a key:value pair with del. If you store using a key that is
already in use, the old value associated with that key is forgotten. It is an error to extract a value
using a non-existent key.
MBA@IICMR
Dictionaries
Performing list(d.keys()) on a dictionary returns a list of all the keys used in the
dictionary, in arbitrary order (if you want it sorted, just use sorted(d.keys()) instead).
To check whether a single key is in the dictionary, use the in keyword. Here is a small example
using a dictionary:
thisdict = {
"brand": "Ford",
"electric": False,
"year": 1964,
"colors": ["red", "white", "blue"]
}
Looping Techniques
MBA@IICMR
When looping through dictionaries, the key and corresponding value can be retrieved at the
same time using the items() method.
MBA@IICMR