Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
232 views

A Project Report On Python Project: Degree of Bachelor of Technology Branch:-Computer Science and Engineering

This document is a project report on a Python project submitted by students Shubham Digra, Prakash Singh, and Sushant Panditaka to Global Group of Institutes, Amritsar for their Bachelor of Technology degree. It discusses ThinkNEXT Technologies, the company that will provide industrial training for the students. ThinkNEXT is an ISO-certified company that provides various IT and electronics services and solutions. The report provides details on ThinkNEXT's profile, activities, training programs, clients, and the benefits of choosing ThinkNEXT for industrial training. It also introduces Python programming language, describing its main features and versions.

Uploaded by

Kamal Pandita
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
232 views

A Project Report On Python Project: Degree of Bachelor of Technology Branch:-Computer Science and Engineering

This document is a project report on a Python project submitted by students Shubham Digra, Prakash Singh, and Sushant Panditaka to Global Group of Institutes, Amritsar for their Bachelor of Technology degree. It discusses ThinkNEXT Technologies, the company that will provide industrial training for the students. ThinkNEXT is an ISO-certified company that provides various IT and electronics services and solutions. The report provides details on ThinkNEXT's profile, activities, training programs, clients, and the benefits of choosing ThinkNEXT for industrial training. It also introduces Python programming language, describing its main features and versions.

Uploaded by

Kamal Pandita
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 62

A PROJECT REPORT ON

PYTHON PROJECT
Submitted in partial fulfilment of the Requirements for the award of

DEGREE OF BACHELOR OF TECHNOLOGY


BRANCH :- COMPUTER SCIENCE AND ENGINEERING

SUBMITTED BY:

SHUBHAM DIGRA
PRAKASH SINGH
SUSHANT PANDITA

University Roll No. : 1904751


1904749

AY-2022-2023

Global Group Of Institutes, Amritsar

1
DEPARTMENT OF COMPUTER

APPLICATIONS GLOBAL GROUP OF

INSTITUTES, AMRITSAR

MAY, 2021

CHAPTER 1

1.1 COMPANY PROFILE

ThinkNEXT Technologies Private Limited, Mohali (Chandigarh) is an ISO


9001:2008 certified company which deals in software Development,
Electronics systems development and CAD/CAM consultancy and it is
approved from Ministry of Corporate Affairs and registered under
Companies Act 1956. ThinkNEXT deals in University/College/School ERP
Software, University Conferences and Journals Management
(www.ptuconferences.ac.in, www.ptujournals.ac.in, www.somme.in),
Embedded Products, PLC/SCADA Consultancy, GPS based Vehicle
Tracking, TechSmart Classes, Android/iPhone Apps development, Web
designing, Web development, Discount Deals, Shopping sites, Project Kart,
Bulk SMS, Voice SMS (sms5.thinknext.co.in), Bulk Email, Biometric Time
Attendance, Access Control, SEO/SMO (Digital Marketing), Database
Solutions, Payment Gateway Integration, E-Mail Integration, Industrial
Training, Corporate Training, Placements etc. ThinkNEXT Technologies
provides IT/Electronics solutions using latest technologies e.g. Smart Card
(Contact Type, Contactless), NFC, Biometrics, GPS, Barcode, RFID, SMS,
Auto SMS (Shortcode), Android, iPhone, Cloud Computing, Web, Windows
and Mobile based technologies.
ThinkNEXT is Google Partner for Google Adwords, Bing Ads Accredited,
Hubspot Inbound and Email Marketing, Facebook Blueprint Certified,
Microsoft Bing Ads Accredited Company.
ThinkNEXT has wide expertise in .NET, Crystal Reports, Java, PHP,
Android, iPhone, Databases (Oracle and SQL Server), Web Designing,
Networking, IIS, Apache, WAMP Web Server configurations, various
RAID Levels etc. ThinkNEXT has its sister concerns/associate partners Zeta
Apponomics Private Limited and RBH Solutions (Embedded System
Products and Industrial Automation using PLC SCADA).

2
ThinkNEXT has its own multiple Smart Card printing, smart card encoding
and barcode label printing machines to provide better and effective customer
support solutions.
ThinkNEXT has also setup its own placement consultancy and is
having numerous placement partner companies to provide best possible
placements in IT/Electronics industry. ThinkNEXT has its numerous clients
across the globe. ThinkNEXT has also its offices in USA, Canada, New
Delhi, Shimla and Bathinda.
ThinkNEXT Technologies has developed its own cloud computing based
Cloud Campus 4.0 to facilitate knowledge and placement centric services. It
is a unique concept for effective and collaborative learning. ThinkNEXT
Cloud Campus is a step towards not only 100% placements, but also better
job offers even after placements.

1.2 ThinkNEXT Value Added Activities:

 Listing of ThinkNEXT in Ministry of Corporate Affairs, Government of


India: Corporate Identity No. : U72200PB2011PTC035677
Status: Approved
 Listing of company in Excise and Taxation, Punjab
TIN No. : 03362166544
 Listing of company in Central Board of Excise and Customs, Ministry of
Finance Service Tax No. : AAECT1486GSD003
 Listing of Company for ESIC
ESIC No. : 12000621820000911
 Technological Collaborations :-
o Sys build Technologies Pvt. Ltd., Bangalore
o FoxBase Technologies Private Limited, Bangalore
o Enterprise Software Solutions Lab, Bangalore
o Lipidata Systems Limited, Mumbai
o Interworld Commnet, Mohali
o Intersoft Professional, Chandigarh
o Urgent Engineering, Chandigarh
o Ess Dee Engineers, Mohali
o Primary Estates, Mohali
o Bajwa Developers Limited, Mohali
o Authorized dealer for security systems with ADI, ESSL and Base Systems
Pvt. Ltd.
 Sister Concerns/Associate Partners
o ThinkNEXT Smart School, Maur Mandi (Bathinda)
o Brilliant ITI, Mansa
o Zeta Apponomics Private Limited, Mohali
o RBH Solutions, Patiala, Noida

3
4
1.2.1 ThinkNEXT Industrial Training Programs under Digital India
Scheme (ESDM) and PMKVY 2.0
As ThinkNEXT is also an accredited training partner for Digital India
Government Scheme (ESDM)
and PMKVY 2.0, Therefore under this scheme, ThinkNEXT offers Free 6
Months industrial training in
following programs:
1. Telecom Technician – PC Hardware and Networking
2. Embedded Systems
3. PLC/SCADA (Advanced)
4. Computer Hardware
5. Junior Software Developer
6. Computer Networking and Storage
In this, Dual Certification will be provided to students i.e. ThinkNEXT and
Government of India.
Students will also be provided National Skill Certificates approved from 5
Government bodies.

5
1.2.2 ThinkNEXT Industrial Training Programs:
CSE/IT/MCA: 1. SAP (ABAP) 2. PHP 3. Android 4. Java 5. SAP (ABAP,
MM, PP, SD, HR) 6. .Net 7. Web Designing 8. Professional Hardware,
Networking, CCNA, CCNP (With Routers and Managed Switches) 9.
Software Testing 10. Digital Marketing
Electronics/Electrical:
1. SAP (ABAP, MM, PP)
2. Embedded Systems
3. PLC/SCADA (Industrial Automation)
4. Professional Hardware, Networking, CCNA (With Routers and Managed
Switches) 5. Android
Mechanical:
1. SAP (MM, PP)
2. AutoCAD
3. Solidworks
4. CNC Programming
5. Solidcam/Delcam/Mastercam
6. CATIA
7. CREO
8. ANSYS
9. NX Unigraphics
Mechanical Industry Tie-ups:
1. Ess Dee Engineers
2. Urgent Engineering
3. 3D Technologies Private Limited
Civil/Architecture:
1. SAP (MM)
2. AutoCAD
3. STAADPro
4. 3DS Max
5. Revit
6. Primavera
Civil Companies Tie-ups:
1. Bajwa Developers Limited
2. TDI Group
3. JLPL Group

6
1.3 Why ThinkNEXT?

1. National Icon Award Winner for “Best Web development and Industrial
Training
Company”
2. Google Partner, Microsoft Accredited Professional, Facebook Blueprint
and
Hubspot Certified Company
3. Got Award for “Excellence in Industrial Training” in Corporate Summit
2017
4. An ISO 9001:2008 Certified, Private Limited Company.
5. National Skill Development Corporation Partner Company (NSDC
Partner)
6. Accredited Training Partner of National Institute of Electronics and
Information
Technology, Department of Electronics and Information Technology,
Ministry of
Communications Information Technology
7. Approved from Ministry of Corporate Affairs, Govt. of India. Corporate
Identity No.
U72200PB2011PTCO35677.
8. Affiliated to Indian Testing Board
9. Accredited Training Partner of ISTQB (International Software Testing
Qualifications
Board).
10. Approved from Department of IT (DoIT), Punjab
11. Approved from Board of Apprenticeship Training, Ministry of HRD,
Govt. of India 12. Member of CII (Confederation of Indian Industry)
Membership No. N5238P.
13. Approved from Ministry of Skill Development and Entrepreneurship
14. Accredited Training Partner for PMKVY 2.0, Skill Development in
Electronics
Systems Design and Manufacturing for Digital India,
15. Accredited Training Partner for PSDM (Punjab Skill Development
Mission)

7
1.4 Clients:
Some of our prestigious software clients for various ThinkNEXT
products/services are:
1. Coromandel International Limited, Secundrabad
2. Medzel, USA
3. Nature9 Inc., USA
4. Punjab Technical University, Jalandhar
5. Maharaja Ranjit Singh Punjab Technical University, Bathinda
6. Guru Kashi University, Talwandi Sabo
7. Rayat Group of Institutions, Ropar
8. Aryans Group of Institutions, Rajpura
9. Punjabi University Patiala
10. Bhai Gurdas Group of Institutions
11. Baba Farid Group of Institutions, Bathinda
12. SUS Group of Institutions, Tangori
13. Asra Group of Institutions, Sangrur
14. Yadavindra College of Engineering and Technology, Talwandi Sabo
15. Guru Nanak Dev Dental College Sunam
16. Shiva International School, Bilaspur
17. Akal Degree College, Sangrur
18. St. Xavier School, Mansa
19. DAV School, Mansa
20. Eternal University, Baru Sahib
21. Shiva Group of Institutions, Bilaspur
22. Swami Devi Dyal Group of Institutions, Barwala
23. SRM Global Group of Institutions, Narayangarh
And many others…..

8
CHAPTER 2
PYTHON

2.1 INTRODUCTION TO PYTHON:

Python is an interpreted, high-level, general-purpose programming


language. Created by Guido van Rossum and first released in 1991, Python
has a design philosophy that emphasizes code readability, notably using
significant whitespace. It provides constructs that enable clear programming
on both small and large scales
There are two major python versions namely:
1. Python2
2. Python3

2.2 Features of Python :

1. Easy-to-learn − Python has few keywords, simple structure, and a clearly


defined syntax. This allows the student to pick up the language quickly.
2. Easy-to-read − Python code is more clearly defined and visible to the eyes.
3. Easy-to-maintain − Python's source code is fairly easy-to-maintain.
4. A broad standard library − Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
5. Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
6. Portable − Python can run on a wide variety of hardware platforms and has
the same interface on all platforms.
7. Extendable − You can add low-level modules to the Python interpreter.
These modules enable programmers to add to or customize their tools to be
more efficient.
8. Databases − Python provides interfaces to all major commercial databases.
9. GUI Programming − Python supports GUI applications that can be created
and ported to many system calls, libraries and windows systems, such as
Windows MFC, Macintosh, and the X Window system of Unix.
10. Scalable − Python provides a better structure and support for large programs
than shell scripting.

2.3 Python is used for:

1. Web development
2. Software development
3. Mathematics
4. System scripting

9
2.4 MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE
USING PYTHON:WHY PYTHON?

There are many good reasons to choose Python as your primary


programming language. First of all Python is an easy to learn, powerful
programming language. Furthermore it has efficient high-level data
structures, which allow you to write complex operations in fewer statements
than in C, C++ or Java.
Object-oriented programming is a lot easier than in languages like Java.
Python has become one of the most popular programming languages among
developers and programmers. They praise it for its clean syntax and code
readability. Python is a general-purpose high-level programming language.
Python is both object oriented and imperative and it can be even used in a
functional style as well. Python programs are portable, i.e. they can be
ported to other operating systems like Windows, Linux, Unix and Mac OS
X, and they can be run on Java and .NET virtual machines.
David Beazley says in his foreword to the book "How to Think like a
Computer Scientist Learning with Python" by Jeffrey Elkner, Allen B.
Downey, and Chris Meyers: Despite Python's appeal to many different
communities, you may still wonder "why Python?" or "why teach
programming with Python?" Answering these questions is no simple task-
especially when popular opinion is on the side of more masochistic
alternatives such as C++ and Java. However, I think the most direct answer
is that programming in Python is simply a lot of fun and more productive.
Guido van Rossum, the author of Python, began work on Python at the
National Research Institute for Mathematics and Computer Science in the
Netherlands (Centrum voor Wiskunde en Informatica, CWI).
When asked, what features of Python he is most pleased with, Guido van
Rossum said in an interview with Linux Journal: "The feel of the whole
system suits my style of programming well, for obvious reasons. The ability
to run the interpreter interactively and the ability to write code from the
bottom up and test it piecemeal combine to let me write code quickly. Other
people find that it makes them more productive, too." (LJ, no 55)
Python is very fast. The source code is compiled into bytecode, so that
executing the same file will be faster, if the script will be executed again.
The bytecode is an "intermediate language", which is said to run on a virtual
machine that executes the machine code corresponding to each bytecode.
Comparing Python with Java, Perl and other Programming Languages
Prof. Lutz Prechelt from the University of Karlsruhe compared Python with
other programming languages. He summarises his results: "80
implementations of the same set of requirements are compared for several
properties, such as run time, memory consumption, source text length,
comment density, program structure, reliability, and the amount of effort
required for writing them. The results indicate that, for the given
programming problem, which regards string manipulation and search in a

10
dictionary, 'scripting languages' (Perl, Python, Rexx, Tcl) are more
productive than 'conventional languages' (C, C++, Java). In terms of run
time and memory consumption, they often turn out better than Java and not
much worse than C or C++. In general, the differences between languages
tend to be smaller than the typical differences due to different programmers
within the same language. (see Lutz Prechelt, An empirical comparison of
C, C++, Java, Perl, Python, Rexx, and Tcl, IEEE Computer, Vol. 30, (10), p.
23-29, Oct 2000.)
Other Advantages of Python:
It's surprisingly easy to embed Python, or better the Python interpreter into
C programs. By doing this you can add features from Python that could take
months to code in C. Vice versa, it's possible to extend the Python
interpreter by adding a module written in C. One reason to do this is if a C
library exists that does something which Python doesn't. Another good
reason is if you need something to run faster than you can manage in
Python.
The Python Standard Library contains an enormous number of useful
modules and is part of every standard Python installation. After having
learned the essentials of Python, it is necessary to become familiar with the
Python Standard Library because many problems can be solved quickly and
easily if you are acquainted with the possibilities that these libraries offer.

2.5DATA TYPES AND VARIABLES :

2.5.1 VARIABLES
As the name implies, a variable is something which can change. A variable
is a way of referring to a memory location used by a computer program. A
variable is a symbolic name for this physical location. This memory location
contains values, like numbers, text or more complicated types.
A variable can be seen as a container (or some say a pigeonhole) to store
certain values. While the program is running, variables are accessed and
sometimes changed, i.e. a new value will be assigned to the variable.
One of the main differences between Python and strongly-typed languages
like C, C++ or Java is the way it deals with types. In strongly-typed
languages every variable must have a unique data type. E.g. if a variable is
of type integer, solely integers can be saved in the variable. In Java or C,
every variable has to be declared before it can be used. Declaring a variable
means binding it to a data type.
Declaration of variables is not required in Python. If there is need of a
variable, you think of a name and start using it as a variable.
Another remarkable aspect of Python: Not only the value of a variable may
change during program execution but the type as well. You can assign an
integer value to a variable, use it as an integer for a while and then assign a
string to the variable.
In the following line of code, we assign the value 42 to a variable:

11
i = 42
The equal "=" sign in the assignment shouldn't be seen as "is equal to". It
should be "read" or interpreted as "is set to", meaning in our example "the
variable i is set to 42". Now we will increase the value of this variable by 1:
>>> i = i + 1
>>> print i
43
>>>

2.5.2 Variables vs. Identifiers


Variables and identifiers are very often mistaken as synonyms. In simple
terms: The name of a variable is an identifier, but a variable is "more than a
name". A variable has a name, in most cases a type, a scope, and above all a
value. Besides this, an identifier is not only used for variables. An identifier
can denote various entities like variables, types, labels, subroutines or
functions, packages and so on.
Naming Identifiers of Variables
Every language has rules for naming identifiers. The rules in Python are the
following:
A valid identifier is a non-empty sequence of characters of any length with:
1. The start character can be the underscore "_" or a capital or lower case letter.
2. The letters following the start character can be anything which is permitted
as a start character plus the digits.
3. Just a warning for Windows-spoilt users: Identifiers are case-sensitive!
4. Python keywords are not allowed as identifier names!
Python Keywords
No identifier can have the same name as one of the Python keywords:
and, as, assert, break, class, continue, def, del, elif, else, except, exec,
finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise,
return, try, while, with, yield

2.6 Changing Data Types and Storage Locations


As we have said above, the type of a variable can change during the
execution of the script. We illustrate this in our following example:
i = 42 # data type is implicitly set to integer
i = 42 + 0.11 # data type is changed to float
i = "forty" # and now it will be a string
Python automatically takes care of the physical representation for the
different data types, i.e. an integer values will be stored in a different
memory location than a float or a string.
Numbers:Python's built-in core data types are in some cases also called
object types. There are four built-in data types for numbers:
2.6.1 Integer
 Normal integers
e.g. 4321
12
 Octal literals (base 8)
A number prefixed by a 0 (zero) will be interpreted as an octal number
example:
>>> a = 010
>>> print a
8 Alternatively, an octal number can be defined with "0o" as a prefix: >>> a
0o10
>>> print a
8
 Hexadecimal literals (base 16)
Hexadecimal literals have to be prefixed either by "0x" or "0X".
example:
>>> hex_number = 0xA0F
>>> print hex_number
2575
2.6.2 Long integers
these numbers are of unlimited size
e.g.42000000000000000000L
2.6.3 Floating-point numbers,
for example: 42.11, 3.1415e-10
2.6.4 Complex numbers
Complex numbers are written as <real part> + <imaginary part>j
examples:
>>> x = 3 + 4j
>>> y = 2 - 3j
>>> z = x + y
>>> print z
(5+1j)
Strings
Another important data type besides numbers are strings.
Strings are marked by quotes:
 Wrapped with the single-quote ( ' ) character:
'This is a string with single quotes'
 Wrapped with the double-quote ( " ) character:
"Obama's dog is called Bo"
 Wrapped with three characters, using either single-quote or double-quote:
'''A String in triple quotes can extend
over multiple lines like this one, and can contain
'single' and "double" quotes.'''
A string in Python consists of a series or sequence of characters - letters,
numbers, and special characters. Strings can be indexed - often
synonymously called subscripted as well
Some operators and functions for strings:
Concatenation
Strings can be glued together (concatenated) with the + operator:
"Hello" + "World" will result in "HelloWorld"

13
Repetition
String can be repeated or repeatedly concatenated with the asterisk operator
"*":
"*-*" * 3 -> "*-**-**-*"
Indexing
"Python"[0] will result in "P"
Slicing
Substrings can be created with the slice or slicing notation, i.e. two indices
in square brackets separated by a colon:
"Python"[2:4] will result in "th"

String Slicing
Size
len("Python") will result in 6

2.7 CONDITIONAL STATEMENTS

Under certain conditions some decisions are sometimes in normal life


inevitable, as we can can see in our photo. It's the same for every program,
which has to solve some useful problem. There is hardly a way to program
without having branches in the flow of code.
In programming and scripting languages, conditional statements or
conditional constructs are used to perform different computations or actions
depending on whether a condition evaluates to true or false. (Please note that
true and false are always written as True and False in Python.)
The condition usually uses comparisons and arithmetic expressions with
variables. These expressions are evaluated to the Boolean values True or
False. The statements for the decision taking are called conditional
statements, alternatively they are also known as conditional expressions or
conditional constructs.
I. The if-then construct (sometimes called if-then-else) is common across
many programming languages, but the syntax varies from language to
language.
II. The if Statement
The general form of the if statement in Python looks like this:
if condition_1:
statement_block_1
elif condition_2:

14
statement_block_2
else:
statement_block_3
If the condition "condition_1" is True, the statements in the block
statement_block_1 will be executed. If not, condition_2 will be executed. If
condition_2 evaluates to True, statement_block_2 will be executed, if
condition_2 is False, the statements in statement_block_3 will be executed.
III. True or False
Unfortunately it is not as easy in real life as it is in Python to differentiate
between true and false:
The following objects are evaluated by Python as False:
numerical zero values (0, 0L, 0.0, 0.0+0.0j),
the Boolean value False,
empty strings,
empty lists and empty tuples,
empty dictionaries.
plus the special value None.
All other values are considered to be True.
IV Abbreviated IF statement
C programmers usually know the following abbreviated notation for the if
construct:
max = (a > b) ? a : b;
This is an abbreviation for the following C code:
if (a > b)
max=a;
else
max=b;
C programmers have to get used to a different notation in Python:
max = a if (a > b) else b;
V Print statement
There are hardly any computer programs and of course hardly any Python
programs, which don't communicate with the outside world. Above all a
program has to deliver its result in some way. One form of output goes to
the standard output by using the print statement in Python.
>>> print "Hello User"
Hello User
>>> answer = 42
>>> print "The answer is: " + str(answer)
The answer is: 42
>>>
It's possible to put the arguments inside of parentheses:
>>> print("Hallo")
Hallo
>>> print("Hallo","Python")

15
('Hallo', 'Python')
>>> print "Hallo","Python"
Hallo Python
>>>

2.8 SEQUENTIAL DATA TYPE

A String can be seen as a sequence of characters, which can be expressed in


several ways:
1. single quotes (')
'This a a string in single quotes'
2. double quotes (") "Miller's dog bites"
3. triple quotes(''') or (""") '''She said: "I don't mind, if Miller's dog bites"'''
Indexing strings
Let's look at the string "Hello World": Diagram string Hello World You can
see that the characters of a string are enumerated from left to right starting
with 0. If you start from the right side, the enumeration is started with -1.
Every character of a string can be accessed by putting the index after the
string name in square brackets, as can be seen in the following example:

>>> txt = "Hello World"


>>> txt[0]
'H'
>>> txt[4]
'o'
Negative indices can be used as well. In this case we start counting from
right, starting with -1:
>>> txt[-1]
'd'
>>> txt[-5]
'W'

2.9 Python Lists

The list is a most versatile data type in Python. It can be written as a list of
comma-separated items (values) between square brackets. Lists are related
to arrays of programming languages like C, C++ or Java, but Python lists are
by far more flexible than "classical" arrays. For example, items in a list need
not all have the same type. Furthermore lists can grow in a program run,
while in C the size of an array has to be fixed at compile time.

16
An example of a list:
languages = ["Python", "C", "C++", "Java", "Perl"]
There are different ways of accessing the elements of a list. Most probably
the easiest way for C programmers will be through indices, i.e. the numbers
of the lists are enumerated starting with 0:
>>> languages = ["Python", "C", "C++", "Java", "Perl"]
>>> languages[0]
'Python'
>>> languages[1]
'C'
>>> languages[2]
'C++'
>>> languages[3]
'Java'

2.10 Sublists

Lists can have sublists as elements. These sublists may contain sublists as
well, i.e. lists can be recursively constructed by sublist structures.
>>> person = [["Marc","Mayer"],["17, Oxford Str",
"12345","London"],"07876-7876"]
>>> name = person[0]
>>> print name
['Marc', 'Mayer']
>>> first_name = person[0][0]
>>> print first_name
Marc
>>> last_name = person[0][1]
>>> print last_name
Mayer
>>> address = person[1]
>>> street = person[1][0]
>>> print street
17, Oxford Str

2.11 Tuples

A tuple is an immutable list, i.e. a tuple cannot be changed in any way once
it has been created. A tuple is defined analogously to lists, except that the set
of elements is enclosed in parentheses instead of square brackets. The rules
for indices are the same as for lists. Once a tuple has been created, you can't
add elements to a tuple or remove elements from a tuple.

Where is the benefit of tuples?

17
Tuples are faster than lists.
If you know that some data doesn't have to be changed, you should use
tuples instead of lists, because this protects your data against accidental
changes.
Tuples can be used as keys in dictionaries, while lists can't.
The following example shows how to define a tuple and how to access a
tuple. Furthermore we can see that we raise an error, if we try to assign a
new value to an element of a tuple:
>>> t = ("tuples", "are", "immutable")
>>> t[0]
'tuples'
>>> t[0]="assignments to elements are not possible"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
Generalization
Python endless Lists and strings have many common properties, e.g. the
elements of a list or the characters of a string appear in a defined order and
can be accessed through indices. There are other data types with similar
properties like tuple, buffer and xrange. In Python these data types are called
"sequence data types" or "sequential data types".

Operators and methods are the same for "sequence data types", as we will
see in the following text.

2.12 SLICING

In many programming languages it can be quite tough to slice a part of a


string and even tougher, if you like to address a "subarray". Python makes it
very easy with its slice operator. Slicing is often better known as substring
or substr.

When you want to extract part of a string, or some part of a list, you use in
Python the slice operator. The syntax is simple. Actually it looks a little bit
like accessing a single element with an index, but instead of just one number
we have more, separated with a colon ":". We have a start and an end index,
one or both of them may be missing. It's best to study the mode of operation
of slice by having a look at examples:
>>> str = "Python is great"
>>> first_six = str[0:6]
>>> first_six
'Python'
>>> starting_at_five = str[5:]
>>> starting_at_five

18
'n is great'
>>> a_copy = str[:]
>>> without_last_five = str[0:-5]
>>> without_last_five
'Python is '
>>>
Length:
Length of a sequence The length of a sequence, i.e. a list, a string or a tuple,
can be determined with the function len(). For strings it counts the number
of characters and for lists or tuples the number of elements are counted,
whereas a sublist counts as 1 element.
>>> txt = "Hello World"
>>> len(txt)
11
>>> a = ["Swen", 45, 3.54, "Basel"]
>>> len(a)
3

2.13 SET OPERATIONS:

1. add(element)
A method which adds an element, which has to be immutable, to a set.
>>> colours = {"red","green"}
>>> colours.add("yellow")
>>> colours
set(['green', 'yellow', 'red'])
>>> colours.add(["black","white"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>>
Of course, an element will only be added, if it is not already contained in the
set. If it is already contained, the method call has no effect.
2. clear()
All elements will removed from a set.
>>> cities = {"Stuttgart", "Konstanz", "Freiburg"}
>>> cities.clear()
>>> cities
set([])
>>>
3. copy
Creates a shallow copy, which is returned.
>>> more_cities = {"Winterthur","Schaffhausen","St. Gallen"}
>>> cities_backup = more_cities.copy()
19
>>> more_cities.clear()
>>> cities_backup
set(['St. Gallen', 'Winterthur', 'Schaffhausen'])
>>>
4. difference()
This method returns the difference of two or more sets as a new set.
>>> x = {"a","b","c","d","e"}
>>> y = {"b","c"}
>>> z = {"c","d"}
>>> x.difference(y)
set(['a', 'e', 'd'])
>>> x.difference(y).difference(z)
set(['a', 'e'])
>>>
5. difference_update()
The method difference_update removes all elements of another set from this
set. x.difference_update(y) is the same as "x = x - y"
>>> x = {"a","b","c","d","e"}
>>> y = {"b","c"}
>>> x.difference_update(y)
>>>
>>> x = {"a","b","c","d","e"}
>>> y = {"b","c"}
>>> x = x - y
>>> x
set(['a', 'e', 'd'])
>>>
6. discard(el)
An element el will be removed from the set, if it is contained in the set. If el
is not a member of the set, nothing will be done.
>>> x = {"a","b","c","d","e"}
>>> x.discard("a")
>>> x
set(['c', 'b', 'e', 'd'])
>>> x.discard("z")
>>> x
set(['c', 'b', 'e', 'd'])
>>>
7. remove(el)
works like discard(), but if el is not a member of the set, a KeyError will be
raised.
>>> x = {"a","b","c","d","e"}
>>> x.remove("a")
>>> x

20
set(['c', 'b', 'e', 'd'])
8. intersection(s)
Returns the intersection of the instance set and the set s as a new set. In other
words: A set with all the elements which are contained in both sets is
returned.
>>> x = {"a","b","c","d","e"}
>>> y = {"c","d","e","f","g"}
>>> x.intersection(y)
set(['c', 'e', 'd'])
>>>
This can be abbreviated with the ampersand operator "&":
>>> x = {"a","b","c","d","e"}
>>> y = {"c","d","e","f","g"}
>>> x.intersection(y)
set(['c', 'e', 'd'])
>>>
>>> x = {"a","b","c","d","e"}
>>> y = {"c","d","e","f","g"}
>>> x & y
set(['c', 'e', 'd'])
>>>

2.14 LAMBDA,MAP,REDUCE,FILTER:

1. Lambda Operator:
The lambda operator or lambda function is a way to create small anonymous
functions, i.e. functions without a name. These functions are throw-away
functions, i.e. they are just needed where they have been created. Lambda
functions are mainly used in combination with the functions filter(), map()
and reduce(). The lambda feature was added to Python due to the demand
from Lisp programmers.
The general syntax of a lambda function is quite simple:
lambda argument_list: expression
The argument list consists of a comma separated list of arguments and the
expression is an arithmetic expression using these arguments. You can
assign the function to a variable to give it a name.
The following example of a lambda function returns the sum of its two
arguments:
>>> f = lambda x, y : x + y
>>> f(1,1)
2

2. The map() Function:

21
The advantage of the lambda operator can be seen when it is used in
combination with the map() function.
map() can be applied to more than one list. The lists have to have the same
length. map() will apply its lambda function to the elements of the argument
lists, i.e. it first applies to the elements with the 0th index, then to the
elements with the 1st index until the n-th index is reached:
>>> a = [1,2,3,4]
>>> b = [17,12,11,10]
>>> c = [-1,-4,5,9]
>>> map(lambda x,y:x+y, a,b)
[18, 14, 14, 14]
>>> map(lambda x,y,z:x+y+z, a,b,c)
[17, 10, 19, 23]
>>> map(lambda x,y,z:x+y-z, a,b,c)
[19, 18, 9, 5]
We can see in the example above that the parameter x gets its values from
the list a, while y gets its values from b and z from list c.
3. Filtering:
The function filter(function, list) offers an elegant way to filter out all the
elements of a list, for which the function function returns True.
The function filter(f,l) needs a function f as its first argument. f returns a
Boolean value, i.e. either True or False. This function will be applied to
every element of the list l. Only if f returns True will the element of the list
be included in the result list.
>>> fib = [0,1,1,2,3,5,8,13,21,34,55]
>>> result = filter(lambda x: x % 2, fib)
>>> print result
[1, 1, 3, 5, 13, 21, 55]
>>> result = filter(lambda x: x % 2 == 0, fib)
>>> print result
[0, 2, 8, 34]
>>>
4. Reducing a List:
The function reduce(func, seq) continually applies the function func() to the
sequence seq. It returns a single value.
Examples of reduce()
Determining the maximum of a list of numerical values by using reduce:
>>> f = lambda a,b: a if (a > b) else b
>>> reduce(f, [47,11,42,102,13])
102
>>>
Calculating the sum of the numbers from 1 to 100:
>>> reduce(lambda x, y: x+y, range(1,101))
5050

22
CHAPTER 3
A. ADVANCED TOPICS

3A.1 Information on the Python Interpreter

sys module and system programming Like all the other modules, the sys
module has to be imported with the import statement, i.e.
import sys
The sys module provides information about constants, functions and
methods of the Python interpreter. dir(system) gives a summary of the
available constants, functions and methods. Another possibility is the help()
function. Using help(sys) provides valuable detail information.

The module sys informs e.g. about the maximal recursion depth
(sys.getrecursionlimit() ) and provides the possibility to change
(sys.setrecursionlimit())
The current version number of Python can be accessed as well:
>>> import sys
>>> sys.version
'2.6.5 (r265:79063, Apr 16 2010, 13:57:41) \n[GCC 4.4.3]'
>>> sys.version_info
(2, 6, 5, 'final', 0)
>>>

3A.2 Graphs in Python

Introduction into Graph Theory Using Python


Simple Graph with an isolated node Before we start our treatize on possible
Python representations of graphs, we want to present some general
definitions of graphs and its components.

23
A "graph"1 in mathematics and computer science consists of "nodes", also
known as "vertices". Nodes may or may not be connected with one another.
In our illustration, - which is a pictorial representation of a graph, - the node
"a" is connected with the node "c", but "a" is not connected with "b". The
connecting line between two nodes is called an edge. If the edges between
the nodes are undirected, the graph is called an undirected graph. If an edge
is directed from one vertex (node) to another, a graph is called a directed
graph. An directed edge is called an arc.
Though graphs may look very theoretical, many practical problems can be
represented by graphs. They are often used to model problems or situations
in physics, biology, psychology and above all in computer science. In
computer science, graphs are used to represent networks of communication,
data organization, computational devices, the flow of computation,
In the latter case, the are used to represent the data organisation, like the file
system of an operating system, or communication networks. The link
structure of websites can be seen as a graph as well, i.e. a directed graph,
because a link is a directed edge or an arc.
Python has no built-in data type or class for graphs, but it is easy to
implement them in Python. One data type is ideal for representing graphs in
Python, i.e. dictionaries. The graph in our illustration can be implemented in
the following way:
graph = { "a" : ["c"],
"b" : ["c", "e"],
"c" : ["a", "b", "d", "e"],
"d" : ["c"],
"e" : ["c", "b"],
"f" : []
}
The keys of the dictionary above are the nodes of our graph. The
corresponding values are lists with the nodes, which are connecting by an
edge. There is no simpler and more elegant way to represent a graph.
An edge can be seen as a 2-tuple with nodes as elements, i.e. ("a","b")
Function to generate the list of all edges:
def generate_edges(graph):
edges = []
for node in graph:
for neighbour in graph[node]:
edges.append((node, neighbour))
return edges
print(generate_edges(graph))
This code generates the following output, if combined with the previously
defined graph dictionary:
$ python3 graph_simple.py
[('a', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'd'), ('c', 'e'), ('b', 'c'), ('b', 'e'), ('e', 'c'), ('e', 'b'),
('d', 'c')]

24
Paths in Graphs
We want to find now the shortest path from one node to another node.
Before we come to the Python code for this problem, we will have to present
some formal definitions.
Adjacent vertices:
Two vertices are adjacent when they are both incident to a common edge.
Path in an undirected Graph:
A path in an undirected graph is a sequence of vertices P = ( v1, v2, ..., vn )
∈ V x V x ... x V such that vi is adjacent to v{i+1} for 1 ≤ i < n. Such a path
P is called a path of length n from v1 to vn.
Simple Path:
A path with no repeated vertices is called a simple path.
Example:
(a, c, e) is a simple path in our graph, as well as (a,c,e,b). (a,c,e,b,c,d) is a
path but not a simple path, because the node c appears twice.

3A.3 Tree / Forest

A tree is an undirected graph which contains no cycles. This means that any
two vertices of the graph are connected by exactly one simple path.
A forest is a disjoint union of trees. Contrary to forests in nature, a forest in
graph theory can consist of a single tree!
A graph with one vertex and no edge is a tree (and a forest).
An example of a tree:

Example of a Graph which is a tree


While the previous example depicts a graph which is a tree and forest, the
following picture shows a graph which consists of two trees, i.e. the graph is
a forest but not a tree:

25
Example of a Graph which is a forest but not a tree
Overview of forests:
With one vertex:

Forests with one vertex


Forest graphs with two vertices:

Forests with two vertices


Forest graphs with three vertices:

Forests with three vertices

26
B. NUMERICAL PROGRAMMING WITH PYTHON

3B.1 Numerical Programming Definition

The term "Numerical Computing" - a.k.a. numerical computing or scientific


computing - can be misleading. One can think about it as "having to do with
numbers" as opposed to algorithms dealing with texts for example. If you
think of Google and the way it provides links to websites for your search
inquiries, you may think about the underlying algorithm as a text based one.
Yet, the core of the Google search engine is numerical. To perform the
PageRank algorithm Google executes the world's largest matrix
computation.

Numerical Computing defines an area of computer science and mathematics


dealing with algorithms for numerical approximations of problems from
mathematical or numerical analysis, in other words: Algorithms solving
problems involving continuous variables. Numerical analysis is used to
solve science and engineering problems.

3B.2 Data Science and Data Analysis

Data science is an interdisciplinary subject which includes for example


statistics and computer science, especially programming and problem
solving skills. Data Science includes everything which is necessary to create
and prepare data, to manipulate, filter and clense data and to analyse data.
Data can be both structured and unstructured. We could also say Data
Science includes all the techniques needed to extract and gain information
and insight from data.
Data Science is an umpbrella term which incorporates data analysis,
statistics, machine learning and other related scientific fields in order to
understand and analyze data.
Another term occuring quite often in this context is "Big Data". Big Data is
for sure one of the most often used buzzwords in the software-related
marketing world. Marketing managers have found out that using this term
can boost the sales of their products, regardless of the fact if they are really
dealing with big data or not. The term is often used in fuzzy ways.
Big data is data which is too large and complex, so that it is hard for data-
processing application software to deal with them. The problems include
capturing and collecting data, data storage, search the data, visualization of
the data, querying, and so on.
The following concepts are associated with big data:
1. volume:
the sheer amount of data, whether it will be giga-, tera-, peta- or exabytes
2. velocity:
the speed of arrival and processing of data

27
3. veracity:
uncertainty or imprecision of data
4. variety:
the many sources and types of data both structured and unstructured

The big question is how useful Python is for these purposes. If we would
only use Python without any special modules, this language could only
poorly perform on the previously mentioned tasks. We will describe the
necessary tools in the following chapter.

3B.3 Numpy & SPICY

3.B.3.1 Introduction: NumPy is a module for Python. The name is an


acronym for "Numeric Python" or "Numerical Python". It is pronounced
/ˈnʌmpaɪ/ (NUM-py) or less often /ˈnʌmpi (NUM-pee)). It is an extension
module for Python, mostly written in C. This makes sure that the
precompiled mathematical and numerical functions and functionalities of
Numpy guarantee great execution speed.
Furthermore, NumPy enriches the programming language Python with
powerful data structures, implementing multi-dimensional arrays and
matrices. These data structures guarantee efficient calculations with matrices
and arrays. The implementation is even aiming at huge matrices and arrays,
better know under the heading of "big data". Besides that the module
supplies a large library of high-level mathematical functions to operate on
these matrices and arrays.
3B.3.2 SciPy (Scientific Python) is often mentioned in the same breath with
NumPy. SciPy needs Numpy, as it is based on the data structures of Numpy
and furthermore its basic creation and manipulation functions. It extends the
capabilities of NumPy with further useful functions for minimization,
regression, Fourier-transformation and many others.
Both NumPy and SciPy are not part of a basic Python installation. They
have to be installed after the Python installation. NumPy has to be installed
before installing SciPy.
NumPy is based on two earlier Python modules dealing with arrays. One of
these is Numeric. Numeric is like NumPy a Python module for high-
performance, numeric computing, but it is obsolete nowadays. Another
predecessor of NumPy is Numarray, which is a complete rewrite of Numeric
but is deprecated as well. NumPy is a merger of those two, i.e. it is build on
the code of Numeric and the features of Numarray.
3B.3.3 Comparison between Core Python and Numpy
When we say "Core Python", we mean Python without any special modules,
i.e. especially without NumPy.
The advantages of Core Python:high-level number objects: integers, floating
point.
containers: lists with cheap insertion and append methods, dictionaries with
fast lookup.

28
3B.3.4 Advantages of using Numpy with Python:
1. array oriented computing
2. efficiently implemented multi-dimensional arrays
3. designed for scientific computation
A Simple Numpy Example:
Before we can use NumPy we will have to import it. It has to be imported
like any other module:
import numpy
But you will hardly ever see this. Numpy is usually renamed to np:
import numpy as np
Our first simple Numpy example deals with temperatures. Given is a list
with values, e.g. temperatures in Celsius:
cvalues = [20.1, 20.8, 21.9, 22.5, 22.7, 22.3, 21.8, 21.2, 20.9, 20.1]
We will turn our list "cvalues" into a one-dimensional numpy array:
C = np.array(cvalues)
print(C)
[ 20.1 20.8 21.9 22.5 22.7 22.3 21.8 21.2 20.9 20.1]

3B.4 Matrix Arithmetics under NumPy and Python

Python with the module NumPy all the basic Matrix Arithmetics like
1. Matrix addition
2. Matrix subtraction
3. Matrix multiplication
4. Scalar product
5. Cross product
6. and lots of other operations on matrices
The arithemtic standard Operators
 +
 -
 *
 /
 **
 %
are applied on the elements, this means that the arrays have to have the same
size.
>>> x = np.array([1,5,2])
>>> y = np.array([7,4,1])
>>> x + y
array([8, 9, 3])
>>> x * y
array([ 7, 20, 2])
>>> x - y
array([-6, 1, 1])
>>> x / y
array([0, 1, 2])

29
>>> x % y
array([1, 1, 0])

3B.5 Vector Addition and Subtraction

Graphical Example of Vector Addition Many people know vector addition


and subtraction from physics, to be exact from the parallelogram of forces. It
is a method for solving (or visualizing) the results of applying two forces to
an object.
The addition of two vectors, in our example (see picture) x and y, may be
represented graphically by placing the start of the arrow y at the tip of the
arrow x, and then drawing an arrow from the start (tail) of x to the tip (head)
of y. The new arrow drawn represents the vector x + y
>>> x = np.array([3,2])
>>> y = np.array([5,1])
>>> z = x + y
>>> z
array([8, 3])
>>>

Graphical Example of Vector Subtraction Subtracting a vector is the same as


adding its negative. So, the difference of the vectors x and y is equal to the
sum of x and -y:
x - y = x + (-y)
Subtraction of two vectors can be geometrically defined as follows: to
subtract y from x, we place the end points of x and y at the same point, and
then draw an arrow from the tip of y to the tip of x. That arrow represents
the vector x - y, see picture on the right side.
Mathematically, we subtract the corresponding components of vector y from
the vector x.

30
3B.6 Matrix Class

The matrix objects are a subclass of the numpy arrays (ndarray). The matrix
objects inherit all the attributes and methods of ndarry. Another difference is
that numpy matrices are strictly 2-dimensional, while numpy arrays can be
of any dimension, i.e. they are n-dimensional.
The most important advantage of matrices is that the provide convenient
notations for the matrix mulitplication. If X and Y are two Matrices than X *
Y defines the matrix multiplication. While on the other hand, if X and Y are
ndarrays, X * Y define an element by element multiplication.
>>> x = np.array( ((2,3), (3, 5)) )
>>> y = np.array( ((1,2), (5, -1)) )
>>> x * y
array([[ 2, 6],
[15, -5]])
>>> x = np.matrix( ((2,3), (3, 5)) )
>>> y = np.matrix( ((1,2), (5, -1)) )
>>> x * y
matrix([[17, 1],
[28, 1]])
Matrix Product
The matrix product of two matrices can be calculated if the number of
columns of the left matrix is equal to the number of rows of the second or
right matrix.
The product of a (l x m)-matrix A = (aij)i=1...l, j= 1..m and an (m x n)-
matrix B = (bij)i=1...m, j= 1..n is a matrix C = (cij)i=1...l, j= 1..n, which is
calculated like this:

Matrix Product

The following picture illustrates it further:

31
If we want to perform matrix multiplication with two numpy arrays
(ndarray), we have to use the dot product:
>>> x = np.array( ((2,3), (3, 5)) )
>>> y = np.matrix( ((1,2), (5, -1)) )
>>> np.dot(x,y)
matrix([[17, 1],
[28, 1]])
Alternatively, we can cast them into matrix objects and use the "*" operator:
>>> np.mat(x) * np.mat(y)
matrix([[17, 1],
[28, 1]])

3B.7 Matplotlib

Introduction: Matplotlib is a plotting library like GNUplot. The main


advantage towards GNUplot is the fact that Matplotlib is a Python module.
Due to the growing interest in python the popularity of matplotlib is
continually rising as well.
Another reason for the attractiveness of Matplotlib lies in the fact that it is
widely considered to be a perfect alternative to MATLAB, if it is used in
combination with Numpy and Scipy. Whereas MATLAB is expensive and
closed source, Matplotlib is free and open source code. It is also object-
oriented and can be used in an object oriented way. Furthermore it can be
used with general-purpose GUI toolkits like wxPython, Qt, and GTK+.
There is also a procedural "pylab", which designed to closely resemble that
of MATLAB. This can make it extremely easy for MATLAB users to
migrate to matplotlib.
Matplotlib can be used to create publication quality figures in a variety of
hardcopy formats and interactive environments across platforms.Another
characteristic of matplotlib is its steep learning curve, which means that
users usually make rapid progress after having started. The officicial website
has to say the following about this: "matplotlib tries to make easy things
easy and hard things possible. You can generate plots, histograms, power
spectra, bar charts, errorcharts, scatterplots, etc, with just a few lines of
code."
A First Example:
We will start with a simple graph , which is as simple as simple can be. A
graph in matplotlib is a two- or three-dimensional drawing showing a
relationship by means of points, a curve, or amongst others a series of bars.
We have two axis: The horizontal X-axis is representing the independent
values and the vertical Y-axis corresponds to the depended values.
We will use the pyplot submodule of matplotlib. pyplot provides a
procedural interface to the object-oriented plotting library of matplotlib.
Its plotting commands are chosen in a way that they are similar to Matlab
both in naming and with the arguments.

32
Is is common practice to rename matplotlib.pyplot to plt. We will use the
plot function of pyplot in our first example. We will pass a list of values to
the plot function. Plot takes these as Y values. The indices of the list are
automatically taken as the X values. The command %matplotlib inline
makes only sense, if you work with Ipython Notebook. It makes sure, that
the graphs will be depicted inside of the document and not as independent
windows:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([-1, -4.5, 16, 23])
plt.show()

What we see is a continuous graph, even though we provided discrete data


for the Y values. By adding a format string to the function call of plot, we
can create a graph with discrete values, in our case blue circle markers. The
format string defines the way how the discrete points have to be rendered.
import matplotlib.pyplot as plt
plt.plot([-1, -4.5, 16, 23], "ob")
plt.show()

33
3B.8 Introduction into Pandas

The pandas we are writing about in this chapter have nothing to do with the
cute panda bears, and they are neither what our visitors are expecting in a
Python tutorial. Pandas is a Python module, which is rounding up the
capabilities of Numpy, Scipy and Matplotlab. The word pandas is an
acronym which is derived from "Python and data analysis" and "panel data".
There is often some confusion about whether Pandas is an alternative to
Numpy, SciPy and Matplotlib. The truth is that it is built on top of Numpy.
This means that Numpy is required by pandas. Scipy and Matplotlib on the
other hand are not required by pandas but they are extremely useful. That's
why the Pandas project lists them as "optional dependency".
Pandas is a software library written for the Python programming language. It
is used for data manipulation and analysis. It provides special data structures
and operations for the manipulation of numerical tables and time series.
Pandas is free software released under the three-clause BSD license.
3B.8.1 Data Structures:
We will start with the following two important data structures of Pandas:
1. Series and
2. DataFrame
3B.8.1.1 Series
A Series is a one-dimensional labelled array-like object. It is capable of
holding any data type, e.g. integers, floats, strings, Python objects, and so
on. It can be seen as a data structure with two arrays: one functioning as the
index, i.e. the labels, and the other one contains the actual data.
We define a simple Series object in the following example by instantiating a
Pandas Series object with a list. We will later see that we can use other data
objects for example Numpy arrays and dictionaries as well to instantiate a
Series object.
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
S
The above Python code returned the following result:
0 11
1 28
2 72
3 3
4 5
5 8
dtype: int64
We haven't defined an index in our example, but we see two columns in our
output: The right column contains our data, whereas the left column contains
the index. Pandas created a default index starting with 0 going to 5, which is
the length of the data minus 1.
We can directly access the index and the values of our Series S:
print(S.index)

34
print(S.values)
RangeIndex(start=0, stop=6, step=1)
[11 28 72 3 5 8]
3B.8.1.2 DataFrame
Playing Pandas
The underlying idea of a DataFrame is based on spreadsheets. We can see
the data structure of a DataFrame as tabular and spreadsheet-like. A
DataFrame logically corresponds to a "sheet" of an Excel document. A
DataFrame has both a row and a column index.
Like a spreadsheet or Excel sheet, a DataFrame object contains an ordered
collection of columns. Each column consists of a unique data typye, but
different columns can have different types, e.g. the first column may consist
of integers, while the second one consists of boolean values and so on.
There is a close connection between the DataFrames and the Series of
Pandas. A DataFrame can be seen as a concatenation of Series, each Series
having the same index, i.e. the index of the DataFrame.
A DataFrame has a row and column index; it's like a dict of Series with a
common index.
cities = {"name": ["London", "Berlin", "Madrid", "Rome",
"Paris", "Vienna", "Bucharest", "Hamburg",
"Budapest", "Warsaw", "Barcelona",
"Munich", "Milan"],
"population": [8615246, 3562166, 3165235, 2874038,
2273305, 1805681, 1803425, 1760433,
1754000, 1740119, 1602386, 1493900,
1350680],
"country": ["England", "Germany", "Spain", "Italy",
"France", "Austria", "Romania",
"Germany", "Hungary", "Poland", "Spain",
"Germany", "Italy"]}
city_frame = pd.DataFrame(cities)
city_frame
The above Python code returned the following:
countr Name populat
y ion
0 Engla Londo 861524
nd n 6
1 Germ Berlin 356216
any 6
2 Spain Madri 316523
d 5
3 Italy Rome 287403
8
4 Franc Paris 227330
e 5
5 Austri Vienn 180568

35
a a 1
6 Roma Bucha 180342
nia rest 5
7 Germ Hamb 176043
any urg 3
8 Hung Budap 175400
ary est 0
9 Polan Warsa 174011
d w 9
1 Spain Barcel 160238
0 ona 6
1 Germ Munic 149390
1 any h 0
1 Italy Milan 135068
2 0

3B.9 Connections between Python, Numpy, Matplotlib, Scipy


and Pandas

Python is a general-purpose language and as such it can and it is widely used


by system administrators for operating system administration, by web
developpers as a tool to create dynamic websites and by linguists for natural
language processing tasks. Being a truely general-purpose language, Python
can of course - without using any special numerical modules - be used to
solve numerical problems as well. So far so good, but the crux of the matter
is the execution speed. Pure Python without any numerical modules couldn't
be used for numerical tasks Matlab, R and other languages are designed for.
If it comes to computational problem solving, it is of greatest importance to
consider the performance of algorithms, both concerning speed and data
usage.
If we use Python in combination with its modules NumPy, SciPy, Matplotlib
and Pandas, it belongs to the top numerical programming languages. It is as
efficient - if not even more efficient - than Matlab or R.

36
Numpy is a module which provides the basic data structures, implementing
multi-dimensional arrays and matrices. Besides that the module supplies the
necessary functionalities to create and manipulate these data structures.
SciPy is based on top of Numpy, i.e. it uses the data structures provided by
NumPy. It extends the capabilities of NumPy with further useful functions
for minimization, regression, Fourier-transformation and many others.

Matplotlib is a plotting library for the Python programming language and


the numerically oriented modules like NumPy and SciPy.

The youngest child in this family of modules is Pandas. Pandas is using all
of the previously mentioned modules. It's build on top of them to provide a
module for the Python language, which is also capable of data manipulation
and analysis. The special focus of Pandas consists in offering data structures
and operations for manipulating numerical tables and time series. The name
is derived from the term "panel data". Pandas is well suited for working with
tabular data as it is known from spread sheet programming like Excel.

3B.10 Image Processing

Introduction: Charlie Chaplin, changed with Python, Numpy and Matplotlib


It has never as easy as it is nowadays to take a picture. All it usually needs is
a mobile phone. These are the bare essentials to shoot and to view an image.
Taking a photograph is free, if we don't take the costs for the mobile phone
into considerations. Just a generation ago, hobby artists and real artists
needed special and often expensive and the costs per picture were far from
being free.

37
We take pictures to preserve great moments in time. Pickled memories ready
to be "opened" in the future at will.
Similar to pickling things, we have to pay attention to the right
preservatives. Of course, mobile phone also provide us with a range of
image processing software, but as soon as we need to manipulate a huge
quantity of photographs we need other tools. This is when programming and
Python comes into play. Python and its modules like Numpy, Scipy,
Matplotlib and other special modules provide the optimal functionality to be
able to cope with the flood of pictures.
To provide you with the necessary knowledge this chapter of our Python
tutorial deals with basic image processing and manipulation. For this
purpose we use the modules NumPy, Matplotlib and SciPy.
We start with the scipy package misc. The helpfile says that scipy.misc
contains "various utilities that don't have another home".
# the following line is only necessary in Python notebook:
%matplotlib inline
from scipy import misc
ascent = misc.ascent()
import matplotlib.pyplot as plt
plt.gray()
plt.imshow(ascent)
plt.show()

38
Additionally to the image, we can see the axis with the ticks. This may be
very interesting, if you need some orientations about the size and the pixel
position, but in most cases, you want to see the image without this
information. We can get rid of the ticks and the axis by adding the command
plt.axis("off"):
from scipy import misc
ascent = misc.ascent()
import matplotlib.pyplot as plt
plt.axis(“off”) # removes the axis and the ticks
plt.gray()
plt.imshow(ascent)
plt.show()

We can see that the type of this image is an integer array:


ascent.dtype
The previous code returned the following result:
dtype(‘int64’)
We can also check the size of the image:
ascent.shape
The previous Python code returned the following:
(512, 512)
Image Processing Techniques

39
1. Tiling an Image
The function imag_tile, which we are going to design, can be best explained
with the following diagram:

The function imag_tile


imag_tile(img, n, m)
creates a tiled image by appending an image "img" m times in horizontal
direction. After this we append the strip image consisting of m img images n
times in vertical direction.
In the following code, we use a picture of for painting decorators as the tile
image:

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
def imag_tile(img, n, m=1):
"""
The image "img" will be repeated n times in
vertical and m times in horizontal direction.
"""
if n == 1:
tiled_img = img
else:
lst_imgs = []
for i in range(n):
lst_imgs.append(img)

40
tiled_img = np.concatenate(lst_imgs, axis=1 )
if m > 1:
lst_imgs = []
for i in range(m):
lst_imgs.append(tiled_img)
tiled_img = np.concatenate(lst_imgs, axis=0 )

return tiled_img
basic_pattern = mpimg.imread('decorators_b2.png')
decorators_img = imag_tile(basic_pattern, 3, 3)
plt.axis("off")
plt.imshow(decorators_img)
This gets us the following output:
<matplotlib.image.AxesImage at 0x7f29cf529a20>

An image is a 3-dimensional numpy ndarray.


type(basic_pattern)
The above code returned the following result:
numpy.ndarray
The first three rows of our image basic_pattern look like this:
basic_pattern[:3]
This gets us the following:
array([[[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
...,
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]],
[[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
...,
[ 1., 1., 1.],

41
[ 1., 1., 1.],
[ 1., 1., 1.]],
[[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
...,
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]]], dtype=float32)

42
CHAPTER 4

4.1 MACHINE LEARNING

Machine learning is is the kind of programming which gives computers the


capability to automatically learn from data without being explicitly
programmed. This means in other words that these programs change their
behaviour by learning from data.

Machine learning can be roughly separated into three categories:


1. Supervised learning
The machine learning program is both given the input data and the
corresponding labelling. This means that the learn data has to be labelled by
a human being beforehand.
2. Unsupervised learning
No labels are provided to the learning algorithm. The algorithm has to figure
out the a clustering of the input data.
3. Reinforcement learning
A computer program dynamically interacts with its environment. This
means that the program receives positive and/or negative feedback to
improve it performance.

4.2 Confusion Matrix:

A confusion matrix, also called a contingeny table or error matrix, is used to


visualize the performance of a classifier.
The columns of the matrix represent the instances of the predicted classes
and the rows represent the instances of the actual class. (Note: It can be the
other way around as well.)
In the case of binary classification the table has 2 rows and 2 columns.
Example:
Confusion
Matrix
Predicted classes
male female
Actual
classes
male 42 8
female 18 32
This means that the classifier correctly predicted a male person in 42 cases
and it wrongly predicted 8 male instances as female. It correctly predicted
32 instances as female. 18 cases had been wrongly predicted as male instead
of female.

43
4.3 k-Nearest-Neighbor Classifier

The principle behind nearest neighbor classification consists in finding a


predefined number, i.e. the 'k' - of training samples closest in distance to a
new sample, which has to be classified. The label of the new sample will be
defined from these neighbors. k-nearest neighbor classifiers have a fixed
user defined constant for the number of neighbors which have to be
determined. There are also radius-based neighbor learning algorithms, which
have a varying number of neighbors based on the local density of points, all
the samples inside of a fixed radius. The distance can, in general, be any
metric measure: standard Euclidean distance is the most common choice.
Neighbors-based methods are known as non-generalizing machine learning
methods, since they simply "remember" all of its training data.
Classification can be computed by a majority vote of the nearest neighbors
of the unknown sample.
The k-NN algorithm is among the simplest of all machine learning
algorithms, but despite its simplicity, it has been quite successful in a large
number of classification and regression problems, for example character
recognition or image analysis.

44
Before we actually start with writing a nearest neighbor classifier, we need
to think about the data, i.e. the learnset. We will use the "iris" dataset
provided by the datasets of the sklearn module.
The data set consists of 50 samples from each of three species of Iris
1) Iris setosa,
2) Iris virginica and
3) Iris versicolor.
Four features were measured from each sample: the length and the width of
the sepals and petals, in centimetres.
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
iris_data = iris.data
iris_labels = iris.target
print(iris_data[0], iris_data[79], iris_data[100])
print(iris_labels[0], iris_labels[79], iris_labels[100])
[5.1 3.5 1.4 0.2] [5.7 2.6 3.5 1. ] [6.3 3.3 6. 2.5]
012
The following code is only necessary to visualize the data of our learnset.
Our data consists of four values per iris item, so we will reduce the data to
three values by summing up the third and fourth value. This way, we are
capable of depicting the data in 3-dimensional space:
# following line is only necessary, if you use ipython notebook!!!
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
colours = ("r", "b")
X = []
for iclass in range(3):
X.append([[], [], []])
for i in range(len(learnset_data)):
if learnset_labels[i] == iclass:
X[iclass][0].append(learnset_data[i][0])
X[iclass][1].append(learnset_data[i][1])
X[iclass][2].append(sum(learnset_data[i][2:]))
colours = ("r", "g", "y")
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for iclass in range(3):
ax.scatter(X[iclass][0], X[iclass][1], X[iclass][2], c=colours[iclass])
plt.show()

45
4.4 Neural Networks

Introduction: When we say "Neural Networks", we mean artificial Neural


Networks (ANN). The idea of ANN is based on biological neural networks
like the brain.
The basic structure of a neural network is the neuron. A neuron in biology
consists of three major parts: the soma (cell body), the dendrites, and the
axon.
The dendrites branch of from the soma in a tree-like way and getting thinner
with every branch. They receive signals (impulses) from other neurons at
synapses. The axon - there is always only one - also leaves the soma and
usually tend to extend for longer distances than the dentrites. The axon is
used for sending the output of the neuron to other neurons or better to the
synapsis of other neurons.
The following image by Quasar Jarosz, courtesy of Wikipedia, illustrates
this:

Image of a Neuron in biology

46
Even though the above image is already an abstraction for a biologist, we
can further abstract

it:

Abstract View of a Neuron


A perceptron of artificial neural networks is simulating a biological neuron.

Image of a Perceptron of a Neural Network


It is amazingly simple, what is going on inside the body of a perceptron or
neuron. The input signals get multiplied by weight values, i.e. each input has
its corresponding weight.This way the input can be adjusted individually for
every xi. We can see all the inputs as an input vector and the corresponding
weights as the weights vector.
When a signal comes in, it gets multiplied by a weight value that is assigned
to this particular input. That is, if a neuron has three inputs, then it has three
weights that can be adjusted individually. The weights usually get adjusted
during the learn phase.
After this the modified input signals are summed up. It is also possible to
add additionally a so-called bias b to this sum. The bias is a value which can
also be adjusted during the learn phase.
A Simple Neural Network
The following image shows the general building principle of a simple
artificial neural network:

47
Building Principle of a Simple Artificial Neural Network
We will write a very simple Neural Network implementing the logical
"And" and "Or" functions.
Let's start with the "And" function. It is defined for two inputs:
Input1 Input2 Output
0 0 0
0 1 0
1 0 0
1 1 1
import numpy as np
class Perceptron:
def __init__(self, input_length, weights=None):
if weights is None:
self.weights = np.ones(input_length) * 0.5
else:
self.weights = weights

@staticmethod
def unit_step_function(x):
if x > 0.5:
return 1
return 0
def __call__(self, in_data):
weighted_input = self.weights * in_data
weighted_sum = weighted_input.sum()
return Perceptron.unit_step_function(weighted_sum)
p = Perceptron(2, np.array([0.5, 0.5]))
for x in [np.array([0, 0]), np.array([0, 1]),
np.array([1, 0]), np.array([1, 1])]:
y = p(np.array(x))

48
print(x, y)
[0 0] 0
[0 1] 0
[1 0] 0
[1 1] 1
Line Separation:
In the following program, we train a neural network to classify two clusters
in a 2-dimensional space. We show this in the following diagram with the
two classes class1 and class2. We will create those points randomly with the
help of a line, the points of class2 will be above the line and the points of
class1 will be below the line.

Two clusters of 2-dimensional points


We will see that the neural network will find a line that separates the two
classes. This line should not be mistaken for the line, which we used to
create the points.

This line is called a decision boundary

49
CHAPTER 5

5.1 PROJECT WORK

FACE DETECTION USING HAAR CASCADES & OPEN CV IN


PYTHON:
GOAL:
In this session,
• We will see the basics of face detection using Haar Feature-based
Cascade Classifiers
• We will extend the same for eye detection etc.
5.1.1 INTRODUCTION:
Face detection is a computer technology being used in a variety of
applications that identifies human faces in digital images.Face detection also
refers to the psychological process by which humans locate and attend to
faces in a visual scene
Face detection can be regarded as a specific case of object-class detection. In
object-class detection, the task is to find the locations and sizes of all objects
in an image that belong to a given class. Examples include upper torsos,
pedestrians, and cars.
Face-detection algorithms focus on the detection of frontal human faces. It is
analogous to image detection in which the image of a person is matched bit
by bit. Image matches with the image stores in database. Any facial feature
changes in the database will invalidate the matching process.In this OpenCV
with Python we're going to discuss object detection with Haar Cascasde.
We'll do face and eye detection to start. In order to do object
recognition/detection with cascade files, you first need cascade files. For the
extremely popular tasks, these already exist. Detecting things like faces,
cars, smiles, eyes, and license plates for example are all pretty prevalent.
First, I will show you how to use these cascade files, then I will show you
how to embark on creating your very own cascades A reliable face-detection
approach based on the genetic algorithm and the eigen-face[ technique:
Firstly, the possible human eye regions are detected by testing all the valley
regions in the gray-level image. Then the genetic algorithm is used to
generate all the possible face regions which include the eyebrows, the iris,
the nostril and the mouth corners. Each possible face candidate is
normalized to reduce both the lightning effect, which is caused by uneven
illumination; and the shirring effect, which is due to head movement. The
fitness value of each candidate is measured based on its projection on the
eigen-faces. After a number of iterations, all the face candidates with a high
fitness value are selected for further verification. At this stage, the face
symmetry is measured and the existence of the different facial features is
verified for each face candidate.

50
5.2 HISTORY

Face Detection has been one of the hottest topics of computer vision for the
past few years.
Face Detection is a process of finding and locating human faces in digital
visual data (Images/Videos).In 1960‟s government agencies in U.S.A. made
a contract with Woodrow W. Bledsoe of Panoramic Research Inc. for the
development of first semi-automatic face recognition system. Although the
detection of face was manual as this system relied solely on the
administrator to locate features such as eyes, ears, nose and mouth on the
photographs. It calculated distances and ratios to a common reference point
that was compared to the reference data. As it can be observed that for a
large set of visual data the above process become humanly impossible,
unreliable and extremely difficult so it led to the need of a system which can
detect human faces with more accuracy and speed. Face detection is not a
straightforward problem as it involves various challenges such as face
definition, pose and scale variation, image orientation, facial expressions,
facial deformities, illumination conditions, occlusions and background
noises. Face detection techniques can be mainly classified into four
categories: Knowledge based methods, Feature-Invariant approaches,
Appearance based methods and Template matching methods . Knowledge
based methods use the face knowledge to encode rules based on face
structure and symmetrical positions of different parts of face like eyes, nose
and mouth. Now in this approach challenge is that, it is difficult to translate
human knowledge into well define rule set. Feature-invariant approaches
uses features such as edges, geometric shapes, facial features such as eyes,
nose, ears, mouth, hairline to build a statistical model which describes their
relationships. Main challenges in this approach is face deformities,
illumination conditions, pose variations, facial expressions and occlusions.
Appearance based methods uses features based upon appearance such as
Eigenfaces – PCA, Neural Network, SVMs, and AdaBoost. Here challenges
are illumination conditions, facial deformities and speed and accuracy of
operation. In template matching based methods pre-defined templates have
to be stored and correlation values with the standard patterns are computed
e.g.: for the face contour, eyes, nose and mouth independently. Limitation so
far is that it cannot effectively deal with variations in scale, pose and shape.
Challenges are how to represent the template, how to model deformations,
and efficient matching algorithms.

5.3 FRONT END TECHNOLOGIES

Haar-cascade Detection in OpenCV.


OpenCV comes with a trainer as well as detector. If you want to train your
own classifier for any object like car, planes etc. you can use OpenCV to
create one. Its full details are given here: Cascade Classifier Training.

51
Here we will deal with detection. OpenCV already contains many pre-
trained classifiers for face, eyes, smile etc. Those XML files are stored in
opencv/data/haarcascades/ folder. Let’s create face and eye detector with
OpenCV.
Each file starts with the name of the classifier it belongs to. For example
face_cascade=cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
Haar features:
OpenCV's algorithm is currently using the following Haar-like features
which are the input to the basic classifiers.

Picture source: How Face Detection Works


Cascade of Classifiers:
As explained here, each the 3x3 kernel moves across the image and does
matrix multiplication with every 3x3 part of the image, emphasizing some
features and smoothing others.

52
Haar-Features are good at detecting edges and lines. This makes it especial
effective in face detection. For example, in a small image of Beyonce, this
Haar-feature would be able to detect her eye (an area that is dark on top and
brighter underneath).

However, because Haar Features have to be determined manually, there is a


certain limit to the types of things it can detect. If you give classifier (a
network, or any algorithm that detects faces) edge and line features, then it
will only be able to detect objects with clear edges and lines.

Picture source: How Face Detection Works


OpenCV's face detection
Let's load the required XML classifiers.
face_cascade =
cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
Then, we need to load input image in grayscale mode:
img = cv2.imread('xfiles4.jpg')

53
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
0
We use v2.CascadeClassifier.detectMultiScale() to find faces or eyes, and it
is defined like this:
cv2.CascadeClassifier.detectMultiScale(image[, scaleFactor[,
minNeighbors[, flags[, minSize[, maxSize]]]]])
Where the parameters are:
1. image : Matrix of the type CV_8U containing an image where objects
are detected.
2. scaleFactor : Parameter specifying how much the image size is
reduced at each image scale.

Picture source: Viola-Jones Face Detection

3. This scale factor is used to create scale pyramid as shown in the


picture. Suppose, the scale factor is 1.03, it means we're using a small step
for resizing, i.e. reduce size by 3 %, we increase the chance of a matching
size with the model for detection is found, while it's expensive.
4. minNeighbors : Parameter specifying how many neighbors each
candidate rectangle should have to retain it. This parameter will affect the
quality of the detected faces: higher value results in less detections but with
higher quality. We're using 5 in the code.
5. flags : Parameter with the same meaning for an old cascade as in the
function cvHaarDetectObjects. It is not used for a new cascade.
6. minSize : Minimum possible object size. Objects smaller than that are
ignored.
7. maxSize : Maximum possible object size. Objects larger than that are
ignored.
If faces are found, it returns the positions of detected faces as Rect(x,y,w,h).
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
Once we get these locations, we can create a ROI for the face and apply eye
detection on this ROI.

54
5.4 BACK END TECHNOLOGIES:

There are two technologies use in background processing.


1. OpenCV:OpenCV (Open Source Computer Vision Library) is an open
source computer vision and machine learning software library. OpenCV
was built to provide a common infrastructure for computer vision
applications and to accelerate the use of machine perception in the
commercial products. Being a BSD-licensed product, OpenCV makes it
easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision
and machine learning algorithms. These algorithms can be used to detect
and recognize faces, identify objects, classify human actions in videos,
track camera movements, track moving objects, extract 3D models of
objects, produce 3D point clouds from stereo camera+s, stitch images
together to produce a high resolution image of an entire scene, find
similar images from an image database, remove red eyes from images
taken using flash, follow eye movements, recognize scenery and
establish markers to overlay it with augmented reality, etc. OpenCV has
more than 47 thousand people of user community and estimated number
of downloads exceeding 14 million. The library is used extensively in
companies, research groups and by governmental bodies.
Along with well-established companies like Google, Yahoo, Microsoft,
Intel, IBM, Sony, Honda, Toyota that employ the library, there are many
startups such as Applied Minds, VideoSurf, and Zeitera, that make
extensive use of OpenCV. OpenCV’s deployed uses span the range from
stitching streetview images together, detecting intrusions in surveillance
video in Israel, monitoring mine equipment in China, helping robots
navigate and pick up objects at Willow Garage, detection of swimming
pool drowning accidents in Europe, running interactive art in Spain and
New York, checking runways for debris in Turkey, inspecting labels on
products in factories around the world on to rapid face detection in
Japan.
It has C++, Python, Java and MATLAB interfaces and supports
Windows, Linux, Android and Mac OS. OpenCV leans mostly towards
real-time vision applications and takes advantage of MMX and SSE
instructions when available. A full-featured CUDAand OpenCL
interfaces are being actively developed right now. There are over 500
algorithms and about 10 times as many functions that compose or
support those algorithms. OpenCV is written natively in C++ and has a
templated interface that works seamlessly with STL containers.
2. OpenCV-Python: Python is a general purpose programming language
started by Guido van Rossum, which became very popular in short time
mainly because of its simplicity and code readability. It enables the
programmer to express his ideas in fewer lines of code without reducing
any readability.

55
Compared to other languages like C/C++, Python is slower. But another
important feature of Python is that it can be easily extended with C/C++.
This feature helps us to write computationally intensive codes in C/C++
and create a Python wrapper for it so that we can use these wrappers as
Python modules. This gives us two advantages: first, our code is as fast
as original C/C++ code (since it is the actual C++ code working in
background) and second, it is very easy to code in Python. This is how
OpenCV-Python works, it is a Python wrapper around original C++
implementation.
And the support of Numpy makes the task more easier. Numpy is a
highly optimized library for numerical operations. It gives a MATLAB-
style syntax. All the OpenCV array structures are converted to-and-from
Numpy arrays. So whatever operations you can do in Numpy, you can
combine it with OpenCV, which increases number of weapons in your
arsenal. Besides that, several other libraries like SciPy, Matplotlib which
supports Numpy can be used with this.
So OpenCV-Python is an appropriate tool for fast prototyping of
computer vision problems.
5.4.1 Key Features
• Optimized for real time image processing & computer vision
applications
• Primary interface of OpenCV is in C++
• There are also C, Python and JAVA full interfaces
• OpenCV applications run on Windows, Android, Linux, Mac and iOS
• Optimized for Intel processors

5.4.2 USES
What it can do :
1. Read and Write Images.
2. Detection of faces and its features.
3. Detection of shapes like Circle,rectangle etc in a image. E.g Detection
of coin in images.
4. Text recognition in images. e.g Reading Number Plates/
5. Modifying image quality and colors e.g Instagram, CamScanner.
6. Developing Augmented reality apps.
and many more.....
5.4.3 Which Language it supports :
1. C++
2. Android SDK
3. Java
4. Python
5. C (Not recommended)
5.4.4 Some Advantages of using OpenCV :
1. Simple to learn,lots of tutorial available.
2. Works with almost all the famous languages.

56
3. Free to use.

5.5 NUMPY

NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/[1][2]


(NUM-pee)) is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these arrays.
The ancestor of NumPy, Numeric, was originally created by Jim Hugunin
with contributions from several other developers. In 2005, Travis Oliphant
created NumPy by incorporating features of the competing Numarray into
Numeric, with extensive modifications. NumPy is open-source software and
has many contributors.
HISTORY
The Python programming language was not initially designed for numerical
computing, but attracted the attention of the scientific and engineering
community early on, so that a special interest group called matrix-sig was
founded in 1995 with the aim of defining an array computing package.
Among its members was Python designer and maintainer Guido van
Rossum, who implemented extensions to Python's syntax (in particular the
indexing syntax) to make array computing easier.
An implementation of a matrix package was completed by Jim Fulton, then
generalized by Jim Hugunin to become Numeric, also variously called
Numerical Python extensions or NumPy. Hugunin, a graduate student at
Massachusetts Institute of Technology (MIT), joined the Corporation for
National Research Initiatives (CNRI) to work on JPython in 1997 leaving
Paul Dubois of Lawrence Livermore National Laboratory (LLNL) to take
over as maintainer. Other early contributors include David Ascher, Konrad
Hinsen and Travis Oliphant.
A new package called Numarray was written as a more flexible replacement
for Numeric. Like Numeric, it is now deprecated. Numarray had faster
operations for large arrays, but was slower than Numeric on small ones, so
for a time both packages were used for different use cases. The last version
of Numeric v24.2 was released on 11 November 2005 and numarray v1.5.2
was released on 24 August 2006.
There was a desire to get Numeric into the Python standard library, but
Guido van Rossum decided that the code was not maintainable in its state
then.
In early 2005, NumPy developer Travis Oliphant wanted to unify the
community around a single array package and ported Numarray's features to
Numeric, releasing the result as NumPy 1.0 in 2006. This new project was
part of SciPy. To avoid installing the large SciPy package just to get an
array object, this new package was separated and called NumPy. Support for
Python 3 was added in 2011 with NumPy version 1.5.0.

57
In 2011, PyPy started development on an implementation of the NumPy API
for PyPy. It is not yet fully compatible with NumPy.

5.5.1 TRAITS
NumPy targets the CPython reference implementation of Python, which is a
non-optimizing bytecode interpreter. Mathematical algorithms written for
this version of Python often run much slower than compiled equivalents.
NumPy addresses the slowness problem partly by providing
multidimensional arrays and functions and operators that operate efficiently
on arrays, requiring rewriting some code, mostly inner loops using NumPy.
Using NumPy in Python gives functionality comparable to MATLAB since
they are both interpreted, and they both allow the user to write fast programs
as long as most operations work on arrays or matrices instead of scalars. In
comparison, MATLAB boasts a large number of additional toolboxes,
notably Simulink, whereas NumPy is intrinsically integrated with Python, a
more modern and complete programming language. Moreover,
complementary Python packages are available; SciPy is a library that adds
more MATLAB-like functionality and Matplotlib is a plotting package that
provides MATLAB-like plotting functionality. Internally, both MATLAB
and NumPy rely on BLAS and LAPACK for efficient linear algebra
computations.
Python bindings of the widely used computer vision library OpenCV utilize
NumPy arrays to store and operate on data. Since images with multiple
channels are simply represented as three-dimensional arrays, indexing,
slicing or masking with other arrays are very efficient ways to access
specific pixels of an image. The NumPy array as universal data structure in
OpenCV for images, extracted feature points, filter kernels and many more
vastly simplifies the programming workflow and debugging.

5.5.2 The ndarray data structure


The core functionality of NumPy is its "ndarray", for n-dimensional array,
data structure. These arrays are strided views on memory. In contrast to
Python's built-in list data structure (which, despite the name, is a dynamic
array), these arrays are homogeneously typed: all elements of a single array
must be of the same type.
Such arrays can also be views into memory buffers allocated by C/C++,
Cython, and Fortran extensions to the CPython interpreter without the need
to copy data around, giving a degree of compatibility with existing
numerical libraries. This functionality is exploited by the SciPy package,
which wraps a number of such libraries (notably BLAS and LAPACK).
NumPy has built-in support for memory-mappedndarrays.[6]

5.5.3 Limitations
Inserting or appending entries to an array is not as trivially possible as it is
with Python's lists. The np.pad(...) routine to extend arrays actually creates
new arrays of the desired shape and padding values, copies the given array

58
into the new one and returns it. NumPy's np.concatenate([a1,a2]) operation
does not actually link the two arrays but returns a new one, filled with the
entries from both given arrays in sequence. Reshaping the dimensionality of
an array with np.reshape(...) is only possible as long as the number of
elements in the array does not change. These circumstances originate from
the fact that NumPy's arrays must be views on contiguous memory buffers.
A replacement package called Blaze attempts to overcome this limitation.
Algorithms that are not expressible as a vectorized operation will typically
run slowly because they must be implemented in "pure Python", while
vectorization may increase memory complexity of some operations from
constant to linear, because temporary arrays must be created that are as large
as the inputs. Runtime compilation of numerical code has been implemented
by several groups to avoid these problems; open source solutions that
interoperate with NumPy include scipy.weave, numexpr and Numba.
Cython and Pythran are static-compiling alternatives to these.

5.6 FUTURE SCOPE:

The future is here, all we have to do is face it. At least that is what the latest
face recognition and detection developers think. That really comes as no
surprise, I mean, how many people do you think have used Snapchat to send
a selfie with a crazy filter today? And how many have browsed through
potential photos of themselves on Facebook to make sure they were tagged?
(Or weren’t, in cases of truly embarrassing photos like the ones a soon to be
ex-friend thought the world needed to see.)
The truth is that facial recognition and detection software is popping up
everywhere. And this can be a really great thing. Let’s say someone is trying
to steal your identity or use photos of you under a different name online.
With facial recognition technology, ideally, you could search the whole of
the internet to see where each and every photo of your face is posted. And
apart from our lives on social media, facial recognition software can also
offer protection from and prevention of other threats. From using facial
recognition in smart security cameras to its uses in digital medical
applications, facial recognition software might help us in creating a safer,
healthier future.
1. Identifiable online daters. An important part of online dating is, of
course, anonymity. You make up a screen name because you want an
element of surprise when you meet someone — and because you don’t want
creepers showing up at your office uninvited. In 2010, Acquisti published
the study, “Privacy in the Age of Augmented Reality.” He and his fellow
researchers analyzed 6,000 online profiles on a dating site in the same US
city. Using four cloud computing cores and the facial recognition software
PittPatt, they were able to identify 1 in 10 of these anonymous daters. And
remember, this technology has improved three-fold since then.
2. Better tools for law enforcement. After the Boston Marathon
bombing, the Boston police commissioner said that facial recognition

59
software had not helped them identify Dzhokhar and Tamerlan Tsarnaev,
despite the fact that the two were in public records databases—and
photographed at the scene. Only, those images were taken from far away,
the brothers were wearing sunglasses and caps, and many shots of them
were in profile — all things that make facial recognition difficult. Experts
say that technology can overcome these difficulties. In an interview with
Salon.com, Acquisti said that the increasing resolution of photos will help
(hello, gigapixel!), as will the improved computational capabilities of
computers and the ever-expanding mountain of data available from social
networks. In a fascinating article via Yahoo, Paul Schuepp of the company
Animetrics shares a more specific advance: software that turns 2D images
into a simulated 3D model of a person’s face. In a single second, it can turn
an unidentifiable partial snapshot into a very identifiable headshot. He
claims the software can boost identification rates from 35 percent to 85
percent.
3. Full body recognition? Allyson Rice of the University of Texas at
Dallas has an idea for how facial recognition software could become even
more accurate for law enforcement purposes — by becoming body
recognition software. In a study published this month in Psychological
Science, Rice and her fellow researchers asked college students to discern
whether two photos — which had stumped facial recognition software —
were indeed of the same person. They used eye-tracking equipment to
discern how the participants were making the call. In the end, they found
that students were far more accurate in their answers when the face and body
of the subject was shown. And while participants reported judging based on
facial features, their eyes were spending more time examining body build,
stance, and other body features. “Psychologists and computer scientists have
concentrated almost exclusively on the role of the face in person
recognition,” Rice tells The Telegraph. “But our results show that the body
can also provide important and useful identity information for person
recognition.”
4. A face scan for your phone. “Face Unlock” is a feature that allows you
to unlock Android smartphones using your “faceprint,” i.e. a map of the
unique structure of your face. This is just the beginning of face-as-security
measure. In June, according to eWeek.com, Google patented a technology
that would turn goofy facial expressions — a wink, a scrunched nose, a
smile, a stuck-out tongue — into a code to unlock devices. The hope: that
this would be harder to spoof than a faceprint. Turns out, apps such as
FastAccess Anywhere, which uses your face as a password, can reportedly
be fooled with a simple photo, says USA Today.
5. Facial recognition as advertising. Could facial recognition technology
be used to influence what we buy? Very likely. In 2012, an interactive ad for
Choice for Girls was launched at bus stops in London. These billboards
were able to scan passersby, judge their gender and show them appropriate
content. Girls and women got a video, while boys and men got statistics on a
subject. This ad was for a good cause, but this technology will no doubt

60
expand — and could allow corporations and organizations to tap into our
personal lives in unpredictable ways. Personalized ads as we walk down the
street, a la the classic scene in Minority Report, yes. But as Acquisti notes in
his talk, there’s a potentially more subtle application of this technology too:
ads that can identify us and our two favorite friends on Facebook. From
there, it’s a snap to create a composite image of a person who’ll star in an ad
targeted just to us. For more in what’s coming in the facial recognition
advertising realm, check out Leslie Stahl’s 60 Minutes segment “A Face in
the Crowd: Say goodbye to anonymity.” Among other fascinating tidbits, it
introduces us to FaceDeals — which notes when you’ve walked into an
establishment, mines your Facebook likes and text messages a deal created
just for you.
6. Shattered Glass. As Acquisti notes in his talk, the fact that someone’s
face can be used to find out private information is especially disconcerting
given Google Glass’ emergence on the scene. In June, US lawmakers
questioned Google about the privacy implications of the device and, in
response, Google stressed that they “won’t be approving any facial
recognition Glassware at this time.” But of course, it’s not completely up to
them. In July, Stephen Balaban announced to NPR and the world that he had
hacked Glass in order to give it facial recognition powers. “Essentially what
I am building is an alternative operating system that runs on Glass but is not
controlled by Google,” he said. On a similar note, one Michael DiGiovanni
created a program called Winky for Glass that lets the wearer take a photo
with a wink, rather than using the voice command.
7. Your face as currency. In July, a Finnish company called Uniqul
released a video of a project in the works, a pay-by-face authentication
system. The idea? At a store, rather than paying with cash or a credit card,
you give a “meaningful nod” to a scanner to make a purchase. A Huffington
Post article describes this new tech, and also gives a peak at the Millennial
ATM, which uses facial recognition as its primary security method.

Facial detection is evolving rapidly. What here sounds cool and useful to
you, and what sounds like a trip to Scarytown? For me, I may well be
investing in these custom t-shirts, which claim to trip up facial detection.

5.7 CONCLUSION:

There is no doubt that lot of research work has been done in the area of face
detection but the goal is still far from achieved: To mimic the human vision
of detecting and identifying the human faces. So to meet that goal, still a lot
of work has to be done in this area. As per literature survey, following
directions for future work in this area are being proposed:

1. The training of Haar features in seminar viola jones' face detector takes a
long time, which may be couple of days if used serial processing. There is

61
scope of work to apply the parallel computing to enhance the speed of
features training. Till date not much work has addressed the performance.

2. Comparisons of various software platforms such as MATLAB, use of


GPU in C/C++ environment, use of GPU in MATLAB environment. So
there is scope of using optimization work to address the issue of speed of
training of features.

3. In the use of volumetric features, there is open research area in: a)


Integrating the descriptor with the scanning strategy, b) Setting criteria for
selecting the optimal number of frames to encode the descriptor, c)
Investigating in using same feature space for face detection & recognition.

4. Use of holistic features for performing various tasks in the process of face
extraction from video such as face detection, face quality estimation, face
quality enhancement and face recognition
instead of using separate feature for each task.

5. Using motion information in creating face-logs from the video.

5.8 REFERENCES:

1. Python for Data Science and Machine Learning


2. Deep Learning With Python
3. Hands-On Machine Learning

Refrences for making this project, firstly our teacher Mr. Sunil Kumar ,secondly
internet and websites such as Google with above given sites , You Tube Vedio
Tutorials and lastly lots of practice.

62

You might also like