Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
18 views

Module4 DataAnalyticsLanguages

Uploaded by

Bhumika Kukade
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module4 DataAnalyticsLanguages

Uploaded by

Bhumika Kukade
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Module 4:

Data Analytics Languages--


Python

31/07/2024 Slide 1
History

• Python created by Guido van Rossum in the


Netherlands in 1990
• Popular programming language
• Widely used in industry and academia
• Simple, intuitive syntax
• Rich library
• Two versions in existence today Python 2 and
Python 3
eLahe Technologies 2020
31/07/2024 2
www.elahetech.com
Interpreted Language
• Python is an interpreted language as opposed
to being compiled
• An interpreter reads a high level program and
executes it
• A compiler translates the program into an
executable object code first which is
subsequently executed

eLahe Technologies 2020


31/07/2024 3
www.elahetech.com
Numpy

• NumPy is the fundamental package for scientific


computing with Python. It contains among other
things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random
number capabilities

eLahe Technologies 2020


31/07/2024 4
www.elahetech.com
Matplotlib

• Matplotlib is a Python 2D plotting library


which produces publication quality figures in
a variety of hardcopy formats and interactive
environments across platforms.

eLahe Technologies 2020


31/07/2024 5
www.elahetech.com
pandas

• pandas is an open source, BSD-licensed


library providing high-performance, easy-to-
use data structures and data analysis tools
for Python

eLahe Technologies 2020


31/07/2024 6
www.elahetech.com
Python Regex

31/07/2024 Slide 7
Regular Expressions

In computing, a regular expression, also referred to as


"regex" or "regexp", provides a concise and flexible
means for matching strings of text, such as particular
characters, words, or patterns of characters. A regular
expression is written in a formal language
that can be interpreted by a regular expression
processor.

http://en.wikipedia.org/wiki/Regular_expression

31/07/2024 8
Python Regular Expressions
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a chracter one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end

31/07/2024 9
The Regular Expression Module
• Before you can use regular expressions in your
program, you must import the library using
"import re"
• You can use re.search() to see if a string matches a
regular expression similar to using the find()
method for strings
• You can use re.findall() extract portions of a string
that match your regular expression similar to a
combination of find() and slicing: var[5:10]

31/07/2024 10
Wild-Card Characters

• The dot character matches any character


• If you add the asterisk character, the character is
"any number of times"
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain

31/07/2024 11
Wild-Card Characters

• The dot character matches any character


• If you add the asterisk character, the character is
"any number of times"
Match the start of the line Many times
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
Match any character

31/07/2024 12
Wild-Card Characters

• Depending on how "clean" your data is and the


purpose of your application, you may want to
narrow your match down a bit
Match the start of the line Many times
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
Match any character

31/07/2024 13
Greedy Matching

• The repeat characters (* and +) push outward in both


directions (greedy) to match the largest possible string
One or more
>>> import re characters
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print y
^F.+:
['From: Using the :']
First character in the Last character in the
Why not 'From:'? match is an F match is a :

31/07/2024 14
Non-Greedy Matching

• Not all regular expression repeat codes are greedy!


If you add a ? character - the + and * chill outOne
a bit...
or more
>>> import re characters but
>>> x = 'From: Using the : character' not greedily
>>> y = re.findall('^F.+?:', x)
>>> print y
^F.+?:
['From:']
First character in the Last character in the
match is an F match is a :

31/07/2024 15
Python Slicing

31/07/2024 Slide 16
String Slices
• >>>fruit = “apple”
• >>>fruit[1:3]
• >>>’pp’
• >>>fruit[1:]
• >>>’pple’
• >>>fruit[:4]
• >>>’appl’
• >>>fruit[:]
• >>>’apple’

31/07/2024 17
List Slices
• >>>b
• [3, 4, 5, 6]
• >>>b[0:3]
• [3,4,5]
• b[0:j] with j > 3 and b[0:] are same
• >>>b[:2]
• [3,4]

31/07/2024 18
List Slices
• >>>b[2:2]
• []
• b[i:j:k] is a subset of b[i:j] with elements
picked in steps of k
• >>>b=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
• >>>b[0:10:3]
• [1, 4, 7]

31/07/2024 19
NumPy array slicing
• 1-d array slicing and indexing is similar to
Python lists
• import numpy as np
• arr1=np.array([1,2,5,6,4,3])
• arr1[2:4]=99

• arr1
• Out[8]: array([ 1, 2, 99, 99, 4, 3])
eLahe Technologies 2020
31/07/2024 20
www.elahetech.com
NumPy array slicing

• Slicing in ndarrays is different from Python lists in that


data is not copied
• Slices are views on the original array!
• arr2=arr1[2:4]

• arr2[0]=88

• arr1
• Out[13]: array([ 1, 2, 88, 99, 4, 3])

eLahe Technologies 2020


31/07/2024 21
www.elahetech.com
Sets

31/07/2024 Slide 22
in and notin
• >>>setA= {1,3,5,7}
• >>>3 in setA
• True
• >>>3 not in setA
• False
• >>>4 not in setA
• True

31/07/2024 23
Subset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issubset setB
• True
• >>> setA issubset setC
• False

31/07/2024 24
Superset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issuperset setB
• False
• >>> setB issuperset setA
• True
• >>> setC issuperset setA
• False

31/07/2024 25
Set Union

• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.union(setB)
• {1,3,5,7,9}
• >>>setA | setB
• {1, 3, 5, 7, 9}

31/07/2024 26
Set Intersection

• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.intersection(setB)
• {5,7}
• >>>setA & setB
• {5, 7}

31/07/2024 27
Dictionaries

31/07/2024 Slide 28
Dictionaries

>>>
• Lists index their entries >>> purse = dict() >>>purse['money'] =
12
based on the position >>> purse['candy'] = 3
in the list >>> purse['tissues'] = 75
>>> print(purse)
• Dictionaries are like {'money': 12, 'tissues': 75, 'candy': 3}
bags - no order >>> print(purse['candy'])
3
• So we index the things >>> purse['candy'] = purse['candy'] + 2
we put in the dictionary >>> print(purse)
{'money': 12, 'tissues': 75, 'candy': 5}
with a “lookup tag”
Comparing Lists and
Dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values

>>> lst = list() >>> ddd = dict()


>>> lst.append(21) >>> ddd['age'] = 21
>>> lst.append(183) >>> ddd['course'] = 182
>>> print(lst) >>> print(ddd)
[21, 183] {'course': 182, 'age': 21}
>>> lst[0] = 23 >>> ddd['age'] = 23
>>> print(lst) >>> print(ddd)
[23, 183] {'course': 182, 'age': 23}

You might also like