Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
111 views

2 - Python Strings

This document provides an overview of strings and regular expressions in Python. It discusses how to create, manipulate, and format strings using slicing, indexing, concatenation and other string methods. The document also introduces regular expressions, including how to match literal characters, character classes, quantifiers, grouping, and other regex patterns. Examples are provided throughout to demonstrate each concept.

Uploaded by

pavan Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views

2 - Python Strings

This document provides an overview of strings and regular expressions in Python. It discusses how to create, manipulate, and format strings using slicing, indexing, concatenation and other string methods. The document also introduces regular expressions, including how to match literal characters, character classes, quantifiers, grouping, and other regex patterns. Examples are provided throughout to demonstrate each concept.

Uploaded by

pavan Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

2 – STRINGS AND REGULAR EXPRESSIONS

Parham Kazemi
University of Isfahan, Computer Eng. Faculty
ACM Student Chapter
pkazemi3@gmail.com
CREATING STRINGS

• Create them simply by enclosing characters in quotes. Python treats single quotes the
same as double quotes.
• Python does not support a character type; these are treated as strings of length one.

• >>> s1 = ‘this is a string’


• >>> s2 = “this is a string”

• Use triple quotes to create a paragraph.

>>> s3 = ‘’’this is
a string’’’
>>> print(s3)
'this is\na string'
2
SLICING AND INDEXING

• ACCESSING STRINGS
• To access substrings, use the square brackets for slicing along with the index or indices
to obtain your substring.

>>> s = ‘hello world’


>>> s[3] >>> s[:6]
‘l’ ‘hello ‘
>>> s[3:6] >>> s[3:-3]
‘lo ’ ‘lo wo’
>>> s[-3]
>>> s[3:] ‘r’
‘lo world’

3
STRING FORMATTING

• FORMATTING STRINGS
• Use the .format(args) method of a string containing {arg} values:
>>> s = ‘Hello, my name is {0} and I live in {1}.’
>>> s = s.format(‘Parham’, ‘Isfahan’)
>>> print(s)
'Hello, my name is Parham and I live in Isfahan.’

• Formatting using the % operator is also supported:


>>> s = ‘Hello, my name is %s and I’m %d years old’
>>> s = s % (‘Parham’, 20)
>>> print(s)
'Hello, my name is Parham and I’m 20 years old'
4
USEFUL FUNCTIONS AND METHODS

• The + operator is used for string concatenation:


>>> ‘Hello ‘ + ‘World!’
‘Hello World!’

• The * operator is used for string repetition:


>>> ‘Hodor ’ * 5
‘Hodor Hodor Hodor Hodor Hodor ‘

• Use the len(string) function to get the length of a string


>>> s = ‘Python’
>>> len(s)
6
5
USEFUL FUNCTIONS AND METHODS

• Calling a strings .count(s) function returns the number of times s occurred in the
original string.
>>> ‘Hello World!’.count(‘l’)
3

• The .replace(old, new) function returns a new string with the old string
replaced with the new one:
>>> s = ‘Boy, that escalated quickly’
>>> print(s.replace(‘Boy’, ‘Well’))
‘Well, that escalated quickly’
>>> print(s)
‘Boy, that escalated quickly’
6
USEFUL FUNCTIONS AND METHODS

• Use the .find(s) function to find the first starting index of s. (Returns -1 if not found)
>>> s = ‘To be or not to be’
>>> s.find(‘and’)
-1
>>> s.find(‘be’)
3

• The .strip(s=‘’) function returns a copy of the string with all whitespaces (or the string
given) from the beginning and end of the string removed
>>> s = ‘\t This is a string. \t\t’
>>> print(s.strip())
'This is a string.'
7
USEFUL FUNCTIONS AND METHODS

• The .split(s=‘’) function splits the string according to delimiter s (space if not
provided) and returns list of substrings.
>>> s = ‘Now this, is a string.’
>>> s.split()
['Now', 'this,', 'is', 'a', 'string.’]
>>> s.split(‘,’)
['Now this', ' is a string.’]

• .isdecimal() returns True if the string contains only decimal characters.


>>> ‘9436’.isdecimal()
True
>>> ‘abcd’.isdecimal()
False
8
USEFUL FUNCTIONS AND METHODS

• The method str.join(seq) returns a string in which the string elements of the
sequence have been joined by str separator.
• This method returns a string, which is the concatenation of the strings in the sequence
seq.The separator between elements is the string providing this method.

>>> s = '-'
>>> l = ['a', 'b', 'c']
>>> s.join(l)
'a-b-c'

9
CODEFORCES 746B - DECODING

Polycarp is mad about coding, that is why he writes Sveta encoded messages. He calls
the median letter in a word the letter which is in the middle of the word. If the word's length is
even, the median letter is the left of the two middle letters. In the following examples, the
median letter is highlighted: contest, info. If the word consists of single letter, then according to
above definition this letter is the median letter.
Polycarp encodes each word in the following way: he writes down the median letter of the
word, then deletes it and repeats the process until there are no letters left. For example, he
encodes the word volga as logva.
You are given an encoding s of some word, your task is to decode it.
Input
The first line contains a positive integer n (1 ≤ n ≤ 2000) — the length of the encoded word.
The second line contains the string s of length n consisting of lowercase English letters — the
encoding.
Output
Print the word that Polycarp encoded.

10
REGULAR EXPRESSIONS

• Regular Expression (Regex): A sequence of symbols and characters expressing a string


or pattern to be searched for within a longer piece of text.

• REGEX USES:
• File Renaming
• Text Search
• Web directives
• Database queries

• Regex testing online app: regex101.com

11
LITERAL CHARACTERS

• The most basic regular expression consists of a single literal character, such as a.
• Twelve characters (metacharacters) have special meanings in regular expressions:

\ ^ $ . | ? * + ( ) { [

• If you want to use any of these characters as a literal in a regex, you need to escape
them with a backslash: 2\+2=4

• Anchors do not match any characters.They match a position.


• ^ matches at the start of the string, or after any line break
• $ matches the end of the string, or before the line terminator right at the end of the
string

• The dot matches any single character, except line break characters.
12
CHARACTER CLASSES OR CHARACTER SETS

• A "character class" matches only one out of several characters.


• To match an a or an e, use [ae]
• You could use this in gr[ae]y to match either gray or grey.
• The order of the characters inside a character class does not matter.
• You can use a hyphen inside a character class to specify a range of characters:
• [0-9] matches a single digit, [A-Z] matches a single uppercase letter
• Typing a caret after the opening square bracket negates the character class. The result is
that the character class matches any character that is not in the character class:
• [^a-z] matches any single character except a single lowercase letter

13
SHORTHAND CHARACTER CLASSES

• \d matches a single character that is a digit


• Same as [0-9]
• \w matches a "word character" (alphanumeric characters plus underscore)
• Same as [A-Za-z0-9_]
• \s matches a whitespace character (includes tabs and line breaks).
• Same as [ \t\r\n\f]

• The above three shorthands also have negated versions.


• \D is the same as [^\d]
• \W is short for [^\w]
• \S is the equivalent of [^\s]

14
ALTERNATION

• Alternation is the regular expression equivalent of "or“:


• python|java matches “python” or “java”
• Alternation has the lowest precedence of all regex operators.
• cat|dog food matches “cat” or “dog food”.
• To create a regex that matches “cat food” or “dog food”, you need to group the
alternatives: (cat|dog) food

15
QUANTIFIERS

• The question mark makes the preceding token in the regular expression optional.
• colou?r matches “colour” or “color”.
• The asterisk or star (*) tells the engine to attempt to match the preceding token zero
or more times.
• The plus sign (+) tells the engine to attempt to match the preceding token once or
more.
• Use curly braces to specify a specific amount of repetition.
• 1{3} matches “111”
• 1{2,4} matches “11”, “111”, “1111”
• 1{5,} matches “11111”, “111111”, …

16
QUANTIFIERS

• The repetition operators or quantifiers are greedy. They expand the match as far as they
can, and only give back if they must to satisfy the remainder of the regex.
• The regex <.+> matches <EM>first</EM> in This is a <EM>first</EM> test.

• Place a question mark after the quantifier to make it lazy.


• <.+?> matches <EM> in the above string.

17
GROUPING

• Place parentheses around multiple tokens to group them together.


• You can then apply a quantifier to the group.
• Set(Value)? matches Set or SetValue.
• Within the regular expression, you can use the backreference \1 to match the same
text that was matched by the capturing group.
• ([abc])=\1 matches a=a, b=b, and c=c.
• If your regex has multiple capturing groups, they are numbered counting their opening
parentheses from left to right.
• Make your regexes easier to read by naming your groups
• In python: (?P<name>[abc])=(?P=name) is identical to ([abc])=\1
• Naming: (?P< + the name of the group + > + a regex + )
• Referring: (?P= + the name of the group + )
18
EXAMPLES

• Write a regex that matches:


A. A correctly cased word (Python, Hello, Strings, …)
B. A valid python identifier (no need to exclude keywords)
C. An email address (almost):
• local@domain
• Local and domain parts can include any uppercase or lowercase characters or
numbers
• Local can contain: $ , * , + , – , _
• Local can have a dot in it, but it must not be in the beginning or the end of the string
• Domain can contain a hyphen
• Domain must have a dot
• The length of the labels (before and after the dot in domain) must be at most 63
19
REGULAR EXPRESSIONS IN PYTHON

• RAW STRINGS
• To avoid any confusion while dealing with regular expressions, we use raw strings
as r'expression’.
>>> s1 = ‘\n’
>>> s2 = r’\n’
>>> print(s1, s2)
\n \\n

• THE re MODULE
• >>> import re
• Python offers two different primitive operations based on regular expressions:
• re.match checks for a match only at the beginning of the string,
• re.search checks for a match anywhere in the string
20
REGULAR EXPRESSIONS IN PYTHON

re.findall(pattern, string, flags=0)


• Returns a list containing all matched strings (flags discussed in the next page)
• If the regex contains groups, the list will be populated with the groups matched,
indicated as tuples
>>> re.findall(‘([A-Z]+)’, ‘THIS is a STRING’)
['THIS', 'STRING']
>>> re.findall(‘([A-Z]+)([0-9]*)’, ‘THIS is a STRING123’)
[('THIS', ''), ('STRING', '123’)]

re.sub(pattern, repl, string)


• Replaces all occurrences of the RE pattern in string with repl, and returns modified string.

21
OPTION FLAGS

Modifier Description
re.I Performs case-insensitive matching
re.L Interprets words according to the current locale.
Makes $ match the end of a line (not just the end of the string) and
re.M makes ^ match the start of any line (not just the start of the string).
re.S Makes a period (dot) match any character, including a newline.
Interprets letters according to the Unicode character set. This flag
re.U
affects the behavior of \w, \W, \b, \B.
Permits "cuter" regular expression syntax. It ignores whitespace
re.X (except inside a set [] or when escaped by a backslash) and treats
unescaped # as a comment marker.

• USAGE
>>> re.findall(pattern, string, re.M | re.S)

22
PYTHON REGEX EXAMPLE

• Google uses the following pattern to show search results:


<h3 class="r"><a href=“https://www.python.org/” ...
• Write a python script that extracts all search result links from a google search page
• Input: HTML source code
• Output: All links to search results (for example: https://www.python.org/)

23

You might also like