Rolex Pearlmaster Replica
  Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
This article is part of in the series
Published: Tuesday 5th June 2012
Last Updated: Wednesday 29th December 2021

In the first part of this series, we looked at the basic syntax of regular expressions and some simple examples. In this part, we'll take a look at some more advanced syntax and a few of the other features Python has to offer.

Regular Expression Captured Groups

So far, we've searched within a string using a regular expression and used the returned MatchObject to extract the entire sub-string that was matched. Now we'll look at how we can extract parts within the sub-string that was matched.

This regular expression:

[python]
\d{2}-\d{2}-\d{4}
[/python]

Will match a date with the following format:

  • A 2-digit date.
  • A hyphen.
  • A 2-digit month.
  • A hyphen.
  • A 4-digit year.

For example:

[python]
>>> s = 'Today is 31-05-2012'
>>> mo = re.search(r'\d{2}-\d{2}-\d{4}', s)
>>> print(mo.group())
31-05-2012
[/python]

We can capture various parts of this regular expression by putting them in parentheses:

[python]
(\d{2})-(\d{2})-(\d{4})
[/python]

If Python matches this regular expression, we can then retrieve each captured group separately.

[python]
>>> mo = re.search(r'(\d{2})-(\d{2})-(\d{4})', s)
>>> # Note: The entire matched string is still available
>>> print(mo.group())
31-05-2012
>>> # The first captured group is the date
>>> print(mo.group(1))
31
>>> # And this is its start/end position in the string
>>> print('%s %s' % (mo.start(1), mo.end(1)))
9 11
>>> # The second captured group is the month
>>> print(mo.group(2))
05
>>> # The third captured group is the year
>>> print(mo.group(3))
2012
[/python]

When you start writing more complex regular expressions, with lots of captured groups, it can be useful to refer to them by a meaningful name rather than a number. The syntax is (...), where ... is the regular expression to be captured, and name is the name you want to give to the group.

[python]
>>> s = "Joe's ID: abc123"
>>> # A normal captured group
>>> mo = re.search(r'ID: (.+)', s)
>>> print(mo.group(1))
abc123
>>> # A named captured group
>>> mo = re.search(r'ID: (?P<id>.+)', s)
>>> print(mo.group('id'))
abc123
[/python]

Re-using Captured Groups with Regular Expressions

We can also take captured groups and re-use them later in the regular expression! (?P=name) means match whatever was previously matched in the named group. For example:

[python]
>>> s = 'abc 123 def 456 def 789'
>>> mo = re.search(r'(?P<foo>def) \d+', s)
>>> print(mo.group())
def 456
>>> print(mo.group('foo'))
def
>>> # Capture 'def' in a group
>>> mo = re.search(r'(?P<foo>def) \d+ (?P=foo)', s)
>>> print(mo.group())
def 456 def
>>> mo.group('foo')
def
[/python]

Python Regular Expression Assertions

Sometimes we want to match something only if it is followed by something else, which means that Python needs to peek ahead as it is searching the string. This is called a look-ahead assertion and the syntax is (?=...), where ... is a regular expression for what needs to follow.

In the example below, the regular expression ham(?= and eggs) means match 'ham' but only if it is followed by ' and eggs'.

[python]
>>> s = 'John likes ham and eggs.'
>>> mo = re.search(r'ham(?= and eggs)', s)
>>> print(mo.group())
ham
[/python]

Note that the matched sub-string is only ham, and not ham and eggs. The and eggs part is simply a requirement for the ham part to be matched. Let's see what happens if this requirement is not met.

[python]
>>> s = 'John likes ham and mushrooms.'
>>> mo = re.search(r'ham(?= and eggs)', s)
>>> print(mo)
None
[/python]

[python]
>>> s = 'John likes ham, eggs and mushrooms.'
>>> mo = re.search(r'ham(?= and eggs)', s)
>>> print(mo)
None
[/python]

Unfortunately, Python only does simple character matching and will only match the string ham, as long as it is followed by and eggs. Artificial intelligence and semantic analysis is a whole 'nother article. 🙂

We can also do negative look-ahead assertions, that is, an element matches only if it is not followed by something else.

[python]
>>> s = 'My name is John Doe.'
>>> # Syntax is (?!...)
>>> mo = re.search( r'John(?! Doe)', s)
>>> print(mo)
None
[/python]

[python]
>>> s = 'My name is John Jones.'
>>> mo = re.search(r'John(?! Doe)', s)
>>> print(mo.group())
John
[/python]

Latest Articles


Tags

  • Parsing
  • tail
  • merge sort
  • Programming language
  • remove python
  • concatenate string
  • Code Editors
  • unittest
  • reset_index()
  • Train Test Split
  • Local Testing Server
  • Python Input
  • Studio
  • excel
  • sgd
  • deeplearning
  • pandas
  • class python
  • intersection
  • logic
  • pydub
  • git
  • Scrapping
  • priority queue
  • quick sort
  • web development
  • uninstall python
  • python string
  • code interface
  • PyUnit
  • round numbers
  • train_test_split()
  • Flask module
  • Software
  • FL
  • llm
  • data science
  • testing
  • pathlib
  • oop
  • gui
  • visualization
  • audio edit
  • requests
  • stack
  • min heap
  • Linked List
  • machine learning
  • scripts
  • compare string
  • time delay
  • PythonZip
  • pandas dataframes
  • arange() method
  • SQLAlchemy
  • Activator
  • Music
  • AI
  • ML
  • import
  • file
  • jinja
  • pysimplegui
  • notebook
  • decouple
  • queue
  • heapify
  • Singly Linked List
  • intro
  • python scripts
  • learning python
  • python bugs
  • ZipFunction
  • plus equals
  • np.linspace
  • SQLAlchemy advance
  • Download
  • No
  • nlp
  • machiine learning
  • dask
  • file management
  • jinja2
  • ui
  • tdqm
  • configuration
  • deque
  • heap
  • Data Structure
  • howto
  • dict
  • csv in python
  • logging in python
  • Python Counter
  • python subprocess
  • numpy module
  • Python code generators
  • KMS
  • Office
  • modules
  • web scraping
  • scalable
  • pipx
  • templates
  • python not
  • pytesseract
  • env
  • push
  • search
  • Node
  • python tutorial
  • dictionary
  • csv file python
  • python logging
  • Counter class
  • Python assert
  • linspace
  • numbers_list
  • Tool
  • Key
  • automation
  • website data
  • autoscale
  • packages
  • snusbase
  • boolean
  • ocr
  • pyside6
  • pop
  • binary search
  • Insert Node
  • Python tips
  • python dictionary
  • Python's Built-in CSV Library
  • logging APIs
  • Constructing Counters
  • Assertions
  • Matplotlib Plotting
  • any() Function
  • Activation
  • Patch
  • threading
  • scrapy
  • game analysis
  • dependencies
  • security
  • not operation
  • pdf
  • build gui
  • dequeue
  • linear search
  • Add Node
  • Python tools
  • function
  • python update
  • logging module
  • Concatenate Data Frames
  • python comments
  • matplotlib
  • Recursion Limit
  • License
  • Pirated
  • square root
  • website extract python
  • steamspy
  • processing
  • cybersecurity
  • variable
  • image processing
  • incrementing
  • Data structures
  • algorithm
  • Print Node
  • installation
  • python function
  • pandas installation
  • Zen of Python
  • concatenation
  • Echo Client
  • Pygame
  • NumPy Pad()
  • Unlock
  • Bypass
  • pytorch
  • zipp
  • steam
  • multiprocessing
  • type hinting
  • global
  • argh
  • c vs python
  • Python
  • stacks
  • Sort
  • algorithms
  • install python
  • Scopes
  • how to install pandas
  • Philosophy of Programming
  • concat() function
  • Socket State
  • % Operator
  • Python YAML
  • Crack
  • Reddit
  • lightning
  • zip files
  • python reduce
  • library
  • dynamic
  • local
  • command line
  • define function
  • Pickle
  • enqueue
  • ascending
  • remove a node
  • Django
  • function scope
  • Tuple in Python
  • pandas groupby
  • pyenv
  • socket programming
  • Python Modulo
  • Dictionary Update()
  • Hack
  • sdk
  • python automation
  • main
  • reduce
  • typing
  • ord
  • print
  • network
  • matplotlib inline
  • Pickling
  • datastructure
  • bubble sort
  • find a node
  • Flask
  • calling function
  • tuple
  • GroupBy method
  • Pythonbrew
  • Np.Arange()
  • Modulo Operator
  • Python Or Operator
  • Keygen
  • cloud
  • pyautogui
  • python main
  • reduce function
  • type hints
  • python ord
  • format
  • python socket
  • jupyter
  • Unpickling
  • array
  • sorting
  • reversal
  • Python salaries
  • list sort
  • Pip
  • .groupby()
  • pyenv global
  • NumPy arrays
  • Modulo
  • OpenCV
  • Torrent
  • data
  • int function
  • file conversion
  • calculus
  • python typing
  • encryption
  • strings
  • big o calculator
  • gamin
  • HTML
  • list
  • insertion sort
  • in place reversal
  • learn python
  • String
  • python packages
  • FastAPI
  • argparse
  • zeros() function
  • AWS Lambda
  • Scikit Learn
  • Free
  • classes
  • turtle
  • convert file
  • abs()
  • python do while
  • set operations
  • data visualization
  • efficient coding
  • data analysis
  • HTML Parser
  • circular queue
  • effiiciency
  • Learning
  • windows
  • reverse
  • Python IDE
  • python maps
  • dataframes
  • Num Py Zeros
  • Python Lists
  • Fprintf
  • Version
  • immutable
  • python turtle
  • pandoc
  • semantic kernel
  • do while
  • set
  • tabulate
  • optimize code
  • object oriented
  • HTML Extraction
  • head
  • selection sort
  • Programming
  • install python on windows
  • reverse string
  • python Code Editors
  • Pytest
  • pandas.reset_index
  • NumPy
  • Infinite Numbers in Python
  • Python Readlines()
  • Trial
  • youtube
  • interactive
  • deep
  • kernel
  • while loop
  • union
  • tutorials
  • audio
  • github
  • Python is a beautiful language.