Xpath Cheat Sheet: Ahmed Rafik - Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy) 2 Edition

The document is an XPath cheat sheet that provides examples of using XPath to locate elements in HTML. It covers basics like selecting elements by tag name, class, ID, and attributes. It also discusses using contains, starts-with, and other functions to match text. The cheat sheet explains how to select elements by position and use logical operators. Finally, it covers XPath axes for navigating up and down the HTML structure to select parent, ancestor, preceding, and child elements.

Uploaded by

mueramon

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

441 views

Xpath Cheat Sheet: Ahmed Rafik - Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy) 2 Edition

Uploaded by

mueramon

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

XPath

Cheat Sheet

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
About the author
Hi! I'm Ahmed nice to meet you, my students prefer to call me web
scraping Ninja and by the time preparing this Cheat Sheet I have
taught more than 2000 students around the world how to do web
scraping. I personally do web scraping on daily basis whether for fun,
for personal projects or as a freelancer and guess what ? I even have a
master degree in computer science.

You can find me at :

Twitter : https://twitter.com/ahmedrafik__

Youtube : https://www.youtube.com/channel/UCKHo1-
b7ZH_CKvc7rtqWEvA

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
About this Cheat Sheet
* This Cheat Sheet covers all the examples I included in my Web
Scraping course with Python using Requests, LXML and Splash.
* This Cheat Sheet covers only the basics of how to use XPath to
locate elements from the HTML markup, if you want to master Web
Scraping then I recommend my to check my course on Udemy, you
can follow this link in case you’re interested in learning web scraping
: (LINK)
* All the XPath expressions I’m gonna cover on this Cheat Sheet will
be applied on the HTML markup I’ve added after this page.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
HTML web page
<!DOCTYPE html>
<html lang="en">
<head>
<title>XPath and CSS Selectors</title>
</head>
<body>
<h1>XPath expressions simplified</h1>
<div class="intro">
<p>
I'm paragraph within a div with a class set to
intro
<span id="location">I'm a span with ID set to
location and i'm within a paragraph</span>
</p>
<p id="outside">I'm a paragraph with ID set to
outside and i'm within a div with a class set to intro</p>
</div>
<p>Hi i'm placed immediately after a div with a class
set to intro
</p>
<span class='intro'>Div with a class attribute set to
intro
</span>
<ul id="items">
<li data-identifier="7">Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
</ul>

<a href="https://www.google.com">Google</a>
<a href="http://www.google.fr">Google France</a>
<p class='bold italic'>Hi, I have two classes</p>
<p class='bold'>Hi i'm bold</p>
</body>
</html>
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
BASICS
An element is a tag in the HTML markup.
Example:
The ‘p’ tag aka paragragh is called an element.
To select any element from HTML web pages we simply use the
following syntax
Example:
To select all p elements we can use the following XPath selector

//p

Although this approach works perfectly fine, it’s not recommended to

use it, because if for example we want only to select the “p”
elements that are inside the first div with a class attribute equals to
“intro” this approach won’t be the best solution, this is why we
always prefer to target elements either by their class attribute, id or
by position so we can limit the scope of the XPath expression.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
CLASS & ID
So to select any element by its class attribute value we use the
following syntax:
//elementName[@attributeName=’value’]
Example:
Let’s say we want to select the “p” elements that inside the “div”
with a class attribute equals to “intro” in this case we use the
following XPath expression:
//div[@class=’intro’]/p
If we want to select the “p” element with “id” equals to “outside” we
can use the following XPath expression:
//p[@id=’outside’]/p
REMEBER:

Please note, the same exact class attribute value can be

assigned to more than one element however, and id can be
assigned to only and only one element.

Sometimes we want also to select elements based on a foreign

attribute which doesn’t belong to HTML markup standard. For
example to select the “li” element with the attribute “data-
identifier” equals to 7 in this case we use the following XPath
expression:

//li[@data-identifier=”7”]

Sometimes the element we want to select does have two classes, for
example, to select the “p” element with a class attribute equals to
“bold” and “italic” in this case we use the following XPath expression:
//p[@class=’bold italic’]
Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
OR:
Although the element does have two classes we can for example
search for a substring within the class attribute value by using the
contains function.
//p[contains(@class, ‘italic’)]
REMEBER:

The contains function takes two arguments:

 The first one is where to search, whether on the class
attribute value, id or anything else.
 The second argument is the value you’re looking for.
 The value you search for is also case sensitive, so be
careful!

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
Value lookup
Let’s say you want to select all the “a” elements in which the “href”
attribute value starts with “https” and not “http”, in this case we can
use the following XPath expression:
//a[starts-with(@class, ‘https’)]
So search for the text at the beginning we use the caret sign “starts-
with” function which takes the same arguments as the contains
function.
Now if you want to search for a value at the end we use the “ends-
with” function, however, this function is not supported on XPath
version 1.0 which is the version used by the majority of the browsers
and LXML.
Finally if we want to search for a particular value in between we use
the contains function as explained before.
If you want to get the text of a particular element you can use the
text function, for example, to get the text element of the “p”
element with id equals to “outside” we use the following XPath
expression:
//p[@id=”outside”]/text()

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
The position
Okay, let’s say you want to get the second “li” element from the “ul”
element with “id” equals to “outside”, in this case you can use the
following XPath expression:
//ul[@id=”items”]/li[2]
However, if you want to select the second list item but you also want
to make that its text element is “Item 2”, in this case you can use the
following XPath expression:
//ul[@id=”items”]/li[position() = 2 and text() = “Item 2”]
Notice in this case I did use the position() function, the text()
function plus the “and” logical operator.
In contrast to the “and” logical operator we also have the “or” logical
operator.
REMEBER:

In XPath everything we write within [] is known as a

predicate.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
XPath axes
In XPath an axis is used to search for an element based on its
relatioship with another element, we have some axes which we can
use to navigate up and down in the HTML markup.
All axes in XPath use the follwing syntax:
ElementName::axis

XPath axes (GOING UP)

 The parent
o The parent axis is used to get the parent of a specific
element, for example the get the parent of the “p”
element with id equals to “outside” we use the following
XPath expression:
//p[@id="outside"]/parent::node()

o The node() function in XPath is used to get the “element”

no matter what its type is.

 The ancestor
o The ancestor axis can be used to all the ancestors of a
specific element, for example to get the ancestors(parent,
grand parent, ...) of the “p” element with id equals to
“outside“ we use the following XPath expression:
//p[@id="outside"]/ancestor::node()

 The preceding
o In XPath the preceding axis will get all the elements that
do precede an element excluding its ancestors.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition
 Preceding sibling
o In XPath the preceding-sibling axis will get the sibling that
do precede an element, in other words it will return the
brother that is on the top of a specific element.

XPath axes (GOING DOWN)

To go down on the HTML markup we also have 4 axis which are:
o The child axis which will get the children of a specific
element
o The following axis which will return all the elements that
are after the closing tag of a specific element.
o The following-sibling axis which will return all the
elements that are after the closing tag of an element but
these elements should share the same parent.
o The descendant axis which will get the descendants of a
particular element.

Ahmed Rafik – Modern Web Scraping With Python Using Scrapy, Splash & Selenium (Udemy)
2nd edition

Dotbiz: The Case of M57.Biz
No ratings yet
Dotbiz: The Case of M57.Biz
9 pages
Cyber Forensics - Lab Manual
100% (2)
Cyber Forensics - Lab Manual
36 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (2)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Build Mobile Apps With Ionic and Firebase Sample
No ratings yet
Build Mobile Apps With Ionic and Firebase Sample
43 pages
PERL - Complete
No ratings yet
PERL - Complete
302 pages
ASP Net MVC Poster
100% (1)
ASP Net MVC Poster
1 page
Virtual Network Tutorial
100% (1)
Virtual Network Tutorial
17 pages
WIT12 01 Que 2019
No ratings yet
WIT12 01 Que 2019
16 pages
Learn Angular: Build a Todo App
From Everand
Learn Angular: Build a Todo App
Jurgen van de Moere
No ratings yet
Xpath Cheatsheet
No ratings yet
Xpath Cheatsheet
10 pages
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
The-Web-Application-Hackers JRN English 1113 1
No ratings yet
The-Web-Application-Hackers JRN English 1113 1
1 page
Manual Testing: Software Development Life Cycle (SDLC)
No ratings yet
Manual Testing: Software Development Life Cycle (SDLC)
21 pages
HTML5 Game Development by Example: Beginner's Guide - Second Edition
From Everand
HTML5 Game Development by Example: Beginner's Guide - Second Edition
Makzan
1/5 (1)
Javascript Mobile Application Development: Chapter No. 1 "An Introduction To Apache Cordova"
No ratings yet
Javascript Mobile Application Development: Chapter No. 1 "An Introduction To Apache Cordova"
29 pages
Security
No ratings yet
Security
28 pages
Web Browser Security
0% (1)
Web Browser Security
16 pages
Web Services Part 2 - SOAP
No ratings yet
Web Services Part 2 - SOAP
13 pages
Becoming The Hacker (2019) .Cleaned
No ratings yet
Becoming The Hacker (2019) .Cleaned
405 pages
XSLT Mapping in Sap Pi 7.1
100% (2)
XSLT Mapping in Sap Pi 7.1
21 pages
Splunk Essentials - Second Edition
From Everand
Splunk Essentials - Second Edition
Betsy Page Sigman
No ratings yet
Build Your Own Mobile App Using Ionic and Drupal 8
No ratings yet
Build Your Own Mobile App Using Ionic and Drupal 8
9 pages
ASP.NET 3.5 Application Architecture and Design
From Everand
ASP.NET 3.5 Application Architecture and Design
Vivek Thakur
No ratings yet
Beginning DotNetNuke Skinning and Design
From Everand
Beginning DotNetNuke Skinning and Design
Andrew Hay
No ratings yet
Security Testing by OWASP Top 10
No ratings yet
Security Testing by OWASP Top 10
30 pages
HTML5 Tags
100% (3)
HTML5 Tags
278 pages
Java Programming Series
From Everand
Java Programming Series
Charlie Masterson
No ratings yet
Coding For Beginners
No ratings yet
Coding For Beginners
17 pages
How To Create A Secure Login Script in PHP and MySQL - WikiHow
100% (1)
How To Create A Secure Login Script in PHP and MySQL - WikiHow
21 pages
Soap Ui Guide
100% (1)
Soap Ui Guide
28 pages
PHP Security Crash Course - 3 - CSRF
100% (3)
PHP Security Crash Course - 3 - CSRF
26 pages
Ajax PDF
0% (1)
Ajax PDF
79 pages
Security Testing - Quick Guide
No ratings yet
Security Testing - Quick Guide
53 pages
Wrox Press Professional XML 2nd (2001) PDF
No ratings yet
Wrox Press Professional XML 2nd (2001) PDF
1,286 pages
PHP and MongoDB Web Development Beginner’s Guide
From Everand
PHP and MongoDB Web Development Beginner’s Guide
Rubayeet Islam
No ratings yet
Learning Web Component Development - Sample Chapter
No ratings yet
Learning Web Component Development - Sample Chapter
60 pages
Bugcrowd 2020UltimateGuideToPenTesting Report 2020
No ratings yet
Bugcrowd 2020UltimateGuideToPenTesting Report 2020
19 pages
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
NW.js Essentials
From Everand
NW.js Essentials
Alessandro Benoit
No ratings yet
Your First Week With Node.js
From Everand
Your First Week With Node.js
James Hibbard
No ratings yet
Sso - Oam - Idm
No ratings yet
Sso - Oam - Idm
20 pages
jQuery 2.0 Development Cookbook
From Everand
jQuery 2.0 Development Cookbook
Leon Revill
No ratings yet
Automation API TM14 PDF
No ratings yet
Automation API TM14 PDF
72 pages
Linux Cheat Sheet
100% (1)
Linux Cheat Sheet
1 page
Introduction To Soap UI
No ratings yet
Introduction To Soap UI
14 pages
Java Quick Syntax Reference
From Everand
Java Quick Syntax Reference
Mikael Olsson
No ratings yet
Questions
40% (5)
Questions
34 pages
La Ravel Companion
No ratings yet
La Ravel Companion
278 pages
Javascript Assessment Test
From Everand
Javascript Assessment Test
Edward Yao
No ratings yet
ASP.NET 3.5 CMS Development
From Everand
ASP.NET 3.5 CMS Development
Curt Christianson
No ratings yet
The Docker Book
No ratings yet
The Docker Book
80 pages
Learning Web Development With Bootstrap and AngularJS - Sample Chapter
No ratings yet
Learning Web Development With Bootstrap and AngularJS - Sample Chapter
17 pages
Fundamental XML For Developers: Dr. Timothy M. Chester Texas A&M University
No ratings yet
Fundamental XML For Developers: Dr. Timothy M. Chester Texas A&M University
82 pages
API Documentation
No ratings yet
API Documentation
115 pages
Guide To UNIX Using Linux Fourth Edition Chapter 01
No ratings yet
Guide To UNIX Using Linux Fourth Edition Chapter 01
4 pages
Web Application Security Testing
No ratings yet
Web Application Security Testing
14 pages
Advanced Java and Web Technologies
No ratings yet
Advanced Java and Web Technologies
236 pages
Django 1.0 Template Development
From Everand
Django 1.0 Template Development
Scott Newman
No ratings yet
A Study On Removal Techniques of Cross-Site From Web Application
No ratings yet
A Study On Removal Techniques of Cross-Site From Web Application
7 pages
Python Web Penetration Testing Cookbook - Sample Chapter
No ratings yet
Python Web Penetration Testing Cookbook - Sample Chapter
36 pages
Notes by RN Reddy
No ratings yet
Notes by RN Reddy
454 pages
Security
No ratings yet
Security
35 pages
SQL Injection 3
100% (1)
SQL Injection 3
19 pages
Tower Wars: International Coding Contest 22nd March 2019
No ratings yet
Tower Wars: International Coding Contest 22nd March 2019
8 pages
Vroom
No ratings yet
Vroom
6 pages
Academic Calendar (Any Year) 1
No ratings yet
Academic Calendar (Any Year) 1
12 pages
MB Memory z390 Aorus Ultra
No ratings yet
MB Memory z390 Aorus Ultra
10 pages
Project On Canon Ir5000
No ratings yet
Project On Canon Ir5000
7 pages
Admin Quick Guide: Setting Up The Front (Home) Page
No ratings yet
Admin Quick Guide: Setting Up The Front (Home) Page
8 pages
Benchmarking Access and Use of Ict in European Schools-KK0114565ENN1
No ratings yet
Benchmarking Access and Use of Ict in European Schools-KK0114565ENN1
455 pages
Cyclades ACS 6000 Command Reference
0% (1)
Cyclades ACS 6000 Command Reference
72 pages
New Media and Business Communication
No ratings yet
New Media and Business Communication
26 pages
Module 7 HTML Images
No ratings yet
Module 7 HTML Images
4 pages
Ucl Dissertation Sample
100% (2)
Ucl Dissertation Sample
5 pages
Web2 - Exam
No ratings yet
Web2 - Exam
3 pages
XG Firewall Features
No ratings yet
XG Firewall Features
7 pages
MCDE Study Guide EN
No ratings yet
MCDE Study Guide EN
5 pages
IP Book 12 Question Bank
No ratings yet
IP Book 12 Question Bank
20 pages
Open Source Software Lab
No ratings yet
Open Source Software Lab
33 pages
F5 BIG-IP LTM Essentials WBT Lab Guide v11
40% (5)
F5 BIG-IP LTM Essentials WBT Lab Guide v11
105 pages
Empowerment Technology Quiz
No ratings yet
Empowerment Technology Quiz
4 pages
DM - Unit-I R16
No ratings yet
DM - Unit-I R16
39 pages
JasperReports Server REST API Reference
100% (1)
JasperReports Server REST API Reference
174 pages
Different Types of Website
No ratings yet
Different Types of Website
23 pages
Introduction To Application Layer
No ratings yet
Introduction To Application Layer
51 pages
Api Testing Interview Questions Answer Set
100% (1)
Api Testing Interview Questions Answer Set
26 pages
SEO Social Media Calendar
No ratings yet
SEO Social Media Calendar
3 pages
GPSC Exam Material PDF
No ratings yet
GPSC Exam Material PDF
3 pages
Kenneth R. Deans ISSN 0263-4503 - Issues in Internet Marketing (Marketing Intelligence & Planning - v.21, No. 2)
No ratings yet
Kenneth R. Deans ISSN 0263-4503 - Issues in Internet Marketing (Marketing Intelligence & Planning - v.21, No. 2)
56 pages
In Sync Admin Guide
No ratings yet
In Sync Admin Guide
113 pages
WHMCS - Printable Ticket Version
No ratings yet
WHMCS - Printable Ticket Version
5 pages
SPH Newsletter 6pp D9
No ratings yet
SPH Newsletter 6pp D9
6 pages
Quiz 001 - Attempt Review PDF
No ratings yet
Quiz 001 - Attempt Review PDF
3 pages
Adidas
No ratings yet
Adidas
32 pages