0% found this document useful (0 votes)

5 views

Web Scraping Using Python [Step by Step Tutorial] – Pythonista Planet

This tutorial provides a step-by-step guide on web scraping using Python's Beautiful Soup library, explaining the process of extracting data from websites. It covers the legality of web scraping, the advantages of using Python, and the necessary tools and packages to get started. The tutorial also includes practical examples of how to scrape website data, including finding links and extracting specific HTML elements.

Uploaded by

Karthikeya Bathula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Web Scraping Using Python [Step by Step Tutorial] – Pythonista Planet

Uploaded by

Karthikeya Bathula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Web Scraping Using P ython [Step by Step

Tutorial]
Written by Ashwin Joy ● in Python

In this tutorial, we are going to do web scraping using Python’s Beautiful Soup library step-by-
step. Python 3 is ridiculously fast in web scraping it provides a beautiful framework for that called
beautiful soup (beauty is in the name itself).

Table of Contents 
What is Web Scraping?
Is Web Scraping Legally Allowed?
Why Python?
Web Scraping using Python’s Beautiful Soup
Finding all the Links from a Website

What is Web Scraping?

When you want to extract some important data from a website, we use web scraping.

According to Wikipedia’s deﬁnition, web scraping, web harvesting, or web data abstraction is
data scraping used for extracting data from websites.

Usually, the ideal way of picking up data from websites is through APIs which is recommended.
But sometimes, when the APIs are not available, we go for web scraping.

Is Web Scraping Legally Allowed?

Web scraping is a little grey area. Web scraping is not legally allowed in most of the websites. You
have to check from the website owner or the policies of the website.

So, make sure you are completely aware of what you are doing, and do web scraping only on
legally allowed websites.

You could scrap your own website for sure. But you can’t scrap or crawl someone else’s website,
without obtaining their permission.

Why Python?
Python 3 is the best programming language to do web scraping. Python is so fast and easy to do
web scraping. Also, most of the tools of web scraping that are present in the Kali-Linux are being
designed in Python.

Enough of the theories, let’s start scraping the web using the beautiful soup library.

We b S c r a p i n g u s i n g P y t h o n’s B e a u t i f u l S o u p
The ﬁrst thing you want to do when you are going to do web scraping is to go to the website that
you want to scrap and analyze it. Web scraping is all about how you understand the website, it’s
data structures, how things are looking, etc.

The next thing you need to do is to get all the necessary tools and packages. I’m using Python
IDLE to do the scraping. So you should have that ready in your system.

You can also write code in your shell as well if needed. After that, we need to install the necessary
packages. We need packages like ‘bs4’ which is the beautiful soup, ‘requests’ and ‘lxml’ to
proceed.

So go to your command line (CMD) and install them one by one, if you don’t have them already. If
you are on a MAC/Linux, use pip3 instead of pip in the following commands.

pip install bs4

pip install lxml

Generally, ‘requests’ already come up with Python. If you don’t have that in your system, install
that too.

pip install requests

Now, all your packages are ready. Go to your Python IDLE or Python Shell and let’s write some
code.

First of all, we need to import all three packages. So, let’s do that.

import requests
import bs4
import lxml
Next, you have to make a request to the website that you want to scrap. Let’s create a variable
‘res’ to make a request.

res = res.requests.get('https://mywebsite.com')

You can type in your URL instead of mywebsite.com which I randomly typed for an example.

This ‘res’ variable is now storing the entire web page data. If you just type in ‘res.text’ and hit
enter, you can see all the details that this variable is storing.

We need to extract information from this variable. Here comes the use of the beautiful soup
library.

We are going to create an object called ‘soup’. For that, we use bs4 and its method called
‘BeautifulSoup’.
This method takes in two parameters, the ﬁrst is ‘res.text’ and the second one is how you want to
structure your data. In this case, we are using lxml.

soup = bs4.BeautifulSoup(res.text,'lxml')

For example, let’s say we want to extract the information about the title tag of that website. So,
let’s create a new variable.

title = soup.select('title')

You can pass any HTML tag you want instead of ‘title’. Now, let’s check what is inside this ‘title’
variable.

print(title)

Then, you will see the title of the website as the output. You have just scraped the title of that
website using Python.

You can also scrape data based on certain CSS class or id using ‘.classname’ or ‘#idname’
respectively. Let’s see an example.

title = soup.select('.classname')
#or
title = soup.select('#idname')

Enter the name of the class or id you want to scrape in place of ‘classname’ and ‘idname’.
Finding all the Links from a Website
If you want to ﬁnd all the links that are there on a website, we can do that too. For that, we are
using a ‘for’ loop and a method called ‘ﬁnd_all’.

for link in soup.find_all('a',href=True):

print(link['href'])

Then, you can see all the links listed on your IDLE or shell as output.

That’s it about the basics of web scraping using Python. If you have any doubts or queries, feel
free to let me know in the comments section down below.

If you enjoyed this article, share it with your friends.

Happy learning!

Tweet Share Save SHARE Print

Ashwin Joy
I'm the face behind Pythonista Planet. I learned my ﬁrst programming language back in 2015. Ever
since then, I've been learning programming and immersing myself in technology. On this site, I
share everything that I've learned about computer programming.

3 t h o u g h t s o n ““We
We b S c r a p i n g U s i n g P y t h o n [ S t e p b y
S t e p Tu to r i a l ]]””
Pachu says:
December 13, 2019 at 8:32 PM
Request.get not working in kali linux

Ashwin Joy says:

December 13, 2019 at 9:55 PM

You might not be having the requests library in your system. Download it using pip and try
again. Hopefully, it’ll work.

zaid kamil says:

January 13, 2020 at 11:13 PM

its requests.get(…)

Leave a Reply
Your email address will not be published. Required ﬁelds are marked *

Comment

Name *

Email *

Save my name and email in this browser for the next time I comment.

POST COMMENT
Recent Content

How To Learn Python - A Concise Guide

Most of us have used or have come across the necessity of using the Python programming
language. Python is one of the most popular programming languages around the world. Due to
many factors,...

CONTINUE READING
15 Best Courses For Machine Learning
Welcome to the future..! In this article, we will be dealing with how to learn Machine Learning. We
know that humans can learn a lot from their past experiences and that machines follow...

ABOUT ME

Hi, I’m Ashwin Joy. I’m a Computer Science and Engineering graduate who is passionate about
programming and technology. Pythonista Planet is the place where I nerd out about computer
programming. On this blog, I share all the things I learn about programming as I go.

ABOUT ME

L E G A L I N F O R M AT I O N

This site is owned and operated by Ashwin Joy. PythonistaPlanet.com is a participant in the
Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a
means for sites to earn advertising fees by advertising and linking to Amazon.com. This site also
participates in affiliate programs of Udemy, Treehouse, Coursera, and Udacity, and is
compensated for referring traffic and business to these companies.

report this ad

Google Hacking With Python 2024
No ratings yet
Google Hacking With Python 2024
243 pages
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
Python Web Scraping - Second Edition
From Everand
Python Web Scraping - Second Edition
Katharine Jarmul
5/5 (1)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (2)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow
From Everand
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow
Finn Sanders
No ratings yet
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
No ratings yet
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
6 pages
Python: Programming for Advanced: Learn the Fundamentals of Python in 7 Days
From Everand
Python: Programming for Advanced: Learn the Fundamentals of Python in 7 Days
Micheal Knapp
2.5/5 (2)
Web Crawling - python
No ratings yet
Web Crawling - python
34 pages
Introduction to Web Crawling chapter -13
No ratings yet
Introduction to Web Crawling chapter -13
3 pages
Web Scraping With Python Tutorials From A To Z
100% (1)
Web Scraping With Python Tutorials From A To Z
35 pages
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
No ratings yet
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
3 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
Web-Scraping-With-Python
No ratings yet
Web-Scraping-With-Python
16 pages
20_BeautifulSoup Library for Web Scraping
No ratings yet
20_BeautifulSoup Library for Web Scraping
12 pages
Scraping Book
No ratings yet
Scraping Book
50 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
No ratings yet
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Web Scraping Presentation With Images
No ratings yet
Web Scraping Presentation With Images
4 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
0% (1)
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
No ratings yet
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
web scraping using python
No ratings yet
web scraping using python
18 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Python 3 Programming for Beginners: The Beginner's Guide for Learning How to Code in Python (version 3.X) From Scratch in Under 7 Days: Computer Programming, #1
From Everand
Python 3 Programming for Beginners: The Beginner's Guide for Learning How to Code in Python (version 3.X) From Scratch in Under 7 Days: Computer Programming, #1
Ramon Nastase
5/5 (1)
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
Scraping
100% (1)
Scraping
25 pages
PDF Document 2
No ratings yet
PDF Document 2
24 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
06 WebScrapingData
No ratings yet
06 WebScrapingData
39 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
HTML For Novices By Novices
From Everand
HTML For Novices By Novices
Mike Abelar
No ratings yet
Advanced Web Scraping - Bypassing - 403 Forbidden, - Captchas, and More - Sangaline
No ratings yet
Advanced Web Scraping - Bypassing - 403 Forbidden, - Captchas, and More - Sangaline
12 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
Christos Chen
No ratings yet
Christos Chen
42 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
Python: Beginner's Guide to Programming Code with Python
From Everand
Python: Beginner's Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Beginner's Guide to Programming Code with Python: Python Computer Programming, #1
From Everand
Python: Beginner's Guide to Programming Code with Python: Python Computer Programming, #1
Charlie Masterson
No ratings yet
bs4 Examples
No ratings yet
bs4 Examples
2 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
web_scrapping_final[1]
No ratings yet
web_scrapping_final[1]
7 pages
I) Web Crawling: Yash Pahlani D17B 49
No ratings yet
I) Web Crawling: Yash Pahlani D17B 49
7 pages
Anis D. Ultimate Step by Step Guide To Data Science..Python.2021
No ratings yet
Anis D. Ultimate Step by Step Guide To Data Science..Python.2021
161 pages
Web Scraping with Python
From Everand
Web Scraping with Python
Richard Lawson
4.5/5 (4)
Chapter 11. Web Scraping
100% (1)
Chapter 11. Web Scraping
57 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
3252_ids_10
No ratings yet
3252_ids_10
5 pages
Template
No ratings yet
Template
21 pages
Beautiful Soup Tutorial
100% (2)
Beautiful Soup Tutorial
56 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Computer Programming JavaScript, Python, HTML, SQL, CSS
From Everand
Computer Programming JavaScript, Python, HTML, SQL, CSS
William Alvin Newton
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
Learn AI with Python: Explore Machine Learning and Deep Learning techniques for Building Smart AI Systems Using Scikit-Learn, NLTK, NeuroLab, and Keras (English Edition)
From Everand
Learn AI with Python: Explore Machine Learning and Deep Learning techniques for Building Smart AI Systems Using Scikit-Learn, NLTK, NeuroLab, and Keras (English Edition)
Gaurav Leekha
5/5 (1)
Web Scraping
No ratings yet
Web Scraping
4 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Getting Started with PhantomJS
From Everand
Getting Started with PhantomJS
Aries Beltran
No ratings yet
Final Publish Paper
No ratings yet
Final Publish Paper
4 pages
b
No ratings yet
b
77 pages
Web Scraping With Python - Sample Chapter
100% (3)
Web Scraping With Python - Sample Chapter
26 pages
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
From Everand
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
Flynn Fisher
4/5 (2)
Raspberry Pi By Example
From Everand
Raspberry Pi By Example
Arush Kakkar
No ratings yet
Create a Website with Wordpress: 6 Easy Steps to Build a Professional Website from Scratch
From Everand
Create a Website with Wordpress: 6 Easy Steps to Build a Professional Website from Scratch
No Limits Books
No ratings yet
data science lab exp lis
No ratings yet
data science lab exp lis
72 pages
Simple Web Scraping Example Using BeautifulSoup in
No ratings yet
Simple Web Scraping Example Using BeautifulSoup in
4 pages
Web Scraping Tools
No ratings yet
Web Scraping Tools
5 pages
British_Airways_Forage_Report
No ratings yet
British_Airways_Forage_Report
12 pages
Web_Scrapping.ipynb - Colab
No ratings yet
Web_Scrapping.ipynb - Colab
7 pages
05 - CM2015 - Retrieving Data From The Web (2022-10)
No ratings yet
05 - CM2015 - Retrieving Data From The Web (2022-10)
9 pages
M5
No ratings yet
M5
25 pages
py5
No ratings yet
py5
9 pages
cmsc320 f2018 Lec02
No ratings yet
cmsc320 f2018 Lec02
45 pages
Python Lab 1
No ratings yet
Python Lab 1
16 pages
ibm-python-module-5-apis-data-collection
No ratings yet
ibm-python-module-5-apis-data-collection
3 pages
Python Automation Part 1
No ratings yet
Python Automation Part 1
138 pages
Web Scraping
No ratings yet
Web Scraping
7 pages
Web Scraping Football Matches From The World Cups 1930 To 2022 With Python - by Frank Andrade - Geek Culture - Nov, 2022 - Medium
No ratings yet
Web Scraping Football Matches From The World Cups 1930 To 2022 With Python - by Frank Andrade - Geek Culture - Nov, 2022 - Medium
9 pages
6-10 Python Lab Program
No ratings yet
6-10 Python Lab Program
16 pages
vertopal.com_ir_op2
No ratings yet
vertopal.com_ir_op2
26 pages
Web Scraping Using Beautiful Soup
No ratings yet
Web Scraping Using Beautiful Soup
11 pages
Mastering Python : Basic to Advanced
No ratings yet
Mastering Python : Basic to Advanced
46 pages
Python Programming (21EC643) (Module-5) by Prof. Sujay Gejji
No ratings yet
Python Programming (21EC643) (Module-5) by Prof. Sujay Gejji
34 pages
BeautifulSoup for Python RPA
No ratings yet
BeautifulSoup for Python RPA
6 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Advance Trading Bot
No ratings yet
Advance Trading Bot
7 pages
Pseudocodes and Flowcharts(Riyansha Shahare)
No ratings yet
Pseudocodes and Flowcharts(Riyansha Shahare)
14 pages
Python network programming
No ratings yet
Python network programming
14 pages
2 NLP Pipeline
No ratings yet
2 NLP Pipeline
57 pages
Handson Entity Resolution A Practical Guide To Data Matching With Python Third Early Release Michael Shearer instant download
No ratings yet
Handson Entity Resolution A Practical Guide To Data Matching With Python Third Early Release Michael Shearer instant download
82 pages
PRINCIPLES OF DATA SCIENCE by - JOHN P DICKERSON
No ratings yet
PRINCIPLES OF DATA SCIENCE by - JOHN P DICKERSON
91 pages

Web Scraping Using Python [Step by Step Tutorial] – Pythonista Planet

Uploaded by

Web Scraping Using Python [Step by Step Tutorial] – Pythonista Planet

Uploaded by

Web Scraping Using P ython [Step by Step

What is Web Scraping?

Is Web Scraping Legally Allowed?

pip install bs4

pip install lxml

pip install requests

for link in soup.find_all('a',href=True):

If you enjoyed this article, share it with your friends.

Tweet Share Save SHARE Print

Ashwin Joy says:

zaid kamil says:

How To Learn Python - A Concise Guide

© 2020 Copyright Pythonista Planet

You might also like