Extract JSON from HTML using BeautifulSoup in Python Last Updated : 16 Dec, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we are going to extract JSON from HTML using BeautifulSoup in Python. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.pip install bs4requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.pip install requests Approach: Import all the required modules.Pass the URL in the get function(UDF) so that it will pass a GET request to a URL, and it will return a response. Syntax: requests.get(url, args) Now Parse the HTML content using bs4. Syntax: BeautifulSoup(page.text, 'html.parser') Parameters: page.text : It is the raw HTML content.html.parser : Specifying the HTML parser we want to use.Now get all the required data with find() function. Now find the customer list with li, a, p tag where some unique class or id. You can open the webpage in the browser and inspect the relevant element by pressing right-click as shown in the figure. Create a Json file and use json.dump() method to convert python objects into appropriate JSON objects. Below is the full implementation: Python3 # Import the required modules import requests from bs4 import BeautifulSoup import json # Function will return a list of dictionaries # each containing information of books. def json_from_html_using_bs4(base_url): # requests.get(url) returns a response that is saved # in a response object called page. page = requests.get(base_url) # page.text gives us access to the web data in text # format, we pass it as an argument to BeautifulSoup # along with the html.parser which will create a # parsed tree in soup. soup = BeautifulSoup(page.text, "html.parser") # soup.find_all finds the div's, all having the same # class "col-xs-6 col-sm-4 col-md-3 col-lg-3" that is # stored in books books = soup.find_all( 'li', attrs={'class': 'col-xs-6 col-sm-4 col-md-3 col-lg-3'}) # Initialise the required variables star = ['One', 'Two', 'Three', 'Four', 'Five'] res, book_no = [], 1 # Iterate books classand check for the given tags # to get the information of each books. for book in books: # Title of book in <img> tag with "alt" key. title = book.find('img')['alt'] # Link of book in <a> tag with "href" key link = base_url[:37] + book.find('a')['href'] # Rating of book from <p> tag for index in range(5): find_stars = book.find( 'p', attrs={'class': 'star-rating ' + star[index]}) # Check which star-rating class is not # returning None and then break the loop if find_stars is not None: stars = star[index] + " out of 5" break # Price of book from <p> tag in price_color class price = book.find('p', attrs={'class': 'price_color' }).text # Stock Status of book from <p> tag in # instock availability class. instock = book.find('p', attrs={'class': 'instock availability'}).text.strip() # Create a dictionary with the above book information data = {'book no': str(book_no), 'title': title, 'rating': stars, 'price': price, 'link': link, 'stock': instock} # Append the dictionary to the list res.append(data) book_no += 1 return res # Main Function if __name__ == "__main__": # Enter the url of website base_url = "https://books.toscrape.com/catalogue/page-1.html" # Function will return a list of dictionaries res = json_from_html_using_bs4(base_url) # Convert the python objects into json object and export # it to books.json file. with open('books.json', 'w', encoding='latin-1') as f: json.dump(res, f, indent=8, ensure_ascii=False) print("Created Json File") Output: Created Json File Our JSON file output: Comment More infoAdvertise with us Next Article Extract JSON from HTML using BeautifulSoup in Python anilabhadatta Follow Improve Article Tags : Python Python BeautifulSoup Python bs4-Exercises Practice Tags : python Similar Reads BeautifulSoup - Scraping Link from HTML Prerequisite: Implementing Web Scraping in Python with BeautifulSoup In this article, we will understand how we can extract all the links from a URL or an HTML document using Python. Libraries Required:bs4 (BeautifulSoup): It is a library in python which makes it easy to scrape information from web 2 min read Get tag name using Beautifulsoup in Python Prerequisite: Beautifulsoup Installation Name property is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. Name object corresponds to the name of an XML or HTML t 1 min read BeautifulSoup - Scraping List from HTML Prerequisite:Â RequestsBeautifulSoup Python can be employed to scrap information from a web page. It can also be used to retrieve data provided within a specific tag, this article how list elements can be scraped from HTML. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data 2 min read How to modify HTML using BeautifulSoup ? BeautifulSoup in Python helps in scraping the information from web pages made of HTML or XML. Not only it involves scraping data but also involves searching, modifying, and iterating the parse tree. In this article, we will discuss modifying the content directly on the HTML web page using BeautifulS 3 min read Python | Extract URL from HTML using lxml Link extraction is a very common task when dealing with the HTML parsing. For every general web crawler that's the most important function to perform. Out of all the Python libraries present out there, lxml is one of the best to work with. As explained in this article, lxml provides a number of help 4 min read Extract CSS tag from a given HTML using Python Prerequisite: Implementing Web Scraping in Python with BeautifulSoup In this article, we are going to see how to extract CSS from an HTML document or URL using python. Â Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come b 2 min read Extract the HTML code of the given tag and its parent using BeautifulSoup In this article, we will discuss how to extract the HTML code of the given tag and its parent using BeautifulSoup. Modules Needed First, we need to install all these modules on our computer. BeautifulSoup: Our primary module contains a method to access a webpage over HTTP.pip install bs4lxml: Helper 3 min read Find the title tags from a given html document using BeautifulSoup in Python Let's see how to Find the title tags from a given html document using BeautifulSoup in python. so we can find the title tag from html document using BeautifulSoup find() method. The find function takes the name of the tag as string input and returns the first found match of the particular tag from t 1 min read BeautifulSoup - Scraping Paragraphs from HTML In this article, we will discuss how to scrap paragraphs from HTML using Beautiful Soup Method 1: using bs4 and urllib. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. For installing the module-pip install bs4.urllib: urllib is a package that c 3 min read How to find a HTML tag that contains certain text using BeautifulSoup ? BeautifulSoup, a powerful Python library for web scraping, simplifies the process of parsing HTML and XML documents. One common task is to find an HTML tag that contains specific text. In this article, we'll explore how to achieve this using BeautifulSoup, providing a step-by-step guide. Required Py 4 min read Like