Pythonlevel 2
Pythonlevel 2
Introduction
Poll (single choice)
Click to edit Master title style
How long have you been programming?
• Less than a week
• Less than a month
• Less than a year
• 1 - 3 years
• 3 - 10 years
• 10+ years
Poll (multi choice)
Click to edit Master title style
• What are you looking forward to learning?
• Dictionaries
• Exception handling
• Reading and writing to files
• Using external libraries
• Making HTTP requests
• Writing simple web scrapers
• Writing complex web scrapers
• Other (say in chat)
Click to edit Master title style
Introduction
Installation
Set up
Click to edit Master title style
• Download the PDF of these slides and the Reference
document (Resources widget)
• Go to https://github.com/ariannedee/python-level-2
and follow the installation instructions in the Readme
More Concepts
Name Age
Name, Age Shehin 23
Shehin, 23
Freddy, 85 Freddy 85
Bob, 5 Bob 5
Gabriella, 62
Gabriella 62
Next Level Python LiveLessons
Click to edit Master title style
Lesson 2 – Work with Files
• https://learning.oreilly.com/videos/next-level-
python/9780136904083/9780136904083-NLP1_01_02_00/
Click to edit Master title style
Scraper Foundations
• Uninstall package
• $ pip uninstall
Scraper Foundations
Scraper Foundations
Introduction to HTML
Poll
Click to edit Master title style
• Do you know HTML?
• Not at all
• A bit, and I would like to review it
• A bit, but I don’t want to review it
• Yes
Click to edit
HTML Master
page title style
structure
Click toHTML
edit Master title style
elements
Click toHTML
edit Master title style
elements
Element
Click toHTML
edit Master title style
elements
Content
Click toHTML
edit Master title style
elements
Element
• table
• th - table header
• tr - table row
• td - table data
Click to editNesting
Master title style
<div>
<p>This is a paragraph inside a div</p>
</div>
Click to editNesting
Master title style
<table>
<tr>
<th>Title</th>
<th>Author</th>
</tr>
<tr>
<td>Animal Farm</td>
<td>George Orwell</td>
</tr>
<tr>
<td>Pride and Prejudice</td>
<td>Jane Austen</td>
</tr>
</table>
Click to editNesting
Master title style
<table>
<tr>
<th>Title</th>
<th>Author</th>
</tr>
<tr>
<td>Animal Farm</td>
<td>George Orwell</td>
</tr>
<tr>
<td>Pride and Prejudice</td>
<td>Jane Austen</td>
</tr>
</table>
Click to editNesting
Master title style
<table>
<tr><th>Title</th><th>Author</th>
</tr><tr>
<td>Animal Farm</td><td>George Orwell</td>
</tr><tr>
<td>Pride and Prejudice</td><td>Jane Austen</td>
</tr></table>
Click to edit
HTML Master
page title style
structure
Click to edit Master title style
Attributes
Attribute Value
Click to edit Master title style
Attributes
Attribute Value
Attributes are listed after the tag name, with an “=” after
The value of an attribute is in quotes after the “=”
If there are multiple attributes, they have a space between them
Common attributes
Click to edit Master title style
• id
• A unique identifier for an element
• class
• Often used for determining the styling of an object
• E.g. Menu link has a class “active” so the styling is different for the current page
• href
• The URL for a link (hypertext reference)
• Required for links ("a" tag)
• src
• The source location of an image
• Required for images ("img" tag)
Click toSample
edit Master
HTML title style
Click toSample
edit Master
HTML title style
Next Level Python LiveLessons
Click to edit Master title style
Lesson 7.2 – Review web pages and HTML
• https://learning.oreilly.com/videos/next-level-
python/9780136904083/9780136904083-NLP1_01_07_02/
Click to edit Master title style
Scraping websites
Scraping data
Python scraper options
Click to edit Master title style
• Beautiful Soup - simple
• lxml - more technical, supports xml
• Scrapy - advanced features, full scraper capability
• Selenium - handles JavaScript and user events, slow
• Requests-HTML - simple, but not production-ready
Click to edit Master
Beautiful Souptitle
4 style
You didn't write that awful page. You're just trying
scraping projects.
Click to editBeautifulSoup
Install Master title style
Click to editfind
Practise: Master
the title style
buttons
Next Level Python LiveLessons
Click to edit Master title style
Lesson 7.3 – Parse HTML documents with Beautiful Soup
• https://learning.oreilly.com/videos/next-level-
python/9780136904083/9780136904083-NLP1_01_07_03/
ClickQuestions
to edit Master title style
and break
Q&A widget
Click to Project
edit Master
datatitle style
https://en.wikipedia.org/wiki/Member_states_of_the_United_Nations
Next Level Python LiveLessons
Click to edit Master title style
Lesson 8 – Create a Web Scraping Application
• https://learning.oreilly.com/videos/next-level-
python/9780136904083/9780136904083-NLP1_01_08_00/
Ethics
Click to edit Master title style
• Let the website know who you are and how to contact you
• Engaging in automated uses of the site that are abusive or disruptive of the services and have not been
• Disrupting the services by placing an undue burden on a Project website or the networks or servers
• Disrupting the services by inundating any of the Project websites with communications or other traffic that
suggests no serious intent to use the Project website for its stated purpose;
• …
https://foundation.wikimedia.org/wiki/Terms_of_Use/en
Click to edit
Let’sMaster
code!title style
Click to Project
edit Master
datatitle style
Wrapping up
• Can return multiple document types, but JSON is the most common
Questions?
Email me at arianne.dee.studios@gmail.com