0% found this document useful (0 votes)

379 views

Software Engineering Project

Software engineering project

Uploaded by

Shruti Garg

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

379 views

Software Engineering Project

Software engineering project

Uploaded by

Shruti Garg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

CSE-3001 Software Engineering

PROJECT FINAL REPORT

Automation and Scraping toolbox

SUBMITTED TO: PROF. ARUN KUMAR S

TEAM MEMBERS:

DALIA JOSE :- 19BCE2024

RAHUL GARG :- 19BCE0991
SHRUTI GARG :- 19BCE0994
ASHMITHA :- 19BCE0995
ABSTRACT

Like many other methodological innovations, Automation has grown over the years with
technological advancements, however very few people take advantage of the libraries and
methods available to automate their daily tasks. This paper explores the avenue of automation
along with scraping data from the internet.

Work for most people nowadays involves using the internet on a daily basis, however people
sometimes need offline web pages and images. They may require to do trivial tasks repeatedly
such as open multiple links, download multiple images and so on. They may not have the time
required to manually perform multiple searches and operations over the internet.

We intend to provide a toolset that can help such people and other students that need to scrape
data off the internet and automate their work. This can help improve their efficiency along with
lessening their burden. It can also be used by tech enthusiasts to explore the upcoming field of
automation through programming.

AIM
Our project aims to encourage and explore the world of automation through the use of two major
programming languages and interfaces – Python and Selenium. It explores the most popular
libraries implemented by python to perform automation and file manipulation in recent years
which include the OS, XML, urlparse and getpass libraries. It intends to create a tool-set driven
by automation and web scraping.

We intend to implement and provide a tool set to users with the purpose of saving multiple web
pages, documents and means to access files offline without the need to individually go and save
each and every required page manually. The tool set also includes scripts that can be used to
scrape weather data, top daily news as well as open multiple links through a simple script. This
documentation will also cover our workflow methodologies, explore previous work in the field,
provide a Software requirement specification documentation along with the design of the
proposed system.

OBJECTIVES
The objectives of the implementation are as follows:
• Implementation of a multiple-link opener that automates opening links one-by-one on

a browser.

• Scraping the top news headlines from the google news webpage

• Scraping weather information from Open Weather Map.

• Automate form fill-ups and logins through social networking sites login automation.

• Automatically download the desired number of images corresponding to the keyword and

number of images required entered by the user.

• Provide a working interface to the user that works on menu-based selection with proper

user sanitized inputs.

PROPOSED METHODOLOGY

Automation replaces human work in repetitive, tedious tasks, and minimizes the number of
errors. With the right automation tools, it is possible to perform all these actions. Web
automation is the process of automatically performing operations on a web browser, in order to
achieve speed and efficiency levels that wouldn’t be possible with human intervention.

These types of tasks can include:

• Filling out forms and fields

• Scraping content from a web page
• Extracting and transferring data between applications
• Clicking buttons and elements

Data scraping is a term used to describe the extraction of data from an electronic file using a
computer program. Web scraping describes the use of a program to extract data from HTML files
on the internet. Typically, this data is in the form of patterned data, particularly lists or tables.
Programs that interact with web pages and extract data use sets of commands known as
application programming interfaces (APIs). These APIs can be ‘taught’ to extract patterned data
from single web pages or from all similar pages across an entire web site. Alternatively,
automated interactions with websites can be built into APIs, such that links within a page can be
‘clicked’ and data extracted from subsequent pages. This is particularly useful for extracting data
from multiple pages of search results.

Furthermore, this interactivity allows users to automate the use of websites’ search facilities,
extracting data from multiple pages of search results and only requiring users to input search
terms rather than having to navigate to and search each web site first.

One can utilize the data collected from a website such as e-commerce portal, Job portals, social
media channels to understand customer’s buying patterns, employee attrition behaviour, and
customer’s sentiments and the list goes on.
Most popular libraries or frameworks that are used in Python for Web – Scrapping are Beautiful
Soup, Scrappy & Selenium.

Scrapy is an open-source framework that has built-in support for extracting the data from HTML
web pages. It is programmed in Python and can work on big data set. The biggest advantage of
Scrapy is that it is built on Twisted which means it is an asynchronous networking library that
allows to move on to another task before the earlier task is completed. This makes the
performance more efficient.

Beautiful Soup is a module for extracting information from an HTML page (and is much better
for this purpose than regular expressions). The Beautiful Soup module’s name is bs4. It’s easy to
install and work with. This tool firstly downloads the web page, extracts the data and saves it
locally. This data is can then be analysed according to your requirements. Even a simple HTML
file involves many different tags and attributes, and matters quickly get confusing with complex
websites. Thankfully, Beautiful Soup makes working with HTML much easier.

Selenium is an open-source web-based automation tool. It is originally used to automate web

browser interaction from Python by providing extensions. It is actually composed of number of
tools such as WebDriver, IDE and Grid. It uses minimalist design approach that gives it a benefit
to be included as a component in large-scale applications. It is very compatible with the JavaScript
codes in any project. This technique is used for browser-based testing to keep up with the
possible changes. The setup of this tool is pretty simple and the data is saved in a CVS file which
makes the data more understandable.

The Selenium tool suite consists of Selenium WebDriver, an automation tool built for
automating in the browser, Selenium IDE, a record and playback tool, and Selenium Grid, a
parallel testing tool. All components of the Selenium suite are open-source tools for
automation of websites and web applications. WebDriver is an API that can automate any
action that takes place in a web browser. It drives browser actions natively, meaning that it
interacts with browser elements, like clicking on buttons, typing text in fields, etc., like a real
user would.

Selenium WebDriver is operated by writing code in a programming language, such as C#,

JavaScript, PHP or Python.

EXPECTED OUTCOME

The outcome of our project should be a website which has these following features:

• Store the required number of images in a folder named after the keyword required.

• Provide weather information as per user input.

• Display the top news on google news feed.

• Automate user login in Facebook.

• Open multiple links provided by the user (separated by space) on a new browser.
PROCESS MODEL
Waterfall model
In our project, Automation and Scrapping toolbox, we have used the waterfall model as our
process model. Here the system development process cascades from one phase to another. It
would consist of major phases like, system planning, problem analysis, system implementation,
system testing and maintenance. The whole process of software development is divided into
separate phases.

The sequential phases in Waterfall model are –

● Requirement Gathering and analysis − All possible requirements of the system to be

developed are captured in this phase and documented in a requirement specification
document.

● System Design − The requirement specifications from the first phase are studied in this
phase and the system design is prepared. This system design helps in specifying
hardware and system requirements and helps in defining the overall system
architecture.

● Implementation − With inputs from the system design, the system is first developed in
small programs called units, which are integrated in the next phase. Each unit is
developed and tested for its functionality, which is referred to as Unit Testing.

● Integration and Testing − All the units developed in the implementation phase are
integrated into a system after testing of each unit. Post integration the entire system is
tested for any faults and failures.
● Deployment of system − Once the functional and non-functional testing is done; the
product is deployed in the customer environment or released into the market.

● Maintenance − There are some issues which come up in the client environment. To fix
those issues, patches are released. Also, to enhance the product some better versions
are released. Maintenance is done to deliver these changes in the customer
environment.

Justification (for opting waterfall model):

● In this Waterfall model, typically, the outcome of one phase acts as the input for the
next phase sequentially.

● The main purpose of using this model is that we have already established the
requirements and objectives of our project.

● Various related products and websites are already present in the market and our aim is
to provide a better and optimized version which can be useful to the users. Thus,
waterfall model becomes our prime choice.

● Moreover, this model also makes us follow all the phases systematically preventing us
from missing anything.

Why not choose other process models:

Prototyping model:

● This model is usually preferred when there may be too much variation in requirements
(which is not in our case) increasing the complexity of the system.
● There may be sub-optimal solutions because of a hurry to build prototypes to provide to
the client.
● There may be incomplete or inadequate problem analysis.

Rapid application development:

● If we are not committed to delivering software on time, RAD projects can fail.
● Progress and problems accustomed are hard to track as such there is no documentation
to demonstrate what has been done.
● Requires highly skilled designers or developers and we are making this project for
learning purposes.
● Not all applications are compatible with RAD.

Spiral model:
● It is not suitable for small projects like ours as it is expensive.
● It is much more complex than other SDLC models. Process is complex.
● Spiral may go on indefinitely.
● End of the project may not be known early.
● It is not suitable for low-risk projects like this.

WORK BREAKDOWN STRUCTURE

SOFTWARE REQUIREMENT SPECIFICATION (SRS)

FUNCTIONAL REQUIREMENTS:
The functional requirements for our project automation and scraping toolbox are defined as
follows:

• Proper menu-based interface and proper tool-selection inputs.

• News scraper that extracts and presents the top 5 headlines, separated by user inputs

• Image scraper that creates a folder and stores images based on the keyword and number
of images provided by the user.

• Weather scraper that extracts the weather conditions of the location specified by the user
and presents it in a readable format.

• Multiple link opener that automates a user browser to open several links (separated
by space) provided by the user.

• Automated Facebook login that opens a browser, logs in with the given credentials
and presents the home page.

1. INTRODUCTION
1.1 Purpose
The purpose of this document is to build a set of scrapping tools that automate user functions
such as filling forms, open multiple tabs with respective user input for search results, download
multiple images from a website etc.

1.2 Document Conventions

The document uses the following convention:
Sel – Selenium Module
SCB– Selenium Controlled Browser
1.3 Intended Audience and Reading suggestions
These set of tools can be used by any user to automate their daily tasks that require them
multiple tab access and downloading / extracting data from the internet on a daily basis for
professional or personal needs. It's useful for web programmers as well as graphic designers.

1.4 Project Scope

The programs can be used by any user to automate their daily tasks that require them multiple
tab access and downloading / extracting data from the internet on a daily basis for professional
or personal needs. The programs are coded in python or through Sel which is accessed through
the Firefox browser. The main code can be changed to accommodate any form of data or
website the user needs to access. It provides a much faster method to do tasks that would take
up huge amounts of time to do manually. It can also do functions that would be too hectic to
proceed with manually.

1.5 References
http://automatetheboringstuff.com/
https://hackernoon.com/building-a-web-scraper-from-start-to-finish-bb6b9538818

2 . OVERALL DESCRIPTION

2.1 Product Perspective and features.

The system after running the program would perform the following features:
• Store required number of images in a folder named after the keyword
required.
• Provide weather information as per user input
• Display the top news on google news feed
• Automate user login in Facebook
• Open multiple links provided by the user (separated by space) on a new
browser.
Filling out forms automatically.
The browser can automatically locate login fields and then login as a user as soon as the user
loads the respective page.

Automated clicking and submitting input

The browser will automatically perform keystrokes depending on the code of the user. It will be
achieved through Selenium.

2.2 User class and characteristics.

The main program will consist of only one major user class which would be the person who runs the
tools for automation. They have the ability to access the main code and change the URLs as well as the
given values for adjusting the changes they need to be made for their next scraping process.

The customer should be able to do the following functions:

Python functions
1. Crawl websites, extract data and store it offline if required.
2. Use weather APIs to scrape weather information
3. Provide a user-controlled menu interface

SCB functions
1. Automate text box and form fill-ups in websites
2. Open a new instance of a browser (or new tab if pre-opened)
3. Mimic user keystrokes and interactions
4. Navigate through the browser interface (tabs)

2.3 Operating Environment

Frontend: Command line interface (CLI)

Operating system: Windows

Browser: Firefox

External modules: Selenium

Scripting language: Python

2.4 Design and implementation constraints
The response procedure from downloading data such as images and plain text, their storage
location, the format of the file saved and their grouping.

The browser-based scripting limits the use of an external front end and back end system as
most of the functionalities involve opening other websites and loading browser modules
which would be ineffective to be merged with a full stack system.

2.5 Assumption Dependencies

Let us assume the tool is being used for websites:

• The websites accessed for scraping should be a general one that does not consist of
millions of sub sections and subdomains which would result in a huge overhead

• The scraping is done on a website that allows such a procedure to follow. Many
websites do not permit scraping data through many guidelines and rules.
• The automation is done on websites that do not run bot checkers and crawlers. It
would restrict the login functionality of the browser many times due to captcha
checks and "I'm a robot" checks.

Gantt Chart:
Pert Chart:

Timeline chart:
3. SYSTEM FEATURES

3.1 Description and Priority

The program's high priority is properly downloading images and other data in high
resolution without any skipping or skimming. This requires the code to be very
efficient otherwise it will experience clogging due to huge traffic flow.

3.2 Stimulus/Response Sequences

• Open web pages according to the data in the search fields inputted.
• Download images of complete sites or crawling through websites
• Correctly input data in login fields of the data and automate keystrokes.

4. EXTERNAL INTERFACE REQUIREMENTS

4.1 User Interfaces

• Python GUI for code execution

4.2 Hardware Interfaces

• Windows
• Browser which supports Sel like Firefox

4.3 Software Interfaces

Software Description

Windows It is the operating system we use as it is the most user

friendly operating system with the highest number of
compatible software.
BeautifulSoup (BS4) It is a parsing library that can use different parsers.
Advantage of BS4 is its ability to automatically detect
encodings and navigate parsed documents to extract
needed data.

Selenium module – Gecko It’s the most popular tool for automating browsers
primarily for testing. It’s one of the easiest testing and
automation software to use with python.

Scrapy An open source and collaborative framework for

extracting the data you need from websites, we’re
using it as it is a fast, simple, yet extensible way to
scrape data.

Firefox Mozilla Firefox, or simply Firefox, is a free and open-

source web browser developed by the Mozilla
Foundation and its subsidiary, Mozilla Corporation.
Firefox uses the Gecko layout engine to render web
pages

URLparse, URLlib, requests, OS and re A set of python libraries that manipulate HTTP
requests and the system directories. Re works to
identify patterns.

Specific versions used for the project:

Python: Version 3.7.7
Firefox: Firefox 77

Selenium: Version 3.141.59

4.4 Communication Interfaces

The SCB supports Firefox browser integration but the web scrapping
functionalities and the combination of front-end and back-end of Electron
supports all operating systems and browsers.
5. NON-FUNCTIONAL REQUIREMENTS

5.1 Performance Requirements

The program should not be clogged or experience bottleneck throughout
the automation process. It could result in a deadlock and make the
whole automated process obsolete.

5.2 Safety Requirements

The programs should in no way act as a suspicious bot on the web which
can lead to the IP blocks. It should also not tamper with sensitive website
data as it is a violation of webpage policies.

5.3 Security Requirements

The data extracted from the website should not be visible to anyone else.
There should be no form of personal information leakage or data leakagduring
the automated/scraping process.

5.4 Software Quality Attributes

• Availability:- The tools should run on most websites and be compatible

with popular OS distros.

• Correctness:- It should correctly scrap data relevant to the website.

• Maintainability:- The scraper should use the latest and most optimized
python libraries for minimum execution time.

• Usability:- It should display proper logs in case of an error. It should also

sanitize user input.
6. CONTEXT DIAGRAM

7. DFD (DATA FLOW DIAGRAM)

8. STATE CHART DIAGRAM
SOFTWARE DESIGN SPECIFICATION (SDS)

1.INTRODUCTION

1.1.Purpose
This document will define the design of our project automation and scrapping toolbox. It
contains specific information about the expected input, output, classes, and overall functions.
The interaction between the classes to meet the desired requirements are outlined in detailed
figures at the end of the document.

1.2.System Overview
In this project, we intend to implement and provide a tool set to users with the purpose of
saving multiple web pages, documents and means to access files offline without the need to
individually go and save each and every required page manually. The tool set also includes
scripts that can be used to scrape weather data, top daily news as well as open multiple links
through a simple script.

1.3.Design Map
1.4.Definition and Acronyms

Automation: Application of technology, programs, robotics or processes to achieve outcomes

with minimal human input.

Scrapping: Extraction of data from a website.

Main program and subroutine Architecture: This form of sub-architecture decomposes the
main program structure into a number of subprograms.

BS4: Beautiful Soup (Module for automation)

2.DESIGN CONSIDERATIONS

2.1.Assumptions
● All the tools used for designing purposes provide good results and are free to use.

● Project will follow waterfall methodology throughout execution.

● The solution will utilize call and return architecture.

2.2.Constraints
● Complete the design within the deadline. We have a lot of ideas but cannot implement
them due to time constraint.

● Limited free tools and resources available for implementation and use.

● Make a user-friendly design interface with less screens (or pages) and more usage.

2.3.System Environment

● UML diagrams:
For these diagrams we have used Edraw Max software.

● Frontend:

The front-end is designed using HTML, CSS and JavaScript. We plan to use Electron (text
Editor) software (tool) for this purpose.

● Script:

The main script is written using python language in a python compiler.

2.4.Design Methodology
Many software developments projects have been known to incur extensive and costly design
errors. The most expensive errors are often introduced early in the development process. This
underscores the need for better requirement definition and software design methodology.
Software design is an important activity as it determines how the whole software development
task would proceed including the system maintenance. A methodology can be defined as the
underlying principles and rules that govern a system. A method can be defined as a systematic
procedure for a set of activities. Thus, from these definitions, a methodology will encompass the
methods used within the methodology. This can be implemented by following four components:

1. a conceptual model of constructs essential to the problem,

2. a set of procedure suggesting the direction and order to proceed,

3. a series of guidelines identifying things to be avoided, and

4. a collection of evaluation criteria for assessing the quality of the product.

A software design methodology can be structured as consisting of the software design process
component and the software design representation or diagrammatic component. The evolution
of each software design needs to be meticulously recorded or diagrammed, including the basis
for choices made, for future walk-throughs and maintenance.

2.5.Risks and Volatile Areas

The current tool-set has certain limitations as follows:
• It requires a new code for every different website scraped.

• It cannot perform automation and scraping together.

• The program may freeze or bottleneck when the input is too large.

• Multiple websites cannot be scraped together.

• It is heavily reliant on simple website architectures (light use of JavaScript and its
frameworks)

3.ARCHITECTURE

3.1.Overview
This section elaborates the architecture used by the project. We will use the Call and Return
architecture for our project. Our program involves a variety of functional calls and returns and
we require a robust architecture that is easy to modify.

3.2.Subsystem, Component, or Module

This architecture style allows us to achieve a program structure which is easy to modify.

Following are the sub styles exist in this category:

1. Main program or subprogram architecture

● The program is divided into smaller pieces hierarchically.

● The main program invokes many of the program components in the hierarchy that
program components are divided into subprograms.

2. Remote procedure call architecture

● The main program or subprogram components are distributed in network of multiple

computers.

● The main aim is to increase the performance.

Our project focuses on Main program and subroutine calls. The architecture diagram is shown
below:

3.3.Strategy
The strategies or working of our project architecture with respect to the call and return
architecture is as follows:

1. The user executes the python application which is the main program

2. According to user input, a sub-program is executed. This order may differ from one
user to another. The architecture is robust to difference in routine calls.

3. These sub-programs then access their specific libraries and invoke calls among
themselves.

4. This process takes place continuously until the user decides to exit the program.

4.DATABASE SCHEMA

4.1.Tables, Fields and Relationships

This section defines all the tables and their respective fields being implemented in the
project.
This figure describes all the relationships between different entities(modules) of our
project.

4.1.1. Databases
For the implementation of automation and scraping the data from existing websites is
required.

4.1.2. New Tables

The required tables for automation and scraping toolbox are:
● Facebook Login
● News Scrapper
● Image Scrapper
● Link Opener
● Weather Report

4.1.3. New Fields

The required fields of the tables listed above are:

➔ Facebook Login
◆ Username
◆ Password

➔ News Scrapper
◆ Article title
◆ Article link
◆ Publishing Date

➔ Image Scrapper ◆ Keyword

◆ Image url
◆ No. of images

➔ Link Opener
◆ Prefix
◆ URL

➔ Weather Report ◆ Location

◆ Temperature
◆ Humidity
◆ Wind
◆ Status

4.1.4. Fields Changes

In the Image scrapper table another field can be added which stores the details for the
location or path in the user device where the images are to be downloaded and stored.

4.1.5. All other changes

All the above listed tables and fields are sufficient for efficient working of the project
and thus no additional changes are made.

4.2.Data Migration
● Premigration planning. Evaluate the data being moved for stability.

● Project initiation. Identify and brief key stakeholders.

● Landscape analysis. Establish a robust data quality rules management process and
brief the business on the goals of the project, including shutting down legacy
systems.

● Solution design. Determine what data to move, and the quality of that data before
and after the move.

● Build & test. Code the migration logic and test the migration with a mirror of the
production environment.

● Execute & validate. Demonstrate that the migration has complied with requirements and
that the data moved is viable for business use.

● Decommission & monitor. Shut down and dispose of old systems.

5.HIGH LEVEL DESIGN

This section consists of the high-level design. It is used to briefly describe all platforms, systems,
products, services and processes that it depends on and include any important changes that need to be
made to them. This can be illustrated by the following UML diagrams.
5.1.Use-case Diagram
5.2.Class Diagram
5.3.Sequence Diagram

5.4.Collaboration Diagram
5.5.Activity Diagram

5.6.Component Diagram
5.7.Deployment Diagram

6.LOW LEVEL DESIGN

This section contains the Low-Level Design of our project which refers to the component-level
design process. It describes each and every module means it includes actual logic for every
system component. It goes deep into each module's specification. It is also known as micro
level/detailed design. It converts the above High-Level Solution into Detailed solution.

6.1.Selenium Automation

Selenium is a free (open-source) automated testing framework used to validate web

applications across different browsers and platforms. We plan to use Python to create Selenium
Test Scripts.
Selenium Grid is a tool used together with Selenium RC to run parallel tests across different
machines and different browsers all at the same time. Parallel execution means running
multiple tests at once.

Features:

● Enables simultaneous running of tests in multiple browsers and environments.

● Saves time enormously.
● Utilizes the hub-and-nodes concept. The hub acts as a central source of Selenium
commands to each node connected to it.

6.2.Scrapping

Web Scrapping also called “Crawling” or “Spidering” is the technique to gather data
automatically from an online source usually from a website. While Web Scrapping is an easy
way to get a large volume of data in a relatively short time frame, it adds stress to the server
where the source is hosted.

This is also one of the main reasons why many websites don’t allow scraping all on their
website. However, as long as it does not disrupt the primary function of the online source, it is
fairly acceptable.

Web scraping can help us extract an enormous amount of data about customers, products,
people, stock markets, etc.We are using web scraping in this project to perform the following
functions:

● Image Scrapping for searching and downloading images.

● News Scrapping for top 4-5 headlines.

6.3.Weather API

Weather impacts nearly every area of our lives – Weather conditions such as fog, rain, and
snow impact commutes to work and travels to places in other parts of the world.

Weather APIs are Application Programming Interfaces that allow you to connect to large
databases of weather forecast and historical information.
In this project we will use the openweathermap.The OpenWeatherMap API currently provides a
wide variety of weather data including (but not limited to) current weather, forecasts,
historical, weather stations, and weather alerts.

7.USER INTERFACE DESIGN

This section contains the interfaces of our project which focuses on looks and styles. We aim to
create interfaces which users find easy to use and pleasurable. The detailed description of each
interface screen is provided below.

7.1. Application Controls

All the controls which are common to each screen are listed and described here.

7.1.1. Navigation Bar

A navigation bar (or navigation system) is a section of a graphical user interface

intended to aid visitors in accessing information. This bar includes components as
follows:

· Title: The Title or logo of our website.

· Home button: A button which when clicked takes the user to the home page
of our website.

· About us: A button which takes us to the about us page which consists of the
information about our site and explains its features.

· Tasks: A drop down menu which lists the 5 tasks of our project and each of
these takes the user to the respective task pages on our site.

7.1.2. Footer

The website footer is the section of content at the very bottom of a web page. It
contains the following components:
· Logo: The logo of our website.

· Contact Information: It contains contact details like a business email, phone

number, or mailing address.
·Social Media Icons: It Includes social profile links.

7.2.Home Page

The home page is the default page which opens up when the user opens the website. It can
also be opened by clicking on the home button present in the navigation bar at top when the
user is on another page.

It contains the list of main 5 tasks:

· Multiple Link Opener

· Weather Report

· Image Scrapper

· News Scrapper

· Facebook Login

All these tasks are present as buttons which link the user to separate respective task pages for
providing the respective output.

7.3.Multiple Link Opener

This page performs the first task that is opening the links input by the user. It will contain a
form with following components:

· Options: To choose whether the link to be entered contains a prefix or not.

· Link: Takes the input of link or multiple links from users separated by spaces.

· Submit: Button to submit the form.

A new screen will appear displaying the output that the links will be opening and the
links will open up in new tabs in Firefox browser.

7.4.Weather Report

It will contain a form with a single input that is location. It will take the input of location from
the user.

A submit button is present which when clicked will read the input and display the weather
report consisting of the temperature, humidity, wind, etc.

7.5.Image Scrapper

This page performs the third task i.e., scraping images from google. It will contain a form with
following components:

· Keyword: Input of the keyword on which images are to be searched.

· Number of images: Input of an integer specifying the number of images to be

downloaded.

· Submit: Button to submit the form.

After submitting a new screen will appear showing us the directory where the images are
downloaded. By opening the respective directory we can view the downloaded images.

7.6.News Scrapper
This page performs the fourth task i.e., scrapping news from google.

It will contain the top news headlines and links to read the full article. The headlines would be
of the present date when the user clicks on the task.
7.7.Facebook Login Automation

This page performs the final task i.e., automating Facebook login. It will contain a form with
following components:

· Email: Input the Facebook email or username from the user.

· Password: The password for Facebook login.

· Submit: Button to submit the form.

After submitting, a new tab with Facebook site will open with the login credentials already
filled in.

8.GUI PROTOTYPING
Prototyping is a process of creation of an interactive model of the software in order to show
future users how the program interface is running. One of its advantages is the possibility to get
immediate feedback from the users. Prototyping makes the dialog between all the participants
of the development process more actual and makes their cooperation more effective.

The goals of prototyping are not limited by solving communication problems. One of the most important
tasks is improving user interface quality. Graphical user interfaces (GUI) should be user-friendly.When
using prototyping technologies, the user transforms from passive consumer to active participant of
software development process. Before the system development (on design step) he can easily visualize
all his wishes and ideas.

Home Page
Multiple Link Opener

Weather Report

Image Scrapper
News Scrapper

Facebook Login Automation

FULL PROTOTYPING:

Click on the link below to view full prototyping:

https://xd.adobe.com/view/61c42d72-e323-4d02-a266-e593c4d5648b-3497
/?fullscreen&hints=off
IMPLEMENTATION
CODE:
from flask import Flask, redirect, request, url_for, render_template

import pyowm

import requests

from selenium import webdriver

from getpass import getpass

import xml.etree.ElementTree

from urllib.parse import urlparse

from bs4 import BeautifulSoup

from simple_image_download import simple_image_download as simp

import urllib3

app= Flask(__name__)

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'

response = http.request('GET', url)

news={1:{'title':'','link':'','pubDate':''},2:{'title':'','link':'','pubDate':''}
,3:{'title':'','link':'','pubDate':''},4:{'title':'','link':'','pubDate':''},5:{'
title':'','link':'','pubDate':''}}

links={1:news[1]['link'],2:news[2]['link'],3:news[3]['link'],4:news[4]['link']}

soup = BeautifulSoup(response.data)

@app.route("/")

def home():

return render_template("index.html")

@app.route('/task2')

def task2():

38
return render_template('task2.html')

@app.route("/task2/result",methods=['POST'])

def task2_results():

owm=pyowm.OWM('3282e54688306ae3571e778e51429885')

s = request.form['location']

observation = owm.weather_at_place(s)

weather = observation.get_weather()

temperature=weather.get_temperature('celsius')['temp']

wind = weather.get_wind()['speed']

humid = weather.get_humidity()

status = weather.get_detailed_status()

return render_template("task2_result.html",location=s,temp=str(temperature),w
ind=str(wind),humid=str(humid),status=str(status))

@app.route('/task1')

def task1():

return render_template('task1.html')

@app.route('/task1',methods=['POST'])

def task1_option():

option=request.form['option']

link=request.form['link']

l=link.split()

t=len(l)

if(option=="2"):

for elements in l:

elements=['https://www.' + elements + '' for elements in l]

39
elif(option=="1"):

for elements in l:

elements=[elements for elements in l]

driver = webdriver.Firefox()

driver.get("https://www.google.com")

page_number = 1

for page in elements:

driver.execute_script("window.open('');")

driver.switch_to.window(driver.window_handles[page_number])

driver.get(page)

page_number +=1

return render_template('task1_result.html')

@app.route('/task5')

def task5():

return render_template('task5.html')

@app.route("/task5",methods=['POST'])

def task5_result():

username=request.form['username']

password=request.form['password']

url='https://facebook.com'

driver = webdriver.Firefox()

driver.get(url)

driver.find_element_by_id('email').send_keys(username)

driver.find_element_by_id('pass').send_keys(password)

return render_template('task5_result.html')

40
@app.route("/task3")

def task3():

return render_template('task3.html')

@app.route("/task3",methods=['POST'])

def task3_result():

key=request.form['key']

num=int(request.form['num'])

response = simp.simple_image_download

imageurl1=response().urls(key,num)

try:

response().download(key,num)

imageurl1=response().urls(key, num)

except:

pass

return render_template('task3_result.html',imageurl=imageurl1)

@app.route('/task4')

def task4():

news_url = "https://news.google.com/news/rss"

xml_page=requests.get(news_url).content

e= xml.etree.ElementTree.fromstring(xml_page)

p = 1

for it in e.iter('item'):

if(p>4):

break

if(p==4):

news[p]['title']=it.find('title').text

41
news[p]['link']=it.find('link').text

news[p]['pubDate']=it.find('pubDate').text

p=p+1

else:

news[p]['title']=it.find('title').text

news[p]['link']=it.find('link').text

news[p]['pubDate']=it.find('pubDate').text

p=p+1

return render_template('task4.html',news=news,links=links)

if __name__ == "__main__":

app.run(debug=True)

1) Home Page

42
2) Multiple Link Opener

43
3) Weather Report

44
4) Image Scrapper

45
46
5) News Scrapper

6) Facebook Login Automation

47
TEST REPORT
TEST CASE SPECIFICATION IDENTIFIER
We have used test case form Test Case ID TC_01 to Test Case ID TC_11.

TEST ITEMS
So we have tested these items for various types of inputs specified in the table :

● Multiple Link Opener: Multiple link opener that automates a user browser to open
several links (separated by space) provided by the user.

● Program Termination from menu: The program exits and returns to the Main Menu

● Weather Report: The current temperature, Wind speeds, Humidity and status of the
input location

48
● Image Scraper: Image scraper that creates a folder and stores images based on the
keyword and number of images provided by the user.

● Facebook login automation: Automated Facebook login that opens a browser, logs in
with the given credentials and presents the home page.

INPUT SPECIFICATIONS
Specified in table below.

OUTPUT SPECIFICATIONS
Expected output and actual output both are specified in the table below with their
status i.e., pass or fail.

ENVIRONMENTAL NEEDS
Hardware
For Python:
● Processors: Intel Atom® processor or Intel® Core™ i3 processor.
● Disk space: 1 GB.
● Operating systems: Windows* 7 or later, macOS, and Linux.

For Firefox (For selenium testing):

● Pentium 4 or higher processor with SSE2 support
● 512MB of RAM
● 200MB of available hard drive space.

Software
● BeautifulSoup (BS4):

49
It is a parsing library that can use different parsers. Advantage of BS4 is its
ability to automatically detect encodings and navigate parsed documents
to extract needed data.

● Selenium module – Gecko:

It’s the most popular tool for automating browsers primarily for testing.

● Scrapy:
An open source and collaborative framework for extracting the data you
need from websites.

● Firefox:
Mozilla Firefox, or simply Firefox, is a free and open-source web browser
developed by the Mozilla Foundation and its subsidiary, Mozilla Corporation.

Other:
● Python GUI for code execution

● Stable Internet Connection

● Access to APIs
● Login credentials of Facebook

SPECIAL PROCEDURAL REQUIREMENTS

Before running the code some prerequisite libraries and packages need to be installed for
python.

These libraries can be downloaded by running the following commands in the terminal:

● pip install selenium

● pip install urllib3

● pip install contextlib2

50
● pip install beautifulsoup4

● pip install pyowm==2.7.1

● pip install simple-image-download

● pip install Flask

● python index.py

INTER-CASE DEPENDENCIES

Each of the test cases is individual on its own. They are not dependent on any other test cases.
Thus, in order to carry out a test case we do not need to perform any prior testcase. So, there
are no inter-case dependencies present in this test report. We only need to satisfy the pre-
requisite requirements as stated in the table below.

TEST REPORT TABLE

Test Test Case Pre- Input Data Expected Output Actual Output Status
Case ID Objective requisite

TC_01 Multiple Link We get the error Fail

Opener message that this URL
Connection to Pressing button A Firefox browser with
doesn't exist
Internet 1,1,youtube.com the first tab as google
search followed by the
user input links

TC_02 Multiple Link Connection to Pressing button A Firefox browser with We get the error Fail
Opener Internet 1,2,youtuuuuube the first tab as google message that this URL
.com search followed by the doesn't exist
user input links

51
TC_03 Program Using Navbar to The program exits and The program exits and Pass
Termination Connection to exit returns to the Main returns to the Main
from menu Internet Menu Menu

TC_04 Multiple Link Connection to Pressing Button A browser with 4 tabs, Pass
opener Internet 1, 1, the first corresponding We get a Firefox
https://www.you to google search browser (with a robot
tube.com followed by the links icon to denote the
https://www.cou given as input browser is automated)
rsera.com
with all the correct
https://www.gm
links opened. We then
ail.com
get the prompt to exit
and return to the main
menu

TC_05 Multiple Link Connection to Pressing Button A Firefox browser with We get the browser Pass
opener Internet 1,2,youtube.com the first tab as google (with the robot icon
gmail.com search followed by the denoting automated)
user input links along with the desired
links.

TC_06 Weather Report Connection to Pressing Button The current It gives us the relevant Pass
Internet and 2, Delhi temperature, Wind weather information
access to speeds, Humidity and pertaining to the user
weather API status of the input input.
location

52
TC_07 Weather Report Connection to Pressing Button The current We get the error Fail
Internet and 2, Dveuwgeuvhw temperature, Wind message along with a
access to speeds, Humidity and prompt to press any
weather API status of the input key to continue
location

TC_08 Image Scraper files should be pressing Button The CLI should output The CLI should output Pass
available 3,dog,5 the image links of the the image links of the
keyword and a folder keyword and a folder
should be created should be created with
with 5 dog images 5 dog images

TC_09 Image Scraper files should be Pressing Button The CLI should output The CLI should output Pass
available 3,ddfwe,5 the image links of the the image links of the
keyword and a folder keyword and a folder
should be created should be created with
with 5 dog images 5 images related to
keyword

TC_10 News Scraper Connection to Pressing Button 4 News headlines, up-to Pass
Internet 5 and then a redirect
We get our articles
to the main-menu
along with the date
and time of their
publishing.

TC_11 Facebook login Connection to Pressing Button 5, Home screen of my It was able to open a Pass
automation Internet, (personal logged in Facebook browser and log in
Facebook ID credentials) through my credentials
and Password

53
CONCLUSION
Given the right tools, automating computer operations can be surprisingly easy
and can reap major benefits. Understanding these benefits—and some obstacles—
can help one develop support for a project or task. Automation can lead to cost
reduction, productivity, availability, reliability, and improved performance.

Automation technology, if used wisely and effectively, can yield substantial

opportunities for the future. There is an opportunity for future automation
technologies to provide a growing social and economic environment in which
humans can enjoy a higher standard of living and a better way of life.

Through this project, we were successfully able to implement automation tools for
common tasks done by users manually. We were able to use two popular
programming interfaces: python and selenium to achieve the concept of
automation in daily life. These two tools exponentially decrease the time required
to do tasks when compared to their manual implementations.

LIMITATIONS:
The current tool-set has certain limitations as follows:

• It requires a new code for every different website scraped.

• It cannot perform automation and scraping together.

• The program may freeze or bottleneck when the input is too large.

• Multiple websites cannot be scraped together.

• It is heavily reliant on simple website architectures (light use of

JavaScript and its frameworks)

Scope for future work:

With rapid development in artificial intelligence (AI) and robotics technology,

automation is at a tipping point. Today, robots can perform a slew of functions
without considerable human intervention. Automated technologies are not only
executing iterative tasks, but also augmenting workforce capabilities significantly. In

54
fact, automated machines are expected to replace almost half of the global
workforce. Multiple industries, from manufacturing to banking, are adopting
automation to drive productivity, safety, profitability, and quality.

There are already automated bots that interact with countless websites in a
human-like manner to discover any bugs or vulnerabilities.

There may soon be automated programmers that will just scrape data online and
build programs from scratch as per the desired input. Automated programs are
rapidly becoming more and more intelligent and are now able to perform a variety
of complex tasks thanks to advancements in deep learning and neural networks.

Automation will bolster connectivity and reliability in a hyper-competitive

ecosystem. The future of automation looks promising where everything will be made
accessible and easily available.

Science, Technology, and Society
100% (10)
Science, Technology, and Society
18 pages
Employee Management System: L. N. Mishra College of Business Management
No ratings yet
Employee Management System: L. N. Mishra College of Business Management
56 pages
Project Report On Online Story Sharing Website
100% (1)
Project Report On Online Story Sharing Website
125 pages
SRS Report For Facebook
No ratings yet
SRS Report For Facebook
10 pages
A Final Project Report On Mobile Inventory Management System
100% (2)
A Final Project Report On Mobile Inventory Management System
42 pages
Implementing Employee Central Service Center: Implementation Guide - Public Document Version: Q4 2019 - 2020-01-24
No ratings yet
Implementing Employee Central Service Center: Implementation Guide - Public Document Version: Q4 2019 - 2020-01-24
70 pages
SDS Online Shopping Application Project
No ratings yet
SDS Online Shopping Application Project
10 pages
E-Music (Online Music Website) : Chapters
0% (1)
E-Music (Online Music Website) : Chapters
25 pages
Ecommerce Website-Documentaion
100% (1)
Ecommerce Website-Documentaion
40 pages
Online Job Portal Management
100% (1)
Online Job Portal Management
72 pages
Examination and Result Management System - TutorialsDuniya
No ratings yet
Examination and Result Management System - TutorialsDuniya
57 pages
Srs
No ratings yet
Srs
22 pages
System Requirement Specification For Ecommerce Web Store
No ratings yet
System Requirement Specification For Ecommerce Web Store
10 pages
ITS521 - Mini Project E-Commerce Website
100% (1)
ITS521 - Mini Project E-Commerce Website
9 pages
Projects in Software Testing
No ratings yet
Projects in Software Testing
8 pages
"Mobile Shop Management": A Project Report On
No ratings yet
"Mobile Shop Management": A Project Report On
48 pages
Chapter 3 SRS
No ratings yet
Chapter 3 SRS
8 pages
Design and Development of Android Opinion Poll System
100% (2)
Design and Development of Android Opinion Poll System
42 pages
Srs For Library Management System
No ratings yet
Srs For Library Management System
18 pages
Online Mobile Phone Shop A ASP Net Project
No ratings yet
Online Mobile Phone Shop A ASP Net Project
40 pages
Project Proposal: COMSATS University Islamabad, Park Road, Chak Shahzad, Islamabad Pakistan
No ratings yet
Project Proposal: COMSATS University Islamabad, Park Road, Chak Shahzad, Islamabad Pakistan
12 pages
Online Quiz Report
100% (1)
Online Quiz Report
66 pages
Book-Shop Automation System
60% (10)
Book-Shop Automation System
17 pages
Budget Manager Mad
No ratings yet
Budget Manager Mad
14 pages
E Learning 02
No ratings yet
E Learning 02
107 pages
Shop Management System Project
0% (1)
Shop Management System Project
29 pages
Index: 1.1 Key Features
No ratings yet
Index: 1.1 Key Features
53 pages
Project SRS Document
33% (3)
Project SRS Document
54 pages
Web Dev Assignment
No ratings yet
Web Dev Assignment
2 pages
Final Year Project (Computing)
0% (1)
Final Year Project (Computing)
138 pages
Software Requirements Specification Vers Alumni
No ratings yet
Software Requirements Specification Vers Alumni
27 pages
SRS Document
No ratings yet
SRS Document
10 pages
Dairy Management System
No ratings yet
Dairy Management System
43 pages
WTA Mini Project Format
100% (3)
WTA Mini Project Format
21 pages
Internship Project PPT Template 2
No ratings yet
Internship Project PPT Template 2
12 pages
Entrepreneurship EXAM Aug
No ratings yet
Entrepreneurship EXAM Aug
3 pages
Project Report
No ratings yet
Project Report
58 pages
Angular POC Project
No ratings yet
Angular POC Project
35 pages
Srs E-Market Website
No ratings yet
Srs E-Market Website
13 pages
Synopsis Laptop World
No ratings yet
Synopsis Laptop World
17 pages
Music Application
No ratings yet
Music Application
34 pages
Netflix Srs
100% (1)
Netflix Srs
18 pages
Project Report Format T.Y.BSc (CS)
No ratings yet
Project Report Format T.Y.BSc (CS)
4 pages
Ui Design
100% (1)
Ui Design
16 pages
E Commerce
0% (1)
E Commerce
100 pages
Bca Synopsis
No ratings yet
Bca Synopsis
22 pages
Nepali Store Final Report
No ratings yet
Nepali Store Final Report
62 pages
Online Shopping System
100% (1)
Online Shopping System
33 pages
2018 Mit 025
No ratings yet
2018 Mit 025
106 pages
Synopsis On: Friendly Chat Application: MCA-V Sem Minor Project
No ratings yet
Synopsis On: Friendly Chat Application: MCA-V Sem Minor Project
12 pages
SDD For A Stock Market Project
No ratings yet
SDD For A Stock Market Project
49 pages
Requirements Elicitation
No ratings yet
Requirements Elicitation
5 pages
Report Writing Presentation Evaluation Guidelines For BSC CSIT Project
No ratings yet
Report Writing Presentation Evaluation Guidelines For BSC CSIT Project
39 pages
Online Shopping Documentation
100% (1)
Online Shopping Documentation
216 pages
Interior Decor Shop Management System
No ratings yet
Interior Decor Shop Management System
29 pages
Secure Email Transaction System
100% (1)
Secure Email Transaction System
32 pages
Software (Final Year Projects)
100% (11)
Software (Final Year Projects)
58 pages
Test Cases For Whatsapp
No ratings yet
Test Cases For Whatsapp
3 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
The Ultimate Guide to Chatbot Development:: From Beginner to Pro
From Everand
The Ultimate Guide to Chatbot Development:: From Beginner to Pro
M. Mangum
No ratings yet
Bug tracking system Complete Self-Assessment Guide
From Everand
Bug tracking system Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
E-Business Models and Web Strategies for Agribusiness
From Everand
E-Business Models and Web Strategies for Agribusiness
Roby Jose Ciju
No ratings yet
Team 5 Final Review BCI3002
No ratings yet
Team 5 Final Review BCI3002
25 pages
Fall Semester 2020-21 CSE1007 - Java Programming Lab
No ratings yet
Fall Semester 2020-21 CSE1007 - Java Programming Lab
16 pages
Name: Shruti Garg Reg No.:19Bce0994 Slot: L19+L20 Date: 18/04/2021
No ratings yet
Name: Shruti Garg Reg No.:19Bce0994 Slot: L19+L20 Date: 18/04/2021
13 pages
Name: Shruti Garg Reg No:19Bce0994 Lab Slot: L19+L20 Date: 15-03-21
No ratings yet
Name: Shruti Garg Reg No:19Bce0994 Lab Slot: L19+L20 Date: 15-03-21
38 pages
Fall Semester 2020-21 CSE1007 - Java Programming Lab
No ratings yet
Fall Semester 2020-21 CSE1007 - Java Programming Lab
16 pages
Final Project
No ratings yet
Final Project
19 pages
DBMS Lab Manual Rahul Mehta
No ratings yet
DBMS Lab Manual Rahul Mehta
8 pages
Naman Pahariya Resume
No ratings yet
Naman Pahariya Resume
1 page
Teradata Architecture Features
100% (1)
Teradata Architecture Features
10 pages
Avinash Kumar Singh Resume
No ratings yet
Avinash Kumar Singh Resume
1 page
Data Centres Guidance Owners
No ratings yet
Data Centres Guidance Owners
20 pages
Find, Nurture, and Convert Leads To Close Deals Faster, Easy-Peasy, On The Top CRM Platform For Sales
No ratings yet
Find, Nurture, and Convert Leads To Close Deals Faster, Easy-Peasy, On The Top CRM Platform For Sales
4 pages
900+ Startups Hiring Remotely in 2024
No ratings yet
900+ Startups Hiring Remotely in 2024
6 pages
Flutter Interview Questions
No ratings yet
Flutter Interview Questions
8 pages
Mongo DB Exp 1-Content Beyond The Syllabus
No ratings yet
Mongo DB Exp 1-Content Beyond The Syllabus
13 pages
Types of Control in The Supply Chain.
No ratings yet
Types of Control in The Supply Chain.
7 pages
Building Real-Time Marvels with Laravel: Create Dynamic and Interactive Web Applications 1st Edition Sivaraj Selvaraj All Chapters Instant Download
100% (2)
Building Real-Time Marvels with Laravel: Create Dynamic and Interactive Web Applications 1st Edition Sivaraj Selvaraj All Chapters Instant Download
41 pages
ISTQB Question Paper16
100% (1)
ISTQB Question Paper16
8 pages
iot
No ratings yet
iot
46 pages
ST Cloud IVandD Pilot Report
No ratings yet
ST Cloud IVandD Pilot Report
10 pages
Exam C - FSUTIL - 60: SAP Certified Associate - Utilities With SAP ERP 6.0
No ratings yet
Exam C - FSUTIL - 60: SAP Certified Associate - Utilities With SAP ERP 6.0
7 pages
Upload 03
No ratings yet
Upload 03
3 pages
UI UX Design Syllabus Spring2022
No ratings yet
UI UX Design Syllabus Spring2022
5 pages
Track & Trace Module: Mes Built On Ignition
No ratings yet
Track & Trace Module: Mes Built On Ignition
2 pages
Agile Controller-Campus SD-WAN Instruction v1.0 (Traning Material)
No ratings yet
Agile Controller-Campus SD-WAN Instruction v1.0 (Traning Material)
14 pages
Python For Reverse Engineering and Malware Analysis
100% (1)
Python For Reverse Engineering and Malware Analysis
3 pages
Sysops 4
No ratings yet
Sysops 4
98 pages
Data Model Standards and Guidelines
100% (2)
Data Model Standards and Guidelines
72 pages
Daftar Studi Independen
No ratings yet
Daftar Studi Independen
8 pages
IBM Tivoli Monitoring Problem Determination Guide
No ratings yet
IBM Tivoli Monitoring Problem Determination Guide
262 pages
Cyber Essentials Requirements For Infrastructure v3 1 April 2023
No ratings yet
Cyber Essentials Requirements For Infrastructure v3 1 April 2023
16 pages
08 - Create API TestCases PDF
No ratings yet
08 - Create API TestCases PDF
3 pages
COM 607 Enterprise Resource Planning
No ratings yet
COM 607 Enterprise Resource Planning
128 pages