Web Technology, Unit 1
Web Technology, Unit 1
Web Technology, Unit 1
MCA-124
Rajkumar
Research Scholar,
I.T.C.A.
MMMUT, Gorakhpur
WWW is a collection of websites connected to the internet so that people can search
and share information. Now, let us understand how it works!
1. The Web works as per the internet’s basic client-server format as shown in the
following image.
2. The servers store and transfer web pages or information to user’s computers on
the network when requested by the users.
3. A web server is a software program which serves the web pages requested by web
users using a browser.
4. The computer of a user who requests documents from a server is known as a
client.
Syntax
XML (Extensible Markup Language) helps to define common syntax in semantic web.
Data Interchange
Resource Description Framework (RDF) framework helps in defining core
representation of data for web. RDF represents data about resource in graph form.
Taxonomies
RDF Schema (RDFS) allows more standardized description of taxonomies and other
ontological constructs.
Ontologies Web Ontology Language (OWL) offers more constructs over RDFS. It
comes in following three versions:
1. OWL Lite for taxonomies and simple constraints.
2. OWL DL for full description logic support.
3. OWL for more syntactic freedom of RDF.
Rules RIF and SWRL offers rules beyond the constructs that are available from RDFs
and OWL. Simple Protocol and RDF Query Language (SPARQL) is SQL like language
used for querying RDF data and OWL Ontologies.
Proof All semantic and rules that are executed at layers below Proof and their result
will be used to prove deductions.
User Interface and Applications On the top of layer User interface and Applications
layer is built for user interaction.
How it work
WWW works on client- server approach. Following steps explains how the web works:
1. User enters the URL (say, http://www.xyz.com) of the web page in the address
bar of web browser.
2. Then browser requests the Domain Name Server for the IP address corresponding
to www.xyz.com.
3. After receiving IP address, browser sends the request for web page to the web
server using HTTP protocol which specifies the way the browser and web server
communicates.
4. Then web server receives request using HTTP protocol and checks its search for
the requested web page. If found it returns it back to the web browser and close
the HTTP connection.
5. Now the web browser receives the web page, It interprets it and display the
contents of web page in web browser’s window.
Search engines are answer machines. They exist to discover, understand, and organize
the internet’s content in order to offer the most relevant results.
A key motivation for designing Web crawlers has been to retrieve Web pages and add
their representations to a local repository.
I What is the “Web Crawling
I What are the uses of Web Crawling
I How API are used
-A Web crawler (also known as a Web spider, Web robot, or—especially in the FOAF
community—Web scutter) is a program or automated script that browses the World
Wide Web in a
-methodical
-automated manner
-Other less frequently used names for Web crawlers are ants, automatic indexers, bots,
and worms.
Crawlers are computer programs that roam the Web with the goal of automating
specific tasks related to the Web.The role of Crawlers is to collect Web Content.
Architecture of a web crawler Figure shows the generalized architecture of web crawler.
It has three main components:a frontier which stores the list of URL’s to visit, Page
Downloader which download pages from WWW andWeb Repository receives web pages
from a crawler and stores it in the database.
Web Mining is the process of Data Mining techniques to automatically discover and
extract information from Web documents and services.
The main purpose of web mining is discovering useful information from the World-Wide
Web and its usage patterns.
Web mining can be broadly divided into three different types of techniques of mining:
1. Web Content Mining
2. Web Structure Mining
3. Web Usage Mining
These are explained as following below.
I Web content mining is the application of extracting useful information from the
content of the web documents. Web content consist of several types of data –
text, image, audio, video etc.
I The Content data is the group of facts that a web page is designed. It can
provide effective and interesting patterns about user needs.
I The Text documents are related to text mining, machine learning and natural
language processing. This mining is also known as text mining.
I This type of mining performs scanning and mining of the text, images and groups
of web pages according to the content of the input.