Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

MST Unit-1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 60

UNIT-1

Introduction to the Web:


Web consists of billions of clients and server connected through
wires and wireless networks. The web clients make requests to web
server. The web server receives the request, finds the resources and
return the response to the client. When a server answers a request,
it usually sends some type of content to the client. The client uses
web browser to send request to the server. The server often sends
response to the browser with a set of instructions written in
HTML(HyperText Markup Language). All browsers know how to
display HTML page to the client.

Web Application

A website is a collection of static files(webpages) such as HTML


pages, images, graphics etc. A Web application is a web site with
dynamic functionality on the server. Google, Facebook, Twitter are
examples of web applications.
HTTP (Hypertext Transfer Protocol):

 HTTP is a protocol that clients and servers use on the web to


communicate.

 It is similar to other internet protocols such as SMTP(Simple


Mail Transfer Protocol) and FTP(File Transfer Protocol) but
there is one fundamental difference.

 HTTP is a stateless protocol i.e HTTP supports only one


request per connection. This means that with HTTP the clients
connect to the server to send one request and then
disconnects. This mechanism allows more users to connect to
a given server over a period of time.

 The client sends an HTTP request and the server answers with
an HTML page to the client, using HTTP.

HTTP Methods:

HTTP request can be made using a variety of methods, but the ones
you will use most often are Get and Post. The method name tells
the server the kind of request that is being made, and how the rest
of the message will be formated.
HTTP Methods and Descriptions :

Method Name Description

Request for communication options that are available on the


OPTIONS
request/response chain.

GET Request to retrieve information from server using a given URI.

Identical to GET except that it does not return a message-body,


HEAD
only the headers and status line.

Request for server to accept the entity enclosed in the body of


POST
HTTP method.

DELETE Request for the Server to delete the resource.

CONNECT Reserved for use with a proxy that can switch to being a tunnel.

PUT This is same as POST, but POST is used to create, PUT can be

used to create as well as update. It replaces all current

representations of the target resource with the uploaded


Method Name Description

content.

Difference between GET and POST requests:

GET Request POST Request

Data is sent in header to the server Data is sent in the request body

Get request can send only limited amount of data Large amount of data can be sent.

Get request is not secured because data is exposed in URL Post request is secured because data is not exposed in URL

Get request can be bookmarked and is more efficient. Post request cannot be bookmarked.

General Difference between PUT and POST methods:

Following are some basic differences between the PUT and the
POST methods :

 POST to a URL creates a child resource at a server defined URL


while PUT to a URL creates/replaces the resource in its entirety
at the client defined URL.
 POST creates a child resource, so POST to /books will create a
resources that will live under the /books resource. Eg. /books/1.
Sending the same post request twice will create two resources.

 PUT is for creating or replacing a resource at a URL known by


the client.

 PUT must be used for CREATE when the client already knows
the url before the resource is created.

 PUT replaces the resource at the known url if it already exists,


so sending the same request twice has no effect. In other
words, calls to PUT are idempotent.

What is the Internet?

The Internet is a global network of billions of computers and


other electronic devices. With the Internet, it's possible to
access almost any information, communicate with anyone else
in the world, and do much more.

You can do all of this by connecting a computer to the Internet,


which is also called going online. When someone says a
computer is online, it's just another way of saying it's
connected to the Internet.
What is the Web?

 The World Wide Web—usually called the Web for short—is


a collection of different websites you can access through
the Internet. A website is made up of related text, images,
and other resources. Websites can resemble other forms
of media—like newspaper articles or television programs—
or they can be interactive in a way that's unique to
computers.
 The purpose of a website can be almost anything: a news
platform, an advertisement, an online library, a forum for
sharing images, or an educational site like us!
 Once you are connected to the Internet, you can access
and view websites using a type of application called a web
browser. Just keep in mind that the web browser itself is
not the Internet; it only displays websites that are stored
on the Internet.

How does the Internet work?

 At this point you may be wondering, how does the Internet


work? The exact answer is pretty complicated and would
take a while to explain. Instead, let's look at some of the
most important things you should know.
 It's important to realize that the Internet is a global
network of physical cables, which can include copper
telephone wires, TV cables, and fiber optic cables. Even
wireless connections like Wi-Fi and 3G/4G rely on these
physical cables to access the Internet.
 When you visit a website, your computer sends a request
over these wires to a server. A server is where websites are
stored, and it works a lot like your computer's hard drive.
Once the request arrives, the server retrieves the website
and sends the correct data back to your computer. What's
amazing is that this all happens in just a few seconds!

Other things you can do on the Internet:


One of the best features of the Internet is the ability to
communicate almost instantly with anyone in the
world. Email is one of the oldest and most universal ways to
communicate and share information on the Internet, and
billions of people use it. Social media allows people to connect in
a variety of ways and build communities online.
There are many other things you can do on the Internet. There
are thousands of ways to keep up with news or shop for
anything online. You can pay your bills, manage your bank
accounts, meet new people, watch TV, or learn new skills. You
can learn or do almost anything online.
World Wide Web:
The World Wide Web (WWW), commonly known as the Web, is
an information system where documents and other web
resources are identified by Uniform Resource Locators (URLs,
such as https://example.com/), which may be interlinked
by hyperlinks, and are accessible over the Internet.[1][2] The
resources of the Web are transferred via the Hypertext Transfer
Protocol (HTTP), may be accessed by users by a software
application called a web browser, and are published by a software
application called a web server. The World Wide Web is not
synonymous with the Internet, which pre-dated the Web in some
form by over two decades and upon the technologies of which the
Web is built.

A web page displayed in a web browser

A global map of the Web Index for countries in 2014


English scientist Tim Berners-Lee invented the World Wide Web in
1989. He wrote the first web browser in 1990 while employed
at CERN near Geneva, Switzerland. The browser was released
outside CERN to other research institutions starting in January
1991, and then to the general public in August 1991. The Web
began to enter everyday use in 1993–4, when websites for general
use started to become available. The World Wide Web has been
central to the development of the Information Age, and is the
primary tool billions of people use to interact on the Internet.
Web resources may be any type of downloaded media, but web
pages are hypertext documents formatted in Hypertext Markup
Language (HTML). Special HTML syntax displays
embedded hyperlinks with URLs which permits users to navigate to
other web resources. In addition to text, web pages may contain
references to images, video, audio, and software components
which are either displayed or internally executed in the user's web
browser to render pages or streams of multimedia content.
Multiple web resources with a common theme and usually a
common domain name make up a website. Websites are stored in
computers that are running a web server, which is a program that
responds to requests made over the Internet from web browsers
running on a user's computer. Website content can be provided by
a publisher, or interactively from user-generated content. Websites
are provided for a myriad of informative, entertainment,
commercial, and governmental reasons.
Function:
The terms Internet and World Wide Web are often used without
much distinction. However, the two terms do not mean the same
thing. The Internet is a global system of interconnected computer
networks. In contrast, the World Wide Web is a global collection of
documents and other resources, linked by hyperlinks and URIs.
Web resources are accessed using HTTP or HTTPS, which are
application-level Internet protocols that use the Internet's transport
protocols.

The World Wide Web functions as an application layer protocol that


is run "on top of" (figuratively) the Internet, helping to make it more
functional. The advent of the Mosaic web browser helped to make
the web much more usable, to include the display of images and
moving images (GIFs).
1. HTML
2. Linking
3. WWW prefix
4. Scheme specifiers
5. Pages
6. Static page
7. Dynamic pages

Domain Name Service/System(DNS):

The Domain Name System (DNS) is the phonebook of the Internet.


Humans access information online through domain names, like
nytimes.com or espn.com. Web browsers interact through Internet
Protocol (IP) addresses. DNS translates domain names to IP
addresses so browsers can load Internet resources.
Each device connected to the Internet has a unique IP address
which other machines use to find the device. DNS servers eliminate
the need for humans to memorize IP addresses such as 192.168.1.1
(in IPv4), or more complex newer alphanumeric IP addresses such
as 2400:cb00:2048:1::c629:d7a2 (in IPv6).

How does DNS work?

The process of DNS resolution involves converting a hostname


(such as www.example.com) into a computer-friendly IP address
(such as 192.168.1.1). An IP address is given to each device on the
Internet, and that address is necessary to find the appropriate
Internet device - like a street address is used to find a particular
home. When a user wants to load a webpage, a translation must
occur between what a user types into their web browser
(example.com) and the machine-friendly address necessary to
locate the example.com webpage.

In order to understand the process behind the DNS resolution, it’s


important to learn about the different hardware components a DNS
query must pass between. For the web browser, the DNS lookup
occurs "behind the scenes" and requires no interaction from the
user’s computer apart from the initial request.

There are 4 DNS servers involved in loading a webpage:

 DNS recursor - The recursor can be thought of as a


librarian who is asked to go find a particular book
somewhere in a library. The DNS recursor is a server
designed to receive queries from client machines through
applications such as web browsers. Typically the recursor
is then responsible for making additional requests in order
to satisfy the client’s DNS query.

 Root nameserver - The root server is the first step in


translating (resolving) human readable host names into IP
addresses. It can be thought of like an index in a library
that points to different racks of books - typically it serves
as a reference to other more specific locations.
 TLD nameserver - The top level domain server (TLD) can be
thought of as a specific rack of books in a library. This
nameserver is the next step in the search for a specific IP
address, and it hosts the last portion of a hostname (In
example.com, the TLD server is “com”).

 Authoritative nameserver - This final nameserver can be


thought of as a dictionary on a rack of books, in which a
specific name can be translated into its definition. The
authoritative nameserver is the last stop in the nameserver
query. If the authoritative name server has access to the
requested record, it will return the IP address for the
requested hostname back to the DNS Recursor (the
librarian) that made the initial request.

What's the difference between an authoritative DNS server and a


recursive DNS resolver?

Both concepts refer to servers (groups of servers) that are integral


to the DNS infrastructure, but each performs a different role and
lives in different locations inside the pipeline of a DNS query. One
way to think about the difference is the recursive resolver is at the
beginning of the DNS query and the authoritative nameserver is at
the end.

Recursive DNS resolver:

The recursive resolver is the computer that responds to a recursive


request from a client and takes the time to track down the DNS
record. It does this by making a series of requests until it reaches
the authoritative DNS nameserver for the requested record (or
times out or returns an error if no record is found). Luckily,
recursive DNS resolvers do not always need to make multiple
requests in order to track down the records needed to respond to a
client; caching is a data persistence process that helps short-
circuit the necessary requests by serving the requested resource
record earlier in the DNS lookup.
Authoritative DNS server:

Put simply, an authoritative DNS server is a server that actually holds, and is

responsible for, DNS resource records. This is the server at the bottom of the DNS

lookup chain that will respond with the queried resource record, ultimately allowing

the web browser making the request to reach the IP address needed to access a website

or other web resources. An authoritative nameserver can satisfy queries from its own

data without needing to query another source, as it is the final source of truth for

certain DNS records.

It’s worth mentioning that in instances where the query is for a subdomain such as
foo.example.com or blog.cloudflare.com, an additional nameserver will be added to
the sequence after the authoritative nameserver, which is responsible for storing the
subdomain’s CNAME record.
There is a key difference between many DNS services and the one that Cloudflare
provides. Different DNS recursive resolvers such as Google DNS, OpenDNS, and
providers like Comcast all maintain data center installations of DNS recursive
resolvers. These resolvers allow for quick and easy queries through optimized
clusters of DNS-optimized computer systems, but they are fundamentally different
than the nameservers hosted by Cloudflare.

Cloudflare maintains infrastructure-level nameservers that are integral to the


functioning of the Internet. One key example is the f-root server network which
Cloudflare is partially responsible for hosting. The F-root is one of the root level DNS
nameserver infrastructure components responsible for the billions of Internet
requests per day. Our Anycast network puts us in a unique position to handle large
volumes of DNS traffic without service interruption.
What are the steps in a DNS lookup?
For most situations, DNS is concerned with a domain name being translated into the
appropriate IP address. To learn how this process works, it helps to follow the path of
a DNS lookup as it travels from a web browser, through the DNS lookup process, and
back again. Let's take a look at the steps.

Note: Often DNS lookup information will be cached either locally inside the querying
computer or remotely in the DNS infrastructure. There are typically 8 steps in a DNS
lookup. When DNS information is cached, steps are skipped from the DNS lookup
process which makes it quicker. The example below outlines all 8 steps when nothing
is cached.

The 8 steps in a DNS lookup:


1. A user types ‘example.com’ into a web browser and the query travels into the
Internet and is received by a DNS recursive resolver.

2. The resolver then queries a DNS root nameserver (.).

3. The root server then responds to the resolver with the address of a Top Level
Domain (TLD) DNS server (such as .com or .net), which stores the information for
its domains. When searching for example.com, our request is pointed toward the
.com TLD.

4. The resolver then makes a request to the .com TLD.

5. The TLD server then responds with the IP address of the domain’s nameserver,
example.com.

6. Lastly, the recursive resolver sends a query to the domain’s nameserver.

7. The IP address for example.com is then returned to the resolver from the
nameserver.

8. The DNS resolver then responds to the web browser with the IP address of the
domain requested initially.
Once the 8 steps of the DNS lookup have returned the IP address for
example.com, the browser is able to make the request for the web page:

9. The browser makes a HTTP request to the IP address.

10. The server at that IP returns the webpage to be rendered in the browser (step
10).

What is a DNS resolver?


The DNS resolver is the first stop in the DNS lookup, and it is responsible for dealing
with the client that made the initial request. The resolver starts the sequence of
queries that ultimately leads to a URL being translated into the necessary IP address.
Note: A typical uncached DNS lookup will involve both recursive and iterative
queries.

It's important to differentiate between a recursive DNS query and a recursive DNS


resolver. The query refers to the request made to a DNS resolver requiring the
resolution of the query. A DNS recursive resolver is the computer that accepts a
recursive query and processes the response by making the necessary requests.

What are the types of DNS queries?


In a typical DNS lookup three types of queries occur. By using a combination of these
queries, an optimized process for DNS resolution can result in a reduction of distance
traveled. In an ideal situation cached record data will be available, allowing a DNS
name server to return a non-recursive query.

3 types of DNS queries:


1. Recursive query - In a recursive query, a DNS client requires that a DNS server
(typically a DNS recursive resolver) will respond to the client with either the
requested resource record or an error message if the resolver can't find the
record.

2. Iterative query - in this situation the DNS client will allow a DNS server to
return the best answer it can. If the queried DNS server does not have a match
for the query name, it will return a referral to a DNS server authoritative for a
lower level of the domain namespace. The DNS client will then make a query to
the referral address. This process continues with additional DNS servers down
the query chain until either an error or timeout occurs.

3. Non-recursive query - typically this will occur when a DNS resolver client
queries a DNS server for a record that it has access to either because it's
authoritative for the record or the record exists inside of its cache. Typically, a
DNS server will cache DNS records to prevent additional bandwidth
consumption and load on upstream servers.

What is DNS caching? Where does DNS


caching occur?
The purpose of caching is to temporarily stored data in a location that results in
improvements in performance and reliability for data requests. DNS caching involves
storing data closer to the requesting client so that the DNS query can be resolved
earlier and additional queries further down the DNS lookup chain can be avoided,
thereby improving load times and reducing bandwidth/CPU consumption. DNS data
can be cached in a variety of locations, each of which will store DNS records for a set
amount of time determined by a time-to-live (TTL).

Browser DNS caching:


Modern web browsers are designed by default to cache DNS records for a set
amount of time. The purpose here is obvious; the closer the DNS caching occurs to
the web browser, the fewer processing steps must be taken in order to check the
cache and make the correct requests to an IP address. When a request is made for a
DNS record, the browser cache is the first location checked for the requested record.

In Chrome, you can see the status of your DNS cache by going to chrome://net-
internals/#dns.

Operating system (OS) level DNS caching:


The operating system level DNS resolver is the second and last local stop before a
DNS query leaves your machine. The process inside your operating system that is
designed to handle this query is commonly called a “stub resolver” or DNS client.
When a stub resolver gets a request from an application, it first checks its own cache
to see if it has the record. If it does not, it then sends a DNS query (with a recursive
flag set), outside the local network to a DNS recursive resolver inside the Internet
service provider (ISP).

When the recursive resolver inside the ISP receives a DNS query, like all previous
steps, it will also check to see if the requested host-to-IP-address translation is
already stored inside its local persistence layer.

The recursive resolver also has additional functionality depending on the types of
records it has in its cache:

1. If the resolver does not have the A records, but does have the NS records for the
authoritative nameservers, it will query those name servers directly, bypassing
several steps in the DNS query. This shortcut prevents lookups from the root and
.com nameservers (in our search for example.com) and helps the resolution of
the DNS query occur more quickly.

2. If the resolver does not have the NS records, it will send a query to the TLD
servers (.com in our case), skipping the root server.

3. In the unlikely event that the resolver does not have records pointing to the TLD
servers, it will then query the root servers. This event typically occurs after a DNS
cache has been purged.

Protocols:

In networking, a protocol is a set of rules for formatting and processing data.


Network protocols are like a common language for computers. The computers within
a network may use vastly different software and hardware; however, the use of
protocols enables them to communicate with each other regardless.

Standardized protocols are like a common language that computers can use, similar
to how two people from different parts of the world may not understand each
other's native languages, but they can communicate using a shared third language. If
one computer uses the Internet Protocol (IP) and a second computer does as well,
they will be able to communicate — just as the United Nations relies on its 6 official
languages to communicate amongst representatives from all over the globe. But if
one computer uses IP and the other does not know this protocol, they will be unable
to communicate.
On the Internet, there are different protocols for different types of processes.
Protocols are often discussed in terms of which OSI model layer they belong to.

Types of Protocols:
 Transmission Control Protocol (TCP)
 Internet Protocol (IP)
 User Datagram Protocol (UDP)
 Post office Protocol (POP)
 Simple mail transport Protocol (SMTP)
 File Transfer Protocol (FTP)
 Hyper Text Transfer Protocol (HTTP)
 Hyper Text Transfer Protocol Secure (HTTPS)
An overview of HTTP:

HTTP is a protocol which allows the fetching of resources, such as HTML documents.
It is the foundation of any data exchange on the Web and it is a client-server
protocol, which means requests are initiated by the recipient, usually the Web
browser. A complete document is reconstructed from the different sub-documents
fetched, for instance text, layout description, images, videos, scripts, and more.

A Web document is the composition of different resources

Clients and servers communicate by exchanging individual messages (as opposed to


a stream of data). The messages sent by the client, usually a Web browser, are called
requests and the messages sent by the server as an answer are called responses.

HTTP as an application layer protocol, on top of TCP (transport layer) and IP (network
layer) and below the presentation layer.Designed in the early 1990s, HTTP is an
extensible protocol which has evolved over time. It is an application layer protocol
that is sent over TCP, or over a TLS-encrypted TCP connection, though any reliable
transport protocol could theoretically be used. Due to its extensibility, it is used to
not only fetch hypertext documents, but also images and videos or to post content
to servers, like with HTML form results. HTTP can also be used to fetch parts of
documents to update Web pages on demand.

Components of HTTP-based systems:

HTTP is a client-server protocol: requests are sent by one entity, the user-agent (or a
proxy on behalf of it). Most of the time the user-agent is a Web browser, but it can
be anything, for example a robot that crawls the Web to populate and maintain a
search engine index.

Each individual request is sent to a server, which handles it and provides an answer,
called the response. Between the client and the server there are numerous entities,
collectively called proxies, which perform different operations and act as gateways or
caches, for example.

Client server chain:

In reality, there are more computers between a browser and the server handling the
request: there are routers, modems, and more. Thanks to the layered design of the
Web, these are hidden in the network and transport layers. HTTP is on top, at the
application layer. Although important to diagnose network problems, the underlying
layers are mostly irrelevant to the description of HTTP.

Client: the user-agent:

The user-agent is any tool that acts on the behalf of the user. This role is primarily
performed by the Web browser; other possibilities are programs used by engineers
and Web developers to debug their applications.

The browser is always the entity initiating the request. It is never the server (though
some mechanisms have been added over the years to simulate server-initiated
messages).

To present a Web page, the browser sends an original request to fetch the HTML
document that represents the page. It then parses this file, making additional
requests corresponding to execution scripts, layout information (CSS) to display, and
sub-resources contained within the page (usually images and videos). The Web
browser then mixes these resources to present to the user a complete document, the
Web page. Scripts executed by the browser can fetch more resources in later phases
and the browser updates the Web page accordingly.

A Web page is a hypertext document. This means some parts of displayed text are
links, which can be activated (usually by a click of the mouse) to fetch a new Web
page, allowing the user to direct their user-agent and navigate through the Web. The
browser translates these directions in HTTP requests, and further interprets the HTTP
responses to present the user with a clear response.
The Web server:

On the opposite side of the communication channel, is the server, which serves the
document as requested by the client. A server appears as only a single machine
virtually: this is because it may actually be a collection of servers, sharing the load
(load balancing) or a complex piece of software interrogating other computers (like
cache, a DB server, or e-commerce servers), totally or partially generating the
document on demand.

A server is not necessarily a single machine, but several server software instances can
be hosted on the same machine. With HTTP/1.1 and the Host header, they may even
share the same IP address.

Proxies:

Between the Web browser and the server, numerous computers and machines relay
the HTTP messages. Due to the layered structure of the Web stack, most of these
operate at the transport, network or physical levels, becoming transparent at the
HTTP layer and potentially making a significant impact on performance. Those
operating at the application layers are generally called proxies. These can be
transparent, forwarding on the requests they receive without altering them in any
way, or non-transparent, in which case they will change the request in some way
before passing it along to the server. Proxies may perform numerous functions:

 caching (the cache can be public or private, like the browser cache)
 filtering (like an antivirus scan or parental controls)
 load balancing (to allow multiple servers to serve the different requests)
 authentication (to control access to different resources)
 logging (allowing the storage of historical information)

Basic aspects of HTTP:

HTTP is simple:

HTTP is generally designed to be simple and human readable, even with the added
complexity introduced in HTTP/2 by encapsulating HTTP messages into frames. HTTP
messages can be read and understood by humans, providing easier testing for
developers, and reduced complexity for newcomers.

HTTP is extensible:
Introduced in HTTP/1.0, HTTP headers make this protocol easy to extend and
experiment with. New functionality can even be introduced by a simple agreement
between a client and a server about a new header's semantics.

 HTTP is stateless, but not sessionless

HTTP is stateless: there is no link between two requests being successively carried
out on the same connection. This immediately has the prospect of being problematic
for users attempting to interact with certain pages coherently, for example, using e-
commerce shopping baskets. But while the core of HTTP itself is stateless, HTTP
cookies allow the use of stateful sessions. Using header extensibility, HTTP Cookies
are added to the workflow, allowing session creation on each HTTP request to share
the same context, or the same state.

HTTP and connections:

A connection is controlled at the transport layer, and therefore fundamentally out of


scope for HTTP. Though HTTP doesn't require the underlying transport protocol to
be connection-based; only requiring it to be reliable, or not lose messages (so at
minimum presenting an error). Among the two most common transport protocols on
the Internet, TCP is reliable and UDP isn't. HTTP therefore relies on the TCP standard,
which is connection-based.

Before a client and server can exchange an HTTP request/response pair, they must
establish a TCP connection, a process which requires several round-trips. The default
behavior of HTTP/1.0 is to open a separate TCP connection for each HTTP
request/response pair. This is less efficient than sharing a single TCP connection
when multiple requests are sent in close succession.

In order to mitigate this flaw, HTTP/1.1 introduced pipelining (which proved difficult
to implement) and persistent connections: the underlying TCP connection can be
partially controlled using the Connection header. HTTP/2 went a step further by
multiplexing messages over a single connection, helping keep the connection warm
and more efficient.

Experiments are in progress to design a better transport protocol more suited to


HTTP. For example, Google is experimenting with QUIC which builds on UDP to
provide a more reliable and efficient transport protocol.

What can be controlled by HTTP:


This extensible nature of HTTP has, over time, allowed for more control and
functionality of the Web. Cache or authentication methods were functions handled
early in HTTP history. The ability to relax the origin constraint, by contrast, has only
been added in the 2010s.

Here is a list of common features controllable with HTTP:

Caching How documents are cached can be controlled by HTTP. The server can
instruct proxies and clients, about what to cache and for how long. The client can
instruct intermediate cache proxies to ignore the stored document.

Relaxing the origin constraint To prevent snooping and other privacy invasions, Web
browsers enforce strict separation between Web sites. Only pages from the same
origin can access all the information of a Web page. Though such constraint is a
burden to the server, HTTP headers can relax this strict separation on the server side,
allowing a document to become a patchwork of information sourced from different
domains; there could even be security-related reasons to do so.

Authentication Some pages may be protected so that only specific users can access
them. Basic authentication may be provided by HTTP, either using the WWW-
Authenticate and similar headers, or by setting a specific session using HTTP cookies.

Proxy and tunneling Servers or clients are often located on intranets and hide their
true IP address from other computers. HTTP requests then go through proxies to
cross this network barrier. Not all proxies are HTTP proxies. The SOCKS protocol, for
example, operates at a lower level. Other protocols, like ftp, can be handled by these
proxies.

Sessions Using HTTP cookies allows you to link requests with the state of the server.
This creates sessions, despite basic HTTP being a state-less protocol. This is useful
not only for e-commerce shopping baskets, but also for any site allowing user
configuration of the output.

HTTP flow:

When a client wants to communicate with a server, either the final server or an
intermediate proxy, it performs the following steps:
Open a TCP connection: The TCP connection is used to send a request, or several,
and receive an answer. The client may open a new connection, reuse an existing
connection, or open several TCP connections to the servers.

Send an HTTP message: HTTP messages (before HTTP/2) are human-readable. With
HTTP/2, these simple messages are encapsulated in frames, making them impossible
to read directly, but the principle remains the same.

For example:

GET / HTTP/1.1

Host: developer.mozilla.org

Accept-Language: fr

Read the response sent by the server, such as:

HTTP/1.1 200 OK

Date: Sat, 09 Oct 2010 14:28:02 GMT

Server: Apache

Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT

ETag: "51142bc1-7449-479b075b2891b"

Accept-Ranges: bytes

Content-Length: 29769

Content-Type: text/html

FTP:
o FTP stands for File transfer protocol.
o FTP is a standard internet protocol provided by TCP/IP used for transmitting the files
from one host to another.
o It is mainly used for transferring the web page files from their creator to the
computer that acts as a server for other computers on the internet.
o It is also used for downloading the files to computer from other servers.

Objectives of FTP
o It provides the sharing of files.
o It is used to encourage the use of remote computers.
o It transfers the data more reliably and efficiently.

Why FTP?
Although transferring files from one system to another is very simple and
straightforward, but sometimes it can cause problems. For example, two systems may
have different file conventions. Two systems may have different ways to represent
text and data. Two systems may have different directory structures. FTP protocol
overcomes these problems by establishing two connections between hosts. One
connection is used for data transfer, and another connection is used for the control
connection.

Mechanism of FTP
The above figure shows the basic model of the FTP. The FTP client has three
components: the user interface, control process, and data transfer process. The server
has two components: the server control process and the server data transfer process.

There are two types of connections in FTP:

o Control Connection: The control connection uses very simple rules for


communication. Through control connection, we can transfer a line of command or
line of response at a time. The control connection is made between the control
processes. The control connection remains connected during the entire interactive
FTP session.
o Data Connection: The Data Connection uses very complex rules as data types may
vary. The data connection is made between data transfer processes. The data
connection opens when a command comes for transferring the files and closes when
the file is transferred.

FTP Clients
o FTP client is a program that implements a file transfer protocol which allows you to
transfer files between two hosts on the internet.
o It allows a user to connect to a remote host and upload or download the files.
o It has a set of commands that we can use to connect to a host, transfer the files
between you and your host and close the connection.
o The FTP program is also available as a built-in component in a Web browser. This GUI
based FTP client makes the file transfer very easy and also does not require to
remember the FTP commands.

Advantages of FTP:
o Speed: One of the biggest advantages of FTP is speed. The FTP is one of the fastest
way to transfer the files from one computer to another computer.
o Efficient: It is more efficient as we do not need to complete all the operations to get
the entire file.
o Security: To access the FTP server, we need to login with the username and
password. Therefore, we can say that FTP is more secure.
o Back & forth movement: FTP allows us to transfer the files back and forth. Suppose
you are a manager of the company, you send some information to all the employees,
and they all send information back on the same server.

Disadvantages of FTP:
o The standard requirement of the industry is that all the FTP transmissions should be
encrypted. However, not all the FTP providers are equal and not all the providers
offer encryption. So, we will have to look out for the FTP providers that provides
encryption.
o FTP serves two operations, i.e., to send and receive large files on a network. However,
the size limit of the file is 2GB that can be sent. It also doesn't allow you to run
simultaneous transfers to multiple receivers.
o Passwords and file contents are sent in clear text that allows unwanted
eavesdropping. So, it is quite possible that attackers can carry out the brute force
attack by trying to guess the FTP password.
o It is not compatible with every system.

SMTP:
o SMTP stands for Simple Mail Transfer Protocol.
o SMTP is a set of communication guidelines that allow software to transmit an
electronic mail over the internet is called Simple Mail Transfer Protocol.
o It is a program used for sending messages to other computer users based on e-mail
addresses.
o It provides a mail exchange between users on the same or different computers, and it
also supports:
o It can send a single message to one or more recipients.
o Sending message can include text, voice, video or graphics.
o It can also send the messages on networks outside the internet.
o The main purpose of SMTP is used to set up communication rules between servers.
The servers have a way of identifying themselves and announcing what kind of
communication they are trying to perform. They also have a way of handling the
errors such as incorrect email address. For example, if the recipient address is wrong,
then receiving server reply with an error message of some kind.

Components of SMTP

o First, we will break the SMTP client and SMTP server into two components such as
user agent (UA) and mail transfer agent (MTA). The user agent (UA) prepares the
message, creates the envelope and then puts the message in the envelope. The mail
transfer agent (MTA) transfers this mail across the internet.
o SMTP allows a more complex system by adding a relaying system. Instead of just
having one MTA at sending side and one at receiving side, more MTAs can be added,
acting either as a client or server to relay the email.

o The relaying system without TCP/IP protocol can also be used to send the emails to
users, and this is achieved by the use of the mail gateway. The mail gateway is a relay
MTA that can be used to receive an email.
Working of SMTP:
1. Composition of Mail: A user sends an e-mail by composing an electronic mail
message using a Mail User Agent (MUA). Mail User Agent is a program which is used
to send and receive mail. The message contains two parts: body and header. The
body is the main part of the message while the header includes information such as
the sender and recipient address. The header also includes descriptive information
such as the subject of the message. In this case, the message body is like a letter and
header is like an envelope that contains the recipient's address.
2. Submission of Mail: After composing an email, the mail client then submits the
completed e-mail to the SMTP server by using SMTP on TCP port 25.
3. Delivery of Mail: E-mail addresses contain two parts: username of the recipient and
domain name. For example, vivek@gmail.com, where "vivek" is the username of the
recipient and "gmail.com" is the domain name.
If the domain name of the recipient's email address is different from the sender's
domain name, then MSA will send the mail to the Mail Transfer Agent (MTA). To relay
the email, the MTA will find the target domain. It checks the MX record from Domain
Name System to obtain the target domain. The MX record contains the domain name
and IP address of the recipient's domain. Once the record is located, MTA connects to
the exchange server to relay the message.
4. Receipt and Processing of Mail: Once the incoming message is received, the
exchange server delivers it to the incoming server (Mail Delivery Agent) which stores
the e-mail where it waits for the user to retrieve it.
5. Access and Retrieval of Mail: The stored email in MDA can be retrieved by using
MUA (Mail User Agent). MUA can be accessed by using login and password.

HTML5:
HTML5 tutorial provides details of all 40+ HTML tags including audio, video,
header, footer, data, datalist, article etc. This HTML tutorial is designed for beginners
and professionals.

HTML5 is a next version of HTML. Here, you will get some brand new features which
will make HTML much easier. These new introducing features make your website
layout clearer to both website designers and users. There are some elements like
<header>, <footer>, <nav> and <article> that define the layout of a website.
Why use HTML5:
It is enriched with advance features which makes it easy and interactive for
designer/developer and users.

It allows you to play a video and audio file.

It allows you to draw on a canvas.

It facilitate you to design better forms and build web applications that work offline.

It provides you advance features for that you would normally have to write JavaScript
to do.

The most important reason to use HTML 5 is, we believe it is not going anywhere. It
will be here to serve for a long time according to W3C recommendation.

HTML 5 Example:
Let's see a simple example of HTML5.

1. <!DOCTYPE>  
2. <html>  
3. <body>  
4. <h1>Write Your First Heading</h1>  
5. <p>Write Your First Paragraph.</p>  
6. </body>  
7. </html>  
Supporting browsers:

1.Chrome

2.IE

3.Firefox

4.Opera

5.Safari
List of the Advantages and Disadvantages of HTML5:

ADVANTAGES:

1. HTML5 isn’t a proprietary code.

You are not required to pay royalties if you decide to use HTML5 for your
website. It’s cross-platform, which means you can use it on virtually any
device. It works the same whether you access a website through a
desktop, a laptop, a smartphone, or even your television. As long as the
browser you are using supports HTML5, then there is a good chance that it
will work as it should.

2. It provides audio and video support.


Through the use of the CANVAAS element, you are able to run a lot of
different components through your website that used to require an
embedded application or installed software on the user side. That means
HTML5 allows you to generate dynamic graphics, incorporate online
games, and use interactive video. There are even offline games and video
that are now possible thanks to what HTML5 provides.

3. The coding with HTML5 is clear and consistent.


If you grew up in the 1990s and learned coding then, you will appreciate
the cleanliness of HTML5’s coding profile. It is simple, straight-forward, and
very easy to read. You can quickly separate the content from the style,
making it easier to compose code that is descriptive and clear. It doesn’t
take long for new coders to learn the language either with this structure,
which means anyone with a passion in this area can follow it.

4. There is more consistency with websites because of HTML5.


You’ll still find various iterations of the different HTML versions sprinkled
throughout the internet. As more websites come over to HTML5, however,
you’ll see from a user standpoint that there is much more consistency with
the internet experience from the user perspective. Many websites are even
using similar code to accomplish very different goals, which quickens the
loading experience without duplicating it for users. This also makes it much
easier for developers to understand one another from a structural
standpoint.

5. There are more page layout elements available for your content.
If you’ve grown familiar with the older versions of HTML, then you know
what your options are already: Div, Heading, Paragraph, and Span. With
HTML5, you’ve got a lot of elements to play around with when designing
your page layouts. Headers, footers, areas, and sections are all available to
you. That makes it possible to develop a page with representative mark-
ups that guide users through the purpose of the content they are
encountering.

6. It offers search engine optimization benefits.


As late as 2010, it was possible to generate some solid organic results from
search engines by stuffing a ton of keywords into your content. If you
assigned the right design elements to distinguish yourself from the
competition, you could almost guarantee a top ranking for clicks. Today’s
SEO is more about value than anything else, which HTML5 complements
nicely. Because you can construct semantically with this version, you’re
able to maintain your coding with higher levels of reliability. That means
real content, not repetitive content, pushes you higher in the rankings,
creating the potential for higher conversions.

7. HTML5 requires less maintenance than other options.


HTML5 utilizes an open-source programming language that is almost
universally known. That means you can find the support you need for
troubleshooting online on your own. It also means that you’ll be going
through fewer maintenance issues over time because updates to the
coding can be updated in real-time. If you have an app that is live on the
app store, you don’t need to resubmit your product. Just update the code
and it will populate itself to those who are using your product.

8. The storage options with HTML5 are more reliable.


With HTML5, you have the ability to store user-side data temporarily within
a SQL database. That moves you away from the need to incorporate
cookies, which is a definite advantage thanks to changes in privacy laws in
Europe. You’ll also find that many users prefer being able to use a website
that offers an offline application cache, since they can reload previous
websites they have visited – even if they happen to be offline at that time.

9. It eliminates the need for multiple developments.


From a business perspective, HTML5 is all about saving you time and
money. Because it is able to be deployed across multiple platforms, you
are no longer forced into a world where multiple code variations are
required to make your business available to customers. You can develop
once, using the same code, while being able to approach multiple markets.
That means your lifetime costs for development can be much less
compared to how previous structures were implemented.
10. All compatible browsers collect and use data.
When you’re using HTML5 from a mobile perspective, you still have the
ability to collect useful data, collate it, and then use it to reach your metrics
and goals. That means you can have multiple people using multiple
devices and different browsers while knowing that your results are going to
be the same. The user experience may be slightly different with each
browser, though the HTML5 experience is virtually the same for anyone on
any compatible device and browser.

11. It performs well with excellent consistency.


With HTML5, you’re eliminating the need to have plugins downloaded to
play games or interact with your website. Remember when you’d need to
click on that “update Flash” link on a website? That issue goes away.
Although not every browser will support every possible feature available
within the language of HTML5, you’ll find that users are willing to avoid
small hiccups in functionality because of the ease of access that is
provided with this efficient coding language.

12. It offers a modern user experience.


If you were to directly compare HTML5 with WebGL or platform native
development, you might find that the performance is not as strong when
compared to other available options. The frame rates for graphics are
where they need to be. The animation is crisp and pure, eliminating the
latency sometimes seen in previous versions. Video and sound are good
as well. It may not be a complete replacement of all platforms. It is an
excellent all-around alternative to considered.

List of the Disadvantages of HTML5:


1. There are different video supports for HTML5.
No one could really agree on what the standard video support should be
within HTML5. That means there is a hodge-podge of different video
supports out there today that are based on the browser you prefer to use.
There are three primary video formats currently used: Ogg Theory, H.264,
and VP8/WebM. The first is supported by everything except Internet
Explorer. The second is supported by everything except Firefox. As for the
third, it is fully supported by everything, though it may require a manual
installation.

2. It requires modern browsers to access it.


If you have users trying to access your website through an older browser,
then you’re not going to be able to reach them. There is a definite lack of
compatibility with Internet Explorer which must be addressed. From a
business perspective, if your website visitors are not able to access a fully
functional website, that creates a problem. They’re not going to blame their
older browser or IE. They’re going to blame you.

3. There are media licensing issues which must be considered.


Your rich media is offered in compressed, multiple formats because of the
wide range of browser compatibility you might encounter. That means there
are media licensing issues which you must take into consideration. If you’re
using multiple formats for your media and paying for your licenses, you’ll
need to pay for multiple audio and video licenses to ensure all your needs
are covered. That also means you’ve got more coding work to do.

4. Multiple device responsiveness can be a headache.


The goal of creating a modern website is to have it look the same, no
matter what device is being used or what browser is preferred by the user.
Many templates allow for automatic responsiveness, which reduces the
need for HTML5 coding knowledge, though it does cause many websites to
look the same. If you’re developing a website, you must view your content
on all device types and browsers to ensure it looks the same because there
is always a chance that it won’t render as it should.

5. The language of HTML5 is always a work in progress.


Although some may see this as an advantage, the constant development of
the actual language contained in HTML5 requires you to be on your toes.
The language itself is quite stable, which means you may find yourself with
unexpected changes in your coding that render your website useless until
you get them fixed. In theory, anything could change at any time. In reality,
this is more of a threat than a true disadvantage at the moment, though it
must be taken into consideration.

6. Gaming struggles with JavaScript under HTML5.


JavaScript is the only scripting language of HTMl5. It is a very capable
language, ideal for numerous applications. From a gaming perspective,
however, there is a lack of features which are necessary for a strong
gaming experience. Custom name spaces, member access, interfaces, and
inheritance all struggle under JavaScript. There are plenty of work-arounds
available which are suitable to get your work done. It is not, however, a
first-choice language option from a purely gaming standpoint.

7. There are zero good IDEs available in HTML5.


Although this disadvantage may change in the future, the integrated
development environments available with HTML5 are average at best. If
you know what you’re doing, then just do your thing and development
testing will be fine. For beginners or coders who haven’t been in the game
for a while, you’ll find that there aren’t many good processes available to
you for asset integration. It’s a bit of a seat-of-your-pants experience, even
though it was initially released in 2014.

If you’re getting into website development or looking for ways to update


your older site, then HTML5 is going to be the most efficient option
available to you. Although it may not be perfect for every possible solution,
you’ll find that the advantages and disadvantages of HTML5 limit the
negatives, accentuate the positives, and give you a simple platform that
highlights all of your strengths.

CSS3:

Cascading Style Sheets (CSS) is a style sheet language used for describing the
look and formatting of a document written in a markup language. CSS3 is a latest
standard of css earlier versions(CSS2). The main difference between css2 and css3
is follows −

 Media Queries
 Namespaces
 Selectors Level 3
 Color

 CSS3 is used with HTML to create and format content structure. It is responsible
for colours, font properties, text alignments, background images, graphics, tables,
etc. It provides the positioning of various elements with the values being fixed,
absolute, and relative.

CSS3 modules:
CSS3 is collaboration of CSS2 specifications and new specifications, we can called
this collaboration is module. Some of the modules are shown below −

 Selectors
 Box Model
 Backgrounds
 Image Values and Replaced Content
 Text Effects
 2D Transformations
 3D Transformations
 Animations
 Multiple Column Layout
 User Interface
 CSS3 Rounded corners are used to add special colored corner to body or
text by using the border-radius property.A simple syntax of rounded corners
is as follows −
 #rcorners7 {
 border-radius: 60px/15px;
 background: #FF0000;
 padding: 20px;
 width: 200px;
 height: 150px;
 }

Features of CSS3:
The features of the CSS3 are as follows:

1. Selectors

Selectors allow the designer to select on more precise levels of the web

page. They are structural pseudo-classes that perform partial matches to

help match attribute and attribute values. New selectors target a pseudo-

class to style the elements targeted in the URL. Selectors also include a

checked pseudo-class to style checked elements such as checkboxes and

radio buttons.

2. Text Effects and Layout


With CSS3, we can change the justification of text, whitespace

adjustment of the document, and style the hyphenation of words.


3. First-Letter and First-Line Pseudo-Classes
CSS 3 includes properties that help with kerning (adjusting the spacing

between characters to achieve a visually pleasing effect) and positioning

drop-caps (large decorative capital letter at the starting of a paragraph).

4. Paged Media and Generated Content


CSS3 has additional choices in Paged Media, such as page numbers and

running headers and footers. There are additional properties for printing

Generated Content as well, like properties for cross-references and

footnotes.

5. Multi-Column Layout
This feature includes properties to allow designers to present their content

in multiple columns with options like the column-count, column-gap, and

column-width.

Advantages of CSS3:
 CSS3 provides a consistent and precise positioning of navigable

elements.

 It is easy to customize a web page as it can be done by merely

altering a modular file.


 Graphics are easier in CSS3, thus making it easy to make the site

appealing.

 It permits online videos to be seen without using third-party plug-ins.

 CSS3 is economical, time-saving, and most browsers support it.

Use and Need of CSS3:


CSS3 is used with HTML to create and format content structure. It is

responsible for colours, font properties, text alignments, background

images, graphics, tables, etc. It provides the positioning of various elements

with the values being fixed, absolute, and relative.

To help build highly interactive online pages, CSS3 is highly commended as

it provides wider options for designing. When advertising products and

services, the website is first viewed by a customer, it should be appealing

and attractive, and this can be achieved with the help of CSS3.

CSS3 allows the designer to create websites, rich in content and low in

code. This technology brings some exciting features that make the page

look good, simple for the user to navigate, and functions flawlessly.

Some designs like drop shadows, rounded corners, and gradients find use

in just about every web page. These design enhancements can make the
site look appealing when used appropriately. Formerly, to use these

techniques, we had to resort to many complicated methods with lots of

coding and HTML elements. We tolerated these workarounds, as there was

no other way of achieving these techniques. But now, CSS3 allows us to

include these designs directly, leading to simpler and cleaner, and fast

pages.

Anatomy of a Web Page:

make it easier for the beginner, below is an image of the anatomy of a


web page. Click the image to see a larger view.

Starting at the top of the web page, let's go through the anatomy of a
web page:

Page Title

The page tile is set using the <title> </title> set of tags in the
head section of the html coding. This is the only web page
element within the head section of the web page the visitor will
see.

URL (Domain Name)

The URL is the domain name of the website. If the visitor just
typed www.domainname.com they would be taken to the home
page of the website.
File Name:

File name is the web page file name. It cannot contain any
spaces! The file name can be written as one long name (e.g.
basichtmlarticles.htm), with hyphens (e.g. basic-html-
articles.htm, as shown in the image above) or with underscores
(e.g. basic_html_articles.htm).

When you create a web page you have to give it a name. The
file name has what is called an extension at the end of it.

The extension at the end of the file name tells the browser
what kind of file it is. A HTML document would have an
extension of .htm or html. If your web page uses a certain
programming language it would have the appropriate
extension. e.g. .php is for the PHP programming language, .asp
is for the ASP programming language.

Note: Servers and some browsers will not render (show) your


page if you refer to it differently in your links than the way it is
actually named. Basic-Html-Articles.htm is different from basic-
html-articles.htm to some servers and browsers. To combat
this problem always name your files with lower case letters.
This way you don't have to remember how you capitalized a file
name.

Scroll Bars:

Scroll bars are on the right side and bottom of the browser
window. If there is a scroll bar at the bottom (horizontal scroll
bar) your web page content is too wide for the browser
window.
A web page layout should be designed so there is no horizontal
scroll bar. You need to test your web page at different
resolutions and on different operating systems to see if the way
the page is laid out will result in horizontal scroll bars when
viewed at smaller resolutions or by different operating systems.

One way to avoid this problem is to use a flexible (fluid)


design. A flexible design will adjust to the browser window size.
As long as all your elements add up to less than the browser
width there will not be a horizontal scroll bar.

Next, we will look at the web page content portion of the


anatomy of a web page.

Header:

The header is at the very top of the web page. It usually contains a logo
for the website.

Navigation:

A website can use a left navigation system, a right navigation system or


a navigation system that spans horizontally right under the header or
above the header.

The navigation system of a website has to be consistent throughout the


website so the visitor will learn your navigation system. Changing the
navigation system from page to page is confusing to the visitor and they
will get frustrated and leave!
Web Page Content:

Web page content includes everything between the <body> and </body>
tags. We have already looked at some of the web page content, the
header and navigation system. Also considered web page content is the
web page footer (we will discuss this next) and the center section of this
page that you are looking at now.

Footer:

The footer is the bottom section of the web page.

This section is where you usually put your copyright notice, link to your
privacy policy and your website contact information.

Anatomy of a web page is Packed with plenty of web design examples.

Header:

Header is the upper (top) part of the webpage. Being the area
people see before scrolling the page in their first seconds on the
website, the header is an element of strategic importance. It is
expected from the header to provide the core navigation around the
website so that users could scan it in split seconds and jump to the
main pages that can help them. Headers are also referred to as site
menus and positioned as an element of primary navigation in the
website layout.

Headers may include a bunch of meaningful layout elements, for


example:

 basic elements of brand identity, usually a logo


 call-to-action button
 links to basic categories of website content
 links to the social networks
 basic contact information (telephone number, e-mail address, etc.)
 switcher of the languages in case of the multilingual interface
 search field
 subscription field or button
 links to interaction with the product such as trial version, downloading
from the AppStore, etc.
It doesn’t mean that all the mentioned elements should be included in
one web page header: in this case, the header section would be
overloaded with information. The more objects attract the user’s
attention, the harder it is to concentrate on the vital ones. On the basis of
design tasks, designers, sometimes together with marketing specialists,
decide on the strategically important options and pick them up from the
list or add the others.

Shipping company website uses the header zone effectively: it


includes a company logo in the left corner and the prominent
contrast call-to-action button in the right corner, also placing the
links to core navigation in between. The header zone is clearly
separated from the rest of the page by the horizontal line used as
a visual divider.
What makes a header a vital element contributing to web usability is the
fact that it is placed in the most scannable zone of a web page.
Whatever is the scanning pattern users stick to on a website, it starts
from the top part of the page, scanned from left to right for languages
using the same reading and writing pattern. It means that what is placed
in the header won’t be missed, especially the elements placed in its left
and right corners. That’s why you will often find the main CTA button in
one of them. What’s more, the power of habit and the idea of external
consistency of user experience should also be taken into account here.
For years so far, visitors have been used to finding core navigation in
headers, so mostly, the main question is to decide what to put into it
rather than to use a header or not.

XML:

Extensible Markup Language is a markup language that defines a


set of rules for encoding documents in a format that is both human-
readable and machine-readable. The World Wide Web Consortium's
XML 1.0 Specification of 1998 and several other related
specifications—all of them free open standards—define XML.

 It is one type of Application Programming Interface.

 A markup language is a set of codes, or tags, that describes


the text in a digital document.

 XML tags identify the data and are used to store and organize
the data, rather than specifying how to display it like HTML
tags, which are used to display the data.

 The tags provide the structure to the data.

 it is used to structure data for storage and transport. 

 The XML standard is a flexible way to create information


formats and electronically share structured data via the public
Internet, as well as via corporate networks.

 Example of XML Document:

XML documents uses a self-describing and simple syntax: <?


xml version="1.0" encoding="ISO-8859-1"?> <note>
<to>Tove</to>

  It is a textual data format with strong support via Unicode for


different human languages.

 Although the design of XML focuses on documents, the


language is widely used for the representation of
arbitrary data structures[7] such as those used in web
services.

Applications:

1.Character

2.processor and application

3.markup and content

4.Tag

5.Element

6.Attribute

 An XML document is a string of characters. Almost every


legal Unicode character may appear in an XML document.

 The processor analyzes the markup and passes structured


information to an application. 

 The characters making up an XML document are divided


into markup and content, which may be distinguished by the
application of simple syntactic rules.

A tag is a markup construct that begins with  <  and ends with  > . Tags
come in three flavors:

 start-tag, such as  <section> ;


 end-tag, such as  </section> ;
 empty-element tag, such as  <line-break /> .
 An element is a logical document component that either
begins with a start-tag and ends with a matching end-tag or
consists only of an empty-element tag.

 An attribute is a markup construct consisting of a name–value


pair that exists within a start-tag or empty-element tag. An
example is  <img src="madonna.jpg" alt="Madonna" />.

 XML documents may begin with an XML declaration that


describes some information about themselves. An example
is  <?xml version="1.0" encoding="UTF-8"?> .

Advantages of XML:

Using XML to exchange information offers many benefits.

Advantages of XML include the following:

 XML uses human, not computer, language. XML is readable and


understandable, even by novices, and no more difficult to code
than HTML.
 XML is completely compatible with Java™ and 100% portable. Any
application that can process XML can use your information,
regardless of platform.
 XML is extendable. Create your own tags, or use tags created by
others, that use the natural language of your domain, that have the
attributes you need, and that makes sense to you and your users.

XML Schema:

XML schema is a language which is used for expressing


constraint about XML documents. There are so many schema
languages which are used now a days for example Relax- NG
and XSD (XML schema definition). An XML schema is used to
define the structure of an XML document.
How do XML schemas work?

The purpose of an XML Schema is to define the legal building


blocks of an XML document:
1. the elements and attributes that can appear in a document.
2. the number of (and order of) child elements.
3. data types for elements and attributes.
4. default and fixed values for elements and attributes.

An XML schema definition (XSD), is a framework document that


defines the rules and constraints for XML documents. An XSD
formally describes the elements in an XML document and can
be used to validate the contents of the XML document to
make sure that it adheres to the rules of the XSD.

The first element of XML document is called root element. The


simple XML document contain opening tag and closing tag.
The XML tags are case sensitive i.e. <root> and <Root> both
tags are different. The XML tags are used to define the scope
of elements in XML document.

Elements of XML Schemas:

XML Schemas define the elements of your XML files. A simple element
is an XML element that contains only text. It cannot contain any other
elements or attributes.

Defining a Simple Element:


 xs:string.
 xs:decimal.
 xs:integer.
 xs:boolean.
 xs:date.
 xs:time.
An XML schema is used to define the structure of an XML document. It
is like DTD but provides more control on XML structure.
Checking Validation

An XML document is called "well-formed" if it contains the correct


syntax. A well-formed and valid XML document is one which have been
validated against Schema.

Visit http://www.xmlvalidation.com to validate the XML file against


schema or DTD.

Description of XML Schema

<xs:element name="employee"> : It defines the element name


employee.

<xs:complexType> : It defines that the element 'employee' is complex


type.

<xs:sequence> : It defines that the complex type is a sequence of


elements.

<xs:element name="firstname" type="xs:string"/> : It defines that the


element 'firstname' is of string/text type.

<xs:element name="lastname" type="xs:string"/> : It defines that the


element 'lastname' is of string/text type.

<xs:element name="email" type="xs:string"/> : It defines that the


element 'email' is of string/text type.

XML Schema Data types

There are two types of data types in XML schema.

1. simpleType
2. complexType

simpleType

The simpleType allows you to have text-based elements. It contains less


attributes, child elements, and cannot be left empty.
complexType

The complexType allows you to hold multiple attributes and elements. It


can contain additional sub elements and can be left empty.

XML document definition:


A document type definition (DTD) is a set of markup declarations that
define a document type for an SGML-family markup language (GML,
SGML, XML, HTML). A DTD defines the valid building blocks of an XML
document. It defines the document structure with a list of validated
elements and attributes.
Types:
XML Document Type Definition
 XML Introduction.
 XML Structure.
 XML With CSS.
 XML with Data Source Object.
 XML Document Type Definition.
 XML Schemas.
 XML Namespaces.
 XSL Transformation Style Sheet.
How do I create XML document?
How do I create an XML document?
Rules:
1. If the XML declaration is present in the XML, it must be placed as the
first line in the XML document.
2. If the XML declaration is included, it must contain version number
attribute.
3. The Parameter names and values are case-sensitive.
4. The names are always in lower case.
5. The order of placing the parameters is important. The correct order
is: version, encoding and standalone.
6. Either single or double quotes may be used.
7. The XML declaration has no closing tag i.e. </?xml>
Document Object Model in XML:
The Document Object Model (DOM) is a programming API for HTML
and XML documents. It defines the logical structure of documents and
the way a document is accessed and manipulated.
The Document Object Model can be used with any programming
language. XML DOM defines a standard way to access and manipulate
XML documents.
the Document Object Model identifies: the interfaces and objects
used to represent and manipulate a document. the semantics of these
interfaces and objects - including both behavior and attributes. the
relationships and collaborations among these interfaces and
objects.
What does XML DOM:

The XML DOM makes a tree-structure view for an XML document.

We can access all elements through the DOM tree.

We can modify or delete their content and also create new elements.
The elements, their content (text and attributes) are all known as nodes.

For example, consider this table, taken from an HTML document:

1. <TABLE>  
2. <ROWS>   
3. <TR>   
4. <TD>A</TD>  
5. <TD>B</TD>   
6. </TR>   
7. <TR>  
8. <TD>C</TD>  
9. <TD>D</TD>   
10. </TR>   
11. </ROWS>  
12. </TABLE>  

The Document Object Model represents this table like this:


XML DOM Example : Load XML File
Let's take an example to show how an XML document ("note.xml") is
parsed into an XML DOM object.This example parses an XML document
(note.xml) into an XML DOM object and extracts information from it with
JavaScript.

Let's see the XML file that contains message.

note.xml

1. <?xml version="1.0" encoding="ISO-8859-1"?>    
2. <note>    
3.   <to>sonoojaiswal@javatpoint.com</to>    
4.   <from>vimal@javatpoint.com</from>    
5.   <body>Hello XML DOM</body>    
6. </note>    
Properties of DOM: Let’s see the properties of the document object
that can be accessed and modified by the document object. 
 

1. Window Object: Window Object is always at top of the hierarchy.


2. Document object: When an HTML document is loaded into a
window, it becomes a document object.
3. Form Object: It is represented by form tags.
4. Link Object: It is represented by link tags.
5. Anchor Object: It is represented by a href tags.
6. Form Control Elements:: Form can have many control elements
such as text fields, buttons, radio buttons, and checkboxes, etc.
Methods of Document Object:
1. write(“string”): writes the given string on the document.
2. getElementById(): returns the element having the given id value.
3. getElementsByName(): returns all the elements having the given
name value.
4. getElementsByTagName():  returns all the elements having the
given tag name.
5. getElementsByClassName() : returns all the elements having the
given class name.

XSLT:
XSLT (Extensible Stylesheet Language Transformations) is a language
for transforming XML documents into other XML documents, or
other formats such as HTML for web pages, plain text or XSL Formatting
Objects, which may subsequently be converted to other formats, such as
PDF, PostScript and PNG.
How XSLT Works

The XSLT stylesheet is written in XML format. It is used to define the


transformation rules to be applied on the target XML document. The
XSLT processor takes the XSLT stylesheet and applies the
transformation rules on the target XML document and then it generates a
formatted document in the form of XML, HTML, or text format. At the end
it is used by XSLT formatter to generate the actual output and displayed
on the end-user.

Image representation:
Advantage of XSLT

A list of advantages of using XSLT:

o XSLT provides an easy way to merge XML data into presentation


because it applies user defined transformations to an XML
document and the output can be HTML, XML, or any other
structured document.
o XSLT provides Xpath to locate elements/attribute within an XML
document. So it is more convenient way to traverse an XML
document rather than a traditional way, by using scripting
language.
o XSLT is template based. So it is more resilient to changes in
documents than low level DOM and SAX.
o By using XML and XSLT, the application UI script will look clean
and will be easier to maintain.
o XSLT templates are based on XPath pattern which is very
powerful in terms of performance to process the XML document.
o XSLT can be used as a validation language as it uses tree-pattern-
matching approach.
o You can change the output simply modifying the transformations in
XSL files.

SAX Approach:

SAX (Simple API for XML) is an event-driven online algorithm for


parsing XML documents, with an API developed by the XML-DEV
mailing list. SAX provides a mechanism for reading data from an
XML document that is an alternative to that provided by the
Document Object Model (DOM).

Unlike DOM, there is no formal specification for SAX.


The Java implementation of SAX is considered to be normative.
[2]
 SAX processes documents state-independently, in contrast to
DOM which is used for state-dependent processing of XML
documents.

Benefits:
A SAX parser only needs to report each parsing event as it
happens, and normally discards almost all of that information once
reported (it does, however, keep some things, for example a list of
all elements that have not been closed yet, in order to catch later
errors such as end-tags in the wrong order). Thus, the minimum
memory required for a SAX parser is proportional to the maximum
depth of the XML file (i.e., of the XML tree) and the maximum data
involved in a single XML event (such as the name and attributes of
a single start-tag, or the content of a processing instruction, etc.).

Drawbacks:

The event-driven model of SAX is useful for XML parsing, but it does
have certain drawbacks.
Virtually any kind of XML validation requires access to the document in
full. The most trivial example is that an attribute declared in the DTD to
be of type IDREF, requires that there be only one element in the
document that uses the same value for an ID attribute. To validate this in
a SAX parser, one must keep track of all ID attributes (any one of
them might end up being referenced by an IDREF attribute at the very
end); as well as every IDREF attribute until it is resolved. Similarly, to
validate that each element has an acceptable sequence of child
elements, information about what child elements have been seen for
each parent must be kept until the parent closes.

XML processing with SAX:


A parser that implements SAX (i.e., a SAX Parser) functions as a stream
parser, with an event-driven API.[1] The user defines a number
of callback methods that will be called when events occur during
parsing. The SAX events include (among others):

 XML Text nodes


 XML Element Starts and Ends
 XML Processing Instructions
 XML Comments
the SAX specification deliberately states that a given section of text may
be reported as multiple sequential text events. Many parsers, for
example, return separate text events for numeric character references.
Thus in the example above, a SAX parser may generate a different
series of events, part of which might include:

 XML Element start, named FirstElement


 XML Text node, with data equal to "&#xb6;" (the Unicode character
U+00b6)
 XML Text node, with data equal to " Some Text"
 XML Element end, named FirstElement
This XML document, when passed through a SAX parser, will generate a
sequence of events like the following:

 XML Element start, named DocumentElement, with an


attribute param equal to "value"
 XML Element start, named FirstElement
 XML Text node, with data equal to "&#xb6; Some Text" (note: certain
white spaces can be changed)
 XML Element end, named FirstElement
 Processing Instruction event, with the target some_pi and
data some_attr="some_value" (the content after the target is just text;
however, it is very common to imitate the syntax of XML attributes, as
in this example)
 XML Element start, named SecondElement, with an
attribute param2 equal to "something"
 XML Text node, with data equal to "Pre-Text"
 XML Element start, named Inline
 XML Text node, with data equal to "Inlined text"
 XML Element end, named Inline
 XML Text node, with data equal to "Post-text."
 XML Element end, named SecondElement
 XML Element end, named DocumentElement

You might also like