Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 04

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Multimedia

Principles

Chapter 4

Text

TMH Chapter - 4 1
Text
Types of Text
ƒ Also known as plaintext, Unformatted Text comprise of fixed
sized characters from a limited character set.

ƒ The character set is called ASCII table which is short for


American Standard Code for Information Interchange and is one
of the most widely used character sets.

ƒ It basically consists of a table where each character is


represented by a unique 7-bit binary code.

ƒ In addition to normal alphabetic, numeric and punctuation


characters, collectively called printable characters, the ASCII
character set also includes a number of control characters.
TMH Chapter - 4 2
Text
Types of Text

TMH Chapter - 4 3
Text
Types of Text
ƒ Later on as requirements increased an extended version of
ASCII table was introduced known as the extended ASCII
character set, while the original table came to be known as
standard ASCII set.

TMH Chapter - 4 4
Text
Types of Text

ƒ Formatted text are those where apart from the actual


alphanumeric characters, other control characters are used to
change the appearance of the characters e.g. bold, underline,
italics, varying shapes, sizes and colors etc.

ƒ In addition a variety of document formatting options are


supported to enable an author to structure a document into
chapters, sections and paragraphs, and with tables and graphics
inserted at appropriate points.

TMH Chapter - 4 5
Text
Types of Text
ƒ Hypertext can be used to link multiple documents in such a way
that the user can navigate non-sequentially from one document
to the other for cross-references.

ƒ These links are called hyperlinks. Hyperlinks form one of the


core structures in multimedia presentations, because multimedia
also emphasizes a non-linear mode of presentation.

ƒ To create such documents the user uses commands of a


hypertext language like HTML or SGML to specify the links.

TMH Chapter - 4 6
Text
Types of Text
ƒ The underlined text string on which the user clicks the mouse is
called an anchor and the document which opens as a result of
clicking is called the target document.

ƒ On the Web target documents are specified by a specific


nomenclature called web site address technically known as
uniform resource locators or URL.

ƒ Other than the Internet, hypertext can also be used in other


applications like MS-Word, MS-Powerpoint Adobe Acrobat and
multimedia presentations.

TMH Chapter - 4 7
Text
Types of Text
ƒ The architecture of a hypertext system can be divided into three
layers.

ƒ At the uppermost layer, the presentation layer, all functions


connected to the user interface are embedded. This layer
determines which data are presented and how they are
presented.

ƒ The hypertext abstract machine (HAM) is placed between the


presentation and storage layers. HAM knows the structure of the
document, it has the knowledge about the pointers and its
attributes.
TMH Chapter - 4 8
Text
Types of Text
ƒ The storage layer, also called the database layer, is the lowest
layer. All functions connected with the storage of data belong to
this layer.

TMH Chapter - 4 9
Text
Unicode
ƒ The Unicode standard is a new universal character coding
scheme for written characters and text. It defines a consistent
way of encoding multilingual text which enables textual data to
be exchanged universally.

ƒ Multilingual support is provided for European, Middle Eastern


and Asian languages. The Unicode Consortium was
incorporated in 1991 to promote the Unicode standard.

ƒ Several methods have been suggested to implement Unicode


based on variations in storage space and compatibility. The
mapping methods are called Unicode Transformation Formats
(UTF) and Universal Character Set (UCS).

TMH Chapter - 4 10
Text
Unicode
ƒ The UTF-32 scheme uses 32-bit for each character. This is the
simplest scheme as it consists of fixed length encoding.

ƒ But it is not efficient with regard to storage space and memory


usage, and therefore rarely used.

ƒ The UTF-16 is a 16-bit encoding format. In its native format it can


encode numbers upto FFFF.

ƒ For codings beyond this, the original number is expressed as a


combination of two 16-bit numbers.

TMH Chapter - 4 11
Text
Font
ƒ The appearance of each character in case of formatted text is
determined by specifying what a font name.

ƒ Font names refer to font files which contain the actual description
of the character appearance.

ƒ These files are usually in vector format meaning that character


descriptions are stored mathematically.

ƒ This is useful because characters may need to be scaled to


various heights and mathematical descriptions can easily handle
such variations without degrading the appearance of characters.
TMH Chapter - 4 12
Text
Font
ƒ An alternative form of font files is the bitmap format where each
character is described as a collection of pixels.

ƒ The disadvantage of this is that it can lead to distortion when the


characters are scaled to various sizes.

ƒ Some of the standard font types included with the Windows OS


package are Times Roman, Arial, Century Gothic, Verdana.

ƒ Other than these there are thousands of font types made by


various organizations and many of them are freely downloadable
over the Internet.
TMH Chapter - 4 13
Text
Font

TMH Chapter - 4 14
Text
Font
ƒ Font characters may have a number of sizes. Size is usually
specified in a unit called point (pt) where 1 point equals 1/72 of
an inch. Sometimes the size may also be specified in pixels.

ƒ Specific font types can be displayed in a variety of styles. Some


of the common styles used are : bold, italics, underline,
superscript and subscript.

ƒ Some application packages allow changing the horizontal gap


between the characters, called kerning, and the vertical gap
between two lines of text, called leading.

TMH Chapter - 4 15
Text
Insertion
ƒ The most common process of inserting text into a digital document is by
typing the text using an input device like the keyboard.

ƒ Another way of inserting text into a document is by copying text from a


pre-existing digital document using the Copy and Paste commands.

ƒ A third way of inserting text into a digital document is by scanning it from


a paper document.

ƒ The scanned file will however be an image file in which the text will be
present as part of an image and will not be editable in a text processor.

ƒ To be able to edit the text, it needs to be converted from the image


format into the editable text format using a software called an Optical
Character Recognition (OCR) software.
TMH Chapter - 4 16
Text
Compression
ƒ By the Huffman coding technique, instead of using fixed length
codewords, an optimum set of variable-length codewords is
derived such that the shortest codeword is used to represent the
most frequently occurring characters.

ƒ In the second approach followed by the Lempel-Ziv (LZ) method


as each word occurs in the text, instead of representing the text
as ASCII characters, the encoders stores only the index of where
the word in stored in a table, called dictionary.

ƒ A variation of the above algorithm called Lempel-Ziv-Welsh


(LZW) method allows the dictionary to be built up dynamically by
the encoder and decoder.

TMH Chapter - 4 17
Text
File Formats
ƒ TXT - Unformatted text document created by an editor like
Notepad on Windows platform.

ƒ DOC - Developed by Microsoft as a native format for storing


documents created by the MS-Word package.

ƒ RTF – Developed by Microsoft for cross-platform exchanges.


RTF codes are human readable similar to HTML.

ƒ PDF - Developed by Adobe Systems for cross platform


exchange of documents. In addition to text the format also
supports images and graphics.
TMH Chapter - 4 18

You might also like