Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

1 Data Representation

Uploaded by

arishatabba
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1 Data Representation

Uploaded by

arishatabba
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DATA REPRESENTATION

Syllabus 2023-2025

2
3
4
Data Representation
Data Representation refers to the methods used internally to represent information stored in
a computer.

Computers store lots of different types of information:

Data and instructions cannot be entered and processed directly into computers using human
language. Any type of data be it numbers, letters, special symbols, sound or pictures must first
be converted into machine-readable form i.e. binary form. Due to this reason, it is important to
understand how a computer together with its peripheral devices handles data in its electronic
circuits, on magnetic media and in optical devices.

Number System:

Number System defines a set of values used to represent ‘quantity’. There are different number
systems decimal, binary and hexadecimal. Each system is characterized by its base or radix,
always given in decimal, and the set of permissible digits.

5
Binary Number System:
The binary system is base on the number 2 made up of 1s and 0s. Thus, only the two ‘values’ 0
and 1 can be used in this system to represent each digit. The computer has switches to represent
data and switches have only two states: ON and OFF. A natural fit to the two states of a switch
(0 = OFF, 1 = ON).

One binary digit (0 or 1) is referred to, as a bit, which is short for binary digit.

Decimal Number System:


Denary, also known as "decimal". It is the standard number system used around the world.
The base of decimal number system is 10. Digits are (0,1,2,3,4,5,6,7,8,9).
The first two letters in denary ("de") are an abbreviated version of "dec" which is a Latin prefix
meaning "ten.". Therefore, the denary system contains ten digits.
Using the denary system, 2532 reads as two thousand, five hundred and thirty two. One way
to break it down is as:
 Two thousands
 Five hundreds
 Three tens
 Two ones

Each number has a place value which could be put into columns. Each column is a power of
ten in the base 10 system:

6
Converting Binary to Denary
To calculate a binary number like, 10101000, place it in columns of base 2 numbers. Then
add all the base 2 numbers.

27 26 25 24 23 22 21 20

128 64 32 16 8 4 2 1

1 0 1 0 1 0 0 0

128 + 32 + 8 =168

(1010100)2 = (168)10

To calculate a binary number like, 10111010, place it in columns of base 2 numbers. Then
add all the base 2 numbers.

27 26 25 24 23 22 21 20

128 64 32 16 8 4 2 1

1 0 1 1 1 0 1 0

128 + 32 + 16 + 8 + 2 =186

(10111010)2 = (186)10

Converting Denary to Binary


Method 1:
To calculate a denary number like, 84, set up the columns of base 2 numbers

27 26 25 24 23 22 21 20

128 64 32 16 8 4 2 1

0 1 0 1 0 1 0 0

84 = 64 + 16 + 4

7
Conversion of large denary number into 16-bit number

To calculate a denary number like, 33111, set up the columns of base 2 numbers

215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20

32896 16448 8224 4112 2056 1028 512 256 128 64 32 16 8 4 2 1

1 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1

33111 = 32896 + 128 + 64 + 16 + 4 + 2 + 1

Method 2:
To calculate a binary number, it involves successive division of denary number by 2.

(73)10 → (?)2

2 73 1
2 36 0
2 18 0
2 9 1
2 4 0
2 2 1
1
(73)10 → (1101001)2

8
Conversion of large denary number into 16-bit number

(34989)10 → (?)2

2 34989 1
2 17494 0
2 8747 1
2 4373 1
2 2186 0
2 1093 1
2 546 0
2 273 1
2 136 0
2 68 0
2 34 0
2 17 1
2 8 0
2 4 0
2 2 0
1
(34989)10 → (1000 1000 1010 1101)2

Difference between Bits and Byte?


A bit is just a smaller unit of information than a byte. It reflects the basic logical process of a
transistor: a single unit of information reflecting a zero (0) or a one (1). There are eight bits in
one byte of information. Bit and Bytes both are measure amounts of data. However, they are
typically used in two different contexts.

Bits, kilobits (kbps), and megabits (mbps) are most often used to measure data transfer
speeds. This may refer to how fast you are downloading a file, or how fast your internet
connection is.

Bytes are used to measure data storage. For example, a CD holds 700 MB(Megabytes) of
data and a hard drive may hold 250 GB (Gigabyte).

9
Measurement of the Size of Computer Memories
Name of Memory Size Size Number of bits
0 or 1 1 bit 1 bit
4 bits 1 nibble 4
8 bits 1 Byte 8 bits
1024 byte 1 Kilobyte 210
1024 Kilobyte 1 Megabyte 220
1024 megabyte 1 Gigabyte 230
1024 Gigabyte 1 Terabyte 240
1024 Terabyte 1 Petabyte 250

It should be pointed out here that there is some confusion in the naming of memory sizes. The
IEC convention is now adopted by some organizations. Manufacturers of storage devices
often use the denary system to measure storage size.

For example,

1 kilobyte = 1000 byte


1 megabyte = 1000000 bytes
1 gigabyte = 1000000000 bytes
1 terabyte = 1000000000000 bytes and so on.
The IEC convention for computer internal memories (including RAM) becomes:
1 kibibyte (1 KiB) = 1024 bytes
1 mebibyte (1 MiB) = 1048576 bytes
1 gibibyte (1 GiB) = 1073741824 bytes
1 tebibyte (1 TiB) = 1099511627776 bytes and so on.

Use of Binary Number Systems

Computers use binary - the digits 0 and 1 - to store data. A binary digit, or bit, is the smallest
unit of data in computing. It is represented by a 0 or a 1.

Computer programs are sets of instructions. Each instruction is translated into machine code.
simple binary codes that activate the CPU. Programmers write computer code and this is
converted by a translator into binary instructions that the processor can execute.

All software, music, documents, and any other information that is processed by a computer,
is also stored using binary.

10
When computers (or microprocessors) are used to control devices (such as robots), registers
are used as part of the control system. The following example describes how registers can be
used in controlling a simple device.

The hexadecimal system

The HEXADECIMAL SYSTEM is very closely related to the binary system. Hexadecimal
(sometimes referred to as simply ‘hex’) is a base 16 system and therefore needs to use 16
different ‘values’ to represent each digit.

Because it is a system based on 16 different digits, the numbers 0 to 9 and the letters A to F
are used to represent each hexadecimal (hex) digit. (A = 10, B = 11, C = 12, D = 13, E = 14
and F = 15.) Using the same method as denary and binary, this gives the headings of 160,
161,162,163 and so on. The typical headings for a hexadecimal number with five digits would
be:

Since 16 = 24 this means that FOUR binary digits are equivalent to each hexadecimal digit.
In given table summarizes the link between binary, hexadecimal and denary.

11
Converting from binary to hexadecimal and from hexadecimal to binary

Converting from binary to hexadecimal is a fairly easy process. Starting from the right and
moving left, split the binary number into groups of 4 bits. If the last group has less than 4 bits,
then simply fill in with 0s from the left. Take each group of 4 bits and convert it into the
equivalent hexadecimal digit using above table. Look at the following two examples to see
how this works.

Example 1

101111100001

First split this up into groups of 4 bits:

1011 1110 0001

Then, using above table, find the equivalent hexadecimal digits:

B E 1

12
Example 2

10000111111101

First split this up into groups of 4 bits:

10 0001 1111 1101

The left group only contains 2 bits, so add in two 0s:

0010 0001 1111 1101

Now use Table to find the equivalent hexadecimal digits:

2 1 F D

Converting from hexadecimal to binary is also very straightforward. Using the data in the
above table, simply take each hexadecimal digit and write down the 4-bit code which
corresponds to the digit.

Example 3

4 5 A

Using above table, find the 4-bit code for each digit:

0100 0101 1010

Put the groups together to form the binary number:

010001011010

Example 4

B F 0 8

Again, just use above table:

1011 1111 0000 1000

Then put all the digits together:

1011111100001000

Converting from hexadecimal to denary and from denary to hexadecimal

Hexadecimal number to denary

To convert a hexadecimal number to denary is fairly straightforward. Take each hexadecimal


digit, that is 4096, 256, 16 and 1. Add the totals together to obtain the denary value.

13
Example 1

4 5 A

First multiply each digit by its value:

Add the totals together:

denary number = 1 1 1 4

Example 2

C 8 F

First multiply each digit by its value:

Add the totals together:

denary number = 3 2 1 5

Denary to hexadecimal

To convert from denary to hexadecimal is a little more difficult. As with the conversion from
binary to denary, there are two very similar methods that can be used. Again, the first method
is ‘trial and error’ and the second method is more methodical and involves repetitive division.

Example 1

Consider the conversion of the denary number, 2004, into hexadecimal. This method involves
placing hexadecimal digits in the appropriate position so that the total equates to 2004:

A quick check shows that: (7 × 256) + (13 × 16) + (4 × 1) gives 2004.

14
Example 2

This method involves successive division by 16. The remainders are then read from BOTTOM
to TOP to give the hexadecimal value. Again using 2004, we get:

Use of the hexadecimal system

This section reviews five uses of the hexadecimal system. The information in this chapter
gives the reader sufficient grounding in each topic at this level. Further material can be found
by searching the internet, but be careful that you don’t go off at a tangent.

This section reviews four uses of the hexadecimal system:

 Error codes
 MAC addresses
 IPv6
 HTML colour codes

Error Codes

Error Codes are often shown as Hexadecimal values.


These number refers to the memory location of the error
and the usually automatically generated by the
computer. The programmer needs to know how to
interpret the hexadecimal error codes. Examples of
error codes from a Windows system shown below.

Media Access Control (MAC)

A MEDIA ACCESS CONTROL (MAC) ADDRESS refers to a number which uniquely identifies
a device on the internet. The MAC address refers to the network interface card (NIC) which is
part of the device. The MAC address is rarely changed so that a particular device can always
be identified no matter where it is.

15
A MAC address is usually made up of 48 bits which are shown as six groups of hexadecimal
digits (although 64-bit addresses are also known):

NN – NN – NN – DD – DD – DD

or

NN:NN:NN:DD:DD:DD

where the first half (NN – NN – NN) is the identity number of the manufacturer of the device
and the second half (DD – DD – DD) is the serial number of the device.

For example: 00 – 1C – B3 – 4F – 25 – FE is the MAC address of a device produced by the


Apple Corporation (code: 001CB3) with a serial number of 4F25FE. Sometimes lower case
hexadecimal letters are used in the MAC address: 00-1c-b3-4f-25-fe.

Other manufacturer identity numbers include:

• 00 – 14 – 22 which identifies devices made by Dell

• 00 – 40 – 96 which identifies devices made by Cisco

• 00 – A0 – C9 which identifies devices made by Intel, and so on.

Types of MAC address

It should be pointed out that there are two types of MAC address: the UNIVERSALLY
ADMINISTERED MAC ADDRESS (UAA) and the LOCALLY ADMINISTERED MAC
ADDRESS (LAA).

IP (Internet Protocol) Addresses

Each device connected to a network is given an address known as the Internet


Protocol Address. An IPV4 address is a 32-bit number written in denary or
hexadecimal form. e.g. 109.108.158.1 (77.76.9E.01 in hex). IPV4 is improved upon
the adaption of IPV6 i.e. a 128-bit number broken down into 16-bit chunks,
represented by a hexadecimal number. For example:

A8FB:7A88:FFF0:0FFF:3D21:2085:66FB:F0FA

In IPV6 (:) is used as a separator.


In IPV4 (.) is used as a separator.

16
HyperText Mark-up Language (HTML)

HYPERTEXT MARK-UP LANGUAGE (HTML) is used when writing and developing web
pages. HTML isn’t a programming language but is simply a mark-up language. A mark-up
language is used in the processing, definition and presentation of text (for example, specifying
the colour of the text).

HTML uses <tags> which are used to bracket a piece of code; for example, <td> starts a
standard cell in an HTML table, and </td> ends it. Whatever is between the two tags has been
defined. Here is a short section of HTML code:

HTML code is often used to represent colors of text on the computer screen. The values
change to represent different colors. The different intensity of the three primary colors (red,
green and blue) is determined by its hexadecimal value. For example:

 # FF 00 00 represents primary colour red


 # 00 FF 00 represents primary color green
 # 00 00 FF represents primary colour blue
 # FF 00 FF represents fuchsia
 # FF 80 00 represents orange
 # B1 89 04 represents tan

and so on producing almost any colour the user wants. There are many websites available
that allow a user to find the HTML code for the color needed.

17
Binary Addition

Two Digits Binary Addition

Binary Addition Carry Sum


0+0 0 0
1+0 0 1
0+1 0 1
1+1 1 0

Three Digits Binary Addition

Binary Addition Carry Sum


0+0+0 0 0
0+0+1 0 1
0+1+0 0 1
0+1+1 1 0
1+0+0 0 1
1+0+1 1 0
1+1+0 1 0
1+1+1 1 1

Example 1

(00100111)2 + (010001010)2

Carry  1 1 1
0 0 1 0 0 1 1 1
+ 0 1 0 0 1 0 1 0
0 1 1 1 0 0 0 1

Overflow

When addition of two 8 bits number produces a 9 digit then 9th digit is known as overflow digit.

Example

(10101010)2 + (11010101)2

Carry
1 0 1 0 1 0 1 0
+ 1 1 0 1 0 1 0 1
Overflow → 1 0 1 1 1 1 1 1 1

18
Logical Binary Shifts

Computers can carry out a logical shift on a sequence of binary numbers. The logical shift
means moving the binary number to the left or to the right. Each shift left is equivalent to
multiplying the binary number by 2 and each shift right is equivalent to dividing the binary
number by 2.

Left Shift

Example

The denary number 21 is 00010101 in binary. If we put this into an 8-bit register.

128 64 32 16 8 4 2 1
0 0 0 1 0 1 0 1

If we now shift the bits in this register one place to the left, we obtain

128 64 32 16 8 4 2 1
0 0 1 0 1 0 1 0

The value of the binary bits is now 21 × 21 = 42. We can see this is correct if we calculate the
denary value of the new binary number 101010 (i.e. 32 + 8 + 2 = 42).

Suppose now we shift the original number two places left

128 64 32 16 8 4 2 1
0 1 0 1 0 1 0 0

The value of the binary bits is now 21 × 22 = 84. (i.e. 64 + 16 + 4 = 84).

128 64 32 16 8 4 2 1
1 0 1 0 1 0 0 0

The value of the binary bits is now 21 × 23 = 168. (i.e. 128 + 32 + 8 = 168).

Now, let’s see what happens if we shift the number to four places left.

128 64 32 16 8 4 2 1
0 1 0 1 0 0 0 0

Losing 1 bit following a shift operation which will cause an error

The left-most 1-bit has been lost. In our 8-bit register the result of 21 × 24 is 80 which is clearly
incorrect. This error is because we have exceeded the maximum number of left shifts possible
using this register.

19
Right Shift

Example

The denary number 200 is 11001000 in binary. If we put this into an 8-bit register.

128 64 32 16 8 4 2 1
1 1 0 0 1 0 0 0

If we now shift the bits in this register one place to the right, we obtain

128 64 32 16 8 4 2 1
0 1 1 0 0 1 0 0

The value of the binary bits is now 200 ÷ 21 = 100. We can see this is correct if we calculate
the denary value of the new binary number 1100100 (i.e. 64 + 32 + 4 = 100).

Suppose now we shift the original number two places right

128 64 32 16 8 4 2 1
0 0 1 1 0 0 1 0

The value of the binary bits is now 200 ÷ 22 = 84. (i.e. 32 + 16 + 2 = 50).

128 64 32 16 8 4 2 1
0 0 0 1 1 0 0 1

The value of the binary bits is now 200 ÷ 23 = 168. (i.e. 16 + 8 + 1 = 25).

Now, let’s see what happens if we shift the number to four places right.

128 64 32 16 8 4 2 1
0 0 0 0 1 1 0 0

Losing 1 bit following a shift operation which will cause an error

The right-most 1-bit has been lost. In our 8-bit register the result of 200 ÷ 24 is 12.5 which is
clearly incorrect. This error is because we have exceeded the maximum number of right shifts
possible using this register.

20
Two’s Complement

For representing negative binary integers, we use of 2’s complement.

-128 64 32 16 8 4 2 1

In 2’s complement the left-most bit is changed to a negative value. For instance, for an 8-bit
number, the value of +128 is changed to -128, but all other values remain same.

For applying 2’s complement, it isn’t always necessary for a binary number to have 8 bits

Method 1

In order to convert a positive binary number into a negative binary number

-88 is same as -128 + 40

40 = 32 + 16 + 2

-128 64 32 16 8 4 2 1
1 0 1 1 0 0 1 0
Method 2

In order to convert a positive binary number into a negative binary number

First write the number as a positive value

(48)10 ⟶ (0011 0000)2

We then invert each binary value

0011 0011 ⟶ 1100 1100

Then add 1 to the number

1100 1100
+ 1
1100 1101 ⟶ This gives us the binary value of -48

Example

The following 6 bits represents 31

32 16 8 4 2 1
0 1 1 1 1 1

The following 6 bit represents -31

-32 16 8 4 2 1
1 0 0 0 0 1

21
Text, Sound & Images
Character Sets – ASCII Code & Unicode
ASCII Code System (American System Code for Information Interchange) was setup in 1963
for use in communication systems and computer systems. Newer version of this code was
published in 1986. The standard ASCII code character set consists of 7-bit codes 0 to 127 in
denary or 00 to 7F in hexadecimal) that represents the letters, numbers and characters found
on a standard keyboard, together with 32 control codes (that use codes 0 to 31 (denary) or 00
to 19 (hexadecimal)).

Following is the part of the ASCII code table (only control codes removed)

Consider the uppercase and lowercase codes in binary of characters.

For example:

22
a 1 1 0 0 0 0 1 Hex 61 [lower case]
A 1 0 0 0 0 0 1 Hex 41 [upper case]
y 1 1 1 1 0 0 1 Hex 79 [lower case]
Y 1 0 1 1 0 0 1 Hex 59 [upper case]

As shown in the above example 6th bit changes from 1 to 0 when comparing a lowercase
character with upper case character. This makes the operation of conversion b/w lowercase
and uppercase easier. It is also noticeable that the character sets (e.g. a to z, 0 to 9, etc.) are
grouped together in sequence, which speeds up usability.

Extended ASCII uses 8-bit codes (0 to 255 in denary or 00 to FF in hexadecimal). This gives
another 128 codes to allow for characters in Non-English alphabets and for some graphical
characters to be included:

23
Disadvantages of ASCII code are that it is not suitable for most of the languages and there
are multiple different versions of ASCII as shown in the table above. Because of this, different
coding methods have been developed one of which is Unicode.

Unicode can represent all languages of the world, supporting many OS, search engines and
internet browsers around the globe. There is overlap with standard ASCII Code, since the first
128 English characters are same, but Unicode can support thousands of different characters.
ASCII used one byte for representing a character while Unicode can use upto 4 bytes to
represent a character.

Unicode Consortium 1.0 (1991) five goals are as follow:

 Create a Universal Standard that covered all languages and all writing systems.
 Produce a more efficient coding system than ASCII.
 Adopt uniform encoding where each character is encoded as 16-bit or 32-bit code.
 Create unambiguous encoding where each 16-bit and 32-bit value always represents the
same character.
 Reserve part of the code for private use to enable a user to assign codes for their own
characters and symbols (Useful for Chinese and Japanese Character sets, for example).

Unicode table below represent multiple characters from multiple different languages like
Russian, Romanian and Croatian:

24
Representation of Sound
Sound Wave have a certain frequency, wavelength and amplitude. Amplitude specifies the
loudness of Sound Wave.

Sound Waves continuously vary which means sound is analogue. Computers cannot work
with analogue data. Since computers cannot work with Analogue data, therefore sound waves
need to be sampled (measuring the amplitude precisely) in order to be stored in computer.
This is done using ADC (Analogue Digital Converter). For conversion sound waves are
sampled at regular intervals.

At time interval 1, the approximate amplitude is 10; at time interval 2, approximate amplitude
is 4; and so on for all time intervals. Because the amplitude ranges from 0 to 10 (as shown in
the figure above), 4 binary bits can be used to represent the amplitude for example 11 would
be represented by the binary value 1011. As the number of possible values (sound amplitude)
increases, accuracy of sampled sound also increases (for example using a range 0 to 127
gives more accurate result than 0 to 10). Number of bits per sample is the Sampling Resolution
(bit depth). For the above example bit depth is 4 bits.

Sampling Rate (Hertz) is equal to the number of sound samples taken per second.

25
How Sampling is Used to Record Sound?

 The amplitude of sound waves is first determined at set time intervals (Sampling Rate)
 This gives an Approximate representation of the Sound Wave
 Each sample of the sound wave is then encoded as a series of binary digits.

Higher sampling rate or larger resolution will result in a more faithful representation of the
original sound. Sampling Rate/Sampling Resolution is directly proportional to the file size.

CD’s have a 16-bit sampling resolution and a 44.1 KHz sample rate – that is 44100 samples
every second. This gives high quality sound reproduction.

Bitmap Images Representation

Bitmap Images are made up of Pixels (Picture elements); an image is made up of a two-
dimensional matrix of a Pixel. Pixels can take different shapes such as:

Each Pixel can be represented as Binary Number, and so bitmap image is stored in a computer
as a series of binary numbers, so that:

 A black and white image only requires 1 bit per pixel – this mean that each pixel can be of
one of two colors, representing 1 or 0.
 If each pixel is represented by 2 bits, then each pixel can be of one of four colors (2 2 = 4)
representing the FOUR combinations of 0 & 1 i.e. 00, 01, 10, 11.
 If each pixel is represented by 3 bits, then each pixel can be of one of four colors (2 3 = 8)
representing the EIGHT combinations of 0 & 1 i.e. 000, 001, 010, 011, 100, 101, 110, 111.

An 8-bit color depth means that each pixel can be one of 256 colors (28 = 256). Modern
computers have a 24-bit color depth, which means over 16 million different colors can be
represented.

Image resolution refers to the number of pixels that make up an image; for example, an image
contain 4096 × 3072 pixels (12 582 912 pixels in total).

26
Image ‘A’ has the highest resolution and ‘E’ has the lowest resolution. ‘E’ has become
pixelated (fuzzy). This is because there are few pixels in ‘E’ to represent the image.

Drawback of using high resolution images is the increase in file size. As the number of pixels
used to represent the image is increased, the size of the file will also increase. This impacts
on how many images can be stored on, for example, a hard drive. It also impacts on the time
to download an image from the internet or the time to transfer images from device to device.
A certain amount of reduction in resolution of an image is possible before the loss of quality
becomes noticeable.

Data Storage & File Compression


Measurement of Data Storage
Bit is the basic unit of computing storage and is represented either 1 or 0. It comes from binary
digit. Smallest unit of computing memory is byte.

1 Byte = 8 bits

0.5 Byte = 4 bits = 1 Nibble

1 Byte of memory is very less so memory size is measured in the multiples as shown below:

The above system of numbering now only refers to some storage devices but is technically
inaccurate. It is based on SI (base 10) systems of units where 1 kilo is equal to 1000.

Memory size is actually measured in terms of powers of 2, another system has been adopted
by the IEC (International Electro technical Commission) that is based on the binary system.

27
This system is more accurate. Internal memories (such as RAM and ROM) should be
measured using the IEC system. A 64 GB RAM could store 64 × 230 bytes of data i.e.
68 719 476 736 bytes.

Calculation of File Size

File Size of an image is calculated as:

Image Resolution (In Pixels) × Color Depth (In Bits)

Size of a mono sound file is calculated as:

Sample Rate (In Hz) × Sample Resolution (In bits) × Length of Sample (In seconds)

For a stereo file, you would multiply the result by two.

Example 1

A photograph is 1024 × 1080 pixels and uses a color depth of 32 bits. How many photographs
of this size would fit onto a memory stick of 64 GB.

Step 1: Multiply number of pixels in vertical and horizontal directions to find total number of
pixels = [1024 × 1080] = 1 150 920 Pixels.

Step 2: Multiply number of pixels by color depth then divide by 8 to give the number of bytes
= 1 105 920 × 32 = 35 389 440/8 bytes = 4 423 680 bytes.

Step 3: 64 GB = 64 × 1024 × 1024 = 68 719 476 736 bytes.

Step 4: Divide the memory stick size by the file size = 68 719 476 736/4 = 423 680 = 15 534
photos.

Example 2

A photograph is 2048 × 2048 pixels and uses a color depth of 16 bits. Find the size of an
image taken by this camera in MB.

Step 1: Multiply number of pixels in vertical and horizontal directions to find total number of
pixels = [2048 × 2048] = 4 194 304 Pixels.

Step 2: Multiply number of pixels by color depth then divide by 16 to give the number of bits
= 4 194 304 × 16 = 67 108 864 bits.

28
Step 3: Divide number of bits by 8 to find number of bytes in the file = 67 108 864/8 =
8 388 608 bytes

Step 4: Divide by 1024 × 1024 to convert to MB = 8 388 608/1 048 576 = 8 MB.

Example 3

An audio CD has a sample rate of 44 100 and a sample resolution of 16 bits. The music been
sampled uses two channels to allow for stereo recording. Calculate the file size for a 60 minute
recording.

Step 1: Size of File = 144 100 × 16 × [60 × 60] = 2 540 160 000 bits

Step 2: Multiply by 2 since there are two channels being used = 5 080 320 000 bits

Step 3: Divide by 8 to find number of bytes = 5 080 320 000/8 = 635 040 000

Step 4: Divide by 1024 × 1024 to convert to MB = 635 040 000/ 1 048 576 = 605 MB.

Data Compression
Calculations show that sound and image files can be very large, therefore, it is necessary to
reduce (compress) file size.

 Save storage space on devices such as hard disk drive/solid state drive.
 Reduce the time taken to stream a music or video file.
 Reduce the time taken to upload, download or transfer a file across a network.
 Download/upload process uses up network bandwidth – this is the maximum rate of
transfer of data across a network, measured in bits per second. This occurs whenever file
is downloaded, for example, from a server. Compressed files contain fewer bits of data
than uncompressed files and therefore use less bandwidth, which results in a faster data
transfer rate.
 Reduce file size also reduce costs. For example, when using cloud storage, the cost is
based on the size of the files stored. Also an internet service provider (ISP) may charge a
user based on the amount of data downloaded.

Lossy & Lossless File Compression

File compression is of two types either lossy or lossless.

Lossy File Compression

Lossy compression refers to discarding irrelevant information. Generally, this means


compressing images, video, or audio by discarding data that the human perceptual system
cannot see or hear. The state of the art is to apply lossy compression only at a very low level
of human sensory modeling, where the model is well understood.

For example, when applying a Lossy File Compression Algorithm to:

29
 An image, it may reduce the resolution and/or the bit/color depth.
 A sound file, it may reduce the sampling rate and/or the resolution.

Common Lossy File Compression algorithms are:

 MPEG-3 (MP3) and MPEG-4 (MP4)


 JPEG

MP3

Filename Extension: .mp3

Format Type: Lossy Compressed

When Internet file-sharing boomed into popularity with Napster and the iPod, the MP3
cornered the market for one reason: it had a small footprint. Without broadband connections,
it was impractical at the time to share file sizes larger than the MP3 standard 2 – 3 Megabytes.
And that preference has stuck for some time now even though MP3 does not have nearly the
same amount of quality as WAV or AIFF files. But despite this growing base of people using
higher quality formats, there are still those who prefer the So, if you have a slower internet
connection or limited hard drive space, MP3 could be your file format of choice. If you’re
worried about quality loss, don’t fret too much about it. While, yes, there is a noticeable drop
off in sound quality, MP3 files fall square under the “good enough” umbrella.

MP4

MP4 is an abbreviated term for MPEG-4 Part 14. It may also be referred to as MPEG-4 AVC,
which stands for Advanced Video Coding. As the name suggests, this is a format for working
with video files and was first introduced in 1998. The MPEG refers to Motion Pictures Expert
Group who is responsible for setting the industry standards regarding digital audio and video.
The MP4 is a container format, allowing a combination of audio, video, subtitles and still
images to be held in the one single file. It also allows for advanced content such as 3D
graphics, menus and user interactivity.

JPEG

JPG files, also known as JPEG files, are a common file format for digital photos and other
digital graphics. When JPG files are saved, they use "lossy" compression, meaning image
quality is lost as file size decreases. JPEG stands for Joint Photographic Experts Group, the
committee that created the file type. JPG files have the file extension .jpg or .jpeg.

They are the most common file type for images taken with digital cameras, and widely used
for photos and other graphics used on websites.

If file sizes get very low, JPG images will become "muddy." When saving photos and other
images as JPG files for the web, email and other uses, you must decide on this compromise
between qualities and file size.

30
Lossless File Compression

Lossless data compression is a class of data compression algorithms that allows the original
data to be perfectly reconstructed from the compressed data. Lossless data compression is
used in many applications. For example, it is used in the ZIP file format.

Lossless compression is used in cases where it is important that the original and the
decompressed data be identical. Typical examples are executable programs, text documents,
and source code.

Run Length Encoding (RLE) can be used for lossless compression of a number of different
file formats:

Using RLE on Text Data

Consider the following text string: ‘aaaaabbbbccddddd’. Assuming each character requires 1
byte then this string needs 16 bytes. If we assume ASCII code is being used, then the string
can be coded as follows:

a a a a a b b b b c c d d d d d
05 97 04 98 02 99 05 100

This means we have five characters with ASCII code 97, four characters with ASCII code 98,
two characters with ASCII code 99 and five characters with ASCII code 100. Assuming each
number in the second row requires 1 byte of memory, the RLE code will need 8 bytes. This is
half the original file size.

Issue arise if a string such as ‘cdcdcdcdcd’ where RLE compression isn’t very effective. To
cope with this, we use a flag. A flag predicting data indicates that what follows are the number
of repeating units (for example, 255 05 97 where 255 is the flag and the other two numbers
indicate that there are five items with ASCII code 97). When a flag is not used, the next byte(s)
are taken with their face value and a run of 1 (for example, 01 99 means one character with
ASCII code 99 follows).

Example

String aaaaaaaa bbbbbbbbbb c d c d c d eeeeeeee


Code 08 97 10 98 01 99 01 100 01 99 01 100 01 99 01 100 08 101

Original string contains 32 characters and would occupy 32 bytes of storage.

Coded version contains 18 values and would require 18 bytes of storage.

Introducing a flag 255 in this case:

255 08 97 255 10 98 99 100 99 100 99 100 255 08 101

31
This has 15 values therefore it requires 15 bytes of storage. This is a reduction in file size of
about 53% when compared to the original string.

RLE with Images

Example 1: Black & White Images

This figure shows the letter F in a grid where each square requires 1 byte of storage. A white
square has a value 1 and a black square has a value 0.

The 8 × 8 grid would need 64 bytes; the compressed RLE format has 30 values, and therefore
needs only 30 bytes to store the image.

Example 2: Colored Images

Figure shows an object in four colors. Each color is made up of red, green and blue [RGB]
according to the code on the right.

The original image (8 × 8) square would need 3 bytes per square (to include all three RGB
value). Therefore, the uncompressed file for this image is 8 × 8 × 3 = 192 bytes.

RLE code has 192 values, which means the compressed file will be 192 bytes in size. This
gives a file reduction of about 52%. It should be noted that the file reductions in reality will not
be as large as this due to other data which needs to be stored with the compressed file e.g. a
file header.

32

You might also like