1 Data Representation
1 Data Representation
Syllabus 2023-2025
2
3
4
Data Representation
Data Representation refers to the methods used internally to represent information stored in
a computer.
Data and instructions cannot be entered and processed directly into computers using human
language. Any type of data be it numbers, letters, special symbols, sound or pictures must first
be converted into machine-readable form i.e. binary form. Due to this reason, it is important to
understand how a computer together with its peripheral devices handles data in its electronic
circuits, on magnetic media and in optical devices.
Number System:
Number System defines a set of values used to represent ‘quantity’. There are different number
systems decimal, binary and hexadecimal. Each system is characterized by its base or radix,
always given in decimal, and the set of permissible digits.
5
Binary Number System:
The binary system is base on the number 2 made up of 1s and 0s. Thus, only the two ‘values’ 0
and 1 can be used in this system to represent each digit. The computer has switches to represent
data and switches have only two states: ON and OFF. A natural fit to the two states of a switch
(0 = OFF, 1 = ON).
One binary digit (0 or 1) is referred to, as a bit, which is short for binary digit.
Each number has a place value which could be put into columns. Each column is a power of
ten in the base 10 system:
6
Converting Binary to Denary
To calculate a binary number like, 10101000, place it in columns of base 2 numbers. Then
add all the base 2 numbers.
27 26 25 24 23 22 21 20
128 64 32 16 8 4 2 1
1 0 1 0 1 0 0 0
128 + 32 + 8 =168
(1010100)2 = (168)10
To calculate a binary number like, 10111010, place it in columns of base 2 numbers. Then
add all the base 2 numbers.
27 26 25 24 23 22 21 20
128 64 32 16 8 4 2 1
1 0 1 1 1 0 1 0
128 + 32 + 16 + 8 + 2 =186
(10111010)2 = (186)10
27 26 25 24 23 22 21 20
128 64 32 16 8 4 2 1
0 1 0 1 0 1 0 0
84 = 64 + 16 + 4
7
Conversion of large denary number into 16-bit number
To calculate a denary number like, 33111, set up the columns of base 2 numbers
1 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1
Method 2:
To calculate a binary number, it involves successive division of denary number by 2.
(73)10 → (?)2
2 73 1
2 36 0
2 18 0
2 9 1
2 4 0
2 2 1
1
(73)10 → (1101001)2
8
Conversion of large denary number into 16-bit number
(34989)10 → (?)2
2 34989 1
2 17494 0
2 8747 1
2 4373 1
2 2186 0
2 1093 1
2 546 0
2 273 1
2 136 0
2 68 0
2 34 0
2 17 1
2 8 0
2 4 0
2 2 0
1
(34989)10 → (1000 1000 1010 1101)2
Bits, kilobits (kbps), and megabits (mbps) are most often used to measure data transfer
speeds. This may refer to how fast you are downloading a file, or how fast your internet
connection is.
Bytes are used to measure data storage. For example, a CD holds 700 MB(Megabytes) of
data and a hard drive may hold 250 GB (Gigabyte).
9
Measurement of the Size of Computer Memories
Name of Memory Size Size Number of bits
0 or 1 1 bit 1 bit
4 bits 1 nibble 4
8 bits 1 Byte 8 bits
1024 byte 1 Kilobyte 210
1024 Kilobyte 1 Megabyte 220
1024 megabyte 1 Gigabyte 230
1024 Gigabyte 1 Terabyte 240
1024 Terabyte 1 Petabyte 250
It should be pointed out here that there is some confusion in the naming of memory sizes. The
IEC convention is now adopted by some organizations. Manufacturers of storage devices
often use the denary system to measure storage size.
For example,
Computers use binary - the digits 0 and 1 - to store data. A binary digit, or bit, is the smallest
unit of data in computing. It is represented by a 0 or a 1.
Computer programs are sets of instructions. Each instruction is translated into machine code.
simple binary codes that activate the CPU. Programmers write computer code and this is
converted by a translator into binary instructions that the processor can execute.
All software, music, documents, and any other information that is processed by a computer,
is also stored using binary.
10
When computers (or microprocessors) are used to control devices (such as robots), registers
are used as part of the control system. The following example describes how registers can be
used in controlling a simple device.
The HEXADECIMAL SYSTEM is very closely related to the binary system. Hexadecimal
(sometimes referred to as simply ‘hex’) is a base 16 system and therefore needs to use 16
different ‘values’ to represent each digit.
Because it is a system based on 16 different digits, the numbers 0 to 9 and the letters A to F
are used to represent each hexadecimal (hex) digit. (A = 10, B = 11, C = 12, D = 13, E = 14
and F = 15.) Using the same method as denary and binary, this gives the headings of 160,
161,162,163 and so on. The typical headings for a hexadecimal number with five digits would
be:
Since 16 = 24 this means that FOUR binary digits are equivalent to each hexadecimal digit.
In given table summarizes the link between binary, hexadecimal and denary.
11
Converting from binary to hexadecimal and from hexadecimal to binary
Converting from binary to hexadecimal is a fairly easy process. Starting from the right and
moving left, split the binary number into groups of 4 bits. If the last group has less than 4 bits,
then simply fill in with 0s from the left. Take each group of 4 bits and convert it into the
equivalent hexadecimal digit using above table. Look at the following two examples to see
how this works.
Example 1
101111100001
B E 1
12
Example 2
10000111111101
2 1 F D
Converting from hexadecimal to binary is also very straightforward. Using the data in the
above table, simply take each hexadecimal digit and write down the 4-bit code which
corresponds to the digit.
Example 3
4 5 A
Using above table, find the 4-bit code for each digit:
010001011010
Example 4
B F 0 8
1011111100001000
13
Example 1
4 5 A
denary number = 1 1 1 4
Example 2
C 8 F
denary number = 3 2 1 5
Denary to hexadecimal
To convert from denary to hexadecimal is a little more difficult. As with the conversion from
binary to denary, there are two very similar methods that can be used. Again, the first method
is ‘trial and error’ and the second method is more methodical and involves repetitive division.
Example 1
Consider the conversion of the denary number, 2004, into hexadecimal. This method involves
placing hexadecimal digits in the appropriate position so that the total equates to 2004:
14
Example 2
This method involves successive division by 16. The remainders are then read from BOTTOM
to TOP to give the hexadecimal value. Again using 2004, we get:
This section reviews five uses of the hexadecimal system. The information in this chapter
gives the reader sufficient grounding in each topic at this level. Further material can be found
by searching the internet, but be careful that you don’t go off at a tangent.
Error codes
MAC addresses
IPv6
HTML colour codes
Error Codes
A MEDIA ACCESS CONTROL (MAC) ADDRESS refers to a number which uniquely identifies
a device on the internet. The MAC address refers to the network interface card (NIC) which is
part of the device. The MAC address is rarely changed so that a particular device can always
be identified no matter where it is.
15
A MAC address is usually made up of 48 bits which are shown as six groups of hexadecimal
digits (although 64-bit addresses are also known):
NN – NN – NN – DD – DD – DD
or
NN:NN:NN:DD:DD:DD
where the first half (NN – NN – NN) is the identity number of the manufacturer of the device
and the second half (DD – DD – DD) is the serial number of the device.
It should be pointed out that there are two types of MAC address: the UNIVERSALLY
ADMINISTERED MAC ADDRESS (UAA) and the LOCALLY ADMINISTERED MAC
ADDRESS (LAA).
A8FB:7A88:FFF0:0FFF:3D21:2085:66FB:F0FA
16
HyperText Mark-up Language (HTML)
HYPERTEXT MARK-UP LANGUAGE (HTML) is used when writing and developing web
pages. HTML isn’t a programming language but is simply a mark-up language. A mark-up
language is used in the processing, definition and presentation of text (for example, specifying
the colour of the text).
HTML uses <tags> which are used to bracket a piece of code; for example, <td> starts a
standard cell in an HTML table, and </td> ends it. Whatever is between the two tags has been
defined. Here is a short section of HTML code:
HTML code is often used to represent colors of text on the computer screen. The values
change to represent different colors. The different intensity of the three primary colors (red,
green and blue) is determined by its hexadecimal value. For example:
and so on producing almost any colour the user wants. There are many websites available
that allow a user to find the HTML code for the color needed.
17
Binary Addition
Example 1
(00100111)2 + (010001010)2
Carry 1 1 1
0 0 1 0 0 1 1 1
+ 0 1 0 0 1 0 1 0
0 1 1 1 0 0 0 1
Overflow
When addition of two 8 bits number produces a 9 digit then 9th digit is known as overflow digit.
Example
(10101010)2 + (11010101)2
Carry
1 0 1 0 1 0 1 0
+ 1 1 0 1 0 1 0 1
Overflow → 1 0 1 1 1 1 1 1 1
18
Logical Binary Shifts
Computers can carry out a logical shift on a sequence of binary numbers. The logical shift
means moving the binary number to the left or to the right. Each shift left is equivalent to
multiplying the binary number by 2 and each shift right is equivalent to dividing the binary
number by 2.
Left Shift
Example
The denary number 21 is 00010101 in binary. If we put this into an 8-bit register.
128 64 32 16 8 4 2 1
0 0 0 1 0 1 0 1
If we now shift the bits in this register one place to the left, we obtain
128 64 32 16 8 4 2 1
0 0 1 0 1 0 1 0
The value of the binary bits is now 21 × 21 = 42. We can see this is correct if we calculate the
denary value of the new binary number 101010 (i.e. 32 + 8 + 2 = 42).
128 64 32 16 8 4 2 1
0 1 0 1 0 1 0 0
128 64 32 16 8 4 2 1
1 0 1 0 1 0 0 0
The value of the binary bits is now 21 × 23 = 168. (i.e. 128 + 32 + 8 = 168).
Now, let’s see what happens if we shift the number to four places left.
128 64 32 16 8 4 2 1
0 1 0 1 0 0 0 0
The left-most 1-bit has been lost. In our 8-bit register the result of 21 × 24 is 80 which is clearly
incorrect. This error is because we have exceeded the maximum number of left shifts possible
using this register.
19
Right Shift
Example
The denary number 200 is 11001000 in binary. If we put this into an 8-bit register.
128 64 32 16 8 4 2 1
1 1 0 0 1 0 0 0
If we now shift the bits in this register one place to the right, we obtain
128 64 32 16 8 4 2 1
0 1 1 0 0 1 0 0
The value of the binary bits is now 200 ÷ 21 = 100. We can see this is correct if we calculate
the denary value of the new binary number 1100100 (i.e. 64 + 32 + 4 = 100).
128 64 32 16 8 4 2 1
0 0 1 1 0 0 1 0
The value of the binary bits is now 200 ÷ 22 = 84. (i.e. 32 + 16 + 2 = 50).
128 64 32 16 8 4 2 1
0 0 0 1 1 0 0 1
The value of the binary bits is now 200 ÷ 23 = 168. (i.e. 16 + 8 + 1 = 25).
Now, let’s see what happens if we shift the number to four places right.
128 64 32 16 8 4 2 1
0 0 0 0 1 1 0 0
The right-most 1-bit has been lost. In our 8-bit register the result of 200 ÷ 24 is 12.5 which is
clearly incorrect. This error is because we have exceeded the maximum number of right shifts
possible using this register.
20
Two’s Complement
-128 64 32 16 8 4 2 1
In 2’s complement the left-most bit is changed to a negative value. For instance, for an 8-bit
number, the value of +128 is changed to -128, but all other values remain same.
For applying 2’s complement, it isn’t always necessary for a binary number to have 8 bits
Method 1
40 = 32 + 16 + 2
-128 64 32 16 8 4 2 1
1 0 1 1 0 0 1 0
Method 2
1100 1100
+ 1
1100 1101 ⟶ This gives us the binary value of -48
Example
32 16 8 4 2 1
0 1 1 1 1 1
-32 16 8 4 2 1
1 0 0 0 0 1
21
Text, Sound & Images
Character Sets – ASCII Code & Unicode
ASCII Code System (American System Code for Information Interchange) was setup in 1963
for use in communication systems and computer systems. Newer version of this code was
published in 1986. The standard ASCII code character set consists of 7-bit codes 0 to 127 in
denary or 00 to 7F in hexadecimal) that represents the letters, numbers and characters found
on a standard keyboard, together with 32 control codes (that use codes 0 to 31 (denary) or 00
to 19 (hexadecimal)).
Following is the part of the ASCII code table (only control codes removed)
For example:
22
a 1 1 0 0 0 0 1 Hex 61 [lower case]
A 1 0 0 0 0 0 1 Hex 41 [upper case]
y 1 1 1 1 0 0 1 Hex 79 [lower case]
Y 1 0 1 1 0 0 1 Hex 59 [upper case]
As shown in the above example 6th bit changes from 1 to 0 when comparing a lowercase
character with upper case character. This makes the operation of conversion b/w lowercase
and uppercase easier. It is also noticeable that the character sets (e.g. a to z, 0 to 9, etc.) are
grouped together in sequence, which speeds up usability.
Extended ASCII uses 8-bit codes (0 to 255 in denary or 00 to FF in hexadecimal). This gives
another 128 codes to allow for characters in Non-English alphabets and for some graphical
characters to be included:
23
Disadvantages of ASCII code are that it is not suitable for most of the languages and there
are multiple different versions of ASCII as shown in the table above. Because of this, different
coding methods have been developed one of which is Unicode.
Unicode can represent all languages of the world, supporting many OS, search engines and
internet browsers around the globe. There is overlap with standard ASCII Code, since the first
128 English characters are same, but Unicode can support thousands of different characters.
ASCII used one byte for representing a character while Unicode can use upto 4 bytes to
represent a character.
Create a Universal Standard that covered all languages and all writing systems.
Produce a more efficient coding system than ASCII.
Adopt uniform encoding where each character is encoded as 16-bit or 32-bit code.
Create unambiguous encoding where each 16-bit and 32-bit value always represents the
same character.
Reserve part of the code for private use to enable a user to assign codes for their own
characters and symbols (Useful for Chinese and Japanese Character sets, for example).
Unicode table below represent multiple characters from multiple different languages like
Russian, Romanian and Croatian:
24
Representation of Sound
Sound Wave have a certain frequency, wavelength and amplitude. Amplitude specifies the
loudness of Sound Wave.
Sound Waves continuously vary which means sound is analogue. Computers cannot work
with analogue data. Since computers cannot work with Analogue data, therefore sound waves
need to be sampled (measuring the amplitude precisely) in order to be stored in computer.
This is done using ADC (Analogue Digital Converter). For conversion sound waves are
sampled at regular intervals.
At time interval 1, the approximate amplitude is 10; at time interval 2, approximate amplitude
is 4; and so on for all time intervals. Because the amplitude ranges from 0 to 10 (as shown in
the figure above), 4 binary bits can be used to represent the amplitude for example 11 would
be represented by the binary value 1011. As the number of possible values (sound amplitude)
increases, accuracy of sampled sound also increases (for example using a range 0 to 127
gives more accurate result than 0 to 10). Number of bits per sample is the Sampling Resolution
(bit depth). For the above example bit depth is 4 bits.
Sampling Rate (Hertz) is equal to the number of sound samples taken per second.
25
How Sampling is Used to Record Sound?
The amplitude of sound waves is first determined at set time intervals (Sampling Rate)
This gives an Approximate representation of the Sound Wave
Each sample of the sound wave is then encoded as a series of binary digits.
Higher sampling rate or larger resolution will result in a more faithful representation of the
original sound. Sampling Rate/Sampling Resolution is directly proportional to the file size.
CD’s have a 16-bit sampling resolution and a 44.1 KHz sample rate – that is 44100 samples
every second. This gives high quality sound reproduction.
Bitmap Images are made up of Pixels (Picture elements); an image is made up of a two-
dimensional matrix of a Pixel. Pixels can take different shapes such as:
Each Pixel can be represented as Binary Number, and so bitmap image is stored in a computer
as a series of binary numbers, so that:
A black and white image only requires 1 bit per pixel – this mean that each pixel can be of
one of two colors, representing 1 or 0.
If each pixel is represented by 2 bits, then each pixel can be of one of four colors (2 2 = 4)
representing the FOUR combinations of 0 & 1 i.e. 00, 01, 10, 11.
If each pixel is represented by 3 bits, then each pixel can be of one of four colors (2 3 = 8)
representing the EIGHT combinations of 0 & 1 i.e. 000, 001, 010, 011, 100, 101, 110, 111.
An 8-bit color depth means that each pixel can be one of 256 colors (28 = 256). Modern
computers have a 24-bit color depth, which means over 16 million different colors can be
represented.
Image resolution refers to the number of pixels that make up an image; for example, an image
contain 4096 × 3072 pixels (12 582 912 pixels in total).
26
Image ‘A’ has the highest resolution and ‘E’ has the lowest resolution. ‘E’ has become
pixelated (fuzzy). This is because there are few pixels in ‘E’ to represent the image.
Drawback of using high resolution images is the increase in file size. As the number of pixels
used to represent the image is increased, the size of the file will also increase. This impacts
on how many images can be stored on, for example, a hard drive. It also impacts on the time
to download an image from the internet or the time to transfer images from device to device.
A certain amount of reduction in resolution of an image is possible before the loss of quality
becomes noticeable.
1 Byte = 8 bits
1 Byte of memory is very less so memory size is measured in the multiples as shown below:
The above system of numbering now only refers to some storage devices but is technically
inaccurate. It is based on SI (base 10) systems of units where 1 kilo is equal to 1000.
Memory size is actually measured in terms of powers of 2, another system has been adopted
by the IEC (International Electro technical Commission) that is based on the binary system.
27
This system is more accurate. Internal memories (such as RAM and ROM) should be
measured using the IEC system. A 64 GB RAM could store 64 × 230 bytes of data i.e.
68 719 476 736 bytes.
Sample Rate (In Hz) × Sample Resolution (In bits) × Length of Sample (In seconds)
Example 1
A photograph is 1024 × 1080 pixels and uses a color depth of 32 bits. How many photographs
of this size would fit onto a memory stick of 64 GB.
Step 1: Multiply number of pixels in vertical and horizontal directions to find total number of
pixels = [1024 × 1080] = 1 150 920 Pixels.
Step 2: Multiply number of pixels by color depth then divide by 8 to give the number of bytes
= 1 105 920 × 32 = 35 389 440/8 bytes = 4 423 680 bytes.
Step 4: Divide the memory stick size by the file size = 68 719 476 736/4 = 423 680 = 15 534
photos.
Example 2
A photograph is 2048 × 2048 pixels and uses a color depth of 16 bits. Find the size of an
image taken by this camera in MB.
Step 1: Multiply number of pixels in vertical and horizontal directions to find total number of
pixels = [2048 × 2048] = 4 194 304 Pixels.
Step 2: Multiply number of pixels by color depth then divide by 16 to give the number of bits
= 4 194 304 × 16 = 67 108 864 bits.
28
Step 3: Divide number of bits by 8 to find number of bytes in the file = 67 108 864/8 =
8 388 608 bytes
Step 4: Divide by 1024 × 1024 to convert to MB = 8 388 608/1 048 576 = 8 MB.
Example 3
An audio CD has a sample rate of 44 100 and a sample resolution of 16 bits. The music been
sampled uses two channels to allow for stereo recording. Calculate the file size for a 60 minute
recording.
Step 1: Size of File = 144 100 × 16 × [60 × 60] = 2 540 160 000 bits
Step 2: Multiply by 2 since there are two channels being used = 5 080 320 000 bits
Step 3: Divide by 8 to find number of bytes = 5 080 320 000/8 = 635 040 000
Step 4: Divide by 1024 × 1024 to convert to MB = 635 040 000/ 1 048 576 = 605 MB.
Data Compression
Calculations show that sound and image files can be very large, therefore, it is necessary to
reduce (compress) file size.
Save storage space on devices such as hard disk drive/solid state drive.
Reduce the time taken to stream a music or video file.
Reduce the time taken to upload, download or transfer a file across a network.
Download/upload process uses up network bandwidth – this is the maximum rate of
transfer of data across a network, measured in bits per second. This occurs whenever file
is downloaded, for example, from a server. Compressed files contain fewer bits of data
than uncompressed files and therefore use less bandwidth, which results in a faster data
transfer rate.
Reduce file size also reduce costs. For example, when using cloud storage, the cost is
based on the size of the files stored. Also an internet service provider (ISP) may charge a
user based on the amount of data downloaded.
29
An image, it may reduce the resolution and/or the bit/color depth.
A sound file, it may reduce the sampling rate and/or the resolution.
MP3
When Internet file-sharing boomed into popularity with Napster and the iPod, the MP3
cornered the market for one reason: it had a small footprint. Without broadband connections,
it was impractical at the time to share file sizes larger than the MP3 standard 2 – 3 Megabytes.
And that preference has stuck for some time now even though MP3 does not have nearly the
same amount of quality as WAV or AIFF files. But despite this growing base of people using
higher quality formats, there are still those who prefer the So, if you have a slower internet
connection or limited hard drive space, MP3 could be your file format of choice. If you’re
worried about quality loss, don’t fret too much about it. While, yes, there is a noticeable drop
off in sound quality, MP3 files fall square under the “good enough” umbrella.
MP4
MP4 is an abbreviated term for MPEG-4 Part 14. It may also be referred to as MPEG-4 AVC,
which stands for Advanced Video Coding. As the name suggests, this is a format for working
with video files and was first introduced in 1998. The MPEG refers to Motion Pictures Expert
Group who is responsible for setting the industry standards regarding digital audio and video.
The MP4 is a container format, allowing a combination of audio, video, subtitles and still
images to be held in the one single file. It also allows for advanced content such as 3D
graphics, menus and user interactivity.
JPEG
JPG files, also known as JPEG files, are a common file format for digital photos and other
digital graphics. When JPG files are saved, they use "lossy" compression, meaning image
quality is lost as file size decreases. JPEG stands for Joint Photographic Experts Group, the
committee that created the file type. JPG files have the file extension .jpg or .jpeg.
They are the most common file type for images taken with digital cameras, and widely used
for photos and other graphics used on websites.
If file sizes get very low, JPG images will become "muddy." When saving photos and other
images as JPG files for the web, email and other uses, you must decide on this compromise
between qualities and file size.
30
Lossless File Compression
Lossless data compression is a class of data compression algorithms that allows the original
data to be perfectly reconstructed from the compressed data. Lossless data compression is
used in many applications. For example, it is used in the ZIP file format.
Lossless compression is used in cases where it is important that the original and the
decompressed data be identical. Typical examples are executable programs, text documents,
and source code.
Run Length Encoding (RLE) can be used for lossless compression of a number of different
file formats:
Consider the following text string: ‘aaaaabbbbccddddd’. Assuming each character requires 1
byte then this string needs 16 bytes. If we assume ASCII code is being used, then the string
can be coded as follows:
a a a a a b b b b c c d d d d d
05 97 04 98 02 99 05 100
This means we have five characters with ASCII code 97, four characters with ASCII code 98,
two characters with ASCII code 99 and five characters with ASCII code 100. Assuming each
number in the second row requires 1 byte of memory, the RLE code will need 8 bytes. This is
half the original file size.
Issue arise if a string such as ‘cdcdcdcdcd’ where RLE compression isn’t very effective. To
cope with this, we use a flag. A flag predicting data indicates that what follows are the number
of repeating units (for example, 255 05 97 where 255 is the flag and the other two numbers
indicate that there are five items with ASCII code 97). When a flag is not used, the next byte(s)
are taken with their face value and a run of 1 (for example, 01 99 means one character with
ASCII code 99 follows).
Example
31
This has 15 values therefore it requires 15 bytes of storage. This is a reduction in file size of
about 53% when compared to the original string.
This figure shows the letter F in a grid where each square requires 1 byte of storage. A white
square has a value 1 and a black square has a value 0.
The 8 × 8 grid would need 64 bytes; the compressed RLE format has 30 values, and therefore
needs only 30 bytes to store the image.
Figure shows an object in four colors. Each color is made up of red, green and blue [RGB]
according to the code on the right.
The original image (8 × 8) square would need 3 bytes per square (to include all three RGB
value). Therefore, the uncompressed file for this image is 8 × 8 × 3 = 192 bytes.
RLE code has 192 values, which means the compressed file will be 192 bytes in size. This
gives a file reduction of about 52%. It should be noted that the file reductions in reality will not
be as large as this due to other data which needs to be stored with the compressed file e.g. a
file header.
32