A Tutorial On Data Representation - Integers, Floating-Point Numbers, and Characters
A Tutorial On Data Representation - Integers, Floating-Point Numbers, and Characters
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
A Tutorial on Data
Representation
Integers, Floatingpoint
Numbers, and Characters
1.Number Systems
1.1Decimal Base 10 Number System
1.2Binary Base 2 Number System
1.3Hexadecimal Base 16 Number System
1.4Conversion from Hexadecimal to Binary
1.Number Systems
Human beings use decimal base 10 and duodecimal base 12 number systems
for counting and measurements probably because we have 10 fingers and two
big toes. Computers use binary base 2 number system, as they are made from
binary digital components known as transistors operating in two states on
and off. In computing, we also use hexadecimal base 16 or octal base 8
number systems, as a compact form for represent binary numbers.
5.Character Encoding
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
because 8=23.
We shall denote a hexadecimal number in short, hex with a suffix H. Some programming languages denote hex numbers
with prefix 0x e.g., 0x1A3C5F, or prefix x with hex digit quoted e.g., x'C3A4D98B'.
Each hexadecimal digit is also called a hex digit. Most programming languages accept lowercase 'a' to 'f' as well as
uppercase 'A' to 'F'.
Computers uses binary system in their internal operations, as they are built from binary digital electronic components.
However, writing or reading a long sequence of binary bits is cumbersome and errorprone. Hexadecimal system is used as
a compact form or shorthand for binary bits. Each hex digit is equivalent to 4 binary bits, i.e., shorthand for 4 bits, as
follows:
0H(0000B)
(0D)
1H(0001B)
(1D)
2H(0010B)
(2D)
3H(0011B)
(3D)
4H(0100B)
(4D)
5H(0101B)
(5D)
6H(0110B)
(6D)
7H(0111B)
(7D)
8H(1000B)
(8D)
9H(1001B)
(9D)
AH(1010B)
(10D)
BH(1011B)
(11D)
CH(1100B)
(12D)
DH(1101B)
(13D)
EH(1110B)
(14D)
FH(1111B)
(15D)
It is important to note that hexadecimal number provides a compact form or shorthand for representing binary bits.
For examples,
A1C2H=1016^3+116^2+1216^1+2=41410(base10)
10110B=12^4+12^2+12^1=22(base10)
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
2/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
The above procedure is actually applicable to conversion between any 2 base systems. For example,
Toconvert1023(base4)tobase3:
1023(base4)/3=>quotient=25Dremainder=0
25D/3=>quotient=8Dremainder=1
8D/3=>quotient=2Dremainder=2
2D/3=>quotient=0remainder=2(quotient=0stop)
Hence,1023(base4)=2210(base3)
Example 1:
Convert18.6875Dtobinary
IntegralPart=18D
18/2=>quotient=9remainder=0
9/2=>quotient=4remainder=1
4/2=>quotient=2remainder=0
2/2=>quotient=1remainder=0
1/2=>quotient=0remainder=1(quotient=0stop)
Hence,18D=10010B
FractionalPart=.6875D
.6875*2=1.375=>wholenumberis1
.375*2=0.75=>wholenumberis0
.75*2=1.5=>wholenumberis1
.5*2=1.0=>wholenumberis1
Hence.6875D=.1011B
Therefore,18.6875D=10010.1011B
Example 2:
Convert18.6875Dtohexadecimal
IntegralPart=18D
18/16=>quotient=1remainder=2
1/16=>quotient=0remainder=1(quotient=0stop)
Hence,18D=12H
FractionalPart=.6875D
.6875*16=11.0=>wholenumberis11D(BH)
Hence.6875D=.BH
Therefore,18.6875D=12.BH
3/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
a. 108
b. 4848
c. 9000
2. Convert the following binary numbers into hexadecimal and decimal numbers:
a. 1000011000
b. 10000000
c. 101010101010
3. Convert the following hexadecimal numbers into binary and decimal numbers:
a. ABCDE
b. 1234
c. 80F
4. Convert the following decimal numbers into binary equivalent:
a. 19.25D
b. 123.456D
Answers: You could use the Windows' Calculator calc.exe to carry out number system conversion, by setting it to the
scientific mode. Run "calc" Select "View" menu Choose "Programmer" or "Scientific" mode.
1. 1101100B, 1001011110000B, 10001100101000B, 6CH, 12F0H, 2328H.
2. 218H, 80H, AAAH, 536D, 128D, 2730D.
3. 10101011110011011110B, 1001000110100B, 100000001111B, 703710D, 4660D, 2063D.
4. ??
4/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
3.Integer Representation
Integers are whole numbers or fixedpoint numbers with the radix point fixed after the leastsignificant bit. They are contrast
to real numbers or floatingpoint numbers, where the position of the radix point varies. It is important to take note that
integers and floatingpoint numbers are treated differently in computers. They have different representation and are
processed differently e.g., floatingpoint numbers are processed in a socalled floatingpoint processor. Floatingpoint
numbers will be discussed later.
Computers use a fixed number of bits to represent an integer. The commonlyused bitlengths for integers are 8bit, 16bit,
32bit or 64bit. Besides bitlengths, there are two representation schemes for integers:
1. Unsigned Integers: can represent zero and positive integers.
2. Signed Integers: can represent zero, positive and negative integers. Three representation schemes had been proposed
for signed integers:
a. SignMagnitude representation
b. 1's Complement representation
c. 2's Complement representation
You, as the programmer, need to decide on the bitlength and representation scheme for your integers, depending on
your application's requirements. Suppose that you need a counter for counting a small quantity from 0 up to 200, you
might choose the 8bit unsigned integer scheme as there is no negative numbers involved.
Example 1: Suppose that n=8 and the binary pattern is 0100 0001B, the value of this unsigned integer is 12^0 +
12^6=65D.
Example 2: Suppose that n=16 and the binary pattern is0001000000001000B, the value of this unsigned integer is
12^3+12^12=4104D.
Example 3: Suppose that n=16 and the binary pattern is0000000000000000B, the value of this unsigned integer is
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
5/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
0.
An nbit pattern can represent 2^n distinct integers. An nbit unsigned integer can represent integers from 0 to (2^n)1,
as tabulated below:
Minimum
Maximum
(2^8)1(=255)
16
(2^16)1(=65,535)
32
(2^32)1(=4,294,967,295)(9+digits)
64
(2^64)1(=18,446,744,073,709,551,615)
(19+digits)
3.2Signed Integers
Signed integers can represent zero, positive integers, as well as negative integers. Three representation schemes are
available for signed integers:
1. SignMagnitude representation
2. 1's Complement representation
3. 2's Complement representation
In all the above three schemes, the mostsignificant bit msb is called the sign bit. The sign bit is used to represent the sign
of the integer with 0 for positive integers and 1 for negative integers. The magnitude of the integer, however, is
interpreted differently in different schemes.
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
6/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
7/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
8/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Example 2: Subtraction is treated as Addition of a Positive and a Negative Integers: Suppose that
n=8,5D5D=65D+(5D)=60D
65D01000001B
5D11111011B(+
00111100B60D(discardcarryOK)
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
Example 3: Addition of Two Negative Integers: Suppose that
9/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Example 3: Addition of Two Negative Integers: Suppose that n=8, 65D 5D = (65D) + (5D) =
70D
65D10111111B
5D11111011B(+
10111010B70D(discardcarryOK)
Because of the fixed precision i.e., fixed number of bits, an nbit 2's complement signed integer has a certain range. For
example, for n=8, the range of 2's complement signed integers is 128 to +127. During addition and subtraction, it is
important to check whether the result exceeds this range, in other words, whether overflow or underflow has occurred.
The following diagram explains how the 2's complement works. By rearranging the number line, values from 128 to +127
are represented contiguously by ignoring the carry bit.
minimum
maximum
(2^7)(=128)
+(2^7)1(=+127)
16
(2^15)(=32,768)
+(2^15)1(=+32,767)
32
(2^31)(=2,147,483,648)
+(2^31)1(=+2,147,483,647)(9+digits)
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
10/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
64
(2^63)
+(2^63)1(=+9,223,372,036,854,775,807)
(=9,223,372,036,854,775,808)
(18+digits)
Answers
1. The range of unsigned nbit integers is [0,2^n1]. The range of nbit 2's complement signed integer is [2^(n
1),+2^(n1)1];
2. 88(01011000), 0(00000000), 1(00000001), 127(01111111), 255(11111111).
3. +88(01011000), 88(10101000), 1(11111111), 0(00000000), +1(00000001), 128 (1000 0000),
+127(01111111).
4. +88(01011000), 88(11011000), 1(10000001), 0(00000000or10000000), +1(00000001), 127
(11111111), +127(01111111).
5. +88(01011000), 88(10100111), 1(11111110), 0(00000000or11111111), +1(00000001), 127
(10000000), +127(01111111).
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
11/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
A floatingpoint number is typically expressed in the scientific notation, with a fraction F, and an exponent E of a certain
radix r, in the form of Fr^E. Decimal numbers use radix of 10 F10^E; while binary numbers use radix of 2 F2^E.
Representation of floating point number is not unique. For example, the number 55.66 can be represented as
5.56610^1, 0.556610^2, 0.0556610^3, and so on. The fractional part can be normalized. In the normalized form, there
is only a single nonzero digit before the radix point. For example, decimal number 123.4567 can be normalized as
1.23456710^2; binary number 1010.1011B can be normalized as 1.0101011B2^3.
It is important to note that floatingpoint numbers suffer from loss of precision when represented with a fixed number of
bits e.g., 32bit or 64bit. This is because there are infinite number of real numbers even within a small range of says 0.0
to 0.1. On the other hand, a nbit binary pattern can represent a finite 2^n distinct numbers. Hence, not all the real
numbers can be represented. The nearest approximation will be used instead, resulted in loss of accuracy.
It is also important to note that floating number arithmetic is very much less efficient than integer arithmetic. It could be
speed up with a socalled dedicated floatingpoint coprocessor. Hence, use integers if your application does not require
floatingpoint numbers.
In computers, floatingpoint numbers are represented in scientific notation of fraction F and exponent E with a radix of 2,
in the form of F2^E. Both E and F can be positive as well as negative. Modern computers adopt IEEE 754 standard for
representing floatingpoint numbers. There are two representation schemes: 32bit singleprecision and 64bit double
precision.
Normalized Form
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
12/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Let's illustrate with an example, suppose that the 32bit pattern is 11000000101100000000000000000000, with:
S=1
E=10000001
F=01100000000000000000000
In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form of 1.F. In this example, the
actual fraction is 1.01100000000000000000000=1+12^2+12^3=1.375D.
The sign bit represents the sign of the number, with S=0 for positive and S=1 for negative number. In this example with
S=1, this is a negative number, i.e., 1.375D.
In normalized form, the actual exponent is E127 socalled excess127 or bias127. This is because we need to represent
both positive and negative exponent. With an 8bit E, ranging from 0 to 255, the excess127 scheme could provide actual
exponent of 127 to 128. In this example, E127=129127=2D.
Hence, the number represented is 1.3752^2=5.5D.
DeNormalized Form
Normalized form has a serious problem, with an implicit leading 1 for the fraction, it cannot represent the number zero!
Convince yourself on this!
Denormalized form was devised to represent zero and other numbers.
For E=0, the numbers are in the denormalized form. An implicit leading 0 instead of 1 is used for the fraction; and the
actual exponent is always 126. Hence, the number zero can be represented with E=0 and F=0 because 0.02^126=0.
We can also represent very small positive and negative numbers in denormalized form with E=0. For example, if S=1, E=0,
and F=01100000000000000000000. The actual fraction is 0.011=12^2+12^3=0.375D. Since S=1, it is a negative
number. With E=0, the actual exponent is 126. Hence the number is 0.3752^126 = 4.410^39, which is an
extremely small negative number close to zero.
Summary
In summary, the value N is calculated as follows:
For 1E254,N=(1)^S1.F2^(E127). These numbers are in the socalled normalized form. The sign
bit represents the sign of the number. Fractional part 1.F are normalized with an implicit leading 1. The exponent is
bias or in excess of 127, so as to represent both positive and negative exponent. The range of exponent is 126 to
+127.
For E=0,N=(1)^S0.F2^(126). These numbers are in the socalled denormalized form. The exponent
of 2^126 evaluates to a very small number. Denormalized form is needed to represent zero with F=0 and E=0. It can
also represents very small positive and negative number close to zero.
For E=255, it represents special values, such as INF positive and negative infinity and NaN not a number. This is
beyond the scope of this article.
13/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
E=01111110B=126D(innormalizedform)
Fractionis1.1B(withanimplicitleading1)=1+2^1=1.5D
Thenumberis1.52^(126127)=0.75D
Example 4 DeNormalized Form: Suppose that IEEE754 32bit floatingpoint representation pattern is 1
0000000000000000000000000000001.
SignbitS=1negativenumber
E=0(indenormalizedform)
Fractionis0.00000000000000000000001B(withanimplicitleading0)=12^23
Thenumberis2^232^(126)=2(149)1.410^45
Hints:
1. Largest positive number: S=0, E=11111110(254), F=11111111111111111111111.
Smallest positive number: S=0, E=000000001(1), F=00000000000000000000000.
2. Same as above, but S=1.
3. Largest positive number: S=0, E=0, F=11111111111111111111111.
Smallest positive number: S=0, E=0, F=00000000000000000000001.
4. Same as above, but S=1.
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
14/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Precision
Single
Double
NormalizedN(min)
NormalizedN(max)
00800000H
7F7FFFFFH
000000001
00000000000000000000000B
01111111000000000000000000000000B
E=254,F=0
E=1,F=0
N(min)=1.0B2^126
(1.1754943510^38)
N(max)=1.1...1B2^127=(2
2^23)2^127
(3.402823510^38)
0010000000000000H
N(min)=1.0B2^1022
7FEFFFFFFFFFFFFFH
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
N(max)=1.1...1B2^1023=(2
15/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
(2.2250738585072014
10^308)
2^52)2^1023
(1.797693134862315710^308)
Precision
Single
DenormalizedD(min)
DenormalizedD(max)
00000001H
007FFFFFH
00000000000000000000000000000001B
E=0,F=00000000000000000000001B
000000000
11111111111111111111111B
D(min)=0.0...12^126=12^23
2^126=2^149
(1.410^45)
E=0,F=
11111111111111111111111B
D(max)=0.1...12^126=(1
2^23)2^126
(1.175494210^38)
Double
0000000000000001H
D(min)=0.0...12^1022=12^52
2^1022=2^1074
001FFFFFFFFFFFFFH
D(max)=0.1...12^1022=
(12^52)2^1022
(4.910^324)
(4.450147717014402310^308)
Special Values
Zero: Zero cannot be represented in the normalized form, and must be represented in denormalized form with E=0 and
F=0. There are two representations for zero: +0 with S=0 and 0 with S=1.
Infinity: The value of +infinity e.g., 1/0 and infinity e.g., 1/0 are represented with an exponent of all 1's E=255 for
singleprecision and E=2047 for doubleprecision, F=0, and S=0 for +INF and S=1 for INF.
Not a Number NaN: NaN denotes a value that cannot be represented as real number e.g. 0/0. NaN is represented with
Exponent of all 1's E=255 for singleprecision and E=2047 for doubleprecision and any nonzero fraction.
5.Character Encoding
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
16/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
In computer memory, character are "encoded" or "represented" using a chosen "character encoding schemes" aka
"character set", "charset", "character map", or "code page".
For example, in ASCII as well as Latin1, Unicode, and many other character sets:
code numbers 65D(41H) to 90D(5AH) represents 'A' to 'Z', respectively.
code numbers 97D(61H) to 122D(7AH) represents 'a' to 'z', respectively.
code numbers 48D(30H) to 57D(39H) represents '0' to '9', respectively.
It is important to note that the representation scheme must be known before a binary pattern can be interpreted. E.g., the
8bit pattern "01000010B" could represent anything under the sun known only to the person encoded it.
The most commonlyused character encoding schemes are: 7bit ASCII ISO/IEC 646 and 8bit Latinx ISO/IEC 8859x for
western european characters, and Unicode ISO/IEC 10646 for internationalization i18n.
A 7bit encoding scheme such as ASCII can represent 128 characters and symbols. An 8bit character encoding scheme
such as Latinx can represent 256 characters and symbols; whereas a 16bit encoding scheme such as Unicode UCS2
can represents 65,536 characters and symbols.
Hex
SP
"
&
'
<
>
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
17/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
In programming languages such as C/C++/Java, linefeed 0AH is denoted as '\n', carriagereturn 0DH as '\r',
tab 09H as '\t'.
DEC
HEX
Meaning
DEC
HEX
00
Meaning
NUL
Null
17
11
DC1
Device
Control1
01
SOH
Startof
Heading
18
12
DC2
Device
Control2
02
STX
StartofText
19
13
DC3
Device
Control3
03
ETX
EndofText
20
14
DC4
Device
Control4
04
EOT
Endof
21
15
NAK
Negative
Transmission
Ack.
05
ENQ
Enquiry
22
16
SYN
Sync.Idle
06
ACK
Acknowledgment
23
17
ETB
Endof
Transmission
07
BEL
Bell
24
18
CAN
Cancel
08
BS
BackSpace
25
19
EM
Endof
'\b'
Medium
09
HT
HorizontalTab
'\t'
26
1A
SUB
Substitute
10
0A
LF
LineFeed'\n'
27
1B
ESC
Escape
11
0B
VT
VerticalFeed
28
1C
IS4
File
Separator
12
0C
FF
FormFeed'f'
29
1D
IS3
Group
Separator
13
0D
CR
Carriage
30
1E
IS2
Return'\r'
Record
Separator
14
0E
SO
ShiftOut
31
1F
IS1
Unit
Separator
15
0F
SI
ShiftIn
16
10
DLE
Datalink
127
7F
DEL
Delete
Escape
Hex
NBSP
SHY
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
18/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
ISO/IEC8859 has 16 parts. Besides the most commonlyused Part 1, Part 2 is meant for Central European Polish, Czech,
Hungarian, etc, Part 3 for South European Turkish, etc, Part 4 for North European Estonian, Latvian, etc, Part 5 for
Cyrillic, Part 6 for Arabic, Part 7 for Greek, Part 8 for Hebrew, Part 9 for Turkish, Part 10 for Nordic, Part 11 for Thai, Part 12
was abandon, Part 13 for Baltic Rim, Part 14 for Celtic, Part 15 for French, Finnish, etc. Part 16 for SouthEastern European.
Hex
EBCDIC Extended Binary Coded Decimal Interchange Code: Used in the early IBM computers.
19/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
UCS4 Universal Character Set 4 Byte: Uses 4 bytes 32 bits, covering BMP and the supplementary characters.
Bits
Unicode
00000000
UTF8Code
0xxxxxxx
0xxxxxxx
Bytes
1
(ASCII)
11
00000yyy
yyxxxxxx
110yyyyy10xxxxxx
16
zzzzyyyy
yyxxxxxx
1110zzzz10yyyyyy
10xxxxxx
21
000uuuuu
zzzzyyyy
11110uuu10uuzzzz
10yyyyyy10xxxxxx
yyxxxxxx
In UTF8, Unicode numbers corresponding to the 7bit ASCII characters are padded with a leading zero; thus has the same
value as ASCII. Hence, UTF8 can be used with all software using ASCII. Unicode numbers of 128 and above, which are less
frequently used, are encoded using more bytes 24 bytes. UTF8 generally requires less storage and is compatible with
ASCII. The drawback of UTF8 is more processing power needed to unpack the code due to its variable length. UTF8 is the
most popular format for Unicode.
Notes:
UTF8 uses 13 bytes for the characters in BMP 16bit, and 4 bytes for supplementary characters outside BMP 21bit.
The 128 ASCII characters basic Latin letters, digits, and punctuation signs use one byte. Most European and Middle
East characters use a 2byte sequence, which includes extended Latin letters with tilde, macron, acute, grave and other
accents, Greek, Armenian, Hebrew, Arabic, and others. Chinese, Japanese and Korean CJK use threebyte sequences.
All the bytes, except the 128 ASCII characters, have a leading '1' bit. In other words, the ASCII bytes, with a leading
'0' bit, can be identified and decoded easily.
Example: (Unicode:60A8H597DH)
Unicode(UCS2)is60A8H=0110000010101000B
UTF8is111001101000001010101000B=E682A8H
Unicode(UCS2)is597DH=0101100101111101B
UTF8is111001011010010110111101B=E5A5BDH
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
20/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Unicode
UTF16Code
Bytes
xxxxxxxxxxxxxxxx
SameasUCS2no
encoding
000uuuuuzzzzyyyy
yyxxxxxx
(uuuuu0)
110110wwwwzzzzyy110111yy
yyxxxxxx
(wwww=uuuuu1)
Take note that for the 65536 characters in BMP, the UTF16 is the same as UCS2 2 bytes. However, 4 bytes are used for
the supplementary characters outside the BMP.
For BMP characters, UTF16 is the same as UCS2. For supplementary characters, each character requires a pair 16bit
values, the first from the highsurrogates range, \uD800\uDBFF, the second from the lowsurrogates range \uDC00
\uDFFF.
BOM Byte Order Mark : BOM is a special Unicode character having code number of FEFFH, which is used to
differentiate bigendian and littleendian. For bigendian, BOM appears as FEFFH in the storage. For littleendian, BOM
appears as FFFEH. Unicode reserves these two code numbers to prevent it from crashing with another character.
Unicode text files could take on these formats:
Big Endian: UCS2BE, UTF16BE, UTF32BE.
Little Endian: UCS2LE, UTF16LE, UTF32LE.
UTF16 with BOM. The first character of the file is a BOM character, which specifies the endianess. For bigendian, BOM
appears as FEFFH in the storage. For littleendian, BOM appears as FFFEH.
UTF8 file is always stored as big endian. BOM plays no part. However, in some systems in particular Windows, a BOM is
added as the first character in the UTF8 file as the signature to identity the file as UTF8 encoded. The BOM character
FEFFH is encoded in UTF8 as EFBBBF. Adding a BOM as the first character of the file is not recommended, as it may be
incorrectly interpreted in other system. You can have a UTF8 file without BOM.
21/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Simplified
Traditional
Standard
Characters
Codes
GB2312
BACDD0B3
USC2
548C8C10
UTF8
E5928CE8B090
BIG5
A94DBFD3
UCS2
548C8AE7
UTF8
E5928CE8ABA7
Notes for Windows' CMD Users : To display the chinese character correctly in CMD shell, you need to choose the
correct codepage, e.g., 65001 for UTF8, 936 for GB2312/GBK, 950 for Big5, 1201 for UCS2BE, 1200 for UCS2LE, 437 for the
original DOS. You can use command "chcp" to display the current code page and command "chcpcodepage_number" to
change the codepage. You also have to choose a font that can display the characters e.g., Courier New, Consolas or Lucida
Console, NOT Raster font.
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
22/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
publicclassTestCharsetEncodeDecode{
publicstaticvoidmain(String[]args){
//Trythesecharsetsforencoding
String[]charsetNames={"USASCII","ISO88591","UTF8","UTF16",
"UTF16BE","UTF16LE","GBK","BIG5"};
Stringmessage="Hi,!";//messagewithnonASCIIcharacters
//PrintUCS2inhexcodes
System.out.printf("%10s:","UCS2");
for(inti=0;i<message.length();i++){
System.out.printf("%04X",(int)message.charAt(i));
}
System.out.println();
for(StringcharsetName:charsetNames){
//GetaCharsetinstancegiventhecharsetnamestring
Charsetcharset=Charset.forName(charsetName);
System.out.printf("%10s:",charset.name());
//EncodetheUnicodeUCS2charactersintoabytesequenceinthischarset.
ByteBufferbb=charset.encode(message);
while(bb.hasRemaining()){
System.out.printf("%02X",bb.get());//Printhexcode
}
System.out.println();
bb.rewind();
}
}
}
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
23/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
UCS2:00480069002C60A8597D0021[16bitfixedlength]
Hi,!
USASCII:48692C3F3F21[8bitfixedlength]
Hi,??!
ISO88591:48692C3F3F21[8bitfixedlength]
Hi,??!
UTF8:48692CE682A8E5A5BD21[14bytesvariablelength]
Hi,!
UTF16:FEFF00480069002C60A8597D0021[24bytesvariablelength]
BOMHi,![ByteOrderMarkindicatesBigEndian]
UTF16BE:00480069002C60A8597D0021[24bytesvariablelength]
Hi,!
UTF16LE:480069002C00A8607D592100[24bytesvariablelength]
Hi,!
GBK:48692CC4FABAC321[12bytesvariablelength]
Hi,!
Big5:48692CB17AA66E21[12bytesvariablelength]
Hi,!
24/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
publicclassPrintHexCode{
publicstaticvoidmain(String[]args){
inti=12345;
System.out.println("Decimalis"+i);//12345
System.out.println("Hexis"+Integer.toHexString(i));//3039
System.out.println("Binaryis"+Integer.toBinaryString(i));//11000000111001
System.out.println("Octalis"+Integer.toOctalString(i));//30071
System.out.printf("Hexis%x\n",i);//3039
System.out.printf("Octalis%o\n",i);//30071
charc='a';
System.out.println("Characteris"+c);//a
System.out.printf("Characteris%c\n",c);//a
System.out.printf("Hexis%x\n",(short)c);//61
System.out.printf("Decimalis%d\n",(short)c);//97
floatf=3.5f;
System.out.println("Decimalis"+f);//3.5
System.out.println(Float.toHexString(f));//0x1.cp1(Fraction=1.c,Exponent=1)
f=0.75f;
System.out.println("Decimalis"+f);//0.75
System.out.println(Float.toHexString(f));//0x1.8p1(F=1.8,E=1)
doubled=11.22;
System.out.println("Decimalis"+d);//11.22
System.out.println(Double.toHexString(d));//0x1.670a3d70a3d71p3(F=1.670a3d70a3d71E=3)
}
}
In Eclipse, you can view the hex code for integer primitive Java variables in debug mode as follows: In debug perspective,
"Variable" panel Select the "menu" inverted triangle Java Java Preferences... Primitive Display Options Check
"Display hexadecimal values byte, short, char, int, long".
25/26
9/10/2015
ATutorialonDataRepresentationIntegers,Floatingpointnumbers,andcharacters
Feedback, comments, corrections, and errata can be sent to Chua HockChuan (ehchua@ntu.edu.sg) | HOME
https://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
26/26