J02a-JavaCharsStrings

Uploaded by

yesshakez

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

J02a-JavaCharsStrings

Uploaded by

yesshakez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Java Characters and Strings

Object Oriented Programming

Version 1.2.0
© Marco Torchiano, 2023
Primitive types
Type Size Encoding
boolean 1 bit -
char 16 bits Unicode UTF16
byte 8 bits Signed integer 2C
short 16 bits Signed integer 2C
int 32 bits Signed integer 2C
long 64 bits Signed integer 2C
float 32 bits IEEE 754 sp
double 64 bits IEEE 754 dp
void -
3
Literals
§ Literal values for chars and strings
follow the C syntax
¨ Non printable characters are introduce by
a ‘\’ backslash
¨ Chars
'a' '%' '\n’
¨ Strings
"prova" "prova\n"

4
Characters and Strings
§ Characters
¨ Primitive
type char
¨ Wrapper class Character

§ String
¨ No primitive representation!
¨ Class String
¨ Class StringBuffer
- and class StringBuilder
CHARACTERS
Wrapper Character
§ Encapsulates a single character
¨ Immutable (like all wrapper classes)
§ Utility methods for the char category
¨ isLetter()
¨ isDigit()
¨ isSpaceChar()
§ Utility methods for conversions
¨ toUpperCase()
¨ toLowerCase()

7
Unicode
§ Standard that assigns a unique code
to every character in any language
¨ Core specification gives the general
principles
¨ Code charts show representative glyphs
for all the Unicode characters.
¨ Annexes supply detailed normative
information
¨ Character Database normative and
informative data for implementers
http://www.unicode.org/versions/latest/
Characters and Glyphs
§ Character: the abstract concept
¨ e.g. LATIN SMALL LETTER I
§ Glyph: the graphical representation of
a character

¨ e.g. i i i ii
§ Font: a collection of glyphs
Unicode Codepoint
§ Codepoint: the numeric
representation of a character
¨ Included in the range 0 to 10FFFF16 (21
bits)
¨ Represented with U+ followed by the
hexadecimal code
¨ e.g. U+0069 for 'i'
Unicode Encoding
Mapping between byte sequence and
code point.
§ UTF-32 fixed width, 32 bits per char
Ä most 23 used: memory occupation
§ UTF-16 variable width, represents
¨ codepoints from Ù+0` to Ù+d7ff` on
16 bits (2 bytes)
¨ codepoints from Ù+10000` to
Ù+10ffff` on 32 bits (4 bytes)
Unicode Encoding
§ UTF-8 variable width,
¨ codepoints
Ù+00` to Ù+7f` are
mapped directly to bytes,
- i.e. ASCII transparent
¨ High bit (0x80) marks multi byte code
¨ Most non-ideographic codepoints are
represented on 1 or 2 bytes
- e.g. Ù+00C8` representing character ‘è’ is
mapped to two bytes: `0xC3` `0xA8`.
Character set
§ Class Charset allows handling
different charsets
§ A few static methods
¨ defaultCharset()
¨ forName(..)
- Returns the corresponding charset
¨ availableCharsets()
- Returns a map of all charsets by name
Predefined charsets
§ US-ASCII
¨ 7-bit ASCII, a.k.a. ISO646-US
§ ISO-8859-1
¨ 8-bit single byte ISO Latin No. 1, a.k.a. ISO-LATIN-1
§ UTF-8
¨ 8-bit multi byte UCS Transformation Format
§ UTF-16BE
¨ 16-bit UCS Transformation Fmt., big-endian
§ UTF-16LE
¨ 16-bit UCS Transformation Fmt., little-endian
§ UTF-16
¨ 16-bit UCS Transformation Fmt., w/byte-order mark
Encoding and Decoding
§ Convenience methods
¨ CharBuffer decode(ByteBuffer)
¨ ByteBuffer decode(CharBuffer)

§ Generation of codecs
¨ getDecoder()
¨ getEncoder()

- Warning: decoder and encoder have an internal

state
- e.g. awaiting next byte of a multi-byte representation
Encoding mismatch
§ Using an encoding scheme to decode a
string encoded with a different scheme
§ E.g.
¨ Character ‘è’ has Unicode codepoint
`U+00C8` which is mapped in UTF-8 to two
bytes: `0xC3` `0xA8`, while IS0-8859-1
decoding interprets the above sequence as
the two characters ‘Ã¨’
¨ Viceversa, ‘è’ in IS0-8859-1 is represented
as 0xE8 which is an invalid character in
UTF-8 (usually represented as �)
STRINGS
String and StringBuffer
§ Class String
¨ Not modifiable / Immutable
§ Class StringBuffer / StringBuilder
¨ Modifiable / Mutable

String s = new String("literal");

StringBuffer sb = new StringBuffer("lit");

18
Operator +
§ It is used to concatenate 2 strings
"This is " + "a concatenation"
¨ Remember: strings are immutable,
therefore + creates a new string object
with the concatenation
§ Works also with other types
¨ Everythingis automatically converted to a
string representation and concatenated
System.out.println("pi = " + 3.14);
System.out.println("x = " + x);

19
String methods
§ int length()
¨ returns string length
§ boolean equals(String s)
¨ comparesthe contents of two strings
String h = "Hello";
String w = "World";
String hw = "Hello World";
String h_w = h + " " + w;
hw.equals(h_w) // -> true
hw == h_w // -> false

20
String methods
§ String toUpperCase()
¨ Converts string to upper case
§ String toLowerCase()
¨ Converts string to lower case
§ String concat(String str)
¨ Creates a concatenation with the given string
§ int compareTo(String str)
¨ Compare to another string returning
- < 0 : if this string precedes the other
- == 0 : if this string equals the other
- > 0 : if this string follows the other

21
Method subString
§ String subString(int startIndex)
"Human".subString(2) à "man"
§ String subString(int start,int end)
¨ Char start included, end excluded
"Greatest".subString(0,5) à"Great"
§ int indexOf(String str)
¨ Returns the index of the first occurrence of str
§ int lastIndexOf(String str)
¨ The same as before but search starts from the
end

22
String (static methods)
§ String valueOf(..)
¨ Converts any primitive type into a String
¨ Overloads defined for all primitive types

§ String format(String fmt, …)

¨ Builds a string using the format string
¨ Similar format as C printf()

23
Format essentials
Max width or
decimal digits
Start at 1 Min width for floats

%[arg_index$][flags][width][.prec]conversion

F Result C Conversion
- left justified b boolean
+ include sign s string
0 0 padding d integer
( Neg in parenthesis f decimal
e scientific

24
StringBuffer
§ Represents a string of characters
§ It is mutable and allows operation that
modify the content
§ Can be converted to the
corresponding String using the
method toString()
StringBuffer
§ append(String str)
¨ Inserts str at the end of string
§ insert(int offset, String str)
¨ Inserts str starting from offset position
§ delete(int start, int end)
¨ Deletes character from start to end (excluded)
§ reverse()
¨ Reverses the sequence of characters

They all return a StringBuffer enabling chaining

26
Class StringBuilder
§ Method-level compatible with
StringBuffer
§ Non thread safe and non reentrant
§ More efficient: ~30% faster
Performance issues
String s=""; StringBuffer sb =
new StringBuffer();
for(i=0;i<N;++i){ for(i=0;i<N;++i){ 2.9 ms
s += i; sb.append(i); N = 100k
} }

StringBuilder sb =
2 sec new StringBuilder();
N = 100k
for(i=0;i<N;++i){
2.2 ms
sb.append(i); N = 100k
}

Three order of magnitudes difference!

String pooling
§ Class String maintains a private static
pool of distinct strings
§ Method intern()
¨ Checks if any string in the pool equals()
¨ If not, adds the string to the pool
¨ Returns the string in the pool
§ For each string literal the compiler
generates code using intern() to keep
a single copy of the string
String internalization
public static final void main(){
char chars[]= {'H','i'};
String s1 = new String(chars);
String s2 = new String(chars);
String i1 = s1.intern();
String i2 = s2.intern();
}

30
String internalization
char chars[]= {'H’,'i'}; String pool
String s1 = new String(chars);
String s2 = new String(chars);
String i1 = s1.intern();
String i2 = s2.intern();
: String
"Hi"

31
String internalization
char chars[]= {'H’,'i'}; String pool
String s1 = new String(chars);
String s2 = new String(chars);
String i1 = s1.intern();
String i2 = s2.intern();
: String
"Hi"

s1 : String
"Hi"
s2

32
String internalization
char chars[]= {'H’,'i'}; String pool
String s1 = new String(chars);
String s2 = new String(chars);
String i1 = s1.intern();
String i2 = s2.intern();
: String
"Hi"

s1 : String
"Hi"
s2

i1
.intern() adds the string to the pool
since none equal already exists
33
String internalization
char chars[]= {'H’,'i'}; String pool
String s1 = new String(chars);
String s2 = new String(chars);
String i1 = s1.intern();
String i2 = s2.intern();
: String
"Hi"

s1 : String
"Hi"
s2

i1
i2
.intern() returns the already
existing equal string 34
Internalizing literals
String ss1 = "Hi";
¨ Generates the same code as:
String ss1 = (new String(
new char[]{'H', 'i'})
).intern();
¨ On the first occurrence of a literal
- compiler creates the string and
- adds it to the pool
¨ Upon later occurrences of a literal
- compiler creates a string and
- through intern returns reference to the one in the pool
Wrap-up
§ Java characters are stored a 16 bits unicode
§ Conversion to/from streams of bytes is
managed by Charset objects
§ String is immutable representation of
strings
§ StringBuffer are mutable
¨ Significantly more efficient for string
manipulation

36
References
§ Unicode specification
¨ http://www.unicode.org/versions/latest/

§ Standard ECMA-94 “8-Bit Single Byte

Coded Graphic Character Sets - Latin
Alphabets No. 1 to No. 4”
¨ https://www.ecma-
international.org/publications/standards
/Ecma-094.htm

OCAIRS AssessmentForms
100% (4)
OCAIRS AssessmentForms
50 pages
English: Quarter 4 - Module 1: Judge The Validity of The Evidence From The Text
95% (21)
English: Quarter 4 - Module 1: Judge The Validity of The Evidence From The Text
28 pages
Arm A330
100% (1)
Arm A330
581 pages
Multicycle Path PDF
No ratings yet
Multicycle Path PDF
28 pages
Continental Diesel Engine Reference Guide: Make Fits Application Engine Details
0% (1)
Continental Diesel Engine Reference Guide: Make Fits Application Engine Details
1 page
Java - Unit-5
No ratings yet
Java - Unit-5
50 pages
Inbound 4387717977174206770
No ratings yet
Inbound 4387717977174206770
62 pages
presentation6(1)
No ratings yet
presentation6(1)
54 pages
String and StringBuilder
0% (1)
String and StringBuilder
42 pages
4 Strings
No ratings yet
4 Strings
62 pages
Strings Class
No ratings yet
Strings Class
27 pages
String and I/O
No ratings yet
String and I/O
91 pages
Java Strings
No ratings yet
Java Strings
7 pages
Block-3 MS-024 Unit-3
No ratings yet
Block-3 MS-024 Unit-3
18 pages
UNIT - 5 Final
No ratings yet
UNIT - 5 Final
63 pages
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1-1
No ratings yet
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1-1
36 pages
String Handling Module 3
No ratings yet
String Handling Module 3
34 pages
4.using Java Objects
No ratings yet
4.using Java Objects
35 pages
Strings
No ratings yet
Strings
19 pages
Chapter 9
No ratings yet
Chapter 9
68 pages
Java - String Class: Creating Strings
No ratings yet
Java - String Class: Creating Strings
5 pages
Class String: Serializable Charsequence Comparable String
No ratings yet
Class String: Serializable Charsequence Comparable String
16 pages
JAVA - MCA 2021 - UNIT 3 - Student - 230926 - 204632
No ratings yet
JAVA - MCA 2021 - UNIT 3 - Student - 230926 - 204632
193 pages
Chapter 4
No ratings yet
Chapter 4
24 pages
Strings in Java
No ratings yet
Strings in Java
13 pages
10.string Handling
No ratings yet
10.string Handling
44 pages
Strings: Steven Skiena
No ratings yet
Strings: Steven Skiena
20 pages
Stringbuffer in Java
No ratings yet
Stringbuffer in Java
22 pages
JAVA Library
No ratings yet
JAVA Library
55 pages
UNIT-5 Java String Handling
No ratings yet
UNIT-5 Java String Handling
72 pages
Characters Strings and StringBuilder
No ratings yet
Characters Strings and StringBuilder
21 pages
Strings
No ratings yet
Strings
11 pages
Java and Unicode: The Confusion About String and Char in Java
No ratings yet
Java and Unicode: The Confusion About String and Char in Java
15 pages
Java String Class
No ratings yet
Java String Class
32 pages
Session 6
No ratings yet
Session 6
22 pages
Module 3 Ajava
No ratings yet
Module 3 Ajava
28 pages
Java String Class n Annotations (1)
No ratings yet
Java String Class n Annotations (1)
16 pages
String
No ratings yet
String
10 pages
Theory PDF
No ratings yet
Theory PDF
24 pages
M3-Advanced Java...
No ratings yet
M3-Advanced Java...
61 pages
Strings
No ratings yet
Strings
9 pages
Lesson 1
No ratings yet
Lesson 1
27 pages
Strings Handling
No ratings yet
Strings Handling
25 pages
String Handling(String Class)
No ratings yet
String Handling(String Class)
45 pages
OOP2017_9
No ratings yet
OOP2017_9
28 pages
Java Strings Slides
No ratings yet
Java Strings Slides
72 pages
Strings: Developed by
No ratings yet
Strings: Developed by
14 pages
Java String Class Tutorial
No ratings yet
Java String Class Tutorial
5 pages
03a Character and String Processing
No ratings yet
03a Character and String Processing
18 pages
unit II
No ratings yet
unit II
38 pages
Java String
No ratings yet
Java String
17 pages
module1 String handling-AJ-BIS402
No ratings yet
module1 String handling-AJ-BIS402
63 pages
L - 9 String Manipulation
No ratings yet
L - 9 String Manipulation
19 pages
Java 4th Unit
No ratings yet
Java 4th Unit
46 pages
09slide 4
No ratings yet
09slide 4
29 pages
x (Lec-25 or 30)Java SE(String Handling)
No ratings yet
x (Lec-25 or 30)Java SE(String Handling)
17 pages
Java String
No ratings yet
Java String
35 pages
String Interface Package Exception
No ratings yet
String Interface Package Exception
154 pages
Java Notes 2
No ratings yet
Java Notes 2
89 pages
String Handling in Java
No ratings yet
String Handling in Java
3 pages
Skip To Contenta
No ratings yet
Skip To Contenta
10 pages
Stringsnotes PDF
No ratings yet
Stringsnotes PDF
20 pages
07 Slide
No ratings yet
07 Slide
32 pages
Strings in Java
No ratings yet
Strings in Java
12 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Unit-3 Technical Writing, Grammer and Editing
No ratings yet
Unit-3 Technical Writing, Grammer and Editing
43 pages
Fa2 Mock Test 3
No ratings yet
Fa2 Mock Test 3
14 pages
Cardiology
No ratings yet
Cardiology
613 pages
Block Approach Butterfield
No ratings yet
Block Approach Butterfield
5 pages
Insurance and Risk Management
No ratings yet
Insurance and Risk Management
6 pages
Literary Genre On Creative Multimedia Presentation
No ratings yet
Literary Genre On Creative Multimedia Presentation
21 pages
ISO 50001: Setting The Standard For Industrial Energy Management
No ratings yet
ISO 50001: Setting The Standard For Industrial Energy Management
4 pages
Cognitive Learning Theory Module
No ratings yet
Cognitive Learning Theory Module
18 pages
3191636_E_20230411
No ratings yet
3191636_E_20230411
23 pages
Book Review-Leading Geeks
No ratings yet
Book Review-Leading Geeks
10 pages
2131-Article Text-9603-1-10-20210217
No ratings yet
2131-Article Text-9603-1-10-20210217
6 pages
Introduction Basic Welding Technology
100% (1)
Introduction Basic Welding Technology
17 pages
SOP of Milkshake revised 2-2
No ratings yet
SOP of Milkshake revised 2-2
8 pages
Electromagnetic Lock: Application Scenario
No ratings yet
Electromagnetic Lock: Application Scenario
2 pages
I Am Sam Analysis
No ratings yet
I Am Sam Analysis
4 pages
DENAIR Oil-Free Air Compressor
No ratings yet
DENAIR Oil-Free Air Compressor
10 pages
CCPP Final Report
No ratings yet
CCPP Final Report
325 pages
System Software
No ratings yet
System Software
3 pages
Physiotherapy After Stroke PDF
No ratings yet
Physiotherapy After Stroke PDF
8 pages
383809KX PDF
No ratings yet
383809KX PDF
17 pages
Spanish Grammar Manual
No ratings yet
Spanish Grammar Manual
380 pages
Pan Pearl River Delta Physics Olympiad 2014: V 10 M/s V
No ratings yet
Pan Pearl River Delta Physics Olympiad 2014: V 10 M/s V
10 pages
48 Yak54 EXP Manual
No ratings yet
48 Yak54 EXP Manual
27 pages
Story of Hayagreeva
No ratings yet
Story of Hayagreeva
2 pages
University Law College Quetta
No ratings yet
University Law College Quetta
3 pages