Material de Preparación - CS - Java
Material de Preparación - CS - Java
Material de Preparación - CS - Java
Table of contents
Data Structures
Binary Tree
Red Black Tree
B Tree
B+ Tree
Questions
Java Collections
Lists
ArrayList
LinkedList
Summary
Map
HashMap
TreeMap
Final Thoughts
Thread Safety
Graphs
Algorithms
Dijkstra
Kruskal
Prim
DFS (Depth First Seach)
BFS (Breadth First Search)
Different techniques for smart List iteration
Find a cycle in a Singly Linked List
Find the nthbefore last element in a Singly LInked List
Find the element in the middle in a Singly Linked List
Sorting algorithms
QuickSort
MergeSort
Concurrency
Processes and Threads
Processes
Threads
Defining and starting a Thread
Synchronization
Intrinsic Locks
High Level Concurrency Objects
1
Non Blocking Algorithm
Immutable Objects
Garbage Collector
Generalities
2
Data Structures
Binary Tree
Binary tree is a tree data structure in which each node has at most two
children, which are referred to as the left child and the right child. Trees are
typically sorted (for every node, children on the left are smaller or equal, and
children on the right are bigger). This is called Sorted Binary Tree or
Binary
Search Tree .
When a Tree is balanced (leaf nodes have the minimum possible depth), the order of
searching a node is O(log n)
2 . However, if the Tree is not balanced, it can be O(n)
. It's
possible to keep the tree balanced, by performing 2 permutations when inserting a node
(Selfbalancing Binary Search Tree ).
There are multiple Tree implementations. The simplest way is to store
Nodes as instances, but its possible to build it as an Array, for better
memory usage.
B-Tree
BTree is a variation of Binary Tree, where internal (nonleaf) nodes
can have a variable number of child nodes within some predefined
range, to reduce tree depth. It's typically used by file systems.
It provides 2 main advantages over binary trees:
Binary search requires multiple comparison (eg: performing a binary search over 1000 elements will require
on average 9 comparisons to locate an element). By grouping elements together, BTrees allow faster read
operations from disk.
Doesn't require so frequent balancing operations.
B+ Tree
B+ tree can be viewed as a Btree in which each node contains only keys (not keyvalue
pairs), and to which an additional level is added at the bottom with linked leaves. The primary
value of a B+ tree is in storing data for efficient retrieval in a blockoriented storage context
in particular, filesystems.
Graph
A graph is a representation of a set of objects where some pairs of objects are
connected by links. Links might be directional (Directed Graph or Digraph ).
3
For storing Directed Graphs, there's a tricky implementation which allows having a lists of
links going in out from each node without duplicating information. Roughly, it uses 2 linked
lists in each node (links going in and out). Each
link is referenced by 2 parents and can have 2
children (next in and out elements). You can
iterate through the in or out linked lists to iterate
through all the links going in and out from each
node.
Questions
● What's the order of inserting an element in a sorted array?
● What is a Binary Search Tree?
● What's a B Tree?
● What's the order of complexity of inserting an item in a Binary Tree?
● What's the difference between a Binary Tree and a B Tree?
4
Java Collections
The following image summarizes the most important
interfaces
(there are more) in
java.util package:
Note: Thread safe interfaces (named Blocking___ Concurrent___
or java.util.concurrent
) are part of package.
linear collection that supports element insertion and removal at both ends. The name
Deque is a deque is short for
"double ended queue" and is usually pronounced "deck".
The basic interfaces have an abstract implementation:
Lists
List
There are several implementations of interface. The most important ones are:
ArrayList LinkedList
, Vector
, and Stack.
5
Notice that
Stackextends Vector. This has been highly questioned, most people agree
these are different concepts.
Stackshould have been an interface, similar to
List
,
Setor
Map. It's recommended to use Deque
instead.
ArrayList
ArrayList, which offers constanttime positional access and is just plain fast. It does not have
to allocate a node object for each element in the List, and it can take advantage of
System.arraycopy when it has to move multiple elements at the same time. Think of ArrayList
as Vector without the synchronization overhead.
ArrayList has one tuning parameter — the initial capacity, which refers to the number of
elements the ArrayList can hold before it has to grow.
LinkedList
If you frequently add elements to the beginning of the List or iterate over the List to delete
elements from its interior, you should consider using LinkedList. These operations require
constanttime in a LinkedList and lineartime in an ArrayList.
6
Adding and removing elements require constanttime in a LinkedList and lineartime in an
ArrayList. But you pay a big price in performance. Positional access requires lineartime in a
LinkedList and constanttime in an ArrayList. Furthermore, the constant factor for LinkedList is
much worse. If you think you want to use a LinkedList, measure the performance of your
application with both LinkedList and ArrayList before making your choice; ArrayList is usually
faster.
Summary
ArrayList LinkedList Vector
Description Resizablearray Doublylinked list Roughly equivalent to
implementation of the List implementation of the List ArrayList, but
interface interface. synchronized.
Operations that index into
the list will traverse the list
from the beginning or the
end, whichever is closer
to the specified index.
7
Map
HashMap1
Hash table based implementation of the Map interface. This implementation provides all of the
optional map operations, and permits null values and the null key. (The HashMap class is
roughly equivalent to HashTable, except that it is unsynchronized and permits nulls.)
This class makes no guarantees as to the order of the map; in particular, it does not
guarantee that the order will remain constant over time.
This implementation provides constanttime performance for the basic operations (get and
put), assuming the hash function disperses the elements properly among the buckets.
Iteration over collection views requires time proportional to the "capacity" of the HashMap
instance (the number of buckets) plus its size (the number of keyvalue mappings). Thus, it's
very important not to set the initial capacity too high (or the load factor too low) if iteration
performance is important.
1
There's good information on HashMap on:
http://javahungry.blogspot.com/2013/08/hashinghowhashmapworksinjavaor.html
8
An instance of HashMap has two parameters that affect its performance: initial capacity and
load factor. The capacity is the number of buckets in the hash table, and the initial capacity is
simply the capacity at the time the hash table is created (
DEFAULT_INITIAL_CAPACITY is
16 in the current implementation). The load factor is a measure of how full the hash table is
allowed to get before its capacity is automatically increased. When the number of entries in
the hash table exceeds the product of the load factor and the current capacity, the hash table
is rehashed (that is, internal data structures are rebuilt) so that the hash table has
approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space
costs. Higher values decrease the space overhead but increase the lookup cost (reflected in
most of the operations of the HashMap class, including get and put). The expected number of
entries in the map and its load factor should be taken into account when setting its initial
capacity, so as to minimize the number of rehash operations. If the initial capacity is greater
than the maximum number of entries divided by the load factor, no rehash operations will ever
occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large
capacity will allow the mappings to be stored more efficiently than letting it perform automatic
rehashing as needed to grow the table.
Note that using many keys with the same hashCode() is a sure way to slow down
performance of any hash table. To ameliorate impact, when keys are Comparable, this class
may use comparison order among keys to help break ties.
Note that this implementation is not synchronized.
In the latest Java 8 release, a great optimization has been added (JEP180). Basically when a
bucket becomes too big (currently: TREEIFY_THRESHOLD = 8), HashMap dynamically
replaces it with an adhoc implementation of tree. This way rather than having pessimistic
O(n) we get much better O(logn). How does it work? Well, previously entries with conflicting
9
keys were simply appended to linked list, which later had to be traversed. Now HashMap
promotes list into binary tree, using hash code as a branching variable. If two hashes are
different but ended up in the same bucket, one is considered bigger and goes to the right. If
hashes are equal (as in our case), HashMap hopes that the keys are Comparable, so that it
can establish some order. This is not a requirement of HashMap keys, but apparently a good
practice. If keys are not comparable, don't expect any performance improvements in case of
heavy hash collisions.
TreeMap
A RedBlack tree based NavigableMap implementation. The map is sorted according to the
natural ordering of its keys, or by a Comparator provided at map creation time, depending on
which constructor is used.
This implementation provides guaranteed log(n) time cost for the containsKey, get, put and
remove operations.
Note that the ordering maintained by a tree map, like any sorted map, and whether or not an
explicit comparator is provided, must be consistent with equals if this sorted map is to
correctly implement the Map interface. This is so because the Map interface is defined in
terms of the equals operation, but a sorted map performs all key comparisons using its
compareTo (or compare) method, so two keys that are deemed equal by this method are,
from the standpoint of the sorted map, equal.
10
Summary
Hashtable HashMap TreeMap
Description Similar to HashMap, but Computes element hash Stores items using a
synchronized to determine a bucket Binary Search Tree.
where elements are Items are sorted in natural
stored. order of their keys.
Final Thoughts
se
1. If you need SortedMap operations or keyordered Collectionview iteration: u
TreeMap
2. If you want maximum speed and don't care about iteration order: use HashMap
3. If you want nearHashMap performance and insertionorder iteration: use
LinkedHashMap
Thread Safety
The above Map implementations are NOT thread safe. You can easily make them thread safe
java.util.Collections.synchronized___
by calling: methods. For instance:
// create map
Map
<
String
,
String> map =
newHashMap<String ,
String >();
map
.
put
(
"hello","world" );
Warning syncMap
: only methods should be used in order for the Map to be safe.
11
synchronizedMapsimply wraps the specified Map instance into a Map implementation which
synchronizes all methods. This allows making thread safe operations to a standard Map, but
performance is bad, as all operations blocks the Map completely.
Java provides a thread safe implementation with better performance: ConcurrentHashMap .
To optimize the performance the Map is divided into different partitions depending upon the
Concurrency level, and ConcurrentHashMap can synchronizes or locks only the certain
portion of the Map, so that we do not need to synchronize the whole Map Object.
java.util.Collections provides methods to make thread safe all standard collections:
● synchronizedCollection(Collection<T> c)
● synchronizedList(List<T> list)
● synchronizedMap(Map<K,V> m)
● synchronizedSet(Set<T> s)
● synchronizedSortedMap(SortedMap<K,V> m)
● synchronizedSortedSet(SortedSet<T> s)
For more recommended reading on the different implementations of data structures on Java,
please visit “
Java Collections Trail
” documentation from Oracle.
Graphs
The Java does not come neither with a pre packed Graph interface nor implementation. In my
humble opinion, the reason for this is that even though graphs are a versatile tool for solving
problems, there are a lot of caveats in the implementation of Graphs. Regardless, having in
mind what a graph is, and how it could be implemented, is very important.
A graph is comprised of a set of vertices and a set of edges. Each edge represents a
connection between two vertices.
Shortest paths in graphs: Given two vertices in a graph, a path is a sequence of edges
connecting them. A shortest path is one with minimal length over all such paths (there
typically are multiple shortest paths).
A graph can be implemented with a 2dimensional matrix. The first dimension is the set of
vertices, and the second dimension is the set of adjacent nodes for each vertex.
12
Algorithms
For most of the algorithms described below the reader can find an implementation in
Princeton’s Java Algorithms web site.
Dijkstra
This is a graph search algorithm that solves the singlesource shortest path problem for a
graph with nonnegative edge path costs, producing a shortest path tree. This algorithm is
often used in routing and as a subroutine in other graph algorithm.
For a given source vertex (node) in the graph, the algorithm finds the path with lowest cost
(i.e. the shortest path) between that vertex and every other vertex. It can also be used for
finding costs of shortest paths from a single vertex to a single destination vertex by stopping
the algorithm once the shortest path to the destination vertex has been determined. For
example, if the vertices of the graph represent cities and edge path costs represent driving
distances between pairs of cities connected by a direct road, Dijkstra's algorithm can be used
to find the shortest route between one city and all other cities. As a result, the shortest path
first is widely used in network routing protocols, most notably ISIS and OSPF (Open Shortest
Path First).
public
class
Dijkstra {
// Dijkstra's algorithm to find shortest path from s to all other nodes
public
static
int
[] dijkstra (WeightedGraph G,
int
s) {
final
int
[] dist =
new
int
[G.size()];
// shortest known distance from "s"
final
int
[] pred =
new
int
[G.size()];
// preceeding node in path
final
boolean
[] visited =
new
boolean
[G.size()];
// all false initially
for
(
int
i=
0
; i<dist.length; i++) {
dist[i] = Integer.MAX_VALUE;
}
dist[s] =
0
;
for
(
int
i=
0
; i<dist.length; i++) {
final
int
next = minVertex (dist, visited);
visited[next] = true;
// The shortest path to next is dist[next] and via pred[next].
final
int
[] n = G.neighbors (next);
for
(
int
j=
0
; j<n.length; j++) {
final
int
v = n[j];
final
int
d = dist[next] + G.getWeight(next,v);
if
(dist[v] > d) {
dist[v] = d;
pred[v] = next;
}
}
}
return
pred;
// (ignore pred[s]==0!)
}
13
private
static
int
minVertex (
int
[] dist,
boolean
[] v) {
int
x = Integer.MAX_VALUE;
int
y =
1
;
// graph not connected, or no unvisited vertices
for
(
int
i=
0
; i<dist.length; i++) {
if
(!v[i] && dist[i]<x) {y=i; x=dist[i];}
}
return
y;
}
public
static
void
printPath (WeightedGraph G,
int
[] pred,
int
s,
int
e) {
final
java.util.ArrayList path =
new
java.util.ArrayList();
int
x = e;
while
(x!=s) {
path.add (
0
, G.getLabel(x));
x = pred[x];
}
path.add (
0
, G.getLabel(s));
System.out.println (path);
}
}
Kruskal
This algorithm aims to find the minimum spanning tree in a graph structure, it is a greedy
algorithm so it’s based on finding the best available solution based on the current context and
the remaining possibilities.
First of all we need to take every edge present on the graph and put in an set (so we can
discard any existing multipath between two nodes) and order them based on its weight.
Then, after ordering, we iterate on each of them adding an edge to the solution if and only if
we aren’t introducing a cycle in the graph represented by the solution set. When every node in
the initial graph gets covered or visited by the algorithm we can finish the iteration and return
the solution set as the minimum spanning tree.
This simple algorithm provides a solution in O(E log V) where E is the number of edges and V
is the number of vertex present in the initial graph. It’s worth to note that this algorithm will find
the spanning forest in a disconnected graph without changes or the needs of running it with a
different starting point (on ea.
14
/**
* Extracted from
http://stackoverflow.com/a/14660751
*/
public class Kruskal {
static class Edge implements Comparable<Edge> {
int v1, v2, wt;
Edge(int v1, int v2, int wt) {
this.v1 = v1;
this.v2 = v2;
this.wt = wt;
}
@Override
public int compareTo(Edge o) {
Edge e1 = (Edge) o;
if (e1.wt == this.wt) {
return 0;
}
return e1.wt < this.wt ? 1 : 1;
}
@Override
public String toString() {
return String.format("V%d \t V:%d \t Cost:%d\n", v1, v2, wt);
}
}
private static List<Edge> kruskal(List<Edge> edges,
HashMap<Integer, Set<Integer>> forest, List<Integer> vertices) {
//first sort the edges
Collections.sort(edges);
ArrayList<Edge> minSpanTree = new ArrayList<>();
while (true) {
// so, while you haven't visited all the vertices at least once
if (edges.isEmpty()) {
// if we doesn't have any more edges then we need to stop
break;
}
Edge check = edges.remove(0); // we take an edge with the min cost available
Set<Integer> visited1 = forest.get(check.v1);
Set<Integer> visited2 = forest.get(check.v2);
if (visited1.equals(visited2)) {
// if we find the same edge we continue with the next one
// so avoid adding any cycles
continue;
}
minSpanTree.add(check); // we add the edge to the solution
visited1.addAll(visited2); // we mark it as visited
visited1.stream().forEach((i) > {
forest.put(i, visited1);
});
if (visited1.size() == vertices.size()) {
// if we visited all vertex then finish
break;
}
15
}
return minSpanTree;
}
public static void main(String[] args) {
// we create a list of vertices
List<Integer> vertices = Arrays.asList(1, 2, 3, 4, 5, 6);
// then we create our forest structure
HashMap<Integer, Set<Integer>> forest = new HashMap<>();
vertices.stream().forEach((vertex) > {
//Each set stores the known vertices reachable from this vertex
//initialize with it self.
Set<Integer> vs = new HashSet<>();
vs.add(vertex);
forest.put(vertex, vs);
});
// and our edges, which defines our graph
List<Edge> graphEdges = new LinkedList<>();
graphEdges.add(new Edge(1, 2, 1));
graphEdges.add(new Edge(2, 3, 1));
graphEdges.add(new Edge(1, 3, 5));
graphEdges.add(new Edge(3, 4, 2));
graphEdges.add(new Edge(1, 4, 5));
graphEdges.add(new Edge(4, 5, 1));
graphEdges.add(new Edge(4, 2, 4));
graphEdges.add(new Edge(4, 6, 3));
graphEdges.add(new Edge(5, 6, 3));
System.out.println("initial graph:");
System.out.println(graphEdges);
long start = new Date().getTime();
// let the algo take flight
List<Edge> result = kruskal(graphEdges, forest, vertices);
long end = new Date().getTime();
System.out.println("minimun spanning tree:");
System.out.println(result);
System.out.println(String.format("total weight %s.", result.stream().map(e > {
return e.wt;
}).reduce((a, b) > {
return a + b;
}).get()));
System.out.println(String.format("execution time %d millis.", (end start)));
}
}
Prim
This is another greedy algorithm that finds a spanning tree in a graph of connected vertices, in
this case if the graph does posses non connected subgraphs the algorithm should be ran
using as initial a vertex one on each connected subgraph.
16
Its main idea is to consider a set of visited vertices and an ordered set of edges, initially we
include any vertex in the visited set and then we take the minimum edge that starts in the
current vertex and ends in a non visited vertex. Once we find this minimum edge, we added it
to the spanning tree set, we mark the end vertex as visited and then we set it as the current
vertex to continue the iteration. The iteration ends when we found that all vertices are visited
or there are no edges left to be used, if there are any vertices to be visited we need to start
the algorithm in one of those vertices since the graph is disconnected.
The time complexity of this algorithm strongly depends in the selection of the structures in use
to hold the remaining edges or the visited vertices set. For example changes in one of those
will affect the procedure that finds the minimum edge at each step of the main iteration. For
the sake of simplicity the next implementation uses simple filtering and removal from the
structures, so it can surely be optimized if desired.
public class Prim {
private static Edge findEdgeFromCurrent(Integer currentVertex,
List<Edge> edges, Set<Integer> visited) {
Edge ret = null;
Optional<Edge> optEdg = edges.stream().filter(e > {
return (e.v1 == currentVertex) && (!visited.contains(e.v2));
}).findFirst();
if (optEdg.isPresent()) {
ret = optEdg.get();
}
return ret;
}
private static void cleanRemainingEdges(List<Edge> edges, Integer current) {
edges.removeIf(e > {
return current == e.v2;
});
}
static class Edge implements Comparable<Edge> {
int v1, v2, wt;
Edge(int v1, int v2, int wt) {
this.v1 = v1;
this.v2 = v2;
this.wt = wt;
}
@Override
public int compareTo(Edge o) {
Edge e1 = (Edge) o;
if (e1.wt == this.wt) {
return 0;
}
17
return e1.wt < this.wt ? 1 : 1;
}
@Override
public String toString() {
return String.format("V%d \t V%d \t Cost:%d\n", v1, v2, wt);
}
}
private static List<Edge> prim(List<Edge> edges, List<Integer> vertices) {
//first sort the edges
Collections.sort(edges);
ArrayList<Edge> minSpanTree = new ArrayList<>();
if (vertices.isEmpty()) {
return null;
}
Set<Integer> visited = new HashSet<>();
Integer currentVertex = vertices.get(0);
visited.add(currentVertex);
while (true) {
// so, while you haven't visited all the vertices at least once
if (edges.isEmpty()) {
// if we doesn't have any more edges then we need to stop
break;
}
// we find the minimun edge in the remaining edges with origin in the
// current vertex and with destination with a non visited vertex
Edge minWeight = findEdgeFromCurrent(currentVertex, edges, visited);
minSpanTree.add(minWeight); // we add the edge to the solution
currentVertex = minWeight.v2;
visited.add(minWeight.v2); // we mark v2 as visited
if (visited.size() == vertices.size()) {
// if we visited all vertex then finish
break;
}
cleanRemainingEdges(edges, currentVertex);
}
return minSpanTree;
}
public static void main(String[] args) {
// we create a list of vertices
List<Integer> vertices = Arrays.asList(1, 2, 3, 4, 5, 6);
// and our edges, which defines our graph
List<Edge> graphEdges = new LinkedList<>();
graphEdges.add(new Edge(1, 2, 1));
graphEdges.add(new Edge(2, 3, 1));
graphEdges.add(new Edge(1, 3, 5));
graphEdges.add(new Edge(3, 4, 2));
graphEdges.add(new Edge(1, 4, 5));
graphEdges.add(new Edge(4, 5, 1));
graphEdges.add(new Edge(4, 2, 4));
graphEdges.add(new Edge(4, 6, 3));
graphEdges.add(new Edge(5, 6, 3));
18
System.out.println("initial graph:");
System.out.println(graphEdges);
long start = new Date().getTime();
// let the algo take flight
List<Edge> result = prim(graphEdges, vertices);
long end = new Date().getTime();
System.out.println("minimun spanning tree:");
System.out.println(result);
System.out.println(String.format("total weight %s.", result.stream().map(e > {
return e.wt;
}).reduce((a, b) > {
return a + b;
}).get()));
System.out.println(String.format("execution time %d millis.", (end start)));
}
}
19
private
void
dfs
(
Digraph G
,
int
v
)
{
count
++;
marked
[
v
]
=
true
;
for
(
int
w
:
G
.
adj
(
v
))
{
if
(!
marked
[
w
])
dfs
(
G
,
w
);
}
}
}
20
Find a cycle in a Singly Linked List
One alternative is using an auxiliary data structure (a Set)
that holds the visited nodes. With one Iterator, you start
traversing the List. Every node you visit, you add it to the
visited auxiliary data structure. The iteration ends when you
reach the final of the list or you find a cycle. Pseudo code:
cycle = false;
foreach(node in inputList) {
if node has not been visited
mark node as visited (add to the data structure)
else break (since there is a loop: cycle = true)
}
return cycle
The best implementation is the "Tortoise & Hare" (Knuth2 ). This algorithm uses 2 pointers
moving at different speeds. Basically, it defines 2 pointers, the first one (tortoise) moving 1
node at a time and the second (hare) moving 2 nodes at a time. If there's a loop, by the time
the tortoise completes 1 loop the hare will complete 2 loops, so pointers should have matched
at some point. Here's a pseudocode implementation:
tortoise := firstNode
hare := firstNode
forever:
ifhare == end
return
'No Loop Found'
hare := hare.next
ifhare == end
return
'No Loop Found'
hare = hare.next
tortoise = tortoise.next
ifhare == tortoise
'Loop Found'
return
Complexity: O(n).
There are some variants of this algorithm to find the first repeating node / value, and the cycle
length. Try writing them. Solutions can be found at:
http://en.wikipedia.org/wiki/Cycle_detection
2
The Art of Computer Programming Donald Knuth
21
th
Find the n -before last element in a Singly LInked List
The most common approach is to first iterate over the list to calculate the size of the list. Then
iterate “size n” times. This works but is not optimal. A simple trick is to use two pointers. The
first one will be used to traverse the whole list. The second one, will be used to find the
th
desired element, by starting from the beginning of the list at the n iteration.
This way, when the first pointer reaches the end of the list, the element referenced by the
th
second pointer will be the n before last element in the list.
Pseudo code:
Node driver = list;
Node nthBeforeLast = null;
for (1 to n) {
check for the end of the list (driver == null)
driver = driver.next();
}
nthBeforeLast = list;
while (driver != null) {
driver = driver.next();
nthBeforeLast = nthBeforeLast.next();
}
return nthBeforeLast
Sorting algorithms
This section only covers two sorting algorithms. The reader is expected to be familiar with
other sorting algorithms like: BubbleSort and InsertionSort.
22
QuickSort
Quicksort 3 is popular because it is not difficult to implement, works well for a variety of
different kinds of input data, and is substantially faster than any other sorting method in typical
applications. It is inplace (uses only a small auxiliary stack), requires time proportional to
O(N*log N) on the average to sort N items, and has an extremely short inner loop.
The basic algorithm
Quicksort is a divideandconquer method for sorting. It works by partitioning an array into two
parts, then sorting the parts independently.
The key of the method is the partitioning process, which rearranges the array to make the
following three conditions hold:
1. The entry a[j] is in its final place in the array, for some j.
2. No entry in a[lo] through a[j1] is greater than a[j].
3. No entry in a[j+1] through a[hi] is less than a[j].
We achieve a complete sort by partitioning, then recursively applying the method to the
subarrays. We use the following general strategy: First, we arbitrarily choose a[lo] to be the
partitioning item—the one that will go into its final position. Next, we scan from the left end of
the array until we find an entry that is greater than (or equal to) the partitioning item, and we
scan from the right end of the array until we find an entry less than (or equal to) the
partitioning item.
public
class
Quick {
// This class should not be instantiated.
public
static
void
sort
(
Comparable
[]
a
)
{
StdRandom
.
shuffle
(
a
);
sort
(
a
,
0
,
a
.
length
1
);
}
// quicksort the subarray from a[lo] to a[hi]
private
static
void
sort
(
Comparable
[]
a
,
int
lo
,
int
hi
)
{
if
(
hi
<=
lo
)
return
;
int
j
=
partition
(
a
,
lo
,
hi
);
sort
(
a
,
lo
,
j
1
);
sort
(
a
,
j
+
1
,
hi
);
}
// partition the subarray a[lo..hi] so that a[lo..j1] <= a[j] <= a[j+1..hi]
// and return the index j.
private
static
int
partition
(
Comparable
[]
a
,
int
lo
,
int
hi
)
{
int
i
=
lo
;
int
j
=
hi
+
1
;
Comparable v
=
a
[
lo
];
while
(
true
)
{
3
Interesting video of the algorithm https://www.youtube.com/watch?v=ywWBy6J5gz8
23
// find item on lo to swap
while
(
less
(
a
[++
i
],
v
))
if
(
i
==
hi
)
break
;
// find item on hi to swap
while
(
less
(
v
,
a
[
j
]))
if
(
j
==
lo
)
break
;
// redundant since a[lo] acts as sentinel
// check if pointers cross
if
(
i
>=
j
)
break
;
exch
(
a
,
i
,
j
);
}
// put partitioning item v at a[j]
exch
(
a
,
lo
,
j
);
// now, a[lo .. j1] <= a[j] <= a[j+1 .. hi]
return
j
;
}
// is v < w ?
private
static
boolean
less
(
Comparable v
,
Comparable w
)
{
return
(
v
.
compareTo
(
w
)
<
0
);
}
// exchange a[i] and a[j]
private
static
void
exch
(
Object
[]
a
,
int
i
,
int
j
)
{
Object swap
=
a
[
i
];
a
[
i
]
=
a
[
j
];
a
[
j
]
=
swap
;
}
}
MergeSort
The algorithm 4 is based on a simple operation known as merging: combining two ordered
arrays to make one larger ordered array: to sort an array, divide it into two halves, sort the two
halves (recursively), and then merge the results.
Mergesort guarantees to sort an array of N items in time proportional to O(N*log N), no matter
what the input. Its prime disadvantage is that it uses extra space proportional to N.
public
class
Merge {
// This class should not be instantiated.
private
Merge
()
{ }
// stably merge a[lo .. mid] with a[mid+1 ..hi] using aux[lo .. hi]
private
static
void
merge
(
Comparable
[]
a
,
Comparable
[]
aux
,
int
lo
,
int
mid
,
int
hi
)
{
4
Interesting video of the algorithm https://www.youtube.com/watch?v=dENca26N6V4
24
// precondition: a[lo .. mid] and a[mid+1 .. hi] are sorted subarrays
assert
isSorted
(
a
,
lo
,
mid
);
assert
isSorted
(
a
,
mid
+
1
,
hi
);
// copy to aux[]
for
(
int
k
=
lo
;
k
<=
hi
;
k
++)
{
aux
[
k
]
=
a
[
k
];
}
// merge back to a[]
int
i
=
lo
,
j
=
mid
+
1
;
for
(
int
k
=
lo
;
k
<=
hi
;
k
++)
{
if
(
i
>
mid
)
a
[
k
]
=
aux
[
j
++];
// this copying is unnecessary
else
if
(
j
>
hi
)
a
[
k
]
=
aux
[
i
++];
else
if
(
less
(
aux
[
j
],
aux
[
i
]))
a
[
k
]
=
aux
[
j
++];
else
a
[
k
]
=
aux
[
i
++];
}
// postcondition: a[lo .. hi] is sorted
assert
isSorted
(
a
,
lo
,
hi
);
}
// mergesort a[lo..hi] using auxiliary array aux[lo..hi]
private
static
void
sort
(
Comparable
[]
a
,
Comparable
[]
aux
,
int
lo
,
int
hi
)
{
if
(
hi
<=
lo
)
return
;
int
mid
=
lo
+
(
hi
lo
)
/
2
;
sort
(
a
,
aux
,
lo
,
mid
);
sort
(
a
,
aux
,
mid
+
1
,
hi
);
merge
(
a
,
aux
,
lo
,
mid
,
hi
);
}
/**
* Rearranges the array in ascending order, using the natural order.
*
@param
a the array to be sorted
*/
public
static
void
sort
(
Comparable
[]
a
)
{
Comparable
[]
aux
=
new
Comparable
[
a
.
length
];
sort
(
a
,
aux
,
0
,
a
.
length
1
);
assert
isSorted
(
a
);
}
}
25
Concurrency
Processes and Threads
In concurrent programming, there are two basic units of execution: processes and threads. In
the Java programming language, concurrent programming is mostly concerned with threads.
However, processes are also important.
A computer system normally has many active processes and threads. This is true even in
systems that only have a single execution core, and thus only have one thread actually
executing at any given moment. Processing time for a single core is shared among processes
and threads through an OS feature called time slicing.
It's becoming more and more common for computer systems to have multiple processors or
processors with multiple execution cores. This greatly enhances a system's capacity for
concurrent execution of processes and threads — but concurrency is possible even on simple
systems, without multiple processors or execution cores.
Processes
A process has a selfcontained execution environment. A process generally has a complete,
private set of basic runtime resources; in particular, each process has its own memory space.
Processes are often seen as synonymous with programs or applications. However, what the
user sees as a single application may in fact be a set of cooperating processes. To facilitate
communication between processes, most operating systems support Inter Process
Communication (IPC) resources, such as pipes and sockets. IPC is used not just for
communication between processes on the same system, but processes on different systems.
Most implementations of the Java virtual machine run as a single process. A Java application
can create additional processes using a ProcessBuilder object. Multiprocess applications are
not covered here.
Threads
Threads are sometimes called lightweight processes. Both processes and threads provide an
execution environment, but creating a new thread requires fewer resources than creating a
new process.
Threads exist within a process — every process has at least one. Threads share the
process's resources, including memory and open files. This makes for efficient, but potentially
problematic, communication.
Multithreaded execution is an essential feature of the Java platform. Every application has at
least one thread — or several, if you count "system" threads that do things like memory
management and signal handling. But from the application programmer's point of view, you
start with just one thread, called the main thread. This thread has the ability to create
additional threads.
26
Defining and starting a Thread
An application that creates an instance of Thread must provide the code that will run in that
thread. There are two ways to do this:
● Provide a Runnable object. The Runnable interface defines a single method, run,
meant to contain the code executed in the thread. The Runnable object is passed to
the Thread constructor
● Subclass Thread. The Thread class itself implements Runnable, though its run method
does nothing. An application can subclass Thread, providing its own implementation of
the run method
Which of these idioms should you use? The first idiom, which employs a Runnable object, is
more general, because the Runnable object can subclass a class other than Thread. The
second idiom is easier to use in simple applications, but is limited by the fact that your task
class must be a descendant of Thread. The first approach, which separates the Runnable
task from the Thread object that executes the task, not only is more flexible, but it is
applicable to the highlevel thread management APIs.
Synchronization
Defining thread safety is surprisingly tricky. The more formal attempts are so complicated as
to offer little practical guidance or intuitive understanding, and the rest are informal
descriptions that can seem downright circular.
A class is threadsafe if it behaves correctly when accessed from multiple threads, regardless
of the scheduling or interleaving of the execution of those threads by the runtime environment,
and with no additional synchronization or other coordination on the part of the calling code.
Threadsafe classes encapsulate any needed synchronization so that clients need not provide
their own.
Threads can communicate by sharing objects, but this comes with a price, since thread
interference and memory consistency errors may occur. The tool needed to prevent these
errors is synchronization.
However, synchronization can introduce thread contention, which occurs when two or more
threads try to access the same resource simultaneously and cause the Java runtime to
execute one or more threads more slowly, or even suspend their execution. Starvation and
livelock are forms of thread contention.
Interference happens when two operations, running in different threads, but acting on the
same data, interleave. This means that the two operations consist of multiple steps, and the
sequences of steps overlap.
Memory consistency errors occur when different threads have inconsistent views of what
should be the same data.
27
The key to avoiding memory consistency errors is understanding the happensbefore
relationship. This relationship is simply a guarantee that memory writes by one specific
statement are visible to another specific statement. To see this, consider the following
example. Suppose a simple int field is defined and initialized:
int counter = 0;
The counter field is shared between two threads, A and B. Suppose thread A increments
counter:
counter++;
Then, shortly afterwards, thread B prints out counter:
System.out.println(counter);
If the two statements had been executed in the same thread, it would be safe to assume that
the value printed out would be "1". But if the two statements are executed in separate threads,
the value printed out might well be "0", because there's no guarantee that thread A's change
to counter will be visible to thread B — unless the programmer has established a
happensbefore relationship between these two statements.
Intrinsic Locks
Synchronization is built around an internal entity known as the intrinsic lock or monitor lock.
(The API specification often refers to this entity simply as a "monitor.") Intrinsic locks play a
role in both aspects of synchronization: enforcing exclusive access to an object's state and
establishing happensbefore relationships that are essential to visibility.
Every object has an intrinsic lock associated with it . By convention, a thread that needs
exclusive and consistent access to an object's fields has to acquire the object's intrinsic lock
before accessing them, and then release the intrinsic lock when it's done with them. A thread
is said to own the intrinsic lock between the time it has acquired the lock and released the
lock. As long as a thread owns an intrinsic lock, no other thread can acquire the same lock.
The other thread will block when it attempts to acquire the lock.
When a thread releases an intrinsic lock, a happensbefore relationship is established
between that action and any subsequent acquisition of the same lock.
When a thread invokes a synchronized method, it automatically acquires the intrinsic lock for
that method's object and releases it when the method returns. The lock release occurs even if
the return was caused by an uncaught exception.
You might wonder what happens when a static synchronized method is invoked, since a static
method is associated with a class, not an object. In this case, the thread acquires the intrinsic
lock for the Class object associated with the class. Thus access to class's static fields is
controlled by a lock that's distinct from the lock for any instance of the class.
28
Another way to create synchronized code is with synchronized statements. Unlike
synchronized methods, synchronized statements must specify the object that provides the
intrinsic lock.
Synchronized statements are also useful for improving concurrency with finegrained
synchronization. Suppose, for example, class MsLunch has two instance fields, c1 and c2,
that are never used together. All updates of these fields must be synchronized, but there's no
reason to prevent an update of c1 from being interleaved with an update of c2 — and doing
so reduces concurrency by creating unnecessary blocking. Instead of using synchronized
methods or otherwise using the lock associated with this, we create two objects solely to
provide locks
Reentrant Synchronization
Recall that a thread cannot acquire a lock owned by another thread. But a thread can acquire
a lock that it already owns. Allowing a thread to acquire the same lock more than once
enables reentrant synchronization. This describes a situation where synchronized code,
directly or indirectly, invokes a method that also contains synchronized code, and both sets of
code use the same lock. Without reentrant synchronization, synchronized code would have to
take many additional precautions to avoid having a thread cause itself to block.
29
has a lot of overhead and generally entails a lengthy interruption. For lockbased classes with
finegrained operations (such as the synchronized collections classes, where most methods
contain only a
few operations), the ratio of scheduling overhead to useful work can be quite high when the
lock is frequently contended.
Volatile variables are a lighterweight synchronization mechanism than locking because they
do not involve context switches or thread scheduling. However, volatile variables have some
limitations compared to locking: while they provide similar visibility guarantees, they cannot be
used to construct atomic compound actions. This means that volatile variables cannot be
used when one variable depends on another, or when the new value of a variable depends on
its old value. This limits when volatile variables are appropriate, since they cannot be used to
reliably implement common tools such as counters or mutexes.
Locking has a few other disadvantages. When a thread is waiting for a lock, it cannot do
anything else. Locking is simply a heavyweight mechanism for finegrained operations such
as incrementing a counter.
For finegrained operations, there is an alternate approach that is often more efficient than the
optimistic approach, whereby you proceed with an update, hopeful that you can complete it
without interference. This approach relies on collision detection to determine if there has been
interference from other parties during the update, in which case the operation fails and can be
retried (or not). The optimistic approach is like the old saying, "It is easier to obtain
forgiveness than permission", where "easier" here means "more efficient".
Processors designed for multiprocessor operation provide special instructions for managing
concurrent access to shared variables. Today, nearly every modern processor has some form
of atomic readmodifywrite instruction, such as compareandswap (CAS) or
loadlinked/storeconditional. Operating systems and JVMs use these instructions to
implement locks and concurrent data structures, but until Java 5.0 they had not been
available directly to Java classes.
An algorithm is called nonblocking if failure or suspension of any thread cannot cause failure
or suspension of another thread; an algorithm is called lockfree if, at each step, some thread
can make progress. Algorithms that use CAS exclusively for coordination between threads
can be both nonblocking and lockfree. An uncontended CAS always succeeds, and if
multiple threads contend for a CAS, one always wins and therefore makes progress.
Nonblocking algorithms are also immune to deadlock or priority inversion.
30
Immutable Objects
Immutable objects are those whose internal state cannot be mutated (be changed) after the
object has been created. The internal state of an object includes all of its members (even the
references to other objects and their internal state).
The following rules define a simple strategy for creating immutable objects.
1. Don't provide "setter" methods — methods that modify fields or objects referred to by
fields.
2. Make all fields final and private.
3. Don't allow subclasses to override methods. The simplest way to do this is to declare
the class as final. A more sophisticated approach is to make the constructor private
and construct instances in factory methods.
4. If the instance fields include references to mutable objects, don't allow those objects to
be changed:
○ Don't provide methods that modify the mutable objects.
○ Don't share references to the mutable objects. Never store references to
external, mutable objects passed to the constructor; if necessary, create
copies, and store references to the copies. Similarly, create copies of your
internal mutable objects when necessary to avoid returning the originals in your
methods.
31
Garbage Collector5
Generalities
Garbage collection is the process of looking at heap memory, identifying which objects are in
use and which are not, and deleting the unused objects. An in use object, or a referenced
object, means that some part of your program still maintains a pointer to that object. An
unused object, or unreferenced object, is no longer referenced by any part of your program.
So the memory used by an unreferenced object can be reclaimed.
In Java, process of deallocating memory is handled automatically by the garbage collector.
The basic process can be described as follows.
Step 1: Marking
The first step in the process is called marking. This is where the garbage collector identifies
which pieces of memory are in use and which are not.
All objects are scanned in the marking phase to make this determination. This can be a very
time consuming process if all objects in a system must be scanned.
Step 2: Normal Deletion
Normal deletion removes unreferenced objects leaving referenced objects and pointers to free
space.
5
Visit the Oracle documentation for more detail
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
32
Step 2a: Deletion with compacting
To further improve performance, in addition to deleting unreferenced objects, you can also
compact the remaining referenced objects. By moving referenced object together, this makes
new memory allocation much easier and faster.
Smarter garbage collection: JVM Generations
As stated earlier, having to mark and compact all the objects in a JVM is inefficient. As more
and more objects are allocated, the list of objects grows and grows leading to longer and
longer garbage collection time. However, empirical analysis of applications has shown that
most objects are short lived.
The information learned from the object allocation behavior can be used to enhance the
performance of the JVM. Therefore, the heap is broken up into smaller parts or generations.
The heap parts are: Young Generation, Old or Tenured Generation, and Permanent
Generation
The Young Generation is where all new objects are allocated and aged. When the young
generation fills up, this causes a minor garbage collection . Minor collections can be
optimized assuming a high object mortality rate. A young generation full of dead objects is
collected very quickly. Some surviving objects are aged and eventually move to the old
generation.
33
Stop the World Event All minor garbage collections are "Stop the World" events. This
means that all application threads are stopped until the operation completes. Minor garbage
always
collections are Stop the World events.
The Old Generation is used to store long surviving objects. Typically, a threshold is set for
young generation object and when that age is met, the object gets moved to the old
generation. Eventually the old generation needs to be collected. This event is called a major
garbage collection .
Major garbage collection are also Stop the World events. Often a major collection is much
slower because it involves all live objects. So for Responsive applications, major garbage
collections should be minimized. Also note, that the length of the Stop the World event for a
major garbage collection is affected by the kind of garbage collector that is used for the old
generation space.
The Permanent generation contains metadata required by the JVM to describe the classes
and methods used in the application. The permanent generation is populated by the JVM at
runtime based on classes in use by the application. In addition, Java SE library classes and
methods may be stored here.
Classes may get collected (unloaded) if the JVM finds they are no longer needed and space
may be needed for other classes. The permanent generation is included in a full garbage
collection.
The GC process
Now that you understand why the heap is separated into different generations, it is time to
look at how exactly these spaces interact. The pictures that follow walks through the object
allocation and aging process in the JVM.
1. First, any new objects are allocated to the eden space. Both survivor spaces start out
empty.
2. When the eden space fills up, a minor garbage collection is triggered.
34
3. Referenced objects are moved to the first survivor space. Unreferenced objects are
deleted when the eden space is cleared
4. At the next minor GC, the same thing happens for the eden space. Unreferenced
objects are deleted and referenced objects are moved to a survivor space. However,
in this case, they are moved to the second survivor space (S1). In addition, objects
from the last minor GC on the first survivor space (S0) have their age incremented and
get moved to S1. Once all surviving objects have been moved to S1, both S0 and
eden are cleared. Notice we now have differently aged object in the survivor space.
5. At the next minor GC, the same process repeats. However this time the survivor
spaces switch. Referenced objects are moved to S0. Surviving objects are aged. Eden
and S1 are cleared
35
6. This slide demonstrates promotion. After a minor GC, when aged objects reach a
certain age threshold (8 in this example) they are promoted from young generation to
old generation
7. As minor GCs continue to occur objects will continue to be promoted to the old
generation space
8. So that pretty much covers the entire process with the young generation. Eventually, a
major GC will be performed on the old generation which cleans up and compacts that
space
36
37
Questions
Subject Questions
Experience What's your business objective
Experience What companies have you worked for?
Experience What was the most interetesting project you've worked on?
Experience Which were your greatest challenges?
Experience What type of development you've worked on? Backend / Frontend?
Experience Mention 3 things you are good at, and 3 weaknesses.
Experience Describe each of the project's you've worked on.
Programming What is Dependency Injection?
Programming What is AOP?
Programming What's the relation between Dependency Injection and AOP?
Programming When would you use and interface, and when an abstract class?
Collections What's the difference in performace between a TreeMap, ArrayList and
HashTable? (consider standard operations: insert and search)
Collections How do HashMap, HashSet, HashTable, work?
What is the order of complexity of the operations in average, in the worst case,
and how to avoid worst cases?
List [List] What is the difference between an ArrayList and a LinkedList
List [List] Describe the underlying structure of an ArrayList
List [List] What happens when you try to insert an element into an ArrayList and the
underlying array is full ?
List [List] Describe the underlying structure of a LinkedList
List [List] Describe an algorithm to find a loop in a singly linked list (without using any
additional data structure nor changing the nodes)
List If you insert and remove 1M elements from an ArrayList, does the capacity
shrink?
Map [Map] Describe the underlying structure of a HashMap
Map Is it possible for 2 different people to have the same HashCode?
Map HashTable vs HashMap. What are the differences? When would you use one or
the other?
HashMap [Map] What happens when two keys end up in the same bucket in a HashMap
Map [Map] What is the relationship between capacity and loadfactor in a HashMap
Map [Map] What happens with a HashMap with 2 million elements when: searching for
an element, inserting an element. What if all the entries fall into the same bucket ?
38
Map [Map] What is the order of complexity (big O notation) for put, get, both average
and worst case in a HashMap and in a TreeMap
Map [Map] Which Java structure would you use to associate a key with a value, but
maintain the order of insertion
Tree [Tree] Describe a Binary Tree
Tree [Tree] What is the order of complexity (big O notation) for put, get, both average
and worst case
Tree [Tree] Describe a Balanced Tree and how is different from a Binary Tree
Tree [Tree] What is the order of complexity (big O notation) for put, get, both average
and worst case (what happens with re balancing)
Graphs What is a DiGraph?
Graphs [Graphs] Describe an algorithm to find the shortest path from node A to B in a
directed graph
Graphs [Graphs] Describe an algorithm to find a loop in a directed graph
Language [Language] How equals and hashCode are related
Language [Language] What is the difference between checked and unchecked exceptions
Language [Language] What is a finally block ? When would you use it ?
Language [Language] What is the difference between an abstract class and an interface
Language [Language] What is the difference between Iterator and Enumeration ?
Concurrency What is a thread pool? What is an executor? How would you use it?
Concurrency [Concurrency] What does it mean for a method to be synchronized ?
Concurrency [Concurrency] What is the difference between a synchronized method and a
synchronized block of code
Concurrency [Concurrency] What happens if i have one class with two "synchronized" methods
(m1 and m2), I create an instance of the class, and have two threads invoking the
methods at the same time, thread1 invokes m1, thread2 invokes m2
Concurrency [Concurrency] What is the difference between an instrinsic lock and a Lock
(interface)
Concurrency [Concurrency] What is the word volatile for ?
Concurrency [Concurrency] What is a deadlock? Provide an example on how can ocurr.
Concurrency [Concurrency] What are the methods wait and notify for?
Concurrency [Concurrency] What happens when you synchronize a static method
Spring [Spring] What is Dependency Injection
Spring [Spring] How can you inject a service using spring ? (difference between
constructor and set)
Spring [Spring] What is the scope of a bean. And what are the possible scopes. Can you
create custom scopes
39
Spring [Spring] What is the difference between Singleton scope and Prototype scope ?
Spring [Spring] How would you use Prototype scope for ?
Spring [Spring] I have a Spring project with Hibernate and we use Maven. We want to
have 2 separate deployments in 2 servers, one in New york the other in London.
One uses an OracleDB and the other MySQL. How can you approach the solution
with these tools, having the same code base?
Spring [Spring] What is AOP, and how is related to Injection
DB [DB] What is JPA. How is different from Hibernate ?
DB [DB] What is the difference between JPA and JDBC Statements. What is the
impact on performance
Web [Web] What is REST
Web [Web] How are REST and SOAP related?
Web [Web] How can a servlet achieve great level of concurrency. Do you know the
servlet lifecycle ?
GC [GC] Mention the different memory generations
GC [GC] Can the GC be invoked?
GC [GC] Briefly describe the GC process
Algorithms Write an algorithm to check if there's loop in a singly linked list (without using any
additional data structure nor changing the nodes)
Algorithms Modify the above program to find the cycle length
Algorithms Modify the above program to find the first repeating node
Algorithms Write a code that looks for the shortest path between 2 nodes in a digraph
Algorithms How would you implement a memcache for a RDBMS using standard Java
Collections
Algorithms Write an algorithm which reverts a string
Algorithms Write an algorithm which transposes a matrix in place.
Algorithms Write an algorithm which reverts a single linked list
40