Codeforces Tutorial
Codeforces Tutorial
In this lecture, we are trying to improve your data structures skills, stay with us and click on read
more. Important data structures :
Trees
Trees are one of the most useful data structures.A tree is a connected-acyclic graph.There are too
many types of trees, like : rooted trees, weighted trees, directed trees, tries, etc.
Partial sum
There are two types of problems solvable by partial sum.
1.Problems which you are asked to answer some queries about the sum of a part of elements
(without modify queries).
Solution of all of this problems are the same. You just need to know how to solve one of them.
Example : You are asked some queries on an array a1,a2,...a,n. Each query give you
numbers l and r and you should print al+al+1+...+ar .
Solution : You need to build another array s1,s2,...,sn which si=a1+a2+...+ai and answer is sr-
sl-1 .
2.Problems which you are asked to perform some queries asking you to modify a part of elements
(without printing queries.)
Solution of all of this problems are the same. You just need to know how to solve one of them.
Example : You need to perform some queries on an array a1,a2,...a,n. Each query give you
numbers l, r and v and for each i such that lir you should increase ai by v, and then after
performing all queries, you should print the whole array.
Solution : You should have another array p1,p2,...,pn which, all of its members are initially 0, for
each query, you should increase pl byv and decrease pr+1 by v .
An then, for each i, starting from 1 you should increase pi by pi-1. So, final array would
be a1+p1,a2+p2,...,an+pn .
Disjoint sets
Disjoint sets are also useful data structures. Using them is fast and easy. We use theme in many
algorithms, like Kruskal's and Prim's.
Disjoint sets, or DSU (Disjoint Sets Union) as their name, are sum sets. Imagine we have some
boxes and some tools and initially each tool is in one box. Mostly, we are given some queries and
ask to merge two boxes or print the members of a box or find which box is some tool in.
For rest of these, let's consider that initially there is exactly one tool in a box.That is, we have n tools
and n boxes and initially, tool number i is in box number i.
Trees
Trees are the most useful containers for DSU. For each vertex, we keep it's parent (and parrent of
the root is -1). So, initially are parents are set to -1, and we have queries to find the root of each
box(having the root, we can easily find the box's index) and queries for merging two trees. For better
time complexity, every time we want to find the root of each vertex, we set it's parent to the root for
the next queries.And while merging, we always want to minimize the height of the tree, so when we
want to merge the boxes, it's like we put all the tools of the box with fewer tools in the other box.
The best way I've seen to code this kind of DSU, is style of bmerry : (C++)
In the code above, for each root v, par[v] equals the negative of number of tools in that box.
Arrays, vectors
We keep tools in a vector (or an array) and when we have a query to merge two boxes, we put all
the tools of the box with fewer tools in the other box.
The time complexity is good because for each tool, we take and put it in an other box at
most log(n) times (each time the size of the vector will be at least doubled).
Problems : Hamro and tools, TROY Query (Join the group ACM-OI first)
Tries
Tries are some kind of rooted trees in which each edge has a character on it. Actually, trie is some
kind of DFA (Determining Finite Automata). For a bunch of strings, their trie is the smallest rooted
tree with a character on each edge and each of these strings can be build by writing down the
characters in the path from the root to some node.
It's advantage is, LCP (Longest Common Prefix) of two of these strings is the LCA (Lowest
Common Ancestor) of their nodes in the trie(a node that we can build the string by writing down the
characters in the path from the root to that node).
Suffix array
Suffix array is a data structure that helps you sort all the suffixes in lexicography order.
One) Non-deterministic algorithm : Use Robin-Carp and for check if a suffix is lexicographically less
than another one, find their LCPusing binary search + hash and then check the next character after
their LCP.
Code :
namespace HashSuffixArray
const int
int N;
char * S;
int sa[MAXN];
void buildSA()
{
N = strlen(S);
hPow[0] = 1;
for (int i = 1; i <= N; ++i)
hPow[i] = hPow[i - 1] * BASE;
h[N] = 0;
for (int i = N - 1; i >= 0; --i)
h[i] = h[i + 1] * BASE + S[i], sa[i] = i;
stable_sort(sa, sa + N, sufCmp);
}
Two) Deterministic algorithm : We sort them log(MaxLength) steps, in the i-th step (counting
from 0), we sort them according to their first 2i characters and put the suffixes whit the same prefix
with 2i characters in the same buckets.
Code :
/*
namespace SuffixArray
{
const int MAXN = 1 << 21;
char * S;
int N, gap;
int sa[MAXN], pos[MAXN], tmp[MAXN], lcp[MAXN];
void buildSA()
{
N = strlen(S);
REP(i, N) sa[i] = i, pos[i] = S[i];
for (gap = 1;; gap *= 2)
{
sort(sa, sa + N, sufCmp);
REP(i, N - 1) tmp[i + 1] = tmp[i] + sufCmp(sa[i], sa[i +
1]);
REP(i, N) pos[sa[i]] = tmp[i];
if (tmp[N - 1] == N - 1) break;
}
}
void buildLCP()
{
for (int i = 0, k = 0; i < N; ++i) if (pos[i] != N - 1)
{
for (int j = sa[pos[i] + 1]; S[i + k] == S[j + k];)
++k;
lcp[pos[i]] = k;
if (k)--k;
}
}
} // end namespace SuffixArray
(Codes by mukel)
Heaps
A heap is a binary rooted tree (a rooted tree that each node has at most 2 children) and each vertex
has a value.
Heap property : Heap usually has a property, like the value of each vertex is equal to or greater than
the value of its child(ren) (we call this a max heap). We can use heaps in heap sort.
Fibonacci heaps
A fibonacci heap is a kind of heap with better complexities. We don't need to know what a fibonacci
heap is.C++ already has one,priority_queue.
Red-black trees
A red-black tree is a kind of BST that after each query, BST will be balanced in such a way that it's
height remains O(log(n)).
Unfortunately, set has not any function to find the k-th smallest minimum or find the index of an
element, bust there is a data structure in C++ with does it in O(log(n))(also contains
all set functions), tree :
#include<bits/stdc++.h>
#include<ext/pb_ds/assoc_container.hpp>
#include<ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
using namespace std;
template <typename T>
using ordered_set = tree<T, null_type, less<T>, rb_tree_tag,
tree_order_statistics_node_update>;
int main(){
ordered_set<int> s;
s.insert(1);
s.insert(3);
cout << s.order_of_key(2) << endl; // the number of elements in the s less
than 2
cout << *s.find_by_order(0) << endl; // print the 0-th smallest number in
s(0-based)
}
(Thanks to Swift for syntax using!)
You can read more about it, just google sgi STL.
SQRT Decomposition
Suppose we have an array a1,a2,...,an and . We partition this array into k pieces each
containing k elements of a.
Doing this, we can do a lot of things in . Usually we use them in the problems with modify
and ask queries.
Problems : Holes, DZY Loves Colors, RMQ (range minimum query) problem
Sparse Table
The main problem that we can solve is RMQ problem, we have an array a1,a2,...,an and some
queries. Each query gives you numbers l and r (lr) and you should print the value
of min(al,al+1,...,ar) .
Solving using Sparse Table : For each i that 1in and for each j that 0j and i+2j-1n,
we keep the value of min(ai ,ai+1,...,ai+2j-1) in st[i][j] (preprocess) : (code is 0-based)
And then for each query, first of all, find the maximum x such that 2xr-l+1 and answer
is max(st[l][x],st[r-2x+1][x]) .
So, the main idea of Sparse Table, is to keep the value for each interval of length 2k (for each k).
You can use the same idea for LCA problem and so many other problems.
In this kind of decomposition, we have some chains, and each vertex belongs to only one chain.
There is at most one such child for each vertex v. If we consider the path from any vertex v to the
root, there will be at most log(n) light edges there (go from v to the root, every time we see a light
edge, size of subtree will be at least doubled). So, the number of chains on the way = O(log(n)) .
In each of these chains, we can contain a container or another data structure like segment tree or
etc.
Problem : GRASS PLANTING
Fenwick
Suppose that we have n elements numbered from 1 to n.
Fenwick or BIT(Binary Indexed Tree) is a data structure with n nodes that node number i has some
information about elements in the interval (i- i&-i,i] .
Actually, you don't need to know what each node contains. The only thing you should know, it this
(then you can change and convert it) :
We have an array a1,a2,...,an and all of them are initially 0. We are gives some queries,
1.increase ap by val and print a1+a2+...+ap .
Only thing you should know is that how to solve this problem using Fenwick (and then you can
change it and solve so many problems).
We perform each query in O(log(n)). Code : (1-based)
int fen[MAX_N];
int ans = 0;
ans += fen[i];
return ans;
Segment tree
We have an array of elements and some queries on intervals. So, we will be glad if we can split this
interval to O(log(n)) intervals that we have actually some information about them.
Segment tree does that for you. Segment tree is a tree that each of it's nodes belongs to an interval.
Each node, has 0 or two children. Left and right. If a node's interval is [l,r) and l+1r, the
interval of its children will be [l,mid) and[mid,r) in order where , so the height of this
tree will be O(log(n)) .
Each node has an index, we consider that root has index 1 and the children of a vertex x will have
indices 2x and 2x+1 in order.
Segment tree is the most useful data structure and every problem solvable by Fenwick is also
solvable by Segment tree.
If the size of the root's interval is n, segment tree could have up to 4n nodes.
To split an interval to some nodes of this tree, we will act like this :
Suppose that S is the set of nodes which their union is [x,y) and no two different nodes in S have
nonempty intersection.
A node i with interval [l,r) is in S if and only if xlry and if it has a parent with
interval [b,e), x>l or r>y .
C++ code :
vector<int> s;
void split(int x,int y, int id = 1,int l = 0, int r = n){// id is the index of
the node
if(x >= r or l >= y) return ; // in this case, intersect of [l,r) and
[x,y) is empty
if(x <= l && r <= y){
s.push_back(id);
return ;
}
int mid = (l+r)/2;
split(x,y,id * 2,l,mid);
split(x,y,id * 2 + 1,mid,r);
}
Example :
1. S l r, Print al,al+1,...,ar
2. M p x, Modify ap to x, it means ap=x .
First of all we need to build the segment tree, for each node we keep the sum of its interval, for
node i we call it s[i], so we should build the initial segment tree.
Modify function :
s[id] += x - a[p];
if(r - l < 2){ // l = r - 1 = p
a[p] = x;
return ;
}
int mid = (l + r)/2;
if(p < mid)
modify(p, x, id * 2, l, mid);
else
modify(p, x, id * 2 + 1, mid, r);
}
Lazy propagation
Imagine we have updates on intervals, what should we do ?
Example :
1. S l r, Print al,al+1,...,ar
2. I l r x, for each i such that li<r, increase ai by x.
We shouldn't update all the nodes in this interval, just the maximal ones, then pass it to children
when we need. This trick is called Lazy Propagation, so we should have another array lazy (for
nodes) which are initially 0 and every time we want to perform increase query,
increase lazy[id] with x.
So, build function will be same as above. But we need some more functions :
void upd(int id,int l,int r,int x){// increase all members in this interval by x
lazy[id] += x;
s[id] += (r - l) * x;
}
upd(id, l, r, v);
return ;
shift(id, l, r);
int mid = (l+r)/2;
increase(x, y, v, id * 2, l, mid);
shift(id, l, r);
Problems : GSS1, GSS3, MULTQ3, DQUERY, KQUERY, POSTERS, PATULJCI, New Year
Domino, Copying Data, DZY Loves Fibonacci Numbers, FRBSUM
For this propose, you got a data structure and somehow, you save the version of that data structure.
The most useful data structure for this propose is segment tree, I will explain persistent segment tree
and all other data structures (like Fenwick) are like that.
We have an array a1,a2,...,an and at first q update queries and then u ask queries which you have
to answer online.
Each update query gives you numbers p and v and asks you to increase ap by v .
Each ask query, gives you three numbers i and x and y and asks you to print the value
of ax+ax+1+...+ay after performing i-thquery.
Each update query, changes the value of O(log(n)) nodes in the segment tree, so you should keep
rest of nodes (not containing p) and create log(n) new nodes. Totally, you need to
have q.log(n) nodes. So, you can not use normal segment's indexing, you should keep the index of
children in the arrays L and R.
If you update a node, you should assign a new index to its interval (for i-th query).
You should keep an array root[q] which gives you the index of the interval of the root ( [0,n) ) after
performing each query and a number ir=0 which is its index in the initial segment tree (ans of
course, an array s[MAXNODES] which is the sum of elements in that node). Also you should have
a NEXT_FREE_INDEX = 1 which is always the next free index for a node.
s[id] = a[l];
return ;
build(L[id], l, mid);
Update function : (its return value, is the index of the interval in the new version of segment tree
and id is the index of old one)
(For the first query (with index 0) we should run root[0]=upd(p, v, ir) and for the rest of them,
for j-th query se should run root[j]=upd(p, v, root[j-1]) )
+526
PrinceOfPersia
8 days ago
40
Comments (40)
Write comment?
7 days ago, # |
+6
7 days ago, # |
+3
7 days ago, # |
-11
M.D thanks :)
Reply
7 days ago, # |
Rev. 3 +27
I didn't know that priority_queue is a Fibonacci heap. BTW, are you sure? cplusplus.com and cppreferenc
push() works in logarithmic time.
nic11
P.S.1 This link (Hamro and tools problem in DSU section) gives access error: link
P.S.2 Isn't "sparse" a correct spelling?
Reply
7 days ago, # |
0
Great intro. Still have 1 question. Does order_set works only in g++? How it is going to work in Visual C++
edogrigqv2 Reply
7 days ago, # ^ |
0
7 days ago, # |
+16
Btw, tree isn't part of C++ standard, it is extension of GNU C++. You can read more about this data
structures hereand here :)
Also about persistent data structures. One can be interested in this: #TphcLk
(k-th order statistics in the array segment using persistent bit trie. . Fully online, works with
adamant numbers also!)
Also you can use this structure to answer all queries from this problem.
Reply
3 days ago, # ^ |
0
o
3 days ago, # ^ |
+1
3 days ago, # ^ |
0
Got it, thanks! The official solution is also based on trie. Any specific adv
speedy03 segment tree?
Reply
3 days ago, # ^ |
0
7 days ago, # |
0
The is probably mistake in DSU implementation. par[x] += par[y] ??? What is this ?
edogrigqv2 Reply
7 days ago, # ^ |
+1
Read it !
In the code above, for each root v, par[v] equals the negative of number of tools in that box.
PrinceOfPersia
So, par[x] = -sizeofbox(x), par[y] = -sizeofbox(y). so, par[x] + par[y] = -sizeofbox(x unuion y).
Reply
7 days ago, # ^ |
0
Have I understood it correct now ? par[v] shows the parent of v, if v is not the root, o
edogrigqv2 negative number of nodes in group. 2 in 1 array!!!
Reply
7 days ago, # ^ |
0
Yep !
PrinceOfPersia Reply
7 days ago, # |
+25
I thought c++ priority queues are binary heap. As far as I remember Fibonacci heaps has huge constant f
share source where you get this information?
ikbal
Reply
7 days ago, # ^ |
+24
7 days ago, # ^ |
+21
edogrigqv2 In my Visual C++ 2013 header of I have found make_heap and push_heap that use the same
void push(const value_type& _Val)
{ // insert value in priority order
c.push_back(_Val);
push_heap(c.begin(), c.end(), comp);
}
Maybe g++ uses Fibonacci heap. But it is no more part of STL. cplusplus.com usually hints if i
compilers something maybe different. Here it only states that priority_queue uses make_heap,
pop_heap.
I also have found this.
Reply
7 days ago, # |
0
Fenwick can be 0-based! Change i+=i&-i to i|=i+1 and change i-=i&-i to i=(i&(i+1))-1
Reply
Bugman
3 days ago, # ^ |
Rev. 2 +7
3 days ago, # ^ |
0
3 days ago, # ^ |
+30
3 days ago, # ^ |
+5
:)
Reply
Bugman
7 days ago, # |
0
7 days ago, # |
+7
4 days ago, # |
Rev. 4 +3
At "Arrays, vectors", you said, "So time complexity would be O(n.log(n)) ."
However, I think the time complexity is O(n). The count of copied item is (1 + 2 + 4 + ... + N) where N < n
shu_mj non-negative integer. So the count of copied item is (N * 2 1), if original array-modify involved, the cou
+ n). Time complexity is O(n).
Reply
Perlik's blog
Implicit cartesian tree in GNU C++ STL.
As it turned out, the rope is actually implemented in some versions of STL, for example, in SGI STL,
and I should say that it's the most complete documentation of this class I've ever seen. And now let's
find the rope in GNU C++. I used this task for testing. Here you should quickly move the block [l,r] to
the beginning of the array 105 times and besides the size of array is not greater than 105:
#include <iostream>
#include <cstdio>
#include <ext/rope> //header with rope
using namespace std;
using namespace __gnu_cxx; //namespace with rope and some additional stuff
int main()
{
ios_base::sync_with_stdio(false);
rope <int> v; //use as usual STL container
int n, m;
cin >> n >> m;
for(int i = 1; i <= n; ++i)
v.push_back(i); //initialization
int l, r;
for(int i = 0; i < m; ++i)
{
cin >> l >> r;
--l, --r;
rope <int> cur = v.substr(l, r - l + 1);
v.erase(l, r - l + 1);
v.insert(v.mutable_begin(), cur);
}
for(rope <int>::iterator it = v.mutable_begin(); it != v.mutable_end(); ++it)
cout << *it << " ";
return 0;
}
It works perfectly, but 2 times slower than the handwritten implicit cartesian tree, but uses less
memory. As far as I can see, GNU C++ has the same implementation of rope as SGI STL. Visual
C++ doesn't support the rope class.
There are several points, that you should know about rope in C++. From SGI STL's documentation
it's known that the rope doesn't cope well with modifications of single elements (that's why begin()
and end() return const_iterator. If you want to use classic iterators, you should call mutable_begin()
and mutable_end().). But my tests show that it's pretty fast (about ). Simultaneously
operator += performes for O(1) (Of course, if we don't consider the time needed to construct the
object on the right side).
You can see some features with [ ] operator in the following code: http://pastebin.com/U8rG1tfu.
Since developers want to maintain the rope in permanent state, the operator [ ] returns const
reference, but there is a special method to overcome it. This solution works with the same speed.
Futhermore, I forgot to mention that all iterators are RandomAccess.
If you test this container, please, tell about your experience in comments. In my opinion, we got a
pretty fast array with complexity for all operations :)
+285
Perlik
12 months ago
10
Comments (10)
Write comment?
12 months ago, # |
Rev. 2 +97
12 months ago, # ^ |
+43
MikhailRubinchik
Reply
12 months ago, # ^ |
0
Cool! I have also found the patricia trie and splay tree there as well as the binomial heap and s
Perlik structures, but it seems to me that they are incomplete.
Reply
12 months ago, # ^ |
+87
Well, Last night I was investigating contents of pb_ds (Politics based data strucutre
by default in g++. I've found that implementations of binary search trees there are a
they support traveling with node iterators (such as moving to the left/right child, dere
everything we need), splitting, mergeing and even contatining additional information
its recalculation after every change of tree structure (such as size of subtree, minim
Zlobober
There is also forward-linked list and bunch of hashmaps (I didn't tested yet).
That trees are really what we could use in our contests, If it hadn't linear-time split.
to the developers of that library with some questions about why they didn't impleme
Waiting for the answer from them.
Reply
12 months ago, # ^ |
0
What's their point of doing this in Linear time when it can be done in con
really weird! Nice effort by the way I really appreciate it :)
12 months ago, # ^ |
0
Excuse me, but in which file did you find the linear split method ? As from
adelnobel
this page
http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/tree_based_containers
You will find this: "These methods are efficient red-black trees are spl
logarithmic complexity; ordered-vector trees are split and joined at linear
alternatives have super-linear complexity."
So only ordered-vector trees are splitted in linear time, you should be alr
black tree as the underlying data structure by specifying the rb_tree_tag
object?
12 months ago, # ^ |
+9
Zlobober
(/ext/pb_ds/detail/bin_search_tree_/split_join_fn_imps.hpp, #1
Of course std::distance for iterators of this tree works in lin
iterator until it is equal to the second iterator). You can run my
big test to ensure that I'm right. It's surprising, but in this place
is wrong.
Reply
12 months ago, # ^ |
0
Aha I see...
adelnobel Reply
10 months ago, # ^ |
+3
Did the authors reply by now? I think it was a design issue because to up
quickly one needs the subtree size augmentation that is only provided by
thetree_order_statistics_node_update mixin.
We can work around the problem by overloading std::distance, but it'
niklasb pretty:https://github.com/niklasb/contest-algos/blob/master/stl_splay_tree
a lot of code to reimplement order statistics, so it's probably not really wo
manually coded treap seems to be much faster and doesn't require a lot
Reply
9 months ago, # ^ |
0
Today, the topic will be the Z Algorithm, which I learned as a result of failing to solve Problem B of
Beta Round 93 (http://codeforces.com/contest/126/problem/B). There are some other solutions like
binary search + hashing, but this one is quite nice. Anyway, first, a description of the algorithm and
why it works; it is simple and makes a lot of sense (as all good algorithms are).
Algorithm
Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the
longest substring starting from S[i]which is also a prefix of S, i.e. the maximum k such
that S[j]=S[i+j] for all 0j<k. Note that Z[i]=0 means that S[0]S[i]. For easier
terminology, we will refer to substrings which are also a prefix as prefix-substrings.
The algorithm relies on a single, crucial invariant. As we iterate over the letters in the string
(index i from 1 to n-1), we maintain an interval [L,R] which is the interval with maximum R such
that 1LiR and S[L...R] is a prefix-substring (if no such interval exists, just let L=R=-1).
For i=1, we can simply compute L and R by comparing S[0...] to S[1...]. Moreover, we also
get Z[1] during this.
Now suppose we have the correct interval [L,R] for i-1 and all of the Z values up to i-1. We will
compute Z[i] and the new [L,R] by the following steps:
If i>R, then there does not exist a prefix-substring of S that starts before i and ends at or
after i. If such a substring existed, [L,R]would have been the interval for that substring rather
than its current value. Thus we "reset" and compute a new [L,R] by
comparing S[0...] to S[i...] and get Z[i] at the same time (Z[i]=R-L+1).
Otherwise, iR, so the current [L,R] extends at least to i. Let k=i-L. We know
that Z[i]min(Z[k],R-i+1) because S[i...]matches S[k...] for at least R-
i+1 characters (they are in the [L,R] interval which we know to be a prefix-substring). Now
we have a few more cases to consider.
If Z[k]<R-i+1, then there is no longer prefix-substring starting at S[i] (or else Z[k] would
be larger), meaning Z[i]=Z[k] and[L,R] stays the same. The latter is true
because [L,R] only changes if there is a prefix-substring starting at S[i] that extends
beyond R, which we know is not the case here.
If Z[k]R-i+1, then it is possible for S[i...] to match S[0...] for more than R-
i+1 characters (i.e. past position R). Thus we need to update [L,R] by setting L=i and
matching from S[R+1] forward to obtain the new R. Again, we get Z[i] during this.
The process computes all of the Z values in a single pass over the string, so we're done.
Correctness is inherent in the algorithm and is pretty intuitively clear.
Analysis
We claim that the algorithm runs in O(n) time, and the argument is straightforward. We never
compare characters at positions less than R, and every time we match a character R increases by
one, so there are at most n comparisons there. Lastly, we can only mismatch once for each i (it
causes R to stop increasing), so that's another at most n comparisons, giving O(n) total.
Code
Simple and short. Note that the optimization L=R=i is used when S[0]S[i] (it doesn't affect
the algorithm since at the next iterationi>R regardless).
int L = 0, R = 0;
if (i > R) {
L = R = i;
} else {
int k = i-L;
else {
L = i;
while (R < n && s[R-L] == s[R]) R++;
Application
One application of the Z Algorithm is for the standard string matching problem of finding matches for
a pattern T of length m in a stringS of length n. We can do this in O(n+m) time by using the Z
Algorithm on the string T S (that is, concatenating T, , and S) where is a character that
matches nothing. The indices i with Z[i]=m correspond to matches of T in S.
Lastly, to solve Problem B of Beta Round 93, we simply compute Z for the given string S, then
iterate from i to n-1. If Z[i]=n-i then we know the suffix from S[i] is a prefix, and if the
largest Z value we've seen so far is at least n-i, then we know some string inside also matches that
prefix. That gives the result.
+107
paladin8
3 years ago
36
Comments (36)
Write comment?
3 years ago, # |
+21
The are many good blogs on Codeforces about algorithm. If they are post in same place (as tutorial) is s
thphong Reply
3 years ago, # ^ |
-11
3 years ago, # ^ |
+2
He simply said that there should be a place where all useful blog posts like this are
Reply
wil93
3 years ago, # ^ |
+3
I agree. Maybe systematically tagging them with "tutorial" would do the trick.
paladin8 Reply
new, 3 weeks ago, # ^ |
0
If you know any new tutorial blog post please comment , I will add them .
abinash Thanks paladin8 for his awesome tutorial.
Reply
3 years ago, # |
0
3 years ago, # ^ |
0
That site is awesome, if someone could translate it to english, i think that it would be the best a
Empty reference ever.
Reply
21 month(s) ago, # ^ |
Rev. 3 0
The KMP algorithm computes memo[i] which is the length of the longest prefix of s that matches the tail o
s[i]. What are the advantages of the z-algorithm over KMP? Here's my code using KMP that solves the C
password problem.
import java.io.*;
import java.util.*;
public class D {
public static void main(String[] args) throws IOException{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String s = in.readLine();
char[] c = s.toCharArray();
int n = c.length;
int[] memo = new int[n];
/* memo[i] will store the length of the longest prefix of s that matches the tail of s1...si */
int max = 0;
for (int i=1; i<n-1; i++) max = Math.max(max, memo[i]);
/* max = Maximum internal match to prefix */
j = memo[n-1];
while (j>max) j = memo[j-1];
System.out.println((j==0)? "Just a legend" : s.substring(0, j));
}
}
Reply
3 years ago, # ^ |
+3
As far as I know, both approaches work fine. I learned KMP before, but it didn't seem very intu
never fully understood string matching. The Z Algorithm is, in my opinion, easier to comprehen
paladin8 code, too). This motivated me to write this blog post :)
Reply
3 years ago, # ^ |
0
3 years ago, # ^ |
0
How?
SkidanovAlex
Reply
3 years ago, # ^ |
Rev. 3 +3
3 years ago, # ^ |
Rev. 3 0
A portion of necroposting.
You can solve 2 problems: 1) Create a sample string with given Z-functi
alphabet. Just follow Z-function generation algorithm and store all the eq
I_love_natalia
check the result. 2) Create a sample string with given prefix-function ove
Just follow prefix function generation algorithm and store all the equalitie
check the result.
(in Russian: http://contest.samara.ru/ru/problemset/735/)
Reply
o
16 months ago, # ^ |
0
I dont know if your Z<-> prefix transformation is mutual. I've written a post to try to c
kmp, http://liruqi.info/post/62511983824/kmp-z (In Chinese). Beside, you may chec
liruqi here,https://raw.github.com/liruqi/topcoder/master/HackerRank/SaveHumanity.cpp
Reply
3 years ago, # ^ |
Rev. 2 0
No advantages =)
If z[i] = x, it means substrings [0..j) [i..i+j) are equals for all j from 0 to x.
If p[i] = x, it means [0..j) = (i-j..i] for all j = x, p[x], p[p[x]] and so on.
Burunduk1
2) LCP (Largest Common Prefix)
Z-function in fact calculates LCP[0,j] for all j. It can be used for not only substring searching.
I also have two examples of problems which, I hope, show advantages Z-function over Prefix-f
1) Determine number (No.) of the string in its suffix array in O(n).
2) Determine number (amount) of substrings, which have at least two occuarances in the strin
course, you can solve it in O(n) using suffix tree).
Reply
8 months ago, # ^ |
Rev. 2 +3
kien_coi_1997 Your comment is valuable. But you have some error, I will show them, to prevent o
misunderstanding. (Someone reopen the post, so I read your comment)
1) "monotony"
If z[i] = x, it means substrings [0..j) [i..i+j) are equals for all j from 0 to x-1.
If p[i] = x, it means [0..j) = (i-j..i] for all j = x, p[x], p[p[x]] and so on.
Reply
8 months ago, # ^ |
+8
Thanks. Fixed.
Burunduk1 Reply
3 years ago, # |
+1
3 years ago, # |
Rev. 2 0
3 years ago, # |
0
Wondering, is there any reason why the z-algorithm is not as famous as KMP? This is the first time I hear
seems that z-algorithm is easier to understand.
simp1eton Reply
3 years ago, # |
0
Sorry to revive this old topic, but I have a question. Let's say that i < R and z[i-L] > R-i+1. Wouldn't thi
= R-i+1? Here is my reasoning.
aquamongoose
s[R+1] != s[R-L+1], because if they were equal then z[L] would be greater than R-L+1.
In order for z[i] to be greater than R-i+1, s[R+1] must equal s[i-L+R-i+1].
This can be rewritten as s[R+1] must equal s[R-L+1], but it doesn't because of the first statement.
Is this true?
Reply
3 years ago, # ^ |
0
3 years ago, # ^ |
Rev. 5 0
z array : 0102310
Amazing tutorial. Thanks to you and e-maxx I have learned a new algorithm today :D
forthright48 Reply
2 years ago, # |
+3
2 years ago, # |
0
22 months ago, # |
0
There seems to be one more condition missing in the routine, actually not missing but Algo is doing more
case. So, there can be three cases when I < R and there are as follows
1) Z[k] < R-I+1 in this case Z[I] = Z[k] //Z[k] Length will be strictly less than R-I+1 2) Z[K] > R-I+1 in this ca
Vivekscripts
//This is the missing case. 3) Z[k] = R-I+1 in this case Z[I] = R-I +1 + [keep computing the Z values startin
position R] // This is because we know Z[K] is strictly = R-I+1 (Beta) but Z[I] can still match with Patterns n
thus start computing the Z values for this Z[I] starting position from R until the mismatch is found.
Reply
8 months ago, # |
-8
One of my favorite posts. Quick question: the algorithm seems to ignore z[0]; shouldn't it be the case tha
Thanks.
saadtaame
Reply
8 months ago, # ^ |
0
8 months ago, # ^ |
0
5 months ago, # ^ |
0
8 months ago, # |
-9
8 months ago, # ^ |
+5
S is the string. S[i] is the symbol at position i. If S = "abcd", then S[0] = 'a',S[1] = 'b'
Reply
saadtaame
A little bit of classics: dynamic programming over subsets and paths in graphs
By Ripatti, 5 years ago, translation, ,
Author thanks adamax for translation this article into English.
Introduction
After Codeforces Beta Round #11 several participants expressed a wish to read something about
problems similar to problem D of that round. The author of this article, for the purpose of helping
them, tried searching for such information, but to his surprise couldn't find anything on the Internet. It
is not known whether the search was not thorough or there's really nothing out there, but (just in
case) the author decided to write his own article on this topic.
In some sense this article may be regarded as a tutorial for the problem D from Beta Round #11.
In this article quite well-known algorithms are considered: the search for optimal Hamiltonian walks
and cycles, finding their number, check for existence and something more. The so-called "dynamic
programming over subsets" is used. This method requires exponential time and memory and
therefore can be used only if the graph is very small - typically 20 vertices or less.
DP over subsets
Consider a set of elements numbered from 0 to N-1. Each subset of this set can be encoded by a
sequence of N bits (we will call this sequence "a mask"). The i-th element belongs to the subset if
and only if the i-th bit of the mask equals 1. For instance, the mask00010011 means that the
subset of the set [0... 7] consists of elements 0, 1 and 4. There are totally 2N masks, and
so 2N subsets. Each mask is in fact an integer number written in binary notation.
The method is to assign a value to each mask (and, therefore, to each subset) and compute the
values for new masks using already computed values. As a rule, to find the value for a subset A we
remove an element in every possible way and use values for obtained subsets A'1,A'2,... ,A'k to
compute the value for A. This means that the values for Ai ' must have been computed already, so
we need to establish an ordering in which masks will be considered. It's easy to see that the natural
ordering will do: go over masks in increasing order of corresponding numbers.
Let dp[mask][i] be the length of the shortest Hamiltonian walk in the subgraph generated by
vertices in mask, that ends in the vertex i.
The answer is .
This solution, like solution 2, requires O(2nn) of memory and O(2nn2) of time. It can be improved in
the following way.
Let dp'[mask] be the mask of the subset consisting of those vertices j for which there exists a
Hamiltonian walk over the subset maskending in j. In other words, we 'compress' the previous
DP: dp'[mask] equals . For the graph G write out n masks Mi,
which give the subset of vertices incident to the vertex i. That
is, .
So dp[mask][i] contains the length of the shortest Hamiltonian walk over the subset mask, starting
at 0 and ending at i.
The answer is .
Exercises
CFBR11D
CCTOOLS
P.S. This article may be extended and fixed in future. The author would be grateful for
supplementing the Exercises section and for pointing out any mistakes and inaccuracies.
+22
Ripatti
5 years ago
14
Comments (14)
Write comment?
5 years ago, # |
+3
http://www.spoj.pl/problems/ASSIGN/
For example
AlexErofeev Reply
5 years ago, # ^ |
0
, )) , ,
. , .
Ripatti
. , , . :)
Reply
5 years ago, # |
0
Nice tutorial. You could also add an explanation on how to count hamiltonian circuits on a grid of size n x
and m <= 10^9) for example which is also fairly classical.
jaher Reply
5 years ago, # ^ |
+1
Counting hamiltonian circuits on a grid is quite different from problems which was viewed there
viewforemost masks over subsets and paths in graphs. But in your problem motzkin words and
multiplication are used:)
Ripatti
Maybe I will view your problem in fiture article:)
Reply
5 years ago, # ^ |
0
5 years ago, # ^ |
0
http://www.cs.ust.hk/mjg_lib/Library/Kwong_Ham_94.pdf
it is very old paper:)
Ripatti Reply
4 years ago, # ^ |
0
o
4 years ago, # ^ |
0
Top up
giongto35 Reply
4 years ago, # |
0
http://www.codechef.com/problems/TOOLS
is an example problem for part 5.
havaliza Reply
4 years ago, # ^ |
0
8 months ago, # |
0
Could you please further explain the recurrence in the first example (finding a shortest Hamiltonian walk)?
saadtaame Reply
8 months ago, # ^ |
0
8 months ago, # |
Rev. 3 0
It contains lots of preliminary analysis and at least the DP approaches described in 1. and 5. of your post.
algorithm is actually called "HeldKarp algorithm" in academic literature, but it was described independen
et al. in another paper.
A list of optimal algorithms for TSP with better runtime complexity can be found here.
Reply
6 months ago, # |
0
In 4th section's second dp formulation second part, the condition stated is count(mask)>1 only. Isn't bi
1condition should also be included?
saurabh060792 Reply
Harta's blog
Dynamic Programming Type
By Harta, 5 years ago, ,
Dynamic Programming (DP) :
b. LIS
Problem: 1. Beautiful People
2. MDOLLS
3. MSTICK
4. MCARDS
c. Edit Distance
e. Knapsack
Problem: 1. Scubadiv
2. Advance DP
a. DP k-th lexicographical string
Problem: 1. z-01 paths
2. z-board
3. Linear Garden (IOI 2008)
b. DP tree
Problem: 1. z-sumpaths
2. River (IOI 2005)
3. z-company
4. Greedy Hydra (CNOI 2002)
5. VOCV
6. PT07F
7. PT07X
8. nagibni
e. DP pre-processing
Problem: 1. Oil (APIO 2009)
2. Garden (IOI 2005)
3. Pyramid (IOI 2006)
f. DP bitmask
Problem: 1. Reklame
2. Chess
3. Bond
4. TRSTAGE
5. HIST2
6. LAZYCOWS
i. DP+ trie
Problem 1. MORSE
j. DP+geometry
Problem 1. MPOLY
2. CVXPOLY
3. MTRIAREA
k. DP + Binary Search
Problem 1. Game (IOI 2008, Practice session)
l. DP + Knuth Optimization
Problem 1. Breaking Strings
Other Problems in SPOJ can be found here by pt1989
Thanks to pt1989
Here are problems in acm.sgu.ru 269, 273, 304, 317, 356, 396, 445, 447, 458, 489, 494
Thanks to natalia
Reference:
1. Topcoder
2. Codechef
Segment Tree
By paladin8, 3 years ago, ,
This blog post is motivated by an interesting problem I solved today: Timus 1846. I knew about
segment trees previously for solving range-minimum query, but was not fully aware of the obvious
generalizations of this nice data structure. Here I will describe the basics of a segment tree and
some applications.
Problem
The problem that a segment tree can solve is the following. We are given an array of
values a[0],a[1],...,a[N-1]. Assume without loss of generality that N=2n; we can generally
pad the computations accordingly. Also, consider some associative binary function f. Examples
of f include sum, min, max, or gcd (as in the Timus problem). Segment trees allow for each of the
following two operations onO(logN) time:
compute f(a[i],a[i+1],...,a[j]) for ij; and
update a[x]=v.
Description
So how does a segment tree work? It's quite simple, in fact. We build a single two-dimensional
array t[0...N-1][0..n] where t[x][y]=f(a[x],a[x+1],...,a[x+2y-1]) where x is divisible
by 2y (i.e. most of the entries are ignored, but it is easiest to represent this way). We can initialize the
entries by the following procedure:
t[x][0]=a[x]; and
t[x][y]=f(t[x][y-1],t[x+2y-1][y-1]) for y=1,...,n,
where the second step uses the associative property of f. The logic is that t[x][y-
1] computes f over the range [x,x+2y-1) and t[x+2y-1][y-1] computes f over the
range [x+2y-1,x+2y) so when we combine them we get the corresponding computation
of f fort[x][y]. Now we can describe how each of the two operations is implemented (at a high
level).
We'll start with computing f(a[i],a[i+1],...,a[j]). It's pretty easy to see that this amounts to
representing the interval [i,j] as disjoint intervals [i1,j1];[i2,j2];...;[ik,jk] where each of these is of
the form [x,x+2y-1] where x is divisible by 2y (i.e. it shows up in the table t somewhere). Then
we can just combine them with f (again using the associative property) to get the result. It is easy to
show thatk=O(logN) as well.
Now for updating a[x]=v, notice that there are only a few terms in the segment tree which get
affected by this change. Obviously, t[x][0]=v. Also, for y=1,...,n, only the entry t[x-(x&(2y-
1))][y] will update. This is because x-(x&(2y-1)) is the only value divisible by 2y (notice the
last y bits are zero) that also covers x at level y. Then it suffices to use the same formula as in the
initialization to recompute this entry in the tree.
Code
Here's a somewhat crude implementation of a segment tree. Note that the value of IDENTITY should
be such that f(IDENTITY,x)=x, e.g. 0 for sum, + for min, - for max, and 0 for gcd.
void init() {
t[x][0] = a[x];
t[x][0] = a[x] = v;
int xx = x-(x&((1<<y)-1));
i += (1<<h);
while (i < j) {
i += (1<<h);
return res;
Application
One possible application is letting f just be the sum of all arguments. This leads to a data structure
that allows you to compute sums of ranges and do updates, but is slightly less efficient than a Binary
Indexed Tree. The most well-known application seems to be the range-minimum query problem,
where f is the minimum of all arguments. This also allows you to compute the least-common
ancestor of nodes in a tree.
Lastly, in Timus 1846, we can use it with f as the GCD function. Every time we remove a number we
should update the corresponding entry to zero (effectively removing it from the GCD computation).
Then the answer at each step is f(a[0],a[1],...,a[k]) which can be computed directly from the
segment tree.
paladin8
3 years ago
29
Comments (29)
Write comment?
3 years ago, # |
+14
Actually, it's known that you need only O(N) memory to store Segment Tree with N leaves. As you see,
many elements, that are not beeing updated or used at all. You can store segment tree in one-dimension
Root is being number 1. Two children of node v are 2v and 2v+1.
niyaznigmatul So the parent of node v is v/2. One can easily prove that it contains only O(N) nodes.
Reply
3 years ago, # ^ |
Rev. 2 0
Yes, you are right. But I think it is easier to think about it as a 2-dimensional array and only use
paladin8 dimensional compression when you have to.
Reply
23 months ago, # ^ |
0
you actually need 2*2^(log(N)+1) for memory, don't you? It contains O(N) leaves, not nodes. B
erreze suggest i'm wrong and you're right, can you share your prof?
Reply
o
23 months ago, # ^ |
+3
2 * 2 ^ (log(N) + 1) = 4 * N = O(N)
AlexanderBolshakov Reply
8 months ago, # ^ |
-19
So you can't say that they are equal. Instead you can say 4 * N is a
yashar_sb_sb
Reply
8 months ago, # ^ |
0
3 years ago, # |
+8
Look at this cool implementation of Range Sum Query (RSQ). This implementation needs 2 * N memory,
implementations with only N memory, but as for me, this implementation is very simple, and it can be eas
with some non trivial actions, (for example, we can build an array with all indexes we need to update, and
Alias
= F(a[i * 2], a[i * 2 + 1]) in order)
Reply
3 years ago, # ^ |
Rev. 3 0
BTW, it can be easily extended to 2D or 3D arrays, but in this case it takes (2 * n)^k memory, w
Alias for 1d, 2d, 3d, ...
upd
Reply
3 years ago, # ^ |
0
23 months ago, # ^ |
0
@Alias I would like to know how to update and query in 2d segment tree??
saikrishna17394 Reply
23 months ago, # ^ |
0
18 months ago, # ^ |
0
21 month(s) ago, # ^ |
+12
Who can help me with practice problems on 2d-, persistent and compressed segme
caustique appreciate links to OJ problems.
Reply
3 years ago, # |
0
3 years ago, # |
+3
Can you please explain the method for updating values over a range of indexes. I looked up in the intern
solution called lazy propagation, but could not understand clearly. It would be great if you can explain a li
asif_iut !.
Reply
3 years ago, # ^ |
0
Lets start with my explanation, hope that somebody will give better explanation.
3 years ago, # |
0
23 months ago, # |
0
Could you please give me an idea of how to implement a 2d-segment tree?. For example, to solve the pr
Census. Thanks in advance.
Empty
Reply
18 months ago, # ^ |
Rev. 3 +1
The idea behind 2D segment tree is that you first build segment tree for every row, then you bu
for every column. Its easier to operate with NxM array if N and M are degrees of 2, if not, you c
with 0s, but if memory is more important, you can do without it. For example, you have the follo
1234
4321
5678
8765
Build a segment tree for every row. It will look like this:
sparik
10 3 7 1 2 3 4
10 7 3 4 3 2 1
26 11 15 5 6 7 8
26 15 11 8 7 6 5
(the first element of the row is the sum of elements (1...n), the second is the sum of (1...n/2), th
of (n/2+1...n) ...
Then build a segment tree for every column of the above matrix in the same way:
72 36 36 18 18 18 18
20 10 10 5 5 5 5
52 26 26 13 13 13 13
10 3 7 1 2 3 4
10 7 3 4 3 2 1
26 11 15 5 6 7 8
26 15 11 8 7 6 5
The tree contains 2N-1 rows and 2M-1 columns, and our initial matrix starts with element (N,M
If you want to update element (x,y), you update it, then all its parents in its column ((x,y/2),(x,y
divide x by 2, and do the same thing while x>0.
For finding the sum of the rectangle with opposite vertices (R1,C1) (R2,C2) (where R1<R2 and
recursively, start from the first row(of the tree), If the range of the row R has no intersection wit
return 0, if the range is not completely into (R1...R2) but has an intersection, then call the func
((R*2) and (R*2+1)). Finally, if the range is completely into (R1...R2) we do the same thing with
Rth row, but this time we return tree[R][C] if the range of the Cth element of the Rth row is com
(C1...C2).
18 months ago, # ^ |
0
18 months ago, # ^ |
Rev. 2 0
What if I want to update a rectangle, say (R1,C1) (R2,C2)? Is the complexity stil
danhnguyen0902 Could we now use the Lazy Propagation technique here, like the one in the 1D S
Reply
18 months ago, # ^ |
+5
faiyaz26
You might need google translate if you are not Russian. :)
Reply
15 months ago, # |
0
Do you know about other problems where segment trees should be used with some other associative bin
HAL9000 Reply
15 months ago, # ^ |
0
k790alex http://coj.uci.cu/24h/problem.xhtml?abb=2125
http://codeforces.com/problemset/problem/339/
Reply
15 months ago, # ^ |
0
HAL9000 Thanks for your reply. I already solved the first one long time ago, and as I rememb
using some associative operation, but storing the progression terms in the nodes w
As for the second one, the link points to nowhere.
Reply
15 months ago, # ^ |
0
8 months ago, # |
Rev. 18 +17
We can generalize that a bit more: Let (M,f) be a semigroup (that is, a set M together with an associativ
operator ).
Let (T,c) be a monoid with identity element tid and a family of transformation f
updates), that take as their arguments a parameter and a sequence and output a
sequence of the same length. gk should respect the monoid structure of t, i.e. for
, gk(c(t1,t2),)should be equivalent to the composition and gk(tid,) should be the
Intuitively, this allows us to compress a series of updates into one single update.
c _ (Set, v) = (Set, v)
c (Add, x) (Add, y) = (Add, x + y)
c (Set, x) (Add, y) = (Set, x + y)
3 months ago, # |
0
Hi I have found one good example here Segment Tree Data Structure
androidrxample Reply
paladin8's blog
Partially Ordered Sets
Definitions
Let S be a set of elements and be a partial ordering on the set. That is, for some
elements x and y in S we may have xy. The only properties that must satisfy are reflexivity
(xx), antisymmetry (if xy and yx, then x=y), and transitivity (if xy and yz, then xz).
Note that because it is a partial ordering, not all x and y are comparable.
An example of a partially ordered set (poset) is the set of points (x,y) in the Cartesian plane with
the operator (x1,y1)(x2,y2) iff x1x2 and y1y2 (e.g. NCPC 2007 problem).
Dilworth's Theorem
We can define the width of a poset in two ways, which is the result of Dilworth's Theorem. One
definition is the size of the maximum antichain; the other is the size of the minimum partition. It is
easy to see that any partition must have at least as many chains as the size of the maximum
antichain because every element of the antichain must be in a different chain. Dilworth's Theorem
tells us that there exists a partition of exactly that size and can be proved inductively (see Wikipedia
for proof).
The bounds on our problem are up to 20,000 so we can't use maximum matching. Luckily, points in
the Cartesian plane are more structured than general posets, and we can use the other definition of
width (maximum antichain) to solve this problem more efficiently. Consider iterating over the points
sorted in order by x. We maintain a set of pairs (a,b) which indicates that there is an antichain of
sizeb that ends at y-value a (among all points that have already been processed). Thus for any
future point (x,y) we can insert (y,b+1)into the set as long as y<a.
Notice, however, that if we have two points in the set (a,b) and (c,d) such
that ca and db then the latter is redundant (it is always worse) and we can remove it. In this
way we keep the set small and only need to check a single element whenever we insert, which
is(a,b) with minimal a such that a>y. All of this can be done with a C++ set, for example. At the
end, the largest antichain we recorded is indeed the maximum one, and we are done.
Exercise
A recent Google Code Jam problem uses these ideas.
+73
paladin8
3 years ago
3
Comments (3)
Write comment?
3 years ago, # |
Rev. 3 +8
(This may not be a problem in discrete math, but) to be more precise, the elements of a chain do not have
by integers; the cardinality of a chain can be larger than .
ikk
Reply
3 years ago, # ^ |
+8
3 years ago, # |
+5