On Superconnectivity of (4, g)-Cages

Camino Balbuena

Preface In July 2006 the participants at the 17th Australasian Workshop on Combinatorial Algorithms (AWOCA) met to consider future directions for a gathering that, from a humble beginning in 1989, had taken on an important role among Australian (and other) computer scientists and mathematicians working in combinatorial areas. The decision was taken to — in the Aussie phrase — go for it: to upgrade the academic profile of the workshop, to arrange for publication of the Proceedings by a world-class scientific publisher (College Publications), and to internationalise, changing the name to IWOCA (International Workshop on Combinatorial Algorithms). At the same time, the meeting urged that the traditional problem-oriented (enquiry-oriented) nature of AWOCA be preserved; this decision has led to the prominent scheduling of problem sessions in the IWOCA program, and to the establishment of a permanent combinatorial problem webpage, accessible from the permanent IWOCA website at http://www.iwoca.org IWOCA 2007 (arguably either the 18th or the first) was held 5–9 November 2007 at Rafferty’s Resort on Lake MacQuarrie, a picturesque location close to Newcastle in the Australian state of New South Wales. The meeting was sponsored by The University of Newcastle and hosted by its School of Electrical Engineering & Computer Science. Calls for papers were distributed round the world, using LISTSERVE and other e-mail lists, resulting in 42 submissions, of which 24 were accepted after review by at least two (generally three) referees, making use of the EasyChair system. In the conference, 20 of the accepted submissions were actually presented, and so included in these Proceedings. In addition, IWOCA 2007 featured talks by 8 invited speakers. Altogether, the authors at IWOCA 2007 represented universities from 16 territories: Australia, Canada, Chile, China, Cuba, Czech Republic, Greece, Indonesia, Israel, Japan, the Philippines, Poland, Russia, Spain, Taiwan, and the UK. We gratefully acknowledge the fine work done by the members of the Program Committee and the Organising Committee in ensuring the success of the first IWOCA and thus assisting in the establishment of IWOCA as a forum for algorithmic research on combinatorial objects. For details of the meeting, including the complete program and a list of participants, please access http://www.eng.newcastle.edu.au/ ˜ iwoca2007/ November 2007 Ljiljana Brankovic Yuqing Lin William F. Smyth Conference Organization Steering Committee Costas Iliopoulos King’s College London, UK Mirka Miller University of Ballarat, Australia University of West Bohemia, Czech Republic Bill Smyth McMaster University, Canada Curtin University, Australia Programme Chairs Ljiljana Brankovic The University of Newcastle, Australia Bill Smyth McMaster University, Canada Curtin University, Australia Programme Committee Amihood Amir Bar-Ilan University, Israel Martin Baca Technical University of Kosice, Slovakia Edy Tri Baskoro Institut Teknologi Bandung, Indonesia Hajo Broersma Durham University, UK Pino Caballero-Gil University of La Laguna, Spain Kun-Mao Chao National Taiwan University, Taiwan Arbee L. P. Chen National Chengchi University, Taiwan Francis Y.L. Chin Hong Kong University, Hong Kong Charlie Colbourn Arizona State University, USA Derek Corneil University of Toronto, Canada Jackie Daykin Royal Holloway College, UK Frank Dehne Carleton University, Canada Diane Donovan University of Queensland, Australia Xiaodong Hu Chinese Academy of Sciences, China Andrei Kelarev University of Tasmania, Australia Jan Kratochvil Charles University, Czech Republic Selda Küçükçifçi Koç University, Turkey Gad M. Landau University of Haifa, Israel Thierry Lecroq University of Rouen, France Jimmy Lee Hong Kong University, Hong Kong Paulette Lieby NICTA, Australian National University, Australia Yuqing Lin The University of Newcastle, Australia Stefano Lonardi University of California Riverside, USA Prabhu Manyem University of Ballarat, Australia Laurent Mouchard University of Rouen, France Gonzalo Navarro University of Chile, Chile Takao Nishizeki Tohoku University, Japan Kunsoo Park Seoul National University, Korea Andrzej Proskurowski Oregon State University, USA Bharati Rajan Loyola College, India Rajeev Raman University of Leicester, UK Joe Ryan University of Ballarat, Australia Zdenek Ryjacek University of West Boehmia, Czech Republic Wojciech Rytter Warsaw University, Poland Jamie Simpson Curtin University, Australia Jozef Sirán Open University, UK Michiel Smid Carleton University, Canada Kathleen Steinhöfel King’s College London, UK Anne Street University of Queensland, Australia Athanasios Tsakalidis University of Patras, Greece Gabriel Valiente Technical University of Catalonia, Spain Jito Vanualailai The University of the South Pacific, Fiji Koichi Wada Nagoya Institute of Technology, Japan Sue Whitesides McGill University, Canada Zhiyou Wu University of Ballarat, Australia Chee Yap New York University, USA Local Organization Yuqing Lin (Chair) The University of Newcastle, Australia Mousa Alfalayleh The University of Newcastle, Australia Ljiljana Brankovic The University of Newcastle, Australia Beti Georgievski The University of Newcastle, Australia Alexandre Mendes The University of Newcastle, Australia Jianmin Tang University of Ballarat, Australia External Reviewers Fusheng Bai Roman Cada Feng Chen Shir Feibish Pinar Heggernes Danny Hermelin Takehiro Ito Taisuke Izumi Yoshiaki Katayama Roman Kuzel Slawomir Lasota Hiroki Morizumi Kiki Ariyanti Sugeng Oren Weimann Xiao Zhou Table of Contents L(2,1)-labeling of bipartite permutation graphs . . . . . . . . . . . . . . . . . . . . . . . Toru Araki 1 Smaller and Faster Lempel-Ziv Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diego Arroyuelo, Gonzalo Navarro 11 Superconnectivity of regular graphs with small diameter . . . . . . . . . . . . . . . Camino Balbuena, jianmin Tang, Kim Marshall, Yuqing Lin 21 On diregularity of digraphs of defect two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dafik, Mirka Miller, Costas Iliopoulos, Zdenek Ryjacek 25 Change dectection through clustering and spectral analysis . . . . . . . . . . . . . Diane Donovan, Birgit Loch, Bevan Thompson, Jayne Thompson 35 TCAM representations of intervals of integers encoded by binary trees . . . Wojciech Fraczak, Wojciech Rytter, Mohammadreza Yazdani 45 Binary Trees, Towers and Colourings (An Extended Abstract) . . . . . . . . . . Alan Gibbons, Paul Sant 56 A Compressed Text Index on Secondary Memory . . . . . . . . . . . . . . . . . . . . . Rodrigo Gonzalez, Gonzalo Navarro 66 Star-shaped Drawings of Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seok-Hee Hong, Hiroshi Nagamochi 78 Algorithms for Two Versions of LCS Problem for Indeterminate Strings . . Costas Iliopoulos, M Sohel Rahman, Wojciech Rytter 93 Three NP-Complete Optimization Problems in Seidel’s Switching . . . . . . . 107 Eva Jelinkova On Superconnectivity of (4,g)-Cages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Hongliang Lu, Yunjian Wu, Yuqing Lin, Qinglin Yu On the nonexistence of odd degree graphs of diameter 2 and defect 2 . . . . 129 Minh Nguyen, Mirka Miller, Guillermo Pineda-Villavicencio Computing Incongruent Restricted Disjoint Covering Systems . . . . . . . . . . 137 Jamie Simpson, Gerry Myerson, Jacky Poon Existence of regular supermagic graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Andrea Semanicova, Jaroslav Ivanco, Petr Kovar Some Parameterized Problems Related to Seidel’s Switching . . . . . . . . . . . . 148 Ondrej Suchy Computing the Moments of Costs over the Solution Space of the TSP in Polynomial Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Paul Sutcliffe, Andrew Solomon, Jenny Edwards Vertex Coloring of Chordal+k1 e-k2 e Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Yasuhiko Takenaga, Yusuke Miura Fault-free Hamiltonian Cycles in Alternating Group Graphs with Conditional Edge Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Ping-Ying Tsai, Jung-Sheng Fu, Gen-Huey Chen Regular Expression Matching Algorithms using Dual Position Automata . 190 Hiroaki Yamamoto Appendix A: Abstracts of Invited Talks The Volume of the Birkhoff Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 E. Rodney Canfield, Brendan D. Mckay Orthogonal Drawings of Series-Parallel Graphs . . . . . . . . . . . . . . . . . . . . . . . 206 Takao Nishizeki Time-Constrained Graph Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Brian Alspach Computing the k Most Representative Skyline Points . . . . . . . . . . . . . . . . . . 208 Xuemin Lin Distance constrained graph labeling: From frequency assignment to graph homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Jan Kratochvil The use of decomposition in the study of even-hole-free graphs . . . . . . . . . 210 Kristina Vušković Haplotype Inference Constrained by Plausible Haplotype Data . . . . . . . . . . 211 Gad M. Landau Full-Text Indexing in a Changing World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Moshe Lewenstein Appendix B: Open Problems Open Problems in Dynamic Map Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Chee Yap Graphs with no equal length cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 ChunHui Lai Entropy-compressed suffix trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Gonzalo Navarro Indexed approximate string matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Gonzalo Navarro Does a polynomial maximising algorithm imply a polynomial minimising algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Prabhu Manyem Certificate Dispersal Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Koichi Wada The Maximum Number of Runs in a String . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Bill Smyth International Workshop on Combinatorial Algorithms 07 1 L(2, 1)-labeling of Bipartite Permutation Graphs Toru Araki⋆ Department of Computer and Information Sciences, Iwate University Morioka, Iwate, 020-8551, Japan arakit@cis.iwate-u.ac.jp Abstract. An L(2, 1)-labeling of a graph G is an assignment f from vertices of G to the set of non-negative integers {0, 1, . . . , λ} such that |f (u) − f (v)| ≥ 2 if u and v are adjacent, and |f (u) − f (v)| ≥ 1 if u and v are at distance 2 apart. The minimum value of λ for which G has L(2, 1)-labeling is denoted by λ(G). The L(2, 1)-labeling problem is related to the channel assignment problem for wireless networks. In this paper, we present a polynomial time algorithm for computing L(2, 1)-labeling of a bipartite permutation graph G such that the largest label is at most bc(G) + 1, where bc(G) is the biclique number of G. Since λ(G) ≥ bc(G) for any bipartite graph G, the upper bound is nearly optimal. 1 Introduction The channel assignment problem for wireless networks is to assign a channel to each radio transmitter so that close transmitters are received channels so as to avoid interference. This situation can be modeled by a graph whose vertices are the radio transmitters, and the adjacency indicate possible interference. The aim is to assign integers (corresponding to the channel) to the vertices such that adjacent vertices receive integers at least 2 apart, and nonadjacent vertices with a common neighbor receive distinct integers. This is called L(2, 1)-labeling problem which is widely accepted model for the channel assignment problem. A formal definition is given as follows. Definition 1. An L(2, 1)-labeling of G is an assignment f from V to the set of integers {0, 1, . . . , λ} such that |f (u) − f (v)| ≥ 2 if uv ∈ E and |f (u) − f (v)| ≥ 1 if dist(u, v) = 2. The minimum value of λ for which G has L(2, 1)-labeling is denoted by λ(G). The notion of L(2, 1)-labeling has attracted a lot of attention for not only its motivation by the channel assignment problem, and also for its interesting graph theoretic properties. Griggs and Yeh [1] first considered this problem. There are many papers that study the problem for several graph classes (for example, see surveys [2, 3]). The complexity for deciding λ(G) ≤ k for fixed k ⋆ This work was supported by Japan Society of the Promotion of Science, Grant-in-Aid for Young Scientists (B) (no. 19700001). 2 International Workshop on Combinatorial Algorithms 07 is NP-complete [1], and for bipartite graphs and chordal graph are also NPcomplete [4]. In this paper, we focus on the class of bipartite permutation graphs which is a permutation graph and bipartite graph. This class was investigated by Spinrad, Brandstädt, and Stewart [5]. Studies for the class are motivated by the fact that many NP-hard problems are efficiently solved in graphs of this class. For example, algorithms for domination problems [6, 7], the path partition problem [8], and the longest path problem [9] were investigated. Books [10, 11] include surveys some algorithmic result for the class. Boldaender et al. [4] proved that λ(G) ≤ 5∆ − 2 for any permutation graph, where ∆ is the maximum degree of G, and such labeling is calculated by a polynomial time algorithm that is greedy manner. We consider the L(2, 1)-labeling problem for bipartite permutation graphs. We present a polynomial time algorithm for computing a nearly optimal labeling. More precisely, the maximum value assigned for vertices is at most bc(G) + 1, where bc(G) is the biclique number. 2 Preliminaries Let G = (V, E) be a graph with vertex set V and edge set E. The neighborhood of a vertex u is NG (u) = {v | uv ∈ E}. The degree of a vertex u is deg u = |NG (u)|. The distance between two vertices u and v, denoted by dist(u, v), is the length of shortest path between u and v. A graph G = (V, E) is bipartite if V can be partitioned into two subsets X and Y such that every edge joins a vertex in X and another vertex in Y . A partition X ∪ Y of V is called bipartition. A bipartite graph with bipartition X ∪ Y is denoted by G = (X, Y, E). A bipartite graph G = (X, Y, E) is complete if each vertex in X is adjacent to every vertices in Y . For a bipartite graph, a subset of vertices is biclique if it induces a complete bipartite subgraph. The biclique number of a bipartite graph G is the number of vertices in a maximum biclique of G and it is denoted by bc(G). A graph G = (V, E) with V = {v1 , v2 , . . . , vn } is called a permutation graph if there is a permutation π over {1, 2, . . . , n} such that vi vj ∈ E if and only if (i − j)(π −1 (i) − π −1 (j)) < 0. When a permutation graph is bipartite, it is said to be a bipartite permutation graph. Intuitively, a permutation graph can be constructed from a permutation π = (π1 , π2 , . . . , πn ) on {1, 2, . . . , n} in the following visual manner. Line up the numbers 1 to n horizontally on a line L1 . On the line below it, line up the corresponding permutation so that πi is below i on a line L2 . Then connect each i and πi−1 with a line segment which is corresponding to vertex vi . The resulting diagram is referred to as a permutation diagram. In the permutation graph corresponding to π, two vertices vi and vj are adjacent if and only if the corresponding lines are crossing. An example of a bipartite permutation graph and the corresponding permutation diagram is shown in Fig. 1. In the permutation diagram of a bipartite permutation graph G = (X, Y, E), we can order line segments x1 , x2 , . . . , xm in X from left to right (these are drawn International Workshop on Combinatorial Algorithms 07 3 Fig. 1. A bipartite permutation graph and the corresponding permutation diagram. Fig. 2. A chain graph and the corresponding permutation diagram. by solid lines in Fig. 1). We also order vertices y1 , y2 , . . . , yn in Y from left to right (these are dotted lines in Fig. 1). From now on, we suppose that vertices in X = {x1 , x2 , . . . , xm } and Y = {y1 , y2 , . . . , yn } are sorted such that the corresponding lines are arranged from left to right in the permutation diagram. It should be noted that Spinrad et al. [5] developed an O(|V | + |E|) time algorithm for recognizing whether a given graph is a bipartite permutation graph and producing such orderings of the vertices if so. A bipartite graph G = (X, Y, E) is a chain graph if vertices can be ordered by inclusion: that is, there is an ordering of vertices x1 , x2 , . . . , xm in X and y1 , y2 , . . . , yn in Y such that NG (x1 ) ⊆ NG (x2 ) ⊆ · · · ⊆ NG (xm ) and NG (yn ) ⊆ · · · ⊆ NG (y2 ) ⊆ NG (y1 ). It is known that any chain graph is a bipartite permutation graph [9]. Lemma 1 (Uehara, Valiente [9]). Let G = (X, Y, E) be a connected chain graph with NG (x1 ) ⊆ NG (x2 ) ⊆ · · · ⊆ NG (xm ) and NG (yn ) ⊆ · · · ⊆ NG (y2 ) ⊆ NG (y1 ). Then, it has a corresponding permutation diagram such that (1) x1 < x2 < · · · < xm < y1 < y2 < · · · < yn on L1 , and (2) y1 < x1 and yn < xm on L2 . Conversely, if a graph G has a corresponding permutation diagram satisfying conditions (1) and (2), then it is a connected chain graph. See Fig. 2. 3 Labeling of Chain Graphs In this section, we show that an optimal L(2, 1)-labeling of chain graph can be solved in linear time. For simplicity, we may assume that the given graph is connected. The following is easily obtained. Lemma 2. For a complete bipartite graph G = (X, Y, E), λ(G) = |X| + |Y |. 4 International Workshop on Combinatorial Algorithms 07 It is obvious that λ(G) ≥ λ(H) if H is a subgraph of G. Hence we obtain a lower bound of λ(G) for any bipartite graph G from Lemma 2. Corollary 1. λ(G) ≥ bc(G) for any bipartite graph G. Theorem 1. Let G = (X, Y, E) be a connected chain graph such that NG (x1 ) ⊆ NG (x2 ) ⊆ · · · ⊆ NG (xm ) and NG (yn ) ⊆ · · · ⊆ NG (y2 ) ⊆ NG (y1 ). Define a labeling cl of vertices such that cl(xi ) = bc(G) − m + i, for 1 ≤ i ≤ m, cl(yj ) = j − 1, for 1 ≤ j ≤ n. Then cl is an optimal L(2, 1)-labeling of G. The labeling cl satisfies the inequality 2 ≤ cl(xi ) − cl(yj ) ≤ bc(G) for xi yj ∈ E. Moreover, cl(xi ) − cl(yj ) = bc(G) for xi yj ∈ E if and only if i = m and j = 1. Proof. Since every vertex in X (or Y ) receives distinct labels, every pair of vertices distance two apart have distinct labels. Then we show that cl(xi ) − cl(yj ) ≥ 2 if xi yj ∈ E. Suppose to the contrary that cl(xi ) − cl(yj ) ≤ 1. Then (k − m + i) − (j − 1) ≤ 1, where k = bc(G). Hence k ≤ m − i + j. On the other hand, the set of (m − i + 1) + j vertices {xi , xi+1 , . . . , xm } ∪ {y1 , y2 , . . . , yj } is a biclique. Thus we obtain k ≥ (m − i + 1) + j. This contradicts the inequality k ≤ m − i + j. Since λ(G) ≥ bc(G) and maxv∈X∪Y cl(v) = cl(xm ) = bc(G), the labeling f is an optimal L(2, 1)-labeling. ⊓ ⊔ Lemma 3. bc(G) = max1≤j≤n {j + deg yj } for a chain graph G. Proof. This can be derived easily from the fact that NG (x1 ) ⊆ · · · ⊆ NG (xm ) and NG (yn ) ⊆ · · · ⊆ NG (y1 ). ⊓ ⊔ We present an algorithm for computing the biclique number and an optimal labeling for a chain graph in Algorithm 1 and 2. Clearly, this algorithm runs in linear time. Theorem 2. An optimal L(2, 1)-labeling of a chain graph can be computed in O(N ) time, where N is the number of vertices. An example of the L(2, 1)-labeling cl obtained by LABELING CHAIN(G) is illustrated in Fig. 3. The chain graph G with |X| = 7 and |Y | = 6 has the biclique number bc(G) = maxyj ∈Y {j + deg yj } = 3 + deg y3 = 9 (in fact, the set {x2 , . . . , x7 } ∪ {y1 , y2 , y3 } forms the maximum biclique). 4 Labeling of Bipartite Permutation Graphs In this section, we present a polynomial time algorithm for calculating an L(2, 1)labeling f for a bipartite permutation graph G so that max f (v) ≤ bc(G) + 1. International Workshop on Combinatorial Algorithms 07 5 Algorithm 1: BICLIQUE CHAIN(G) Input: a chain graph G = (X, Y, E) with X = {x1 , . . . , xm } and Y = {y1 , . . . , yn }. Output: the biclique number bc(G) bc ← 0; for j ← 1 to n do if bc < j + deg yj then bc ← j + deg yj end return bc Algorithm 2: LABELING CHAIN(G) Input: a chain graph G = (X, Y, E) with X = {x1 , . . . , xm } and Y = {y1 , . . . , yn }. Output: an L(2, 1)-labeling cl of G /* the biclique number of G */ bc ← BICLIQUE CHAIN(G) ; foreach xi ∈ X do cl(xi ) ← bc − m + i; foreach yj ∈ Y do cl(yj ) ← j − 1; return cl Definition 2. Let G = (X, Y, E) be a bipartite permutation graph. For yj ∈ Y , let Gj be the subgraph of G induced by Xj ∪ Yj , where Xj = NG (yj ) = {xi , xi+1 , . . . , xk }, and Yj = {yl | xk yl ∈ E and l ≥ j}. Lemma 4. Gj is a chain graph such that NGj (xi ) ⊆ NGj (xi+1 ) ⊆ · · · ⊆ NGj (xk ) and NGJ (yl ) ⊆ · · · ⊆ NGj (yj+1 ) ⊆ NGj (yj ), where yl is the maximum neighbor of xk . Proof. It is easy to see that the vertices in Gj are arranged such that xi < xi+1 < · · · < xk < yj < yj+1 < · · · < yl on L1 of the permutation diagram, and yj < xi and yl < xk on L2 . By Lemma 1, the lemma holds. ⊓ ⊔ 4.1 Algorithm The outline of our algorithm for a bipartite permutation graph G is as follows: Fig. 3. A chain graph G with bc(G) = 9 and its optimal L(2, 1)-labeling. 6 International Workshop on Combinatorial Algorithms 07 Algorithm 3: LABELING BIPARTITE PERMUTATION(G) Input: a bipartite permutation graph G = (X, Y, E). Output: an L(2, 1)-labeling f of G 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 foreach v ∈ X ∪ Y do label(v) ← undef; xmax ← x0 ; /* the maximum vertex in X s.t. label(x) 6= undef ymax ← y0 ; /* the maximum vertex in Y s.t. label(y) 6= undef r ← 1; while r ≤ n do if max NG (yr ) > xmax then Construct the chain graph Gr ; /* Definition 2 cl ← LABELING CHAIN(Gr ); if r = 1 then s ← 0; else s ← max{0, label(xmax ) − cl(xmax ), label(ymax ) − cl(ymax )} ; /* Xr ∪ Yr is the bipartition of Gr foreach x ∈ Xr do if label(x) = undef then label(x) ← cl(x) + s */ */ */ */ if s = 0 or s = label(ymax ) − cl(ymax ) then foreach y ∈ Yr do if label(y) = undef then label(y) ← cl(y) + s else /* label(xmax ) − cl(xmax ) > label(ymax ) − cl(ymax ) foreach y ∈ Yr do label(y) ← cl(y) + s */ xmax ← max NGr (yr ); ymax ← max NGr (xmax ); r ←r+1 foreach v ∈ X ∪ Y do f (v) ← label(v) mod (bc(G) + 2) ; return f 1. Visit yr ∈ Y starting from r = 1 to r = n consecutively. (a) Construct a chain graph Gr and calculate a labeling cl of Gr by LABELING CHAIN. (b) Determine the L(2, 1)-labeling label(v) of vertices v in Gr by adjusting cl to already assigned labels of G. 2. After the assignment label are determined for all vertices, calculate f = label(v) mod (bc(G) + 2), and output the resulting label assignment f . The detail of the algorithm is described in Algorithm 3. 4.2 Example of Our Algorithm We present Figs. 4–8 as an example of our labeling algorithm for the bipartite permutation graph G of Fig. 1. 1. In Fig. 4, the chain graph G1 and its labeling cl are calculated. Then label for vertices in G1 is defined. International Workshop on Combinatorial Algorithms 07 7 Fig. 4. The chain graph G1 and its labeling cl (left), and the labeling label of G (right). In this case, s = 0. Fig. 5. G2 and its labeling cl, and label of G (s = 1). Fig. 6. G3 and its labeling cl, and label of G (s = 2). Fig. 7. G6 and its labeling cl, and label of G (s = 6). Fig. 8. The L(2, 1)-labeling of G which is obtained from label and bc(G) + 2 = 8. 8 International Workshop on Combinatorial Algorithms 07 2. The chain graph G2 and its labeling cl are obtained as in Fig. 5. In this case, s = max{0, label(x2 ) − cl(x2 ), label(y3 ) − cl(y3 )} = max{0, 4 − 4, 2 − 1} = 1. Thus label(v) = cl(v) + 1 for v ∈ {x3 , x4 , y4 , y5 }. 3. The chain graph G3 and its labeling cl are obtained as in Fig. 6. In this case, s = max{0, label(x4 ) − cl(x4 ), label(y5 ) − cl(y5 )} = max{0, 7 − 5, 4 − 2} = 2. Thus label(v) = cl(v) + 2 for v ∈ {x5 , y6 }. 4. The chain graph G6 and its labeling cl are obtained as in Fig. 7. In this case, s = max{0, label(x5 ) − cl(x5 ), label(y6 ) − cl(y6 )} = max{0, 8 − 2, 5 − 0} = 6. Since s = 6 = label(x5 ) − cl(x5 ) > label(y6 ) − cl(y6 )}, label(v) = cl(v) + 6 for v ∈ {x6 , x7 , y6 , y7 , y8 } (line 17 of Algorithm 3). 5. Finally, L(2, 1)-labeling f of G is obtained by f (v) = label(v) mod 8 as shown in Fig. 8. Note that, in each step, the biclique number bc(Gr ) is equal to the value of label(x) − label(yr ), where x is the maximum neighbor of yr . For example, in Fig. 5, bc(G2 ) = 6 = label(x4 ) − label(y2 ) holds. It also should be noted that if the condition s = label(xmax ) − cl(xmax ) > label(ymax ) − cl(ymax ) holds (line 17), the labeling of nodes of Y in Gr are increased. For example, label(y6 ) = 5 in Fig. 6, then it is increased to 6 in Fig. 7 because the situation s = label(x5 ) − cl(x5 ) > label(y6 ) − cl(y6 ) occurs. 4.3 Correctness The labeling label calculated in the algorithm is an L(2, 1)-labeling of G, which is guaranteed by the following two lemmas. Lemma 5. 2 ≤ label(xi ) − label(yj ) ≤ bc(G) if xi yj ∈ E. Proof (Sketch of proof ). If an edge xi yj is in Gr , then cl(xi ) − cl(yj ) ≤ bc(Gr ) by Lemma 1, where cl is the labeling of Gr . Since label(xi ) − label(yj ) ≤ cl(xi ) − cl(yj ), the inequality label(xi ) − label(yj ) ≤ bc(G) holds. So we should show that label(xi ) − label(yj ) ≥ 2. This condition would be violated only when the following situation occurs: (i) label(yj ) is increased in line 17 for some chain graph Gr , and (ii) The labels of vertices of Xr in Gr are not consecutive numbers. An example of non-consecutive labels is G3 in Fig. 6. The vertices of X3 , x2 , x3 , x4 and x5 , have labels 4,6,7 and 8, respectively, in G, which are not consecutive numbers. Furthermore, vertex x2 is adjacent to y3 and label(x2 ) − label(y3 ) = 2. Thus, if label(y3 ) would be increased after processing G3 , then label(x2 ) − label(y3 ) < 2. However, we can show that the above situation (i) and (ii) does not occur simultaneously. The detailed proof of this will be presented in the full version of this paper. ⊓ ⊔ Lemma 6. The labeling label satisfies the following inequalities: International Workshop on Combinatorial Algorithms 07 9 1. 1 ≤ label(xk ) − label(xi ) ≤ bc(G) − 2 if dist(xi , xk ) = 2 and 1 ≤ i < k ≤ m. 2. 1 ≤ label(yl ) − label(yj ) ≤ bc(G) − 2 if dist(yj , yl ) = 2 and 1 ≤ j < l ≤ n. Proof. Suppose that dist(xi , xk ) = 2 and i < k. Clearly label(xi ) < label(xk ). Let y be a common neighbor of xi and xk , and q = label(y). By Lemma 5, we have label(xk ) ≤ q+bc(G) and label(xi ) ≥ q+2. Hence label(xk )−label(xi ) ≤ bc(G)−2. Similarly, we suppose that dist(yj , yl ) = 2 and j < l. Clearly label(yj ) < label(yl ). Let x be a common neighbor of yj and yl , and p = label(x). By Lemma 5, we have label(yl ) ≤ p − 2 and label(yj ) ≥ p − bc(G). Hence label(yl ) − label(yj ) ≤ bc(G) − 2. ⊓ ⊔ Theorem 3. The labeling f calculated by Algorithm 3 is an L(2, 1)-labeling of G, and maxv∈X∪Y f (v) ≤ bc(G) + 1. This algorithm runs in O(|V | + |E|) time. Proof. Since f (v) = label(v) mod (bc(G) + 2), the inequality maxv∈X∪Y f (v) ≤ bc(G) + 1 holds. Let xy ∈ E, where x ∈ X and y ∈ Y . Then, by Lemma 5, 2 ≤ label(x) − label(y) ≤ bc(G). Since f (x) = label(x) mod (bc(G) + 2) and f (y) = label(y) mod (bc(G) + 2), the value of |f (x) − f (y)| cannot be 0 or 1. If dist(xi , xk ) = 2, then label(xk ) − label(xi ) ≤ bc(G) − 2 by Lemma 6. Hence |f (xk ) − f (xi )| ≥ 1. Similarly, we can show that |f (yl ) − f (yj )| ≥ 1 if dist(yj , yl ) = 2. If the degree of yj and max NG (yj ) are d1 and d2 , respectively, then the chain graph Gj has at most d1 + d2 vertices. Hence, cl and label of vertices in Gj are calculated in O(d1 + d2 ) = O(∆) time, where ∆ is the maximum degree of G. Since the number of chain graphs constructed in our algorithm is O(|V |), the total running time of the algorithm is O(|V | + ∆|V |) = O(|V | + |E|). ⊓ ⊔ Corollary 2. Any bipartite permutation graph G satisfies λ(G) ≤ bc(G) + 1. 5 Conclusion In this paper, we investigated the L(2, 1)-labeling problem for bipartite permutation graphs. We showed that an optimal L(2, 1)-labeling of a chain graph, a special class of bipartite permutation graphs, can be computed in linear time. We also present a linear time algorithm for computing L(2, 1)-labeling of a bipartite permutation graph such that the maximum label is at most bc(G) + 1. Since λ(G) ≥ bc(G) for any bipartite graph G, our algorithm computes a nearly optimal solution. We conclude this paper by presenting two open problems. It should be noted that there exists a bipartite permutation graph G such that λ(G) = bc(G) + 1. For example, the bipartite permutation graph G in Fig. 9 has bc(G) = 6 and λ(G) = 7 = bc(G) + 1. Hence the set of bipartite permutation graphs is classified into two classes, one consists of G with λ(G) = bc(G) and another consists of G with λ(G) = bc(G) + 1. Problem 1. Characterize bipartite permutation graph G with λ(G) = bc(G). 10 International Workshop on Combinatorial Algorithms 07 Fig. 9. A bipartite permutation graph G with λ(G) = bc(G) + 1. Our algorithm does not guarantee that the resulting labeling is optimal. The complexity of the L(2, 1)-labeling problem for bipartite permutation graphs is still open. Problem 2. Develop a polynomial time algorithm for computing an optimal L(2, 1)-labeling of a bipartite permutation graph, or prove NP-completeness of the problem for bipartite permutation graphs. References 1. Griggs, J.R., Yeh, R.K.: Labelling graphs with a condition at distance 2. SIAM Journal on Discrete Mathematics 5(4) (nov 1992) 586–595 2. Calamoneri, T.: The L(h, k)-labelling problem: A survey and annotated bibliography. The Computer Journal 49(5) (2006) 585–608 3. Yeh, R.K.: A survey on labeling graphs with a condition at distance two. Discrete Mathematics 306 (2006) 1217–1231 4. Bodlaender, H.L., Kloks, T., Tan, R.B., van Leeuwen, J.: Approximations for λcoloring of graphs. The Computer Journal 47(2) (2004) 193–204 5. Spinrad, J., Brandstädt, A., Stewart, L.: Bipartite permutation graphs. Discrete Applied Mathematics 18(3) (1987) 279–292 6. Xu, G., Kang, L., Shan, E.: Acyclic domination on bipartite permutation graphs. Information Processing Letters 99 (2006) 139–144 7. Lu, C.L., Tang, C.Y.: Solving the weighted efficient edge domination problem on bipartite permutation graphs. Discrete Applied Mathematics 87 (1998) 203–211 8. Steiner, G.: On the k-path partition of graphs. Theoretical Computer Science 290 (2003) 2147–2155 9. Uehara, R., Valiente, G.: Linear structure of bipartite permutation graphs and the longest path problem. Information Processing Letters 103 (2007) 71–77 10. Golumbic, M.C.: Algorithmic graph theory and perfect graphs. 2nd edition. Elsevier B.V. (2004) 11. Brandstädt, A., Le, V.B., Spinrad, J.P.: Graph classes: A survey. SIAM (1999) International Workshop on Combinatorial Algorithms 07 Smaller and Faster Lempel-Ziv Indices 11 ⋆ Diego Arroyuelo and Gonzalo Navarro Dept. of Computer Science, Universidad de Chile, Chile. {darroyue,gnavarro}@dcc.uchile.cl Abstract. Given a text T [1..u] over an alphabet of size σ = O(polylog(u)) and with k-th order empirical entropy Hk (T ), we propose a new compressed full-text self-index based on the Lempel-Ziv (LZ) compression algorithm, which replaces T with a representation requiring about three times the size of the compressed text, i.e (3 + ǫ)uHk (T ) + o(u log σ) bits, for any ǫ > 0 and k = o(logσ u), and in addition gives indexed access to T : it is able to locate the occ occurrences of a pattern P [1..m] in the text in O((m + occ) log u) time. Our index is smaller than the existing indices that achieve this locating time complexity, and locates the occurrences faster than the smaller indices. Furthermore, our index is able to count the pattern occurrences in O(m) time, and it can extract any text substring of length ℓ in optimal O(ℓ/ logσ u) time. Overall, our indices appear as a very attractive alternative for space-efficient indexed text searching. 1 Introduction With the huge amount of text data available nowadays, the full-text searching problem plays a fundamental role in modern computer applications. Full-text searching consists of finding the occ occurrences of a given pattern P [1..m] in a text T [1..u], where both P and T are sequences over an alphabet Σ = {1, 2, . . . , σ}. Unlike word-based text searching, we wish to find any text substring, not only whole words or phrases. This has applications in texts where the concept of word is not well defined (e.g., Oriental languages), or texts where words do not exist at all (e.g., DNA, protein, and MIDI pitch sequences). We assume that the text is large and known in advance to queries, and we need to perform several queries on it. Therefore, we can construct an index on the text, which is a data structure allowing efficient access to the pattern occurrences, yet increasing the space requirement. Our main goal is to provide fast access to the text using as little space as possible. Classical full-text indices, like suffix trees and suffix arrays, have the problem of a high space requirement: they require O(u log u) and u log u bits respectively, which in practice is about 10 and 4 times the text size, not including the text. Compressed self-indexing is a recent trend in full-text searching, which consists in developing full-text indices that store enough information so as to search ⋆ Supported in part by CONICYT PhD Fellowship Program (first author), and Fondecyt Grant 1-050493 (second author). 12 International Workshop on Combinatorial Algorithms 07 and retrieve any part of the indexed text without storing the text itself, while requiring space proportional to the compressed text size. Because of their compressed nature and since the text is replaced by the index, typical compressed self-indices are much smaller than classical indices, allowing one to store indices of large texts entirely in main memory, in cases where a classical index would have required to access the much slower secondary storage. There exist two classical kind of queries, namely: (1 ) count(T, P ), which counts the number of occurrences of P in T ; (2 ) locate(T, P ), which reports the starting positions of the occ occurrences of P in T . Self-indices also need operation (3 ) extract(T, i, j), which decompresses substring T [i..j], for any text positions i 6 j. Let Hk (T ) denote the k-th order empirical entropy of a sequence of symbols T [12]. The value uHk (T ) provides a lower bound to the number of bits needed to compress T using any compressor that encodes each symbol considering only the context of k symbols that precede it in T . It holds that 0 6 Hk (T ) 6 Hk−1 (T ) 6 · · · 6 H0 (T ) 6 log σ (by log we mean log2 in this paper). The main types of compressed self-indices [16] are Compressed Suffix Arrays (CSA) [8, 19], indices based on backward search [6] (which are alternative ways to compress suffix arrays), and the indices based on the Lempel-Ziv compression algorithm (LZ-indices for short) [10]. LZ-indices have shown to be very effective in practice for locating occurrences and extracting text, outperforming other compressed indices. Compressed indices based on suffix arrays store extra noncompressible information to carry out these tasks, whereas the extra data stored by LZ-indices is compressible. Therefore, when the texts are highly compressible, LZ-indices can be smaller and faster than alternative indices; and in other cases they offer very attractive space/time trade-offs. What characterizes the particular niche of LZ-indices is the O(uHk (T )) space combined with O(log u) time per located occurrence. The smallest LZ-indices available are those by Arroyuelo et al. [1] (ANS-LZI for short), which require (2 + ǫ)uHk (T ) + o(u log σ) bits of space, for any constant 0 < ǫ < 1 and any k = o(logσ u); however, their time complexity of O(m2 log m + (m + occ) log u) makes them suitable only for short patterns, when the quadratic term is less significant. Other LZ-indices, like a compact version of that by Ferragina and Manzini [6], or the one by Russo and Oliveira [18] (ILZI for short), achieve O((m + occ) log u) time. They are, however, relatively large, at least 5uHk (T ) + o(u log σ) bits of space. In this paper we propose a new LZ-index scheme requiring (3 + ǫ)uHk (T ) + o(u log σ) bits of space, for σ = O(polylog(u)) and any k = o(logσ u), with an overall locating time of O((m + occ) log u), a counting time of O(m), and an extracting time of O(ℓ/ logσ u) for a substring of length ℓ. In this way we achieve the same locating complexity of larger LZ-indices [6, 18]. Note that the original index in [6] achieves better locate time, O(m + occ), yet it requires O(uHk (T ) logγ ) bits of space, for any γ > 0. The CSA of Sadakane [19] (SAD-CSA for short) has a locating complexity of O((m + occ) logǫ u), for any ǫ > 0, however the space requirement is proportional to the zero-th order empirical entropy plus a non-negligible extra space, International Workshop on Combinatorial Algorithms 07 13 ǫ−1 uH0 (T ) + O(u log log σ) bits. The Alphabet-Friendly FM-index [7] (AF-FMI for short), on the other hand, requires uHk (T ) + o(u log σ) bits of space, however its locate complexity is O(m+occ log1+ǫ u), which is slower than ours. Finally, the CSA of Grossi, Gupta, and Vitter [8] (GGV-CSA for short) requires ǫ−1 uHk (T )+ 1−2ǫ ǫ o(u log σ) bits of space, with a locating time of O((log u) 1−ǫ (log σ) 1−ǫ ) per oc1−3ǫ 1+ǫ currence, after a counting time of O( logm u +(log u) 1−ǫ (log σ) 1−ǫ ), where 0 < ǫ < σ 1/2. When ǫ approaches 1/3, the space requirement is about 3uHk (T )+o(u log σ) bits, with a locating time of O( logm u + log2 u + occ((log u)1/2 (log σ)1/2 )). Thus, σ using the same space their time per occurrence located is slightly lower, in exchange for an O(log2 u) extra additive factor. In Table 1 we summarize the space and time complexities of some existing compressed self-indices. Locate times in the table are per occurrence reported, after counting the pattern occurrences. In the case of our LZ-index, we have to pay extra O(m log u) time. Table 1. Comparison of our LZ-index with alternative compressed self-indices. The result for the GGV-CSA [8] is shown for ǫ = 1/3, and assuming σ = O(polylog(u)) in all cases. Index SAD-CSA [19] GGV-CSA [8] AF-FMI [7] ANS-LZI [1] RO-LZI [18] Our LZI Index [19] [8] [7] [1] [18] Our LZI 2 count O(m log u) O( logm u + log2 u) σ O(m) O((m + occ) log u) O((m + occ) log u) O(m) Space in bits ǫ−1 uH0 (T ) + O(u log log σ) 3uHk (T ) + o(u log σ) uHk (T ) + o(u log σ) (2 + ǫ)uHk (T ) + o(u log σ) (5 + ǫ)uHk (T ) + o(u log σ) (3 + ǫ)uHk (T ) + o(u log σ) locate (per occurrence) ǫ O(log u) O(log1/2 u log1/2 σ) O(log1+ǫ u) free after counting free after counting O(log u) extract O(ℓ + logǫ u) O(log1/2 u log1/2 σ + ℓ/ logσ u) O(ℓ + log1+ǫ u) O(ℓ/ logσ u) O(ℓ/ logσ u) O(ℓ/ logσ u) Searching in Lempel-Ziv Compressed Texts Assume that the text T [1..u] has been compressed using the LZ78 algorithm [21] into n + 1 phrases, T = B0 . . . Bn , such that B0 = ε. The search of a pattern P [1..m] in a LZ78-compressed text has the additional problem that, as the text is parsed into phrases, a pattern occurrence can span several (two or more) 14 International Workshop on Combinatorial Algorithms 07 consecutive phrases. We call occurrences of type 1 those occurrences contained in a single phrase (there are occ1 occurrences of type 1), and occurrences of type 2 those occurrences spanning two or more phrases (there are occ2 occurrences of this type). Next we review the existing Lempel-Ziv self-indices. The first compressed index based on LZ78 was that of Kärkkäinen and Ukkonen [10], which has a locating time O(m2 + (m + occ) log u) and a space requirement of O(uHk (T )) bits [16], plus the text. However, this is not a self-index. Ferragina and Manzini [6] define the FM-LZI, a compressed self index based on the LZ78 compression algorithm, requiring O(uHk (T ) logγ u) bits of space, for any constant γ > 0. This index is able to report the occ pattern occurrences in optimal O(m + occ) time. This is the fastest existing compressed self-index, achieving the same time complexity as suffix trees, yet requiring o(u log u) bits and without needing the text to operate. However, the extra O(logγ u) factor makes this index large in practice. However, we can replace their data structure for range queries by that of Chazelle [4], such that the resulting version of FMLZI requires (5 + ǫ)uHk (T ) + o(u log σ) bits of space, for any constant 0 < ǫ < 1 and any k = o(logσ u), and is capable of locating the pattern occurrences in O((m + occ) log u) time. The LZ-index of Navarro [15] (Nav-LZI for short) has a greater locate time than that of LZ-indices in general, yet the smallest existing LZ-index is a variant of the Nav-LZI: the index defined by Arroyuelo et al. [1] (ANS-LZI for short) requires (2 + ǫ)uHk (T ) + o(u log σ) bits of space, and its locate time is O(m2 log m + (m + occ) log u). Although the locating time per occurrence is O(log u) as for other LZ-indices, the O(m2 log m) term makes the ANS-LZI attractive only for short patterns. The key to achieve such a small space requirement is that the Nav-LZI (and therefore the ANS-LZI) do not use the concept of suffix arrays at all. Rather, the search is based only in an implicit representation of the text through the LZ78 parsing of it: the LZTrie, which is the trie representing the LZ78 phrases of T . As the text is scattered throughout the LZTrie, we have to distinguish between occurrences spanning two consecutive phrases (occurrences of type 2) and occurrences spanning more than two phrases (occurrences of type 3 ). For occurrences of type 3 we must consider the O(m2 ) possible substrings of P , search for all these strings in LZTrie, then form maximal concatenations of consecutive phrases, to finally check every candidate [15]. All of this takes O(m2 log m) time. The space of the ANS-LZI can be reduced to (1 + ǫ)uHk (T ) + o(u log σ) bits, with a locating time of O(m2 ) on average for patterns of length m > 2 logσ u, yet without worst-case guarantees at search time. Russo and Oliveira [18] discard the LZ78 parsing of T and use a so-called maximal parsing instead, which is constructed for the reversed text. In this way they avoid the O(m2 ) checks for the different pattern substrings. The resulting LZ-index (the RO-LZI) requires (5 + ǫ)uHk (T ) + o(u log σ) bits of space, for any constant 0 < ǫ < 1 and any k = o(logσ u). The locating time of the index is O((m + occ) log u). As for all the previous LZ-indices, the extract time for any text substring of length ℓ is the optimal O(ℓ/ logσ u). International Workshop on Combinatorial Algorithms 07 3 15 Smaller and Faster Lempel-Ziv Indices In the review of Section 2 we conclude that LZ-indices can be as small as to require (2+ǫ)uHk (T )+o(u log σ) bits of space yet with a locate time O(m2 log m+ (m+occ) log u), or we can achieve time O((m+occ) log u) for locate with a greater index requiring (5 + ǫ)uHk (T ) + o(u log σ) bits of space. On the other hand, we can be fast to count only using the FM-LZI: O(m) counting time in the worst case. We show that we can be fast for all these operations with a significantly smaller LZ-index. 3.1 Index Definition As we aim at a small index, we use the ANS-LZI as a base, specifically the version requiring (1 + ǫ)uHk (T ) + o(u log σ) bits of space, which is composed of: – LZTrie: is the trie storing the LZ78 phrases B0 , . . . , Bn of T . As the set of LZ78 phrases is prefix-closed, this trie has exactly n + 1 nodes. We represent the trie structure using the following data structures: (1 ) par[0..2n]: the tree shape of LZTrie represented using dfuds [2], requiring 2n + o(n) bits of storage, allowing us to compute operations parentlz (x) (which gets the parent of node x), childlz (x, i) (which gets the i-th child of x), subtreesizelz (x) (which gets the size of the subtree of x, including x itself), depthlz (x) (which gets the depth of x in the trie) and LAlz (x, d) (a level-ancestor query, which gets the ancestor at depth d of node x), both of which can be computed on dfuds by using the idea of Jansson et al. [9], and finally ancestorlz (x, y) (which tells us whether x is an ancestor of node y), all of them in O(1) time. As in [1], we add the data structure of [20] to extract any text substring of length ℓ in optimal O(ℓ/ logσ u) time, requiring only o(u log σ) extra bits. (2 ) ids[0..n]: is the preorder sequence of LZ78 phrase identifiers. Permutation ids is represented using the data structure of Munro et al. [14] such that we can compute ids in O(1) time and its inverse permutation ids−1 in O(1/ǫ) time, requiring (1 + ǫ)n log n bits for any constant 0 < ǫ < 1. (3 ) letts[1..n]: the array storing the edge labels of LZTrie according to a dfuds traversal of the trie. We solve operation child(x, α) (which gets the child of node x with label α ∈ {1, . . . , σ}) in constant time as follows (this is slightly different to the original approach [2]). Suppose that node x has position p within par. Let k be the number of αs up to position p − 1 in letts, and let p + i be the position of the (k + 1)-th α in letts. If p + i lies within positions p and p + degree(x), the child we are looking for is child(x, i + 1), which is computed in constant time over par; otherwise x has no child labeled α. If σ = O(polylog(u)), we represent letts using the wavelet tree of [7] in order to compute k and p + i in constant time by using rankα and selectα respectively, and requiring n log σ + o(n) bits of space. We can also retrieve the symbol corresponding to node x (i.e., the symbol by which x descend from its parent) in constant time by letts[rank( (par, p) − 1]. Sequence letts is also used to get the symbols of the text for extract queries. 16 International Workshop on Combinatorial Algorithms 07 Overall, LZTrie requires (1 + ǫ)n log n + 2n + n log σ + o(u log σ) bits, which is (1 + ǫ)uHk (T ) + o(u log σ) bits [11], for any k = o(logσ u). – RevTrie: is the PATRICIA tree [13] of the reversed LZ78 phrases of T . In this trie there could be internal nodes not representing any phrase. We call these nodes empty. We compress empty unary nodes, and so we only represent the empty non-unary nodes. As a result, the number of nodes in this trie is n 6 n′ 6 2n. The trie is represented using the dfuds representation, requiring 2n′ + n′ log σ + o(n′ ) 6 4n + 2n log σ + o(n) bits of space. – R[1..n]: a mapping from RevTrie preorder positions (for non-empty nodes) to LZTrie preorder positions, requiring n log n = uHk (T ) + o(u log σ) bits. – T P os[1..u]: a bit vector marking the n phrase beginnings. We represent T P os using the data structure of [17] for rank and select queries in O(1) time and requiring uH0 (T P os) + o(u) 6 n log log n + o(u) = o(u log σ) bits of space1 . We can compress the R mapping [1], so as to require o(u log σ) bits, by adding suffix links to RevTrie, which are represented by function ϕ. R(i) (seen as a function) can be computed in constant time by using ϕ [1]. If we store the values of ϕ in preorder (according to RevTrie), the resulting sequence can be divided into at most σ strictly increasing subsequences, and hence it can be compressed using the δ-code of Elias [5] such that its space requirement is n log σ bits in the worst case, which is o(u log σ). The overall space requirement of the three above data structures is (1 + ǫ)uHk + o(u log σ) bits. To avoid the O(m2 log m) term in the locating complexity, we must avoid occurrences of type 3 (which make the ANS-LZI slower). Hence we add the alphabet friendly FM-index [7] of T (AF-FMI(T ) for short) to our index. By itself this self-index is able to search for pattern occurrences, requiring uHk (T )+ o(u log σ) bits of space. However, its locate time per occurrence is O(log1+ǫ u), for any constant ǫ > 0, which is greater than the O(log u) time per occurrence of LZ-indices. As AF-FMI(T ) is based on the Burrows-Wheeler transform (BWT) [3] of T , it can be (conceptually) thought as the suffix array of T . To find occurrences spanning several phrases we define Range, a data structure for 2-dimensional range searching in the grid [1..u] × [1..n]. For each LZ78 phrase with identifier id, for 0 < id 6 n, assume that the RevTrie node for id has preorder j, and that phrase (id + 1) starts at position p in T . Then we store the point (i, j) in Range, where i is the lexicographic order of the suffix of T starting at position p. Suppose that we search for a given string s2 in AF-FMI(T ) and get the interval [i1 , i2 ] in the BWT (equivalently, in the suffix array of T ), and that the search for string sr1 in RevTrie yields a node such that the preorder interval for its subtree is [j1 , j2 ]. Then, a search for [i1 , i2 ]×[j1 , j2 ] in Range yields all phrases ending with s1 such that in the next phrase starts an occurrence of s2 . We transform the grid [1..u] × [1..n] indexed by Range to the equivalent grid [1..n]×[1..n] by defining a bit vector V[1..u], indicating (with a 1) which positions of AF-FMI(T ) index an LZ78 phrase beginning. We represent V with the data 1 rank1 (T P os, i) is the number of 1’s in T P os up to position i. select1 (T P os, j) yields the position of the j-th 1 in T P os. International Workshop on Combinatorial Algorithms 07 17 structure of [17] allowing rank queries, and requiring uH0 (V) + o(u) = o(u log σ) bits of storage. Thus, instead of storing the point (i, j) as in the previous definition of Range, we store the point (rank1 (V, i), j). The same search of the previous paragraph now becomes [rank1 (V, i1 ), rank1 (V, i2 )] × [j1 , j2 ]. As there is only one point per row and column of Range, we can use the data structure of Chazelle [4], requiring n log n(1 + o(1)) = uHk (T ) + o(u log σ) bits of space and allowing us to find the K points in a given two-dimensional range in time O((K + 1) log n). As a result, the overall space requirement of our LZ-index is (3 + ǫ)uHk (T ) + o(u log σ), for any k = o(logσ u) and any constant 0 < ǫ < 1. 3.2 Search Algorithm Assume that P [1..m] = p1 . . . pm , for pi ∈ Σ. We need to consider two types of occurrences of P in T . Locating Occurrences of Type 1. Assume that phrase Bj contains P . If Bj does not end with P and if Bj = Bℓ · c, for ℓ < j and c ∈ Σ, then by LZ78 properties Bℓ contains P as well. Therefore we must find the shortest possible phrases containing P , which according to LZ78 are all phrases ending with P . This work can be done by searching for P r in RevTrie. Say we arrive at node v. Any node v ′ in the subtree of v (including v itself) corresponds to a phrase terminated with P . Thus we traverse and report all the subtrees of the LZTrie nodes R(v ′ ) corresponding to each v ′ . Total locate time is O(m + occ1 ). Locating Occurrences of Type 2. To find the pattern occurrences spanning two or more consecutive phrases we must consider the m − 1 partitions P [1..i] and P [i + 1..m] of P , for 1 6 i < m. For every partition we must find all phrases terminated with P [1..i] such that the next phrase starts at the same position as an occurrence of P [i + 1..m] in T . Hence, as explained before, we must search for P r [1..i] in RevTrie and for P [i + 1..m] in AF-FMI(T ). Thus, every partition produces two one-dimensional intervals, one in each of the above structures. The m − 1 intervals in AF-FMI(T ) can be found in O(m) time thanks to the backward search concept, since the process to count the number of occurrences of P [2..m] proceeds in m − 1 steps, each one taking constant time: in the first step we find the BWT interval for pm , then we find the interval for occurrences of pm−1 pm , and so on to finally find the interval for p2 . . . pm = P [2..m]. However, the work in RevTrie can take time O(m2 ) if we search for strings P r [1..i] separately, as done for the ANS-LZI. Fortunately, some work done to search for a given P r [1..i] can be reused to search for other strings. We have to search for strings pm−1 pm−2 . . . p1 ; pm−2 . . . p1 ;. . . ; and p1 in RevTrie. Note that every pj . . . p1 is the longest proper suffix of pj+1 pj . . . p1 . Suppose that we successfully search for P r [1..m−1] = pm−1 pm−2 . . . p1 , reaching the node with preorder i in RevTrie, hence finding the corresponding preorder interval in RevTrie in O(m) time. Now, to find the node representing suffix pm−2 . . . p1 we only need to follow suffix link ϕ(i) (which takes constant time) instead of searching for it from the RevTrie root (which would take O(m) time 18 International Workshop on Combinatorial Algorithms 07 again). The process of following suffix links can be repeated m − 1 times up to reaching the node corresponding to string p1 , with total time O(m). This is the main idea to get the m − 1 preorder intervals in RevTrie in time less than quadratic. The general case is slightly more complicated and corresponds to the descend and suffix walk method used in the RO-LZI [18]. In the sequel we explain the way we implement descend and suffix walk in our data structure. We first prove a couple of properties. First, we know that every non-empty node in RevTrie has a suffix link [1], yet we need to prove that every RevTrie node (including empty-non-unary nodes) has also a suffix link. Property 1. Every empty non-unary node in RevTrie has a suffix link. Proof. Assume that node with preorder i in RevTrie is empty non-unary, and that it represents string ax, for a ∈ Σ and x ∈ Σ ∗ . As node i is an empty non-unary node, the node has at least two children. In other words, there exist at least two strings of the form axy and axz, for y, z ∈ Σ ∗ , y 6= z, both strings corresponding to non-empty nodes, and hence these nodes have a suffix link. These suffix links correspond to strings xy and xz in RevTrie. Thus, it must exist a non-unary node for string x, so every empty node i has a suffix link. ⊓ ⊔ We store the n′ 6 2n suffix links ϕ in preorder (as explained before), requiring 2n log σ bits of space in the worst case, which is o(u log σ). The second property is that, although RevTrie is a PATRICIA tree and hence we store only the first symbol of each edge label, we can get all of it. Property 2. Any edge label in RevTrie can be extracted in optimal time. Proof. To extract the label for edge eij between nodes with preorder i and j in RevTrie, note that the length of the edge label can be computed as depthlz (R[j])− depthlz (R[i]). We can access the node from where to start the extraction by x = LAlz (R[j], depthlz (R[j]) − depthlz (R[i])), in constant time. The label of eij is the label of the root-to-x path (read backwards). ⊓ ⊔ In this way, every time we arrive to a RevTrie node, the string represented by that node will match the corresponding prefix of the pattern. Previously we show that it is possible to search for all strings P r [1..i] in time O(m), assuming that P r [1..m − 1] exists in RevTrie (therefore all P r [1..i] exist in RevTrie). The general case is as follows. Suppose that, searching for pm−1 pm−2 . . . p1 , we arrive at a node with preorder i in RevTrie (hereafter node i), and we try to descend to a child node with preorder j (hereafter node j). Assume that node i represents string ax, for a ∈ Σ and x ∈ Σ ∗ . According to Property 2, we are sure that ax = pm−1 . . . pt , for some 1 6 t 6 m − 1. Assume also that edge eij between nodes i and j is labeled yz, for y, z ∈ Σ ∗ , z = z1 . . . zq . If we discover that pt−1 . . . pk = y and pk−1 6= z1 , then this means that symbol z1 in the edge label differs from the corresponding symbol pk−1 in P r [1..m − 1], and so we cannot descend to node j. This means that there are no phrases ending with P r [1..m − 1], and we go on to consider P r [1..m − 2]. To International Workshop on Combinatorial Algorithms 07 19 reuse the work done up to node i, we follow the suffix link to get the node ϕ(i), and from this node we descend using y = pt−1 . . . pk . As this substring of P has been already checked in the previous step, the descent is done checking only the first symbols of the labels, up to a node such that the next node in the path represents a string longer than |xy|. At this point the descent is done as usual, extracting the edge labels and checking with the pattern symbols. In this way the total amortized time is O(m). If the search in RevTrie for P r [1..i] yields the preorder interval [x, y], and the search for P [i + 1..m] in AF-FMI(T ) yields interval [x′ , y ′ ], the two-dimensional range [x′ , y ′ ]×[x, y] in Range yields all pattern occurrences for the given partition of P . For every pattern occurrence we get a point (i′ , j ′ ). The corresponding phrase identifier can be found as id = ids(R(j ′ )), to finally compute the text position by select1 (T P os, id + 1) − i. Overall, occurrences of type 2 are found in O((m + occ2 ) log n) time. For count queries we can achieve O(m) time by just using the AF-FMI(T ). For extract queries we use the data structure of Sadakane and Grossi [20] in the LZTrie to extract any text substring T [p..p + ℓ] in optimal O(ℓ/ logσ u) time: the identifier for the phrase containing position p can be computed as id = rank1 (T P os, p). Then, by using ids−1 we compute the corresponding LZTrie node from where to extract the text. We have proved: Theorem 1. There exists a compressed full-text self-index requiring (3 + ǫ)uHk (T ) + o(u log σ) bits of space, for σ = O(polylog(u)), any k = o(logσ u), and any constant 0 < ǫ < 1, which is able to: report the occ occurrences of pattern P [1..m] in text T [1..u] in O((m + occ/ǫ) log u) worst-case time; count pattern occurrences in O(m) time; and extract any text substring of length ℓ in time O(ℓ/(ǫ logσ u)). References 1. D. Arroyuelo, G. Navarro, and K. Sadakane. Reducing the space requirement of LZ-index. In Proc. CPM, pages 319–330, 2006. 2. D. Benoit, E. Demaine, I. Munro, R. Raman, V. Raman, and S.S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275–292, 2005. 3. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. 4. B. Chazelle. A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427–462, 1988. 5. P. Elias. Universal codeword sets and representation of integers. IEEE Trans. Inform. Theory, 21(2):194–203, 1975. 6. P. Ferragina and G. Manzini. Indexing compressed texts. Journal of the ACM, 54(4):552–581, 2005. 7. P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms (TALG), 3(2):article 20, 2007. 8. R. Grossi, A. Gupta, and J. S. Vitter. High-order entropy-compressed text indexes. In Proc. SODA, pages 841–850, 2003. 20 International Workshop on Combinatorial Algorithms 07 9. J. Jansson, K. Sadakane, and W.-K. Sung. Ultra-succinct representation of ordered trees. In Proc. SODA’07, pages 575–584, 2007. 10. J. Kärkkäinen and E. Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. WSP, pages 141–155, 1996. 11. R. Kosaraju and G. Manzini. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing, 29(3):893–911, 1999. 12. G. Manzini. An analysis of the Burrows-Wheeler transform. Journal of the ACM, 48(3):407–430, 2001. 13. D. R. Morrison. Patricia – practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM, 15(4):514–534, 1968. 14. I. Munro, R. Raman, V. Raman, and S.S. Rao. Succinct representations of permutations. In Proc. ICALP, LNCS 2719, pages 345–356, 2003. 15. G. Navarro. Indexing text using the Ziv-Lempel trie. J. Discrete Algorithms, 2(1):87–114, 2004. 16. G. Navarro and V. Mäkinen. Compressed full-text indexes. ACM Computing Surveys, 39(1):article 2, 2007. 17. R. Raman, V. Raman, and S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proc. SODA, pages 233–242, 2002. 18. L. Russo and A. Oliveira. A compressed self-index using a Ziv-Lempel dictionary. In Proc. SPIRE, LNCS 4209, pages 163–180, 2006. 19. K. Sadakane. New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms, 48(2):294–313, 2003. 20. K. Sadakane and R. Grossi. Squeezing Succinct Data Structures into Entropy Bounds. In Proc. SODA, pages 1230–1239, 2006. 21. J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory, 24(5):530–536, 1978. International Workshop on Combinatorial Algorithms 07 21 Superconnectivity of regular graphs with small diameter Camino Balbuena1⋆ , Jianmin Tang2 , Kim Marshall2 , Yuqing Lin3 1 Departament de Matemàtica Aplicada III Universitat Politècnica de Catalunya, Barcelona, Spain 2 School of Information Technology and Mathematical Sciences University of Ballarat, Ballarat, Victoria 3353, Australia 3 School of Electrical Engineering and Computer Science The University of Newcastle, NSW 2308, Australia Abstract. The superconnectivity κ1 of a connected graph G is defined as the minimum cardinality of a vertex-cut over all vertex-cuts X such that no vertex x 6∈ X has all its neighbors in X. In this paper we prove for any δ-regular graph of diameter D and odd girth g that if D ≤ g − 2 then κ1 > δ when g ≥ 5 and a complete graph otherwise. Key words. connectivity, superconnectivity, cutset, diameter, girth. 1 Introduction Let G = (V, E) be a graph with vertex set V = V (G) and edge set E = E(G). Throughout this paper, only undirected simple graphs without loops or multiple edges are considered. Unless otherwise stated, we follow [7] for terminology and definitions. The set of vertices adjacent to a vertex v is called the neighborhood of v and denoted by N (v). A vertex in the neighborhood of v is a neighbor of v. The degree of a vertex v is deg(v) = |N (v)|, and the minimum degree δ = δ(G) of G is the minimum degree over all vertices of G. A graph is called δ-regular if all its vertices have the same degree δ. If S ⊂ V then G[S] stand for the subgraph induced by S. The degree of a vertex v in an induced subgraph H of G is degH (v) = |N (v) ∩ V (H)|. The minimum edge-degree of G, denoted by ξ = ξ(G), is defined as ξ(G) = min{(d(u) + d(v) − 2 : uv ∈ E(G)}. The distance d(u, v) of two vertices u and v in G is the length of a shortest path between u and v. The diameter of a graph G, written D = D(G), is the maximum distance of any two vertices among all the vertices of G. The girth g = g(G) is the length of a shortest cycle in G. For S ⊂ V , d(w, S) = dG (w, S) = min{d(w, s) : s ∈ S} denotes the distance between a vertex w and a set S. For every v ∈ V and every positive integer r ≥ 0, Nr (v) = {w ∈ V : d(w, v) = r} denotes the neighborhood ⋆ m.camino.balbuena@upc.edu shall@students.ballarat.edu.au j.tang@ballarat.edu.au yuqing.lin@newcastle.edu.au klmar- 22 International Workshop on Combinatorial Algorithms 07 of v at distance r. Similarly, for S ⊂ V , the neighborhood of S at distance r is denoted Nr (S) = {w ∈ V : d(w, S) = r}. Observe that N0 (S) = S. When r = 1 we write N (v) and N (S), instead of N1 (v) and N1 (S). A graph G is connected if there is a path between any two vertices of G. If X ⊂ V and G − X is not connected, then X is said to be a cutset or a disconnecting set. Analogously, If F ⊂ E and G − F is not connected, then F is said to be an edge-cut or an edge disconnecting set. We say that G is rconnected if the deletion of at least r vertices of G is required to disconnect the graph and in this case we say that the vertex connectivity κ = κ(G) ≥ r. A complete graph with r + 1 vertices is r-connected. A graph with minimum degree δ is maximally connected if it is δ-connected, or equivalently κ = δ. The notion of superconnectedness was proposed in [4–6]. A graph is superconnected, for short super-κ, if all minimum cutsets consist of the vertices adjacent with one vertex, see Boesch [5], Boesch and Tindell [6] and Fiol, Fàbrega and Escudero [9]. Observe that a superconnected graph is necessarily maximally connected, κ = δ, but the converse is not true. For example, a cycle Cg of length g with g ≥ 6 is a maximally connected graph that is not superconnected. A cutset X of G is called a non-trivial cutset if X does not contain the neighborhood N (u) of any vertex u 6∈ X. Provided that some non-trivial cutset exists, the superconnectivity of G denoted by κ1 was defined in [1, 9] as: κ1 = κ1 (G) = min{|X| : X is a non-trivial cutset}. A non-trivial cutset X is called a κ1 -cut if |X| = κ1 . Notice that if κ1 ≤ δ, then κ1 = κ and that κ1 > δ is a sufficient and necessary condition for G to be super-κ, since all the minimum disconnecting sets with cardinality equal to δ must be trivial. A non-trivial edge-cut, the edge-superconnectivity λ1 = λ1 (G) and a λ1 -cut are defined analogously. Some known sufficient conditions on the diameter of a graph in terms of its girth to guarantee lower bounds on κ, λ, κ1 and λ1 are listed in the following theorem. Theorem 1. Let G be a graph with minimum degree δ ≥ 2, diameter D, girth g, edge minimum degree ξ, connectivities λ and κ and superconnectivities κ1 and λ1 . Then, (i) (ii) (iii) (iv) [10] λ = δ [10] κ = δ [3] λ1 = ξ [2] κ1 ≥ ξ if D ≤ 2⌊(g − 1)/2⌋. if D ≤ 2⌊(g − 1)/2⌋ − 1. if D ≤ g − 2. if D ≤ g − 3. In this paper we improve Theorem 1 (ii) by proving that a δ-regular graph G with δ ≥ 3 and diameter at most g − 2 is super-κ when g odd. To do this we require the following known result. Proposition 1. [2] Let G = (V, E) be a connected graph with girth g and minimum degree δ ≥ 2. Let X ⊂ V be a κ1 -cut of |X| < ξ(G). Then for each connected component C of G − X there exists some vertex u0 ∈ V (C) such that d(u0 , X) ≥ ⌈(g − 3)/2⌉ and |N⌈(g−3)/2⌉ (u0 ) ∩ X| ≤ 1. International Workshop on Combinatorial Algorithms 07 23 In Section 2 we present our results. 2 Main results We use Proposition 1 to prove the existence of some structural properties in a component C when g is odd and max{d(u, X) : u ∈ V (C)} = (g − 3)/2. Lemma 1. Let G be a κ1 -connected graph with odd girth and minimum degree δ ≥ 3. Let X be a κ1 -cut with |X| = δ and assume that there exists a connected component C of G − X such that max{d(u, X) : u ∈ V (C)} = (g − 3)/2. Then the following assertions hold: (i) If u ∈ V (C) is such that d(u, X) = (g − 3)/2 and |N(g−3)/2 (u) ∩ X| = 1, then u has degree d(u) = δ and δ − 1 neighbors z such that d(z, X) = (g − 3)/2 and |N(g−3)/2 (z) ∩ X| = 1. (ii) If u ∈ V (C) is such that d(u, X) = (g − 3)/2 and |N(g−3)/2 (u) ∩ X| = 1, then |N(g−1)/2 (u) ∩ X| = δ − 1. (iii) There exists a (δ−1)-regular subgraph Γ such that for every vertex w ∈ V (Γ ), dG (w) = δ and d(w, X) = (g − 3)/2. (iv) If g = 5 then |N (X) ∩ V (C)| ≥ δ(δ − 1). And if g ≥ 7 then |N (X) ∩ V (C)| ≥ (δ − 1)2 + 2. As a consequence of Lemma 1 we obtain Theorem 2 which is an improvement of Theorem 1 (i) for regular graphs of odd girth. Theorem 2. Let G be a δ-regular graph with δ ≥ 3 and odd girth g. If the diameter D ≤ g − 2, then G is super-κ when g ≥ 5 and a complete graph otherwise. The graph depicted in Figure 1 shows a non δ-regular graph with g = 5, D = 3 which is non super-κ. Consequently, the hypothesis of regularity is essential to establish Theorem 2. Acknowledgments Research supported by the Ministry of Science and Technology, Spain, the European Regional Development Fund (ERDF) under project MTM2005-08990-C0202. References 1. C. Balbuena and A. Carmona, On the connectivity and superconnectivity of bipartite digraphs and graphs, Ars. Combin. 61 (2001), 3 − 21. 2. C. Balbuena, M. Cera, A. Diánez, P. Garcı́a-Vázquez, and X. Marcote, On the restrited connectivity and superconnectivity in graphs with given girth, Discrete Math. (2006), doi: 10.1016/j.disc.2006.07.016 24 International Workshop on Combinatorial Algorithms 07 Fig. 1. A graph with g = 5 and κ1 = δ = 3. 3. C. Balbuena, P. Garcı́a-Vázquez, and X. Marcote, Sufficient conditions for λ′ optimality in graphs with girth g, J. of Graph Theory 52(1) (2006), 73-86. 4. D. Bauer, F. Boesch, C. Suffel, and R. Tindell, Connectivity extremal problems and the design of reliable probabilistic networks, The theory and application of graphs, Y. Alavi and G. Chartrand (Editors), Wiley, New York (1981), 89–98. 5. F.T. Boesch, Synthesis of reliable networks-A survey, IEEE Trans. Reliability 35 (1986), 240–246. 6. F.T. Boesch and R. Tindell, Circulants and their connectivities, J. Graph Theory 8(4) (1984), 487–499. 7. G. Chartrand and L. Lesniak, Graphs and digraphs, Third edition, Chapman and Hall, London, 1996. 8. J. Fàbrega and M.A. Fiol, Maximally connected digraphs. J. Graph Theory 13 (1989), 657–668. 9. M.A. Fiol, J. Fàbrega, and M. Escudero, Short paths and connectivity in graphs and digraphs. Ars Combin. 29B (1990), 17–31. 10. T. Soneoka, H. Nakada, M. Imase, and C. Peyrat, Sufficient conditions for maximally connected dense graphs, Discrete Math. 63 (1987), 53–66. International Workshop on Combinatorial Algorithms 07 25 On diregularity of digraphs of defect two Dafik 1 1,2 , Mirka Miller 1,3 , Costas Iliopoulos 4 and Zdenek Ryjacek 3 School of Information Technology and Mathematical Sciences University of Ballarat, Australia 2 Department of Mathematics Education Universitas Jember, Indonesia 3 Department of Mathematics University of West Bohemia, Plzeň, Czech Republic 4 Department of Computer Science Kings College, London, UK d.dafik@students.ballarat.edu.au m.miller@ballarat.edu.au csi@dcs.kcl.ac.uk ryjacek@kma.zcu.cz Abstract: Since Moore digraphs do not exist for k 6= 1 and d 6= 1, the problem of finding the existence of digraph of out-degree d ≥ 2 and diameter k ≥ 2 and order close to the Moore bound becomes an interesting problem. To prove the non-existence of such digraphs, we first may wish to establish their diregularity. It is easy to show that any digraph with out-degree at most d ≥ 2, diameter k ≥ 2 and order n = d + d2 + . . . + dk − 1, that is, two less than Moore bound must have all vertices of out-degree d. However, establishing the regularity or otherwise of the in-degree of such a digraph is not easy. In this paper we prove that all digraphs of defect two are out-regular and almost in-regular. Key Words: Diregularity, digraph of defect two, degree-diameter problem. 1 Introduction By a directed graph or a digraph we mean a structure G = (V (G), A(G)), where V (G) is a finite nonempty set of distinct elements called vertices, and A(G) is a set of ordered pair (u, v) of distinct vertices u, v ∈ V (G) called arcs. The order of the digraph G is the number of vertices in G. An in-neighbour (respectively, out-neighbour) of a vertex v in G is a vertex u (respectively, w) such 1 This research was supported by the Australian Research Council (ARC) Discovery Project grant DP04502994. 26 International Workshop on Combinatorial Algorithms 07 that (u, v) ∈ A(G) (respectively, (v, w) ∈ A(G)). The set of all in-neighbours (respectively, out-neighbours) of a vertex v is called the in-neighbourhood (respectively, the out-neighbourhood) of v and denoted by N − (v) (respectively, N + (v)). The in-degree (respectively, out-degree) of a vertex v is the number of all its in-neighbours (respectively, out-neighbours). If every vertex of a digraph G has the same in-degree (respectively, out-degree) then G is said to be in-regular (respectively, out-regular). A digraph G is called a diregular digraph of degree d if G is in-regular of in-degree d and out-regular of out-degree d. An alternating sequence v0 a1 v1 a2 ...al vl of vertices and arcs in G such that ai = (vi−1 , vi ) for each i is called a walk of length l in G. A walk is closed if v0 = vl . If all the vertices of a v0 − vl walk are distinct, then such a walk is called a path. A cycle is a closed path. A digon is a cycle of length 2. The distance from vertex u to vertex v, denoted by δ(u, v), is the length of a shortest path from u to v, if any; otherwise, δ(u, v) = ∞. Note that, in general, δ(u, v) is not necessarily equal to δ(v, u). The in-eccentricity of v, denoted by e− (v), is defined as e− (v) = max{δ(u, v) : u ∈ V } and out-eccentricity of v, denoted by e+ (v), is defined as e+ (v) = max{δ(v, u) : u ∈ V }. The radius of G, denoted by rad(G), is defined as rad(G)= min{e− (v) : v ∈ V }. The diameter of G, denoted by diam(G), is defined as diam(G)= max{e− (v) : v ∈ V }. Note that if G is a strongly connected digraph then, equivalently, we could have defined the radius and the diameter of G in terms of out-eccentricity instead of ineccentricity. The girth of a digraph G is the length of a shortest cycle in G. The well known degree/diameter problem for digraphs is to determine the largest possible order nd,k of a digraph, given out-degree at most d ≥ 1 and diameter k ≥ 1. There is a natural upper bound on the order of digraphs given out-degree at most d and diameter k. For any given vertex v of a digraph G, we can count the number of vertices at a particular distance from that vertex. Let ni , for 0 ≤ i ≤ k, be the number of vertices at distance i from v. Then ni ≤ di , for 0 ≤ i ≤ k, and consequently, nd,k = k X ni ≤ 1 + d + d2 + . . . + dk . (1) i=0 The right-hand side of (1), denoted by Md,k , is called the Moore bound. If the equality sign holds in (1) then the digraph is called a Moore digraph. It is well known that Moore digraphs exist only in the cases when d = 1 (directed cycles of length k + 1, Ck+1 , for any k ≥ 1) or k = 1 (complete digraphs of order d + 1, Kd+1 , for any d ≥ 1) [2, 11]. International Workshop on Combinatorial Algorithms 07 27 Note that every Moore digraph is diregular (of degree one in the case of Ck+1 and of degree d in the case of Kd+1 ). Since for d > 1 and k > 1 there are no Moore digraphs, we are next interested in digraphs of order n ‘close’ to Moore bound. It is easy to show that a digraph of order n, Md,k −Md,k−1 +1 ≤ n ≤ Md,k −1, with out-degree at most d ≥ 2 and diameter k ≥ 2 must have all vertices of outdegree d. In other words, the out-degree of such a digraph is constant (= d). This can be easily seen because if there were a vertex in the digraph with out-degree d1 < d (i.e., d1 ≤ d − 1), then the order of the digraph, n ≤ 1 + d1 + d1 d + . . . + d1 dk−1 = 1 + d1 (1 + d + . . . + dk−1 ) ≤ 1 + (d − 1)(1 + d + . . . + dk−1 ) = (1 + d + . . . + dk ) − (1 + d + . . . + dk−1 ) = Md,k − Md,k−1 < Md,k − Md,k−1 + 1, However, establishing the regularity or otherwise of in-degree for an almost Moore digraph is not easy. It is well known that there exist digraphs of outdegree d and diameter k whose order is just two or three less than the Moore bound and in which not all vertices have the same in-degree. In Fig. 1 we give two examples of digraphs of diameter 2, out-degree d = 2, 3, respectively, and order Md,2 − d, with vertices not all of the same in-degree. Fig. 1. Two examples of non-diregular digraphs. 28 International Workshop on Combinatorial Algorithms 07 Miller, Gimbert, Širáň and Slamin [7] considered the diregularity of digraphs of defect one, that is, n = Md,k − 1, and proved that such digraphs are diregular. For defect two, diameter k = 2 and any out-degree d ≥ 2, non-diregular digraphs always exist. One such family of digraphs can be generated from Kautz digraphs which contain vertices with identical out-neighbourhoods and so we can apply vertex deletion scheme, see [8], to obtain non-diregular digraphs of defect two, diameter k = 2, and any out-degree d ≥ 2. Fig. 2(a) shows an example of Kautz digraph G of order n = M3,2 − 1 which we will use to illustrate the vertex deletion scheme. Note the existence of identical out-neighbourhoods, for example, N + (v11 ) = N + (v12 ). Deleting vertex v12 , together with its outgoing arcs, and then reconnecting its incoming arcs to vertex 11, we obtain a new digraph G1 of order n = M3,2 − 2, as shown in Fig. 2(b). 1 1 7 6 5 2 5 2 10 10 12 4 11 11 3 (a) 9 8 9 8 7 6 3 4 (b) Fig. 2. Digraphs G of order 12 and G1 of order 11. We now introduce the notion of ‘almost diregularity’. Throughout this paper, let S be the set of all vertices of G whose in-degree is less than d. Let S ′ be the set of all vertices of G whose in-degree is greater than d; and let σ − be the P P − − in-excess, σ − = σ − (G) = w∈S ′ (d (w) − d) = v∈S (d − d (v)). Similarly, let R be the set of all vertices of G whose out-degree is less than d. Let R′ be the set of all vertices of G whose out-degree is greater than d. We define the P P out-excess, σ + = σ + (G) = w∈R′ (d+ (w) − d) = v∈R (d − d+ (v)). A digraph International Workshop on Combinatorial Algorithms 07 29 of average in-degree d is called almost in-regular if the in-excess is at most equal to d. Similarly, a digraph of average out-degree d is called almost out-regular if the out-excess is at most equal to d. A digraph is almost diregular if it is almost in-regular and almost out-regular. Note that if σ − = 0 (respectively, σ + = 0) then G is in-regular (respectively, out-regular). In this paper we prove that all digraphs of defect two, diameter k ≥ 3 and out-degree d ≥ 2 are out-regular and almost in-regular. 2 Results Let G be a digraph of out-degree d ≥ 3, diameter k ≥ 3 and order Md,k − 2. Since the order of G is Md,k − 2, using a counting argument, it is easy to show that for each vertex u of G there exist exactly two vertices r1 (u) and r2 (u) (not necessarily distinct) in G with the property that there are two u → ri (u) walks, for i = 1, 2, in G of length not exceeding k. The vertex ri (u), for each i = 1, 2, is called the repeat of u; this concept was introduced in [5]. We will use the following notation throughout. For each vertex u of a digraph G described above, and for 1 ≤ s ≤ k, let Ts+ (u) be the multiset of all endvertices of directed paths in G of length at most s which start at u. Similarly, by Ts− (u) we denote the multiset of all starting vertices of directed paths of length at most s in G which terminate at u. Observe that the vertex u is in both Ts+ (u) and Ts− (u), as it corresponds to a path of zero length. Let Ns+ (u) be the set of all endvertices of directed paths in G of length exactly s which start at u. Similarly, by Ns− (u) we denote the set of all starting vertices of directed paths of length exactly s in G which terminate at u. If s = 1, the sets T1+ (u) \ {u} and T1− (u) \ {u} represent the out- and in-neighbourhoods of the vertex u in the digraph G; we denote these neighbourhoods simply by N + (u) and N − (u), respectively. We illustrate the notations Ts+ (u) and Ns+ (u) in Fig. 3. We will use the following notation throughout. Notation 1 We say that G is a (d, k, δ)-digraph, that is, G ∈ G(d, k, δ), if G is a digraph of defect δ, maximum out-degree d and diameter k. We will present our new results concerning the diregularity of digraphs of order close to Moore bound in the following sections. 30 International Workshop on Combinatorial Algorithms 07 u u1 u2 ud N1+ (u) ... Tk+ (u) N2+ (u) + Tk−1 (ud ) .. . Nk+ (u) Fig. 3. Multiset Tk+ (u) 2.1 Diregularity of (d, k, 2)-digraphs In this section we present a new result concerning the in-regularity of digraphs of defect two for any out-degree d ≥ 2 and diameter k ≥ 3. Let S be the set of all vertices of G whose in-degree is less than d. Let S ′ be the set of all vertices of G whose in-degree is greater than d; and let σ be the in-excess, P P σ − = w∈S ′ (d− (w) − d) = v∈S (d − d− (v)). Lemma 1 Let G ∈ G(d, k, 2). Let S be the set of all vertices of G whose in-degree is less than d. Then S ⊆ N + (r1 (u)) ∪ N + (r2 (u)), for any vertex u. Proof. Let v ∈ S. Consider an arbitrary vertex u ∈ V (G), u 6= v, and let N + (u) = {u1 , u2 , ..., ud }. Since the diameter of G is equal to k, the vertex v must occur in each of the sets Tk+ (ui ), i = 1, 2, ..., d. It follows that for each i + (ui ) such that xi v is an arc of G. Since the there exists a vertex xi ∈ {u} ∪ Tk−1 in-degree of v is less than d then the in-neighbours xi of v are not all distinct. This implies that there exists some vertex which occurs at least twice in Tk+ (u). Such a vertex must be a repeat of u. As G has defect 2, there are at most two vertices of G which are repeats of u, namely, r1 (u) and r2 (u). Therefore, S ⊆ N + (r1 (u)) ∪ N + (r2 (u)). ✷ Combining Lemma 1 with the fact that every vertex in G has out-degree d gives Corollary 1 |S| ≤ 2d. International Workshop on Combinatorial Algorithms 07 31 In principle, we might expect that the in-degree of v ∈ S could attain any value between 1 and d − 1. However, the next lemma asserts that the in-degree cannot be less than d − 1. Lemma 2 Let G ∈ G(d, k, 2). If v1 ∈ S then d− (v1 ) = d − 1. Proof. Let v1 ∈ S. Consider an arbitrary vertex u ∈ V (G), u 6= v1 , and let N + (u) = {u1 , u2 , ..., ud }. Since the diameter of G is equal to k, the vertex v1 must occur in each of the sets Tk+ (ui ), i = 1, 2, ..., d. It follows that for each + i there exists a vertex xi ∈ {u} ∪ Tk−1 (ui ) such that xi v1 is an arc of G. If d− (v1 ) ≤ d − 3 then there are at least three repeats of u, which is impossible. Suppose that d− (v1 ) ≤ d − 2. By Lemma 1, the in-excess must satisfy σ− = X (d− (x) − d) = X (d − d− (v1 )) = |S| ≤ 2d. v1 ∈S x∈S ′ We now consider the number of vertices in the multiset Tk− (v1 ). To reach v1 from all the other vertices in G, the number of distinct vertices in Tk− (v1 ) must be X |Tk− (v1 )| ≤ − d− (u) = d|Nt−1 (v1 )| + εt , (2) − u∈Nt−1 (v1 ) where 2 ≤ t ≤ k and ε2 + ε3 + . . . + εk ≤ σ. If d− (v1 ) = d − 2 then |N − (v1 )| = |N1− (v1 )| = d − 2. It is not difficult to see that a safe upper bound on the sum of |Tk− (v1 )| is obtained from the inequality (2) by setting ε2 = 2d, and εt = 0 for 3 ≤ t ≤ k. This gives |Tk− (v1 )| ≤ 1 + |N1− (v1 )| + |N2− (v1 )| + |N3− (v1 )| + . . . + |Nk− (v1 )| = 1 + (d − 2) + (d(d − 2) + ε2 ) + (d(d(d − 2) + ε2 ) + ε3 ) (1 + d + · · · + dk−3 ) = 1 + (d − 2) + (d(d − 2) + 2d) + (d(d(d − 2) + 2d) + 0) (1 + d + · · · + dk−3 ) = 1 + d − 2 + d2 + d3 (1 + d + · · · + dk−3 ) = Md,k − 2. Since ε2 = 2d, εt = 0 for 3 ≤ t ≤ k, and G contains a vertex of in-degree d − 2 then |S| = d. Let S = {v1 , v2 , . . . , vd }. Every vi , for i = 2, 3, . . . , d, has to reach v1 at distance at most k. Since v1 and every vi have exactly the same 32 International Workshop on Combinatorial Algorithms 07 in-neighbourhood then v1 is forced to be selfrepeat. This implies that v1 occurs twice in the multiset Tk− (v1 ). Hence |T − (v1 )| < Md,k −2, which is a contradiction. Therefore d− (v1 ) = d − 1, for any v1 ∈ S. ✷ Lemma 3 If S is the set of all vertices of G whose in-degree is d − 1 then |S| ≤ d. Proof. Suppose |S| ≥ d + 1. Then there exist vi ∈ S such that d− (vi ) = 1, for P i = 1, 2, . . . , d + 1. The in-excess σ − = v∈S (d − d− (v)) ≥ d + 1. This implies that |S ′ | ≥ 1. However, we cannot have |S ′ | = 1. Suppose, for a contradiction, S ′ = {x}. To reach v1 (and vi , i = 2, 3, . . . , d + 1) from all the other vertices in Td+1 G, we must have x ∈ i=1 N − (vi ), which is impossible as the out-degree of x is d. Hence |S ′ | ≥ 2. Sd+1 Let u ∈ V (G) and u 6= vi . To reach vi from u, we must have i=1 N − (vi ) ⊆ Sd+1 {r1 (u), r2 (u)}. Since the out-degree is d then | i=1 N − (vi )| = d. Without loss of Sd generality, we suppose x1 ∈ i=1 N − (vi ) and x2 ∈ N − (vd+1 ), where x1 , x2 ∈ S ′ . Now consider the multiset Tk+ (x1 ). Since every vi , for i = 1, 2, . . . , d, respectively, must reach {vj6=i }, for j = 1, 2, . . . , d + 1, within distance at most k, then x1 occurs three times in Tk+ (x1 ), otherwise x1 will have at least three repeats, which is impossible. This implies that x1 is a double selfrepeat. Since two of vi , say vk and vl , for k, l ∈ {1, 2, . . . , d + 1}, occur in the walk joining two selfrepeats then vk and vl are selfrepeats. Then it is not possible for the d out-neighbours of x1 to reach vd+1 . ✷ Theorem 1 For k ≥ 3 and d ≥ 2, every (d, k, 2)-digraph is out-regular and almost in-regular. Proof. Out-regularity of (d, k, 2)-digraphs was explained in the Introduction. Hence we only need to proof that every (d, k, 2)-digraph is almost in-regular. If S = ∅ then (d, k, 2)-digraph is diregular. By Lemma 2, if S 6= ∅ then all vertices in S have in-degree d − 1. This gives X X (d− (x) − d) = (d − d− (v)) = |S| ≤ 2d. σ= v∈S x∈S ′ Take an arbitrary vertex v ∈ S; then |N − (v)| = |N1− (v)| = d − 1. By the diameter assumption, the union of all the sets Nt− (v) for 0 ≤ t ≤ k is the entire vertex set V (G) of G, which implies that |V (G)| ≤ k X t=0 |Nt− (v)|. (3) International Workshop on Combinatorial Algorithms 07 33 To estimate the above sum we can observe the following inequality X − d− (u) = d|Nt−1 (v)| + εt , |Nt− (v)| ≤ (4) − u∈Nt−1 (v) where 2 ≤ t ≤ k and ε2 + ε3 + . . . + εk ≤ σ. It is not difficult to see that a safe upper bound on the sum of |V (G)| is obtained from the inequality (4) by setting ε2 = σ = |S|, and εt = 0, for 3 ≤ t ≤ k; note that the latter is equivalent to assuming that all vertices from ′ S \ {v} are contained in Nk− (v) and that all vertices of S belong to N1− (v). This way we successively obtain: |V (G)| ≤ 1 + |N1− (v)| + |N2− (v)| + |N3− (v)| + . . . + |Nk− (v)| ≤ 1 + (d − 1) + (d(d − 1) + |S|)(1 + d + · · · + dk−2 ) = d + d2 + · · · + dk + (|S| − d)(1 + d + · · · + dk−2 ) = Md,k − 2 + (|S| − d)(1 + d + · · · + dk−2 ) + 1. But G is a digraph of order Md,k − 2; this implies that (|S| − d)(1 + d + · · · + dk−2 ) + 1 ≥ 0 dk−1 − 1 +1≥0 (|S| − d) d−1 |S| ≥ d − As 0 < d−1 dk−1 −1 d−1 dk−1 − 1 < 1, whenever k ≥ 3 and d ≥ 4, it follows that |S| ≥ d. Since 1 ≤ |S| ≤ d. This implies |S| = d. ✷ We conclude with a conjecture. Conjecture 1 All digraphs of defect 2 are diregular for diameter k ≥ 3 and maximum out-degree d ≥ 2. References 1. E.T. Baskoro, M. Miller, J. Plesnı́k, On the structure of digraphs with order close to the Moore bound, Graphs and Combinatorics, 14(2) (1998) 109–119. 2. W.G. Bridges, S. Toueg, On the impossibility of directed Moore graphs, J. Combin. Theory Series B, 29 (1980) 339–341. 3. Dafik, M. Miller and Slamin, Diregularity of digraphs of defect two of out-degree three and diameter k ≥ 3, preprint. 34 International Workshop on Combinatorial Algorithms 07 4. B.D. McKay, M. Miller, J. Širáň, A note on large graphs of diameter two and given maximum degree, J. Combin. Theory (B), 74 (1998) 110–118. 5. M. Miller, I. Fris, Maximum order digraphs for diameter 2 or degree 2, Pullman volume of Graphs and Matrices, Lecture Notes in Pure and Applied Mathematics, 139 (1992) 269–278. 6. M. Miller, J. Širáň, Digraphs of degree two which miss the Moore bound by two, Discrete Math., 226 (2001) 269–280. 7. M. Miller, J. Gimbert, J. Širáň, Slamin, Almost Moore digraphs are diregular, Discrete Math., 216 (2000) 265-270. 8. M. Miller, Slamin, On the monotonocity of minimum diameter with respect to order and maximum out-degree, Proceeding of COCOON 2000, Lecture Notes in Computer Science 1558 (D.-Z Du, P. Eades, V.Estivill-Castro, X.Lin (eds.)) (2000) 193-201. 9. M. Miller, I. Fris, Minimum diameter of diregular digraphs of degree 2, Computer Journal, 31 (1988) 71–75. 10. M. Miller, J. Širáň, Moore graphs and beyond: A survey of the degree/diameter problem, Electronic J. Combin., 11 (2004). 11. J. Plesnı́k, Š. Znám, Strongly geodetic directed graphs, Acta F. R. N. Univ. Comen. - Mathematica XXIX, (1974) 12. Slamin, M. Miller, Diregularity of digraphs close to Moore bound, Prosiding Konferensi Nasional X Matematika, ITB Bandung, MIHMI, 6, No.5 (2000) 185-192. International Workshop on Combinatorial Algorithms 07 35 Change Detection through Clustering and Spectral Analysis Diane Donovan1 , Birgit Loch2 , H.B. Thompson1 and Jayne Thompson3 1 2 Department of Mathematics, University of Queensland, St Lucia 4072, Australia Department of Mathematics and Computing, University of Southern Queensland Toowoomba 4350, Australia 3 School of Physics, Melbourne University, Parkville 3052, Australia Abstract. This paper defines a metric on a sequence of graphs which can be used to measure the distance between consecutive graphs. The method is based on vertex clustering through spectral analysis of the Laplace matrix. Keywords: Networks, Clustering, Spectral Graph Theory 1 Introduction Our aim is to investigate intragraph clustering in large enterprise networks and to define a metric, for quantifying network change, based on the evolution of vertex clusters. The use of clustering as a technique for change detection is a relatively recent idea and one which requires further study. In this paper, a metric will be defined on a sequence of graphs G1 , . . . , Gs . The basis for the metric will be a clustering procedure which uses edge weights to determine a partition C h of the vertex set of graph Gh . Then a distance measure d(C h , C h+1 ), 1 ≤ h ≤ s − 1, will be computed and used to quantify the distance d(Gh , Gh+1 ). The clustering procedure will be based on spectral analysis. This combination of spectral analysis, intragraph clusters and change detection is a new field of study and one which will be explored in the current paper. In spectral graph theory (see [11] Section 8.6, page 452, for a general discussion) the eigenvalues and eigenvectors of associated matrices are calculated and used to characterize a graph’s global structure. The literature includes some papers studying spectral theory and quantifying change in networks. For instance, for each graph Gh selected from a sequence of graphs, G1 , . . . , Gs , Bunke, Dickinson, Kraetzl and Wallis (see [1], page 72) determine the k largest positive eigenvalues λh1 , . . . , λhk and quantify a graph distance measure by setting h h+1 d(G , G Pk − λh+1 )2 j ´. Pk k h+1 2 h 2 ) j=1 (λj j=1 (λj ) , h j=1 (λj )= min ³P (1) 36 International Workshop on Combinatorial Algorithms 07 This measure is termed the spectral distance and it will be one of the techniques used to calibrate the results obtained in this paper (see Section 6). Using similar ideas, Robles-Kelly and Hancock [8] calculate the largest eigenvalue for the adjacency matrix of a graph. The associated eigenvector is then used to identify an ordered path traversing every vertex. This ordered path forms the basis for computing the distance between graphs within the sequence. Robles-Kelly and Hancock also go on to discuss clustering, however individual clusters are determined using the largest eigenvalue. In this paper, we take a different approach and adapt techniques which have been used extensively for image analysis and graphs embedded in Euclidean space, see [6]. Initially, we will focus on a rigorous theoretical discussion of the eigenvectors of the Laplace matrix and associated techniques for partitioning the vertex set. By carefully analysing the theory we are able to demonstrate that the second smallest eigenvalue provides a good measure for determining vertex cluster sets. As noted by Hagan and Kahng [4], the advantage of this technique is that the partitioning is based on global information extracted from the overall network. Once we have determined the vertex clustering for each graph in a sequence, the Rand Index (see Dickinson, Bunke, Dadej and Kraetzl [3] and also [1] page 118) is used to define a graph distance measure. Section 3 provides the necessary background on the Rand index. The underlying theory is reviewed in Section 4, providing a rigorous justification for the techniques under investigation. Section 5 discusses the necessary heuristics and provides some justification for the methods used. The theory is then tested in Section 6 and a comparison is made with distance measures studied by Bunke, Dickinson, Kraetzl and Wallis [1]. 2 Definitions A graph G = (V, E(V )) is a vertex set V and a collection E(V ) of 2-element subsets, chosen from V . If {u, v} ∈ E(V ), then {u, v} is called an edge and u and v the endpoints of the edge. In this paper all graphs will be simple, in that there are no repeated edges and all edges have two distinct endpoints. Each edge in the graph will be assigned a label or weight. Thus it will be assumed that there exists a function β, where β : E(V ) −→ ZZ+ ∪ {0}. When we wish to emphasize the fact that the edges of the graph are labeled we will use the notation G = (V, E(V ), β). It will be productive to define a number of matrices to summarize specific information about a graph. We begin by setting m = |V | and define an ordering on the vertex set; that is, the vertices are given an arbitrary order v1 , . . . , vm and this ordering is used to define a vector V = (v1 , . . . , vm )T . An adjacency matrix A = (aij ), for a graph G = (V, E(V ), β), is defined to be a |V | × |V | matrix with International Workshop on Combinatorial Algorithms 07 37 entries ½ β({vi , vj }), where {vi , vj } ∈ E(V ), 0, otherwise. P Given the adjacency matrix A we let ai = j aij be the sum of the ith row (or column since A is symmetric) and define the diagonal matrix to be D = (dij ), where dij = ai δij . Finally B = D−A is the disconnection or Laplace matrix. Note that since A is symmetric, B is self adjoint. Hence, by the Spectral Theorem, B has a full complement of orthogonal eigenvectors. aij = Example 1. These concepts are illustrated for the graph given in Figure 1. • vertex set is V = {A, B, C, D, E, F }; • vertex ordering is given by V = (A, B, C, D, E, F )T ; • edge set is E(V ) = {{A, D}, {B, D}, {B, C}, {C, E}, {D, E}, {D, F }}; • weights are β({A, D}) = 5, β({B, D}) = 1, β({B, C}) = 2, β({C, E}) = 5, β({D, E}) = 2, β({D, F }) = 3. • matrices are       500 000 000500 5 0 0 −5 0 0  0 3 0 0 0 0   0 0 2 1 0 0   0 3 −2 −1 0 0         0 0 7 0 0 0   0 2 0 0 5 0   0 −2 7 0 −5 0  − =  D−A=  0 0 0 11 0 0   5 1 0 0 2 3   −5 −1 0 11 −2 −3  = B.        0 0 0 0 7 0   0 0 5 2 0 0   0 0 −5 −2 7 0  000 003 000300 0 0 0 −3 0 3 A C 5 E 5 2 B 2 1 D 3 F Figure 1: A weighted graph G = (V, E(V ), β) 3 Distance Measure The focus of this paper will be the partitioning of the vertex set V of a graph G into cluster sets C1 , . . . , Ct , such that V = C1 ∪ . . . ∪ Ct and Ci ∩ Cj = ∅, for 1 ≤ i < j ≤ t. The collection C = {C1 , . . . , Ct } is termed a clustering of V . This partition defines an equivalence relation ρ(C) on the vertex set V ; that is, for any two vertices x, y ∈ V, (x, y) ∈ ρ(C) if and only if there exists Ci ∈ C such that x, y ∈ Ci . Given two graphs G1 and G2 , with associated clusterings C 1 and C 2 , we seek to quantify the distance d(G1 , G2 ) by specifying the distance between C 1 and C 2 . We say a pair of vertices x, y ∈ V are consistent if either (x, y) ∈ ρ(C 1 ) 38 International Workshop on Combinatorial Algorithms 07 and (x, y) ∈ ρ(C 2 ), or (x, y) ∈ / ρ(C 1 ) and (x, y) ∈ / ρ(C 2 ). Otherwise the vertices x and y are said to be inconsistent. That is, two vertices x and y are consistent if they belong to the same cluster set in C 1 and the same cluster set in C 2 , or they are in different cluster sets in both C 1 and C 2 . Then R+ = |{{x, y} | x, y are consistent vertices in V }| and R− = |{{x, y} | x, y are inconsistent vertices in V }|. Note that if |V | = m, then R+ + R− = m(m − 1)/2. Finally the Rand Index for a pair of clusterings C 1 and C 2 is defined to be R(C 1 , C 2 ) = 1 − R+ . R+ + R− We note that R(C 1 , C 2 ) ∈ [0, 1], with R(C 1 , C 2 ) = 0 if and only if |C 1 | = |C 2 | and all pairs of vertices are consistent, and R(C 1 , C 2 ) = 1 if all pairs of vertices are inconsistent. For a sequence of graphs the Rand Index will be used to measure the distance between consecutive graphs in the sequence. That is, for a given sequence of graphs G1 , . . . , Gs , we will determine a sequence of clusterings C 1 , . . . , C s , where C i is a clustering on the vertex set of graph Gi . Then d(Gi , Gi+1 ) = R(C i , C i+1 ). 4 The clustering procedure In this section we will follow the work of Hall [6], including ideas of Hagen and Kahng [4], and develop techniques needed to partition the vertex set of a general graph; see also [5]. Initially the partition will give two disjoint subsets, but repeated application will provide a hierarchical clustering tree with branches of degree 2, the root of the tree corresponding to V and the leaves corresponding to the cluster sets. Thus initially we seek to take a graph G = (V, E(V ), β) and partition the vertex set V into two subsets U and W such that P the sum of weights of edges connecting U and W , defined to be w(U, W ) = x∈U,y∈W β({x, y}), is minimized. More specifically, we seek to minimize the cut ratio r= w(U, W ) . |U |.|W | We begin with the work of Hall [6], where G = (V, E(V ), β) is a graph embedded in IR2 , and hence each vertex vi corresponds to a point (xi , yi ) ∈ IR2 . The vector X T = (x1 , . . . , xm ), where xi ≤ xj , defines an ordering on the vertex set V = (v1 , . . . , vm ). Hall seeks to reposition the vertices to minimise the sum of the edge weights times the squared distances between the corresponding xcoordinates of X T , minimizing m z= m 1 XX (xi − xj )2 aij . 2 i=1 j=1 International Workshop on Combinatorial Algorithms 07 39 Hall’s method clusters together those vertices which are “strongly” connected. We note that since A is symmetric m m m m 1 XX (xi − xj )2 aij z= 2 i=1 j=1 1 XX 2 (x − 2xi xj + x2j )aij 2 i=1 j=1 i   m m X m m X X 1 X 2 x ai − 2 xi xj aij + x2j aj  = 2 i=1 i i=1 j=1 j=1 = = m X x2i ai − i=1 m X m X xi xj aij i=1 j=1 = X T DX − X T AX = X T BX. Hence to minimize the weighted sum of the squares of the distance between the x-coordinates of the vertices we seek to minimize the expression X T BX, where B is symmetric, positive semidefinite. Hagen and Kahng [4] extend these ideas to study general graphs, not just those which are embedded in the plane. So let G = (V, E(V ), β) be any graph and {U, W } be any partition of the vertex set V = {v1 , . . . , vm } into two subsets; that is, V = U ∪W and U ∩W = ∅. Set p = |U |/m and q = |W |/m and let X T = (x1 , x2 , . . . , xm ) be a vector of length m with coordinates defined by ½ q, if vi ∈ U, and xi = (2) −p, if vi ∈ W. Note that p+q = |U |/m+|W |/m = (|U |+|W |)/m = 1, X T 1 = qpm+(−p)qm = 0, and    0, if vi , vj ∈ U  1, if vi ∈ U, vj ∈ W |xi − xj | = 1, if vi ∈ W, vj ∈ U    0, if vi , vj ∈ W. Also note w(U, W ) = 1 2 X (xi − xj )2 aij . {vi ,vj }∈E(V ) Moreover |X|2 = m X x2i = (q)2 pm + (−p)2 qm = pqm(q + p) i=1 = |U ||W | pqm2 = . m m 40 International Workshop on Combinatorial Algorithms 07 So returning to Hall’s analysis, where w(U, W ) = r= |U ||W | P 1 2 P {vi ,vj }∈E(V ) (xi {vi ,vj }∈E(V ) (xi − 2m|X|2 xj )2 aij = − xj )2 aij = X T BX, X T BX . m|X|2 (3) Hence to minimize r we need to minimize X T BX/|X|2 , where X satisfies the conditions given in (2) and X T 1 = 0. We obtain an approximation to this by using Lagrange multipliers to minimize X T BX amongst X P with X T X = 1 and X T 1 = 0. Thus given that B is T symmetric and X BX = ij xi bij xj , we have X ¢ X ∂ ¡ T X BX = Bsj xj + xi Bis , and ∂xs j i ∇X T BX = BX + B T X = 2BX. Consequently, for the Lagrangian L = X T BX − λ(X T X − 1) − µX T 1, ∇L = 2BX − 2λX − µ1. After noting that B1 = 0 and so 1 is an eigenvector with eigenvalue 0, the optimal solution occurs when 0 = (B − λI)X and non-trivial solutions for X can be obtained by calculating the eigenvalues λi of B and associated eigenvectors. Premultiplying by X T gives 0 = X T BX − λX T X and since it is assumed that X T X = 1, λ = X T BX = z. Thus (see the Courant-Fischer Minimax principle, [2], page 106) the second eigenvalue λ= X T BX X⊥1,X6=0 |X|2 min and so Equation (3) implies r= λ w(U, W ) ≥ . |U ||W | m This suggests r can be minimized by using the Fiedler eigenvector corresponding to the second smallest eigenvalue λ. Further, this eigenvalue can be used to determine the coordinates of the P “position” vector X which satisfies the conditions given in (2). That is, z = 21 {vi ,vj }∈E(V ) (xi −xj )2 aij = X T BX is minimized by the Fiedler eigenvector, with norm 1, and so intuitively the best approximation to this minimum will be obtained by assigning values to X which reflect the differences in the coordinates of the eigenvector and satisfy the conditions given in (2). More precisely, if the difference for two coordinates of the eigenvector is small then the difference between the corresponding coordinates given in (2) should be small. Conversely, if the difference for two coordinates of the eigenvector is large then the difference between the corresponding coordinates given in (2) should be large. International Workshop on Combinatorial Algorithms 07 41 So to determine a partition of the vertices of V into two subsets, the components of the Fiedler vector are ordered in ascending numerical value giving F = (f1 , . . . , fm ). The same ordering is applied to the vector V to obtain a new ordering of the vertex set, or equivalently a new vector V + . This vector can now be split into subvectors reflecting the partitioning of V into two subsets U and W such that the sum of the weights of the edges joining vertices from different subsets is minimized. The value |U | is termed the splitting index. A heuristic for determining the splitting index is discussed in the next section. Once U and W are determined using the splitting index, the process is repeated for the induced graphs G(U ) = (U, E(U ), β) and G(W ) = (W, E(W ), β). 5 Splitting Index Hagen and Kahng [4] propose four heuristics for determining the splitting index: + (i) partition V + = (v1+ , . . . , vm ) based on the sign of the corresponding component in the Fiedler vector; that is, U = {vi+ | fi ∈ F, fi < 0} and W = {vi+ | fi ∈ F, fi ≥ 0}; (ii) partition V + around the median value of F; + (iii) partition V + = (v1+ , . . . , vm ) by determining the maximum difference between consecutive components of F; that is, ai = |fi − fi+1 |, 1 ≤ 1 ≤ m − 1, + + U = {v1+ , . . . , vt+ | at = max{ai }} and W = {vt+1 , . . . , vm }; + (iv) partition V to obtain the least cut ratio; that is, for 1 ≤ i ≤ m−1, calculate + + ri = w(Ui , Wi )/|Ui ||Wi |, where Ui = {v1+ , . . . , vi+ } and Wi = {vi+1 , . . . , vm }, + + + + then set U = {v1 , . . . , vt | rt = min{ri }} and W = {vt+1 , . . . , vm }. It is suggested in [4] that method (iv) be used. However, our research indicates that the above methods are susceptible to singularities and thus we will follow the work of Hopcroft, Khan, Kulis and Selman, [7], by investigating instabilities in the data and thus determining the most appropriate heuristic. Full details of the data set are given in Section 6, however for a random selection of graphs the following observations were made. The graphs contained a large number of pendant vertices (vertices incident with at most one edge). For these graphs heuristics (i), (iii) and (iv) were equivalent. Thus heuristics (ii) and (iv) were tested on a number of randomly selected graphs, then the pendant vertices were removed. When heuristic (iv) was applied the pendant vertices dominated the majority of cluster sets and their removal reduced the clustering to a small number of cluster sets containing a large number of vertices. When heuristic (ii) was applied and the pendant vertices were removed the number of cluster sets did not change. At times some of the pendant vertices were collected together into the same cluster but at times they were fairly evenly distributed across the cluster sets. Hence given the data set, it was decided that heuristic (ii) should be applied to determine the splitting index. 42 6 International Workshop on Combinatorial Algorithms 07 Experimentation In this section we apply the above procedure (using heuristic (ii)) to data obtained from an enterprise communication network4 . This data set was obtained by placing probes on physical links and recording the volume of information in daily communications. Three types of statistics were gathered: sender and receiver identifications and a count of the TCP/IP traffic. The sender and receiver identifications were clustered into 328 business domains or vertices in the resultant network. Data was collected over a period of 102 days and this information gives rise to a series of graphs G1 , . . . , G102 . In [1], Bunke, Dickinson, Kraetzl and Wallis conducted a number of tests on the distance between consecutive graphs in the above sequence. We have selected two techniques proposed in [1], the Spectral Distance (see (1), Section 1) and the Edit Distance, and compared these techniques with the procedure presented in this paper. The results of these tests are given below, but first we provide a brief discussion of edit distance. Let Gh = (V, E h (V ), β h ) and Gh+1 = (V, E h+1 (V ), β h+1 ), where |V | = m and let B h = [bhij ] and B h+1 = [bh+1 ij ] represent the corresponding m × m disconnection matrices. Then the edit distance is given by d(Gh , Gh+1 ) = X 1≤i<j≤m |bhij − bh+1 ij | + X 6=0 =0,bh+1 bh ii ii 1+ X 1. =0 6=0,bh+1 bh ii ii That is, the sum of the differences in the weights of edges in Gh and Gh+1 plus the number of vertices of non-zero degree which only occur in one of Gh or Gh+1 . This is a robust measure as it accurately records the overall change in communication traffic across the network. The values of d(Gh , Gh+1 ), using each of the three techniques (spectral distance, edit distance and clustering techniques), has been calculated for 1 ≤ h ≤ 101 and the results are displayed in the following graphs. In the first of these graphs, we note that both the edit distance and the clustering procedure detect major changes in the network between days 20 and 25, 60 and 65, and 85 and 90. The spectral distance detects the first of these changes, but not the other two major perturbations. A more interesting aspect highlighted in the first graph is the differences between days 20 and 90. Other than the major changes mentioned above, the edit distance shows relatively little change from one day to the next. However the clustering technique tends to imply that during the same period there are changes in intergroup communications, indicating that the dynamics of the network may be changing through this period. These results suggest that the clustering technique presented here may be useful in detecting major changes in the overall communication traffic, but may also be useful in detecting changes in group dynamics. 4 We wish to acknowledge the generous support of Miro Kraetzl, who among other things has given us access to the data set. International Workshop on Combinatorial Algorithms 07 Comparison of Edit Distance with normalized Rand Index 43 Comparison of Rand Index and Spectral Distance 900 0.35 Rand index 9 cluster sets 800 Rand Index 9 clusters Spectral Distance 0.3 Edit Distance 700 Distance d(G,G’) Distance d(G,G’) 0.25 600 500 400 0.2 0.15 300 0.1 200 0.05 100 0 0 20 40 60 Days 80 100 120 0 0 20 40 60 80 100 Days Figure 2: Time Series: Network Evolution 7 Conclusions and further research The above results show that the Fiedler vector can be used to determine an intragraph clustering. Further, it may tentatively be suggested that this clustering technique may provide a method comparable to the edit distance for detecting change in networks. For the data tested here, there are some instances where the clustering technique detects changes which were not so obvious in the edit distance. These results are positive and indicate that further investigation of the technique is warranted, and may lead to more refined methods providing a robust tool for the detection of change within large enterprise networks. Possible areas for further investigation are an extended analysis of the heuristics used to determine the splitting index; an investigation of the use of the M-cut ratio as proposed by Shi and Malik [9]; an investigation of related algorithms which overcome the problem of a hierarchical clustering process, for instance some of the techniques proposed by Tolliver and Miller [10]. References 1. Horst Bunke, Peter J Dickinson, Miro Kraetzl and Walter D Wallis: A GraphTheoretic Approach to Enterprise Network Dynamics, Birkhäuser. Boston (2007) 2. John W Dettman: Mathematical methods in Physics and Engineering. 2nd edition McGraw-Hill New York (1969) 3. Peter J Dickinson, Horst Bunke, Arek Dadej and Miro Kraetzl: A novel graph distance measure and its application to monitoring change in computer networks. In the Proceedings of the 7th World Multiconference on Systemics, Cybernetics and Informatics. volume 3 Orlando, FL (2003) 33–338 4. Lars Hagen and Andrew B Kahng: New spectral methods for ratio cut partitioning and clustering. IEEE Transactions Computer-Aided Design Santa Clara CA (1992) 422–427 5. Lars W Hagen and Andrew B Kahng: Combining problem reduction and adaptive multistart: A new technique for superior iterative partitioning. IEEE Transactions on Computer-aided Design of Intergrated Circuits and Systems 16 No. 7 (1997) 709–717 44 International Workshop on Combinatorial Algorithms 07 6. Kenneth M Hall: An r-dimensional quadratic placement algorithm. Management Science 17 No. 3 (1970) 219–229 7. John Hopcroft, Omar Khan, Brian Kulis and Bart Selman: Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences of the United States of America (2004) 1–5 doi:10.1073/pnas.0307750100 8. Antonio Robles-Kelly and Edwin R Hancock: Graph Edit Distance from Spectral Seriation. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 No. 3 (2005) 365–378 9. Jianbo Shi and Jitendra Malik: Normalized Cut and Image Segementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 No. 8 (2000) 888–905 10. David A Tolliver and Gary L Miller: Graph partitioning by spectral rounding: application in image segmentation and clustering. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) (2006) 1–8. 11. Douglas B West: Introduction to Graph Theory. Prentice-Hall Inc. Upper Saddle River (1996). International Workshop on Combinatorial Algorithms 07 45 TCAM representations of intervals of integers encoded by binary trees⋆ Wojciech Fraczak1,3 , Wojciech Rytter2,4 , and Mohammadreza Yazdani3 1 4 Dépt d’informatique, Université du Québec en Outaouais, Gatineau PQ, Canada 2 Inst. of Informatics, Warsaw University, Warsaw, Poland 3 Systems and Computer Engineering, Carleton University, Ottawa ON, Canada Department of Mathematics and Informatics, Copernicus University, Torun, Poland Abstract. We consider the problem of minimal representations of intervals of positive integers by Ternary Content Addressable Memory (TCAM). The integers are encoded (represented) as binary strings of the same length n and a TCAM is a string-oriented representation in terms of unions of simple sets, called rules. Each rule is a concatenation (of length n) of singleton sets (i.e., single digits 0 and 1) or the set {0, 1} denoted by ∗. Two important encodings are lexicographic encoding (standard binary representation of integers) and binary reflected Gray encoding. We consider a family of encoding schemes, called dense-tree encodings, which includes both of them: each integer i ∈ {0, 1, . . . , 2n −1} is represented as the label of the path from the root to the i-th leaf of a perfect binary tree T of height n, whose edges are labeled by zeros and ones. A set X ⊆ {0, 1, . . . , 2n − 1} corresponds to the set T [X] ⊆ {0, 1}n of binary strings representing integers in X as branches of the tree T . We provide exact bounds (with respect to n) on the minimal sizes of TCAMs representing sets of strings T [X], for an interval X. Three important cases are analyzed: T can correspond to lexicographic, Gray or general dense-tree encoding. Some other issues related to the minimal sizes and number of essential rules of TCAMs are also investigated. 1 Introduction The Ternary Content Addressable Memory (TCAM), [8, 7], is a type of associative memory with a highly parallel architecture which is used for performing very fast (constant time) table look up operations. The problem of interval representation by TCAMs appears in network processing engines where the header fields of each IP packet (e.g., source address, destination address, port number, etc.) should be matched under strict time constraints against the entries of an Access Control List (ACL) [2, 6, 11]. In these engines, an ACL is often represented by a TCAM. Each entry of the ACL defines either a single value or an interval of ⋆ The research of the first, second, and the third author were supported by grants of NSERC, Polish Ministery of Science and Higher Education N 206 004 32/0806, and Ontario Graduate Scholarship, respectively. 46 International Workshop on Combinatorial Algorithms 07 values for the fields of the packet header. If an ACL entry defines only single values for all header fields, then it can be directly and very efficiently represented using a single TCAM rule. However, if the ACL entry defines some non-trivial intervals for some header fields of the packet, then it may need more than one TCAM rule [9, 10]. A TCAM can be defined as a two dimensional array of cells, where each cell carries one of the three values 0, 1, or ∗. Each row of this array is called a TCAM rule. An example of a TCAM is shown in Figure 1. 1 2 3 4 5 0 ∗ ∗ ∗ 1 1 0 ∗ ∗ ∗ ∗ 1 ∗ 0 ∗ ∗ ∗ 0 1 ∗ ∗ ∗ 1 ∗ 0 Fig. 1. A TCAM T of width 5 with 5 rules. L(T ) = {0, 1}5 \ {00000, 11111}. A rule of a TCAM of width n is a sequence r = e1 e2 . . . en , where ei ∈ {0, 1, ∗} for i ∈ {1, . . . , n}. It defines the following non-empty language L(r): def L(r) = L(e1 )L(e2 ) . . . L(en ), def def def where L(0) = {0}, L(1) = {1}, and L(∗) = {0, 1}. For example, L(0*1) = {0} · {0, 1} · {1} = {001, 011}. We say that r covers a set of strings S if ∀w ∈ S, w ∈ L(r). A TCAM T of width d with k rules defines the following partial mapping T : {0, 1}d → {1, . . . , k}. T (w) = i if and only if w is a word of the language defined by the rule number i, and w is not a word of the language defined by any rule with a number smaller than i. The language L(T ) of a TCAM T is the union of the languages defined by its rules, i.e., the domain of the partial mapping T . The number of rules in a TCAM is called the size of the TCAM. def For a ∈ {0, 1}, by a we will denote the complement of a, i.e., 0 = 1 and def 1 = 0. The problem of interval representation in TCAM is sometimes referred to as the problem of “range representation” in TCAM, [10, 9, 2]. In this paper we present a systematic approach for tackling this problem. For a family of encoding schemes, which includes the lexicographic encoding and the binary reflected Gray encoding, we provide exact bounds on the minimal sizes of TCAMs for intervals representation. We also provide bounds on the number of essential TCAM rules (prime implicants) for intervals in the lexicographic encoding and the binary reflected Gray encoding. International Workshop on Combinatorial Algorithms 07 2 47 Dense-tree encodings of integers and interval sets We define a full tree of height n as a perfect binary tree of height n such that each pair of sibling edges are labeled 0 and 1. The assignment of labels to the edges (two alternatives per each internal node) can be chosen arbitrarily. Let Tn be a full tree of height n. The label w ∈ {0, 1}n of the path from the root to i-th leaf defines the n-bit encoding of number i, with 0 corresponding to the furthest left leaf of the tree. In that way, Tn defines a bijection Tn : {0, 1}n ֒→ {0, 1, . . . , 2n −1}, called an n-bit dense-tree encoding. The lexicographic encoding (i.e., standard unsigned binary encoding) and the binary reflected Gray encoding [3, 5] are two important examples of dense-tree encodings. They are presented in Figure 2 in forms of full trees for 4-bit encodings. 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 0 10 1 0 1 0 1 0 0 1 2 34 5 6 7 8 9 10 1112 1314 15 (Lex) 10 1 0 1 0 0 1 1 0 1 0 1 1 1 2 1 1 1 0 0 0 1 1 0 00 1 1 0 0 1 1 34 5 6 7 8 9 10 1112 13 14 15 00 1 1 0 (Gray) Fig. 2. Two sample 4-bit dense-tree encodings: Lexicographic encoding (Lex) and reflected Gray encoding (Gray). In the context of a dense-tree encoding Tn , a set X ⊆ {0, 1}n defines both the set Tn (X) of integers and a subset of leaves of Tn . X can be represented by a skeleton tree (see Figure 3). The skeleton tree of X is obtained from Tn by: (1) removing all edges which are not leading to the leaves of X; (2) secondly, turning into leaves all full subtrees. We say that a skeleton tree S is a chain, if every vertex of S has at most one nonleaf child. A double-chain is a skeleton tree with at most one vertex v having two non-leaf children and such that all ancestors of v have only one child. Examples of a chain and a double-chain are illustrated in Figure 4. For two integers x, y, we denote by [x, y] the set {x, x + 1, . . . , y} and call it an interval. A set of binary strings X of length n is an interval-set of a dense-tree encoding Tn if Tn (X) is an interval. Lemma 1. Let X ⊆ {0, 1}n . The following statements are equivalent: 1. There exists a dense-tree encoding Tn for which Tn (X) = [0, |X| − 1]; 2. There exists a dense-tree encoding Tn for which Tn (X) = [2n − |X|, 2n − 1]; 3. For any dense-tree encoding the skeleton tree of X is a chain. 48 International Workshop on Combinatorial Algorithms 07 skeleton version of T interval tree T Fig. 3. An interval tree T and its skeleton version. double chain chain Fig. 4. A chain and a double-chain. Lemma 2. Let X ⊆ {0, 1}n . The following statements are equivalent: 1. There exists a dense-tree encoding Tn for which Tn (X) is an interval; 2. For any dense-tree encoding the skeleton tree of X is a double-chain. Corollary 1. There is a linear time algorithm to test whether there exists a dense-tree encoding Tn for which Tn (X) is an interval. In contrast the problem of determining whether a given TCAM defines an interval is intractable, as stated by the following lemma. Lemma 3. The problem of testing for a given TCAM T if L(T ) is an interval set is co-NP complete. Proof. Testing if a TCAM of width n corresponds to a full tree is co-NP complete, since determining whether a given Boolean formula in a disjunctive normal form is a tautology is known to be co-NP complete. For a given TCAM T , we construct the TCAM T ′ = {10} · T ∪ {*1n+1 }. TCAM T ′ defines an interval set if and only if T corresponds to a full tree. ⊓ ⊔ International Workshop on Combinatorial Algorithms 07 3 49 Prefix rules and essential rules of TCAMs Let E : {0, 1}n ֒→ {0, 1, . . . , 2n − 1} be an encoding of integer values by n-bit strings. Any subset S ⊆ {0, 1, . . . , 2n − 1} can be represented by a TCAM T of width n, i.e., T represents S iff E −1 (S) = L(T ). The problem of finding a minimal size TCAM T (i.e., a TCAM with the minimal number of rules) for a given set S is known to be NP-hard (as it corresponds to the problem of finding a minimal disjunctive normal form for a Boolean expression). However, in this paper we are interested only in subsets which are intervals. A TCAM rule is called a prefix rule if all non-star symbols occur as a prefix of it, e.g., 0101**** is a prefix rule. Theorem 1. Let X ⊆ {0, 1}n be an interval-set of a dense-tree encoding Tn . The minimum number of prefix rules covering X equals the number of leaves in the skeleton tree of X in Tn . Theorem 2. Let the skeleton tree of X ⊆ {0, 1}n in a dense-tree encoding Tn be a chain with k leaves. The minimal size of a TCAM for X is k. Proof. We first show that k rules are sufficient for representing X. Let k denote the number of leaves. Obviously k prefix rules are enough, each rule corresponding to a leaf. Now we show that we need at least k rules (prefix or non-prefix). Assume, by contradiction, we have k ′ rules e1 , e2 , . . . , ek′ , with k ′ < k. Let d = d[1]d[2] . . . d[n] be the label of the path from the root to the sibling of the bottom leaf of the chain, which means that d 6∈ X. Each rule ei has to have at least one position pi such that ei [pi ] is d[pi ]; we choose always the first such position. Since k ′ < k, there is a position r which corresponds to an incoming edge of a leaf of the chain and which was not selected by any of positions pi . We change the symbol on this position of d to d[r]. Then the resulting string belongs to the interval but it is not covered by any of the rules. ⊓ ⊔ Corollary 2. In an n-bit dense-tree encoding, representing each of the intervals [1, 2n − 1] and [0, 2n − 2] needs exactly n TCAM rules. Consider a set X ⊆ {0, 1}n . A TCAM rule r is called an X-limited rule if it does not cover any string out of X, i.e., L(r) ⊆ X. An X-limited rule r is said to be X-essential if there is no other X-limited TCAM rule that covers r. In the context of two-level logic generation, an X-essential TCAM rule is called a “prime implicant” of X (see [1]). Prime implicants play an important role in the process of two-level logic generation. Any coverage of X by a TCAM of size k can be turned into a coverage by k X-essential TCAM rules. Therefore, one can consider only X-essential TCAM rules in the process of finding a minimal coverage of X. In the general case of two-level logic minimization, the number of prime implicants for a given set may be exponential with respect to the number n of 50 International Workshop on Combinatorial Algorithms 07 input bits [1]. In the following, we prove that in the case of the n-bit lexicographic encoding and n-bit reflected Gray encoding, an interval-set X ⊆ {0, 1}n admits no more than n(n − 1) + 1 and 2n X-essential TCAM rules, respectively. Lemma 4. Let X ⊆ {0, 1}n such that the skeleton tree of X in a dense-tree encoding Tn is a chain with k ≤ n leaves. There are exactly k different Xessential TCAM rules. Proof. It can be verified that every X-essential rule is of form s1 s2 . . . si . . . sn , where i is the level of a leaf in the chain and, for 1 ≤ j ≤ n,   d[i] if i = j sj = d[j] if j < i and there is no leaf at level j  * otherwise where d = d[1]d[2] . . . d[n] is the path from the root to the sibling of the bottom leaf of the chain. ⊓ ⊔ Theorem 3. Let X ⊆ {0, 1}n be an interval-set in the n-bit lexicographic encoding Lexn . There is at most n(n − 1) + 1 different X-essential TCAM rules. Proof. Let Lexn (X) = [x, y]. The proof is by induction on n. Suppose that Lexn (0u) = x and Lexn (1v) = y for some u, v ∈⊆ {0, 1}n−1 . The easy case when x and y encodings are starting by the same digit is skipped. The set P ([0u, 1v]) of all X-essential TCAM rules can be split into three disjoint sets P0 , P1 , and P∗ , where Pa , a ∈ {0, 1, ∗}, denotes the set of all Xessential TCAM rules that start with a. By Lemma 4, we have: |P0 | ≤ n − 1 and |P1 | ≤ n − 1. Also |P∗ | = |P ([u, v])|, where the interval [u, v] is encoded by n − 1 bits. Thus, p(n) ≤ 2(n − 1) + p(n − 1), with p(1) = 1, where p(i) denotes the maximum number of different X-essential TCAM rules for any interval I with Lexi (X) = I. ⊓ ⊔ Theorem 4. Let X ⊆ {0, 1}n be an interval-set in the reflected Gray encoding Grayn . There is no more than 2n different X-essential TCAM rules. Proof. Let Grayn (X) = [x, y] be such that encodings of x and y differ in the first digit. We assume that the dense-tree encoding is “reflective”, i.e., the labels of the right-hand side and the left-hand side subtrees rooted at every node are the mirror copy of each other. Suppose that x < y, x is encoded as av, y as au, a ∈ {0, 1}, and u, v ∈ {0, 1}n−1 . Then by reflective property, au encodes 2n − 1 − y. In case when x ≤ 2n − 1 − y, there is no X-essential TCAM rule which starts by a, and symmetrically, if x ≥ 2n − y then there is no X-essential TCAM rule which starts by a. Without loss of generality we may assume that x ≤ 2n − 1 − y. Therefore, X-essential TCAM rules can be split into two sets Pa (i.e., the set of rules starting by a), and P∗ (i.e., those rules that start by ∗). Every rule from Pa with initial a removed must be an essential TCAM rule for interval [x, 2n−1 − 1] on n − 1 bits, and every rule from P∗ with initial ∗ removed must be an essential International Workshop on Combinatorial Algorithms 07 51 TCAM rule for interval [2n − 1 − y, 2n−1 − 1] on n − 1 bits. Since the skeleton trees of both these intervals are chains (or empty), by Lemma 4 each of them has at most n − 1 (or none if empty) different essential TCAM rules. ⊓ ⊔ 4 Bounds on the size of a TCAM representing an interval The minimum number of prefix rules required for representing an interval in TCAM is equal to the number of leaves in its corresponding skeleton tree. For some intervals, this number can be significantly reduced by using non-prefix TCAM rules. Theorem 5. There is a dense-tree encoding Tn and an interval I such that the minimum number of prefix rules representing I is 2n − 2 and the minimum TCAM size is only n. Proof. Consider the interval I = [1, 2n − 2] in the lexicographic encoding on n bits. The skeleton tree for X ⊂ {0, 1}n such that Lexn (X) = I has 2n − 2 leaves and thus by Theorem 1 it cannot be covered by less than 2n − 2 prefix rules. However X can be covered by the following n rules: X = {rotk (01 ∗ . . . ∗) | 0 ≥ def k < n}, where rot denotes the left-rotation of a word, i.e., rot(aw) = wa, for a ∈ {0, 1, ∗} and w ∈ {0, 1, ∗}n−1 . (An example of X for n = 5 is presented in Figure 1.) ⊓ ⊔ 0 (A) 1 1 0 T2 0 0 0 T1 0 (B) 0 0 1 1 1 0 0 0 0 T1 T2 T3 0 0 Fig. 5. The lexicographic tree in (A) needs at least 2n−4 rules for lexicographic coding, the (non-lexicographic) tree in (B) needs at least 2n − 3 TCAM rules. Theorem 6. (a) Let Tn be a lexicographic or reflected Gray encoding of length n. There is an interval I = [x, y] which cannot be represented with less than max(n, 2n − 4) TCAM rules. (b) For each n ≥ 1 there exists a dense-tree encoding Tn and an interval I = [x, y] which needs max(n, 2n − 3) TCAM rules. Proof. Point (a). For 1 ≤ n ≤ 3, we have max(n, 2n − 4) = n and the theorem’s claim was proved in Corollary 2. For n ≥ 4, we provide a proof for 52 International Workshop on Combinatorial Algorithms 07 the lexicographic encoding; the proof for the reflected Gray encoding is similar to this proof. Consider the interval I = [rl , rh ] = [Lexn (sl ), Lexn (sh )] = [Lexn (0100 . . . 001), Lexn (1011 . . . 110)] whose corresponding interval-set has a skeleton tree as illustrated in Fig. 5(A) for the lexicographic encoding. sl starts with 01, followed by n − 3 zeros and ends by 1. sh starts with 10, followed by n − 3 ones and ends by 0. We divide this interval to two sub-intervals I1 = [rl , 2n−1 − 1] = [Lexn (0100 . . . 001), Lexn (0111 . . . 11)] and I2 = [2n−1 , rh ] = [Lexn (100 . . . 00), Lexn (1011 . . . 110)]. The encodings of all values in the interval I1 start with 01 and thus they are within the first half of the domain [0, 2n − 1]. This means that all the leaves corresponding to the values in I1 fall on the left side of the vertical line that passes through the root of the tree. The encodings for the values of interval I2 , however, start with 10 and so their corresponding leaves fall on the right side of the vertical line. This means that any TCAM rule that represents a value from I1 cannot represent any value from I2 and vice verse. As such, the minimum number of TCAM rules required for representing I is equal to the addition of the minimum number of rules that are required to represent each of the intervals I1 and I2 , separately. Since the encodings of all the values from I1 start with 10, we can represent this interval by prepending 10 to the n − 2 rules that are required to represent the interval [1, 2n−2 − 1] = [Lexn (00 . . . 001), Lexn (11 . . . 11)] over the domain [0, 2n−2 − 1]. Hence, using Corollary 2, we need at least n − 2 TCAM rules of width n − 2 to represent this interval. Similarly, it can be seen that representing I2 needs at least n − 2 TCAM rules. This means that the interval I cannot be represented by less than 2n − 4 TCAM rules. Point (b). An example of such a dense-tree encoding Tn and the interval [1, 2n−1 + 2n−2 − 2] is shown in Figure 5(B). ⊓ ⊔ Theorem 7. Let Tn be a dense-tree encoding of length n. Every interval I = [x, y] can be represented by max(n, 2n − 3) TCAM rules. Proof. If 1 ≤ n ≤ 3, we have max(n, 2n − 3) = n and it can be checked manually that every interval can be represented by n or less TCAM rules. For n > 3, if the skeleton tree of I has 2n − 3 or less leaves, then by Theorem 1 I can be represented by 2n − 3 or less prefix TCAM rules. The skeleton tree has maximal number of 2n − 2 leaves iff it has a shape similar to tree T of Figure 6. We decompose this tree into three subtrees T1 , T2 , and T3 . The subtrees T1 and T2 are chains with n − 3 leaves each, and thus altogether they need 2n − 6 TCAM rules. It is enough to show that T3 needs only 3 TCAM rules. Let abc (ab′ c′ ) be the labels of the left (resp., right) branch of T3 , as shown in Figure 6. We consider all possible cases as follows. If b′ c′ = bc, then rules ∗bc, a∗c, and ab∗ represent T3 ; If b′ c′ = bc, then rules ∗∗c, abc, and abc represent T3 ; If ⊓ ⊔ b′ c′ = bc, then rules ∗bc, abc, and abc represent T3 . Theorem 8. Let Tn ∈ {Lexn , Grayn } be the lexicographic or the reflected Gray encoding of length n. Every interval I = [x, y] can be represented by max(n, 2n − 4) TCAM rules. International Workshop on Combinatorial Algorithms 07 v 53 v v a tree T tree T1 v a b b′ c c′ tree T3 tree T2 Fig. 6. The case when skeleton tree has maximal number of prefix rules 2n − 2. T is decomposed into T1 , T2 , and T3 . For T1 and T2 we use 2n − 6 prefix rules. For T3 three general rules are enough (instead of 4 prefix rules). Proof. We only show the proof for the lexicographic encoding; the proof for the Gray encoding is similar. For n ≤ 4, it can be manually verified that every interval I can be represented by n or less TCAM rules. For n ≥ 5, if the interval tree has 2n − 2 leaves, then by proof of Theorem 5 it can be represented by n TCAM rules. If the skeleton tree has 2n − 3 leaves, then it has one of the formats shown in Figure 7, where Cl and Cr are chains. In all these three cases it can be proved that the leaf labeled by letter B can be represented by a combination of the TCAM rules that represent the other labeled leaves and thus (by Theorem 1) the whole interval can be represented by 2n − 4 TCAM rules. For instance, in the skeleton tree of Figure 7-(a), changing the second bit and the first bit of the prefix rules that represent the leaves A and D, respectively, to ∗ results in two rules that represent A′ and D′ , either. As such, representing leaf B does not 0 0 0 1 1 0 C 1 D B 0 A′ 0 1 0 1 1 1 0 0 1 D′ A B 0 D E Cr Cl 1 1 0 Cl (b) 0 1 1 0 D 1 C′ 0 1 A′ E Cl B 0 (c) 1 0 1 A Cr 1 1 0 1 0 A′ E ′ (a) 0 1 C 1 0 1 C 0 E′ 1 A Cr Fig. 7. Three possible forms of a skeleton tree with 2n − 3 leaves, for n ≥ 5. need a separate TCAM rule. ⊓ ⊔ 54 5 International Workshop on Combinatorial Algorithms 07 Conclusion and future work In this paper we studied the problem of interval representation by TCAM. This problem appears in the Internet routers that classify and filter the packets by implementing ACL policy rules. The available methods of interval representation by TCAM use the lexicographic encoding to encode the integers of a given interval. We broadened this choice and explored the problem assuming that the underlying encoding scheme belongs to the family of dense-tree encoding schemes which includes the lexicographic encoding and the binary reflected Gray encoding. We introduced the notion of a skeleton tree of a set X ⊆ {0, 1}n and used it to devise a linear time algorithm which determines whether X is an interval in some dense tree encoding scheme. Skeleton trees appear to be useful also in estimating the sizes of TCAMs. We also showed that the number of prime implicants (called here the essential rules) of a given interval set X ⊆ {0, 1}n is at most n(n − 1) + 1 and 2n for the lexicographic encoding and the reflective Gray encoding, respectively. Finally, we proved that for the class of dense-tree encoding schemes, every interval over the domain [0, 2n − 1] can be represented by 2n − 3 TCAM rules and in addition, there are intervals over the domain [0, 2n − 1] which need at least max(n, 2n − 4) rules for their TCAM representation. Suppose that we use a fixed n−bit encoding scheme for representing the integer values of a domain D = [0, 2n − 1]. In the continuation of our work, we would devise an algorithm for finding a TCAM representation of a given interval over D with minimum number of TCAM rules. This problem is a special case of finding a TCAM representation of an arbitrary subset of the integers of domain D with minimal number of TCAM rules. The latter problem is an NP-hard problem as it can be polynomially reduced to the NP-complete problems of the Minimal Disjunctive Normal Form of a Boolean formula [4] or the Two-Level Logic Minimization [1]. References 1. Robert K. Brayton, Gary D. Hachtel, Curtis T. McMullen, and Alberto L. Sangiovanni-Vincentelli. Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers, 1984. 2. G. Davis, C. Jeffries, and J. van Lunteren. Method and system for performing range rule testing in a ternary content addressable memory, April 2005. US Patent 6,886,073. 3. E. N. Gilbert. Gray codes and paths on the n-cube. Bell Systems Technical Journal, 37:815–826, 1958. 4. J. F. Gimpel. A method of producing a boolean function having an arbitrarily prescribed prime implicant table. IEEE Trans. Computers, 14:485–488, 1965. 5. F. Gray. Pulse code communications, March 1953. US Patent 2,632,058. 6. Hae-Jin Jeong, Il-Seop Song, Taeck-Geun Kwon, and Yoo-Kyoung Lee. A multidimension rule update in a tcam-based high-performance network security system. In AINA ’06: Proceedings of the 20th International Conference on Advanced International Workshop on Combinatorial Algorithms 07 7. 8. 9. 10. 11. 55 Information Networking and Applications - Volume 2 (AINA’06), pages 62–66, Washington, DC, USA, 2006. IEEE Computer Society. R. Kempke and A. McAuley. Ternary cam memory architecture and methodology, August 1996. US Patent 5,841,874. T. Kohonen. Content-Addressable Memories. Springer-Verlag, New York, 1980. Karthik Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for advanced packet classification with ternary cams. In SIGCOMM, pages 193–204, 2005. Huan Liu. Efficient mapping of range classifier into ternary-cam. In HOTI ’02: Proceedings of the 10th Symposium on High Performance Interconnects HOT Interconnects (HotI’02), page 95, Washington, DC, USA, 2002. IEEE Computer Society. V. Srinivasan, S. Suri, G. Varghese, and M. Waldvogel. Fast and scalable layer four switching. In Proceedings of ACM Sigcomm, volume 28, pages 191–202, October 1998. 56 International Workshop on Combinatorial Algorithms 07 Binary Trees, Towers and Colourings (An Extended Abstract) Alan Gibbons1 and Paul Sant2 2 1 1 Department of Computer Science, King’s College London, UK, Computing and Information Systems Department, University of Bedfordshire, UK Introduction This paper explores new avenues into an old problem. Although the famous Four-Colour Problem (F CP ) of planar graphs has a published solution [6–8] with later improvements [9] there has always been a body of opinion that extant proofs are unsatisfactory, lacking conciseness and lucidity and requiring hours of electronic computation. Historically, (see, for example, [10]) work on F CP has contributed significantly to the body of graph theory and combinatorics and continues to do so, partly through papers like this. Tait [3] showed that 3-edge colouring of cubic bridgeless graphs was equivalent to F CP . Building indirectly on this work others (for example, [4, 5]) showed that F CP was equivalent, by a linear-time reduction, to the so-called Colouring Pairs of Binary Trees problem (CPBT ). This is our starting point. Specifically, the paper explores the notion of so-called towersin this context. Other papers cited in the references, by the same authors, investigate (CPBT through other avenues. Figure 1 shows an instance of CPBT. Such an instance always consists of two binary trees of the same size (that is, having the same number of leaves). For analytical reasons, we usually consider that the trees have so-called rootedges shown at the top of these structures. Any 3-edge colouring of a binary tree defines a sentence over the colours {a, b, c} which is given by reading (from left to right) the colours of those edges which have leaves as endpoints. In this figure, for both trees, the sentence in bbabb. The sentence is said to colour the tree. In the example, bbabb is said specifically to a-colour both trees because a is the colour of the root-edge in both cases. More formally, an a-colouring of a binary tree is one in which the colours of edges are assigned in such a way that the root-edge is forced (by the colours assigned to all other edges) to be a. For any pair of binary trees of the same size, CP BT is to prove that there always exists at least one sentence that colours both trees. Other problems in a variety of combinatorics (for example, in language, automata and integer linear equation theories, [4, 5]) have, in turn, been shown to be equivalent to CPBT which adds to its interest. In [1, 2] the authors study CP BT through the idea of rotations in trees and by looking at broad classes of CP BT problem instances which are solvable independently from F CP . This paper takes a fresh look at CP BT through the notion of towers. The following sections define towers and decomposition of binary trees into towers. They International Workshop on Combinatorial Algorithms 07 57 also develop combinatorics and algorithmics appropriate to this initial study of towers in the context of CP BT . Before embarking on that, we present a theorem which, for the first time, completely characterises the set of all the strings each of which a-colours at least one binary tree with n leaves. In the theorem, Sn is the cardinality of this set. a b a c a b c b c a a b b b b c b a Fig. 1. A solution to CPBT for two trees with 5 leaves Theorem 1. Let S be a string of length n (> 1) over the alphabet {a, b, c} and let A, B, C respectively be the number of a’s, b’s and c’s in s. Then: · S a-colours some binary tree with n leaves if and only if 1. S contains at least two of the characters a, b and c. 2. for n even, A is even and (B, C) are odd. for n odd, A is odd and (B, C) are even. ·S n = (3n − 1)/4, n even, and Sn = (3n − 3)/4, n odd Proof. (sketch) First part: In a 3-edge colouring of any 3-regular rootless tree T , the number of edges attached to leaves that are coloured a, b or c are either all even or all odd. This forces the parities of A, B and C of S defining the colouring of any binary tree obtained by rooting T . On the other hand, given any S over {a, b, c}, we can simulate a colouring of a binary tree by replacing recursively any pair of adjacent and differently coloured letters by the third letter, only if S satisfies the conditions of the theorem for a-colouring. Second part: It is easy to prove that of the 3n distinct strings of length n over {a, b, c}, (3n + 1)/2 have A even and (3n − 1)/2 have A odd. For n even, the (3n + 1)/2 strings with even A includes the string consisting only of a’s. This is disallowed by (1) above, leaving (3n +1)/2−1 = (3n −1)/2 strings. Each of these contains at least one character that is not a, and each has either (B, C) odd or (B, C) even. We can form pairs of strings such that one string of a pair differs from the other in just one respect: the first character that is not an a in one is a b and in the other is a c. Clearly every one of the (3n − 1)/2 strings can be accounted for in this way. It follows that for n even, there are (3n − 1)/4 strings with A even, (B, C) odd and containing at least two characters from {a, b, c}. For n odd, the proof is similar but we start with (3n − 1)/2 strings with A odd. 58 International Workshop on Combinatorial Algorithms 07 Corollary 1. In any instance of CPBT, each tree with n leaves can be a-coloured by 2n−1 distinct strings. It follows that the full space of all strings a-colouring trees of size n is therefore much too large to guarantee that the two sets of strings defined by two trees of such an instance will have a non-empty intersection. 2 Towers If both children of an internal node of a binary tree are leaves, then that internal node is said to be a terminal. A tower is a binary tree with exactly one terminal. Before going on to look at the decomposition of trees into towers and its relevance for CP BT , we present the following theorem which formulates the number of binary trees, Tk,n that have n leaves and k terminals. Theorem 2. Tk,n 1 = k µ 2k − 2 k−1 ¶µ n−2 2k − 2 ¶ 2n−2k Proof. (brief sketch) A tree with n vertices may be constructed from a tree with (n − 1) vertices by attaching a pair of edges (creating a terminal vertex) to any leaf. This process of adding an edge pair may (if the new terminal is not a child of an old terminal) or may not (if the new terminal is a child of an old terminal) create one more terminal in the new tree than in the old. Thus, a particular tree with n leaves and k terminals may be constructed from k different smaller trees with (n − 1) leaves, some of which have k terminals and some of which have (k − 1) terminals. In a tree with (n − 1) leaves and (k − 1) terminals, there are (n − 2k + 1) leaves that are not the children of terminals. In a tree with (n − 1) leaves and k terminals, there are 2k leaves that are the children of terminals. The above considerations establish the following recurrence relation Tk,n = ((n − 2k + 1)Tk−1,n−1 + 2kTk,n−1 )/k (1) In addition, and by inspection of the trees, we have that: T1,2 = 1, T1,3 = 2, T1,4 = 4, T2,4 = 1, T1,5 = 8, T2,5 = 6 The closed form for Tk,n in the theorem statement can now be found using standard means too long to include here. The reader may (tediously) verify that the closed form does indeed satisfy the recurrence relation and initial values. 2.1 Decomposition of Trees into Towers There are several ways to decompose trees into towers. At this time it is not clear what strategy might best help, for example, to provide an algorithmic solution to CPBT (this being a major motivator). Let a join define an internal vertex for which both children are not leaves. Natural methods for decomposition then International Workshop on Combinatorial Algorithms 07 59 involve cutting edges that connect a join to a child (a copy of that edge being retained by the join and its child). Such methods might: (a) cut both edges from each join to its children, or (b) cut just one edge from each join to a child. We look briefly at both of these possibilities. Cutting both edges produces more towers from a given tree, but the method seems to provide, in some contexts, greater analytic utility. For example, we can reproduce the result of Theorem 2 by considering the reverse process of such a decomposition. Suppose that a tree has k terminals, then we can think of the towers produced by process (a) as super-edges in a “tree” of super-edges which has k “leaves” and, in all, (2k − 1) super-edges arranged in one of the standard topologies of a normal k-leaved binary tree. This is illustrated in Figure 2. For the tree, T , of (a) the edge-cuts decomposing (that is, separating) the tree into towers are indicated. Its super-edge tree is shown in (b). Now, suppose that we wish to count the number of trees with n leaves and k terminals. We can look at all the distinct ways of generating (2k − 1) towers and combining them through topologies such as that of Figure 2 (b). The following method does this, carefully avoiding the duplication of individual binary trees. We start with a large tower with n leaves. Such a tower that will produce T of Figure 2 (a) is shown in (c). Internal edges of the large tower are cut to produce the required (2k − 1) super-edges. Note that the stacking of super-edges is topologically ordered according to the digital labels of (b) and (c). How many 1 x y z 2 3 (a) T 4 1 2 5 3 4 5 (b) super−edge tree of T (c) tower of super−edges Fig. 2. Decomposition of a tree, T , into towers. different trees with n vertices and k terminals can be constructed in this way for a fixed topology such as that of (b)? We start with a tower (such as (c)) with n leaves, there are 2n−1 such towers. We cut (2k-2) internal edges of the tower to make (2k − 1) super-edges; there are (n-2) internal edges, so that these cuts can ¢ ¡ n−2 ways. However, the same set of super-edges can be produced be made in 2k−2 by 22k−2 different large towers which only differ in the connections made at the 60 International Workshop on Combinatorial Algorithms 07 cuts. For example, in Figure 2 (c), the topmost cut severs y from z, but after the cut z might just as well have been connected ¡ n−2 ¢ n−2k to x. Thus, removing duplications in the count, there are overall 2k−2 2 different ways of producing trees with n leaves, k terminals with a fixed topology (such as that indicated in (b)). Notice that we employ, for every such tree, the same mapping of super-edges in the tower to positions in the tree (exemplified by the digital labels in (b) and (c)) and this¡avoids ¢ duplications in those trees generated by the process. Now there are k1 2k−2 k−1 different topologies for the super-edge tree (this is the Catalan number giving the number of binary trees with k leaves). Multiplying the Catalan number with the number of super-edge trees with a specific topology then reproduces the result of Theorem 2. We very briefly look at the second option, (b), of decomposing a tree into towers. It is easily appreciated that this option (however the choice is made as to which edge of a pair is cut) produces a minimum number of towers (equal to the number of terminals in the tree). A lemma similar to the following also holds for the previous method of decomposition. Lemma 1. A decomposition of a binary tree into a minimum number of towers can be achieved in linear-time. Proof. (sketch) We employ a depth-first traversal of the tree which (on exiting a tower) can detect (on the basis of local information and in constant time) whether an edge should be cut. It is well-known that such a traversal takes linear time. 2.2 Towers and the CPBT problem In deriving solutions to specific CPBT problems we shall be concerned to show that the 3-regular graph, obtained by conjoining corresponding leaves of the trees, has an even-cycle cover. A 3-edge colouring then follows immediately by assigning the colours a and b to edges alternately around each cycle and assigning the colour c to every other edge. The solution to CP BT is then given by the series of colours assigned to the conjoined edges. We first show that if one of the trees in a CPBT problem is a left (or right) spine, then a solution is easily found. A right (left) spine is, of course, just a tower with its terminal as parent of the rightmost (leftmost) pair of leaves. Theorem 3. If (T1 , T2 ) is an instance of CPBT where T1 is any binary tree and T2 is a right (or left) spine, then this instance is solvable. Proof. We first show that the so-called Halin graph based on T1 contains a Hamiltonian cycle. A Halin graph is constructed from T1 by adding a circuit of edges, each connecting adjacent leaves of T1 as indicated in Figure 3(a). Here, T1 is enclosed in a so-called skirt of edges and e is the root-edge. A Hamiltonian Circuit for any Halin graph can be inductively constructed by replacing vertices on the skirt, in turn, by triangles. This is indicated in Figure 3(b) where the starting point for any T1 is the graph on the left of that figure. In that starting International Workshop on Combinatorial Algorithms 07 (a) e 61 (b) ... Fig. 3. A Halin Graph (a) and construction of its Hamiltonian Circuit (b). position, the bold (Hamiltonian) circuit is the seed for our final Hamiltonian circuit. At each stage, a vertex (indicated by a heavy dot) is replaced by a triangle and the bold circuit is locally expanded to incorporate two sides of the triangle so that it remains Hamiltonian. A 3-edge colouring of the Halin graph is now obtained by assigning the colours a and b to the Hamiltonian Circuit and c to the other edges. By removing part of the skirt (for our example, this is shown in Figure 4(a)) we begin to construct the graph which is the conjugation of T1 and a right- or left-spine. This produces four free ends: W, X, Y and Z in the figure. Notice that whatever T1 , the free ends W and X will be connected by a 2-coloured chain of edges (coloured a and b in our example) which does not contain Y or Z. To complete the construction, we interchange the colours of this chain and conjoin (as will always be possible) the free ends X and Y . We can now observe (see Figure 4(b)) the graph T1 (above the dotted line) and a spine (below it). By obvious variation in construction, the choice of left- or right- spine is always possible. For our example, the solution to CP BT is babbbc. Unlike the case for Lemma 2, instances of CP BT will generally consist of tree pairs neither of which is a tower. Our vision then first consists of a decomposition of these trees into towers. Then the towers are to be systematically reconnected and coloured. We might start with a pair of towers with some conjoined edges and add towers one or more at a time until the job is done. This process is too ambitious to be tackled in this initial study. However, a starting point has to be the solution of CP BT for pairs of towers. This occupies the remainder of the paper. Incidentally (as the bigger picture requires), solving CP BT for problem instances consisting of two towers also lets us colour pairs of towers where there is only partial overlap in terms of conjoined edges. 2.3 The lozenge and restricted lozenge graph In general, two towers with corresponding leaves conjoined form the archetypical graph shown in Figure 5(a). The edges of one tree are indicated in bold and its longest path can be traced through the vertices Y (its root), e, f and a. Other edges form the second tree whose longest path is traced through X (its root), b, c and d. Note that for some pairs of trees the longest paths might be traced respectively by (X, b, a, f ) and (Y, e, d, c), but the overall graph is always archetypically as shown. At the heart of the figure is the lozenge traced through the vertices a, b, c, d, e and f . For this reason, we call such a graph a lozenge 62 International Workshop on Combinatorial Algorithms 07 (a) W W (b) a b b a c a X a c c a b Y b b Z b b b c Z a c c a b b b a a c c c a c a Fig. 4. X (a) X (b) b a b a c f d c f d e Y e Y Fig. 5. Archetypical lozenge (a) and restricted lozenge (b) graphs. graph. Within the lozenge are a number of horizontal edges which we call slats. If there are zero or one slats, then a Hamiltonian Path from X, through the lozenge and onto Y is easy to trace. For these cases the CP BT problem is trivially solved so that we will normally assume there are several or more slats. Outside the lozenge, the edges are shown in a curved manner. In the figure, these edges on the left [right] form (as it were) concentric arcs away from the edge (a,b) [(e,d)]. The edges (a, b) and (d, e) are called the free edges, while (b, c) and (e, f ) are called the leaf-spanning edges (they are actually segments on the longest path of one tree which span between the two leaves at maximum depth in the other tree) of the lozenge. Similarly, those edges defining the perimeter of the lozenge , which are not free or leaf-spanning, we will call inter-slat edges (although in the lozenge graph as a whole they will be segments of the longest paths of one of the trees). The rectangular shapes within the lozenge between successive slats will be called boxes. By the degree of a perimeter edge of the lozenge, we will mean the number of curved edges that have an endpoint on that edge; notice that the degree of the free edges is always zero. International Workshop on Combinatorial Algorithms 07 63 If there are no curved edges on (say) the right-hand-side of Figure 5 (a), then one of the towers is a right- or left-spine and (by Lemma 3) the corresponding CP BT problem is solvable. One approach to solve the CP BT problem for any pair of towers might then be to start with such a coloured lozenge graph and attempt to add and colour the missing edges. However, we will take a different starting point which, we believe, will present fewer problems in extending to the general case. Given an instance of CPBT on the lozenge graph as exemplified in Figure 5 (a), the corresponding problem on the restricted lozenge graph is obtained by deleting all those curved edges that have no endpoint on the lozenge. Of course, it is possible for our original instance of CPBT to be already in this restricted form. In this case, what follows describes how a solution may be found, otherwise the restricted form is a useful starting point for solving the original problem. To be clear, Figure 5 (b) shows the archetypical form of an instance of CPBT on the restricted lozenge graph. Lemma 2. Without loss of generality, we may assume that each leaf-spanning edge of a restricted lozenge graph has non-zero degree. Proof. (sketch) Suppose that a leaf-spanning edge has zero degree, then, locally, the restricted lozenge graph is depicted on the left of Figure 6(a). We assume that the number of slats is greater than 1, otherwise the graph is 3-edge colourable as described earlier. In Figure 6(a), the degrees of the interslat edges (X, i) and (Y, j) are 2 and 1 respectively, although the construction to be described works whatever their degrees. As Figure 6(a) indicates, we construct a smaller restricted lozenge by removing that part of the graph enclosed by the dotted curve and join the three loose edges left (crossing the curve) at a single new vertex Z as shown on the right of Figure 6(a). If the smaller restricted lozenge is 3-edge colourable, then we show that the reconstructed graph is also 3-edge colourable. Notice that the edges meeting at Z must be differently coloured. We therefore only need to show that the removed portion of the graph has a 3-edge colouring in which the corresponding edges are all differently coloured. That this is possible is indicated in Figure 6(b). Here (from left to right) the degree of (b) (a) c b a Z X i Y i b j c b c a b a a c b c a c b c a b ... b a a c a c j b Fig. 6. the original interslat edge (X, i) is increased (in an obvious inductive manner) from 0 upwards until its original value is attained. It is possible that (if the degree of the interslat edge (Y, j) of the original graph is zero) our smaller restricted lozenge in Figure 6(a) will also have a leaf-spanning edge of zero degree. In this case we simply repeat the operation described above until this is not the case or until the number of slats is one. 64 International Workshop on Combinatorial Algorithms 07 Lemma 3. Without loss of generality, we may assume that at least one of the leaf-spanning edges of a restricted lozenge graph has odd degree (and that the other has non-zero degree). Proof. (sketch) Suppose that one of the leaf-spanning edges has even degree. The situation is depicted on the left of Figure 7(a). Here the degree of (X, Y ) X X (a) Y Y (b) X b a Y Z Z X a c b a c Y X X a b a a a b Z Z a c a b a Y c b Y Z Z Fig. 7. is 2, but the construction we describe works just as long as it is non-zero and even. As Figure 7(a) indicates, we construct a smaller restricted lozenge graph simply by deleting the edges contributing to the degree of the leaf-spanning edge (X, Y ). If the smaller graph is 3-edge colourable, then it is easy to obtain a 3-edge colouring of the reconstructed graph. Figure 7(b) shows how this is possible (for each pair of edges contributing to the even degree of (X, Y )) given that (on the left) the replaced edges connect differently coloured edges or (on the right) they connect similarly coloured edges. Notice that the deconstruction of Figure 7(a) yields a restricted lozenge graph with a leaf-spanning edge of zero-degree. We then apply the process of Lemma 2 to obtain an even smaller example with a non-zero degree leaf-spanning edge. If this degree is even, then we repeat the construction of this lemma and so on until we obtain either a restricted lozenge graph with a single slat or a restricted lozenge graph with a leaf-spanning edge of odd degree. Theorem 4. Every restricted lozenge instance of CPBT is solvable. Proof. In view of Lemma 3, we may assume that in our instance of CPBT at least one of its leaf-spanning edges has odd degree and that the other has nonzero degree. Figure 8 shows that a Hamiltonian cycle of the graph exists in this case. In this example, the leaf-spanning edge (e, f ) has degree 3, but provided that it is odd, then a similar construction is possible in which all edges that contribute to the degree also lie on the Hamiltonian Cycle. The construction works whatever the degree of the other leaf-spanning edge (c, d). 3 Conclusion We have proposed a new approach to an old problem (F CP ) whose importance is emphasised by its links to a variety of areas in combinatorics and graph theory. International Workshop on Combinatorial Algorithms 07 65 c d f e Fig. 8. The lemmas and theorems of this paper are a substantial start to this work. We might next extend Theorem 3 to the full lozenge graph (perhaps relatively easily) and then fully consider issues of inductively combining and colouring towers for general instances of CP BT (challenging). There are also many other challenging combinatorial problems (like finding the size of the set of strings each a-colouring some tower) in this general area. References 1. Alan Gibbons and Paul Sant, Rotation sequences and edge-colouring of binary tree pairs, Theoretical Computer Science, 326(1-3), pages 409-418, 2004. 2. Alan Gibbons and Paul Sant, Stringology and The Four Colour Problem of Planar Maps, String Algorithmics, Volume 2 (Texts in Algorithms), C. Illiopoulos and T. Lecroq (Eds.), King’s College Publications,2004. 3. P.G. Tait, On the colouring of Maps, Proceedings of the Royal Society of Edindurgh (Section A), Volume 10, 501-503, 1880. 4. A. Czumaj and A.M. Gibbons, Guthrie’s problem: new equivalences and rapid reductions, Theoretical Computer Science, Volume 154, Issue 1, 3-22, January 1996. 5. A.M. Gibbons. Problems on pairs of trees equivalent to the four colour problem of planar maps. Second British Colloquium for Theoretical Computer Science, Warwick, March 1986. 6. K. Appel and W. Haken. Every planar map is four colourable. Part I. Discharging. Illinois Journal of Mathematics 21 (1977), 429-490. 7. K. Appel, W. Haken and J. Koch. Every planar map is four colourable. Part II. Reducibility. Illinois Journal of Mathematics 21 (1977), 491-567. 8. K. Appel and W. Haken. Every planar map is four colourable. Comtemporary Mathematics 98 (1989) entire issue. 9. N. Robertson, D. P. Sanders, P.D.Seymour and R. Thomas. A new proof of the four-colour theorem. Electron. Res. Announc.Amer. Math. Soc. 2 (1996), no.1, 17-25 also: The four colour theorem, J. Combin. Theory Ser. B. 70 (1997), 2-44. 10. T. L. Saaty and P. C. Kainen. The Four-Colour Problem: Assaults and Conquest. McGraw-Hill (1977). 66 International Workshop on Combinatorial Algorithms 07 A Compressed Text Index on Secondary Memory Rodrigo González ⋆ and Gonzalo Navarro ⋆⋆ Department of Computer Science, University of Chile. {rgonzale,gnavarro}@dcc.uchile.cl Abstract. We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides very good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as compressed indexes have been slower than their classical counterparts on secondary memory. We analyze our index and show experimentally that it is extremely competitive on compressible texts. 1 Introduction and Related Work Compressed full-text self-indexing [22] is a recent trend that builds on the discovery that traditional text indexes like suffix trees and suffix arrays can be compacted to take space proportional to the compressed text size, and moreover be able to reproduce any text context. Therefore self-indexes replace the text, take space close to that of the compressed text, and in addition provide indexed search into it. Although a compressed index is slower than its uncompressed version, it can run in main memory in cases where a traditional index would have to resort to the (orders of magnitude slower) secondary memory. In those situations a compressed index is extremely attractive. There are, however, cases where even the compressed text is too large to fit in main memory. One would still expect some benefit from compression in this case (apart from the obvious space savings). For example, sequentially searching a compressed text is much faster than a plain text, because much fewer disk blocks must be scanned [25]. However, this has not been usually the case on indexed searching. The existing compressed text indexes for secondary memory are usually slower than their uncompressed counterparts. A self-index built on a text T1,n = t1 t2 . . . tn over an alphabet Σ of size σ, supports at least the following queries: – count(P1,m ): counts the number of occurrences of pattern P in T . – locate(P1,m ): locates the positions of all those occ occurrences of P1,m . – extract(l, r): extracts the subsequence Tl,r of T , with 1 ≤ l, r ≤ n. ⋆ ⋆⋆ Funded by Millennium Nucleus Center for Web Research, Grant P04-067-F, Mideplan, Chile. Partially funded by Fondecyt Grant 1-050493, Chile. International Workshop on Combinatorial Algorithms 07 67 The most relevant text indexes for secondary memory follow: – The String B-tree [7] is based on a combination between B-trees and Patricia tries. locate(P1,m ) takes O( m+occ + logb̃ n) worst-case I/O operations, where b̃ b̃ is the disk block size measured in integers. This time complexity is optimal, yet the string B-tree is not a compressed index. Its static version takes about 5–6 times the text size plus text. – The Compact Pat Tree (CPT) [4] represents a suffix tree in secondary memory in compact form. It does not provide theoretical space or time guarantees, but the index works well in practice, requiring 2–3 I/Os per query. Still, its size is 4–5 times the text size, plus text. – The disk-based Suffix Array [2] is a suffix array on disk plus some memoryresident structures that improve the cost of the search. We divide the suffix array into blocks of h elements, and for each block store the first m symbols of its first suffix. It takes at best 4 + m/h times the text size, plus text, and needs 2(1 + log h) I/Os for counting and ⌈occ/b̃⌉ I/Os for locating (in this paper log x stands for ⌈log2 (x + 1)⌉). This is not yet a compressed index. – The disk-based Compressed Suffix Array (CSA)[17] adapts the main memory compressed self-index [24] to secondary memory. It requires n(O(log log σ)+ H0 ) bits of space (Hk is the kth order empirical entropy of T [18]). It takes O(m logb̃ n) I/O time for count(P1,m ). Locating requires O(log n) access per occurrence, which is too expensive. – The disk-based LZ-Index [1] adapts the main-memory self-index [21]. It uses 8nHk (T ) + o(n log σ) bits. It does not provide theoretical bounds on time complexity, but it is very competitive in practice. In this paper we present a practical self-index for secondary memory, which is built from three components: for count, we develop a novel secondary-memory version of backward searching [8]; for locate we adapt a recent technique to locally compress suffix arrays [12]; and for extract we adapt a technique to compress sequences to k-th order entropy while retaining random access [11]. Depending on the available main memory, our data structure requires 2(m − 1) to 4(m − 1) accesses to disk for count(P1,m ) in the worst case. It locates the occurrences in ⌈occ/b̃⌉ I/Os in the worst case, and on average in cr · occ/b̃ I/Os, 0 < cr ≤ 1 is the compression ratio achieved: the compressed divided by original text size. Similarly, the time to extract Pl,r is ⌈(r − l + 1)/b⌉ I/Os in the worst case (where b is the number of symbols on a disk block), multiplying that time by cr on average. With sufficient main memory our index takes O(Hk log(1/Hk )n log n) bits of space, which in practice can be up to 4 times smaller than suffix arrays. Thus, our index is the first in being compressed and at the same time taking advantage of compression in secondary memory, as its locate and extract times are faster when the text is compressible. Counting time does not improve with compression but it is usually better than, for example, disk-based suffix arrays and CSAs. We show experimentally that our index is very competitive against the alternatives, offering a relevant space/time tradeoff when the text is compressible. 68 International Workshop on Combinatorial Algorithms 07 Algorithm count(P [1, m]) i ← m, c ← P [m], First ← C[c] + 1, Last ← C[c + 1]; while (First ≤ Last) and (i ≥ 2) do i ← i − 1; c ← P [i]; First ← C[c] + Occ(c, First − 1) + 1; Last ← C[c] + Occ(c, Last); if (Last < First) then return 0 else return Last − First + 1; Fig. 1. Backward search algorithm to find and count the suffixes in SA prefixed by P (or the occurrences of P in T ). 2 Background and Notation We assume that the symbols of T are drawn from an alphabet A = {a1 , . . . , aσ } of size σ. We will have different ways to express the size of a disk block: b will be the number of symbols, b̄ the number of bits, and b̃ the number of integers in a block. The suffix array SA[1, n] of a text T contains all the starting positions of the suffixes of T , such that TSA[1]...n < TSA[2]...n < . . . < TSA[n]...n , that is, SA gives the lexicographic order of all suffixes of T . All the occurrences of a pattern P in T are pointed from an interval of SA. The Burrows-Wheeler transform (BWT) is a reversible permutation T bwt of T [3] which puts together characters sharing a similar context, so that k-th order compression can be easily achieved. There is a close relation between T bwt and SA: Tibwt = TSA[i]−1 . This is the key reason why one can search using T bwt instead of SA. The inverse transformation is carried out via the so-called “LF mapping”, defined as follows: – For c ∈ A, C[c] is the total number of occurrences of symbols in T (or T bwt ) which are alphabetically smaller than c. – For c ∈ A, Occ(c, q) is the number of occurrences of character c in the prefix T bwt [1, q]. – LF (i) = C[T bwt [i]] + Occ(T bwt [i], i), the “LF mapping”. Backward searching is a technique to find the area of SA containing the occurrences of a pattern P1,m by traversing P backwards and making use of the BWT. It was first proposed for the FM-index [8, 9], a self-index composed of a compressed representation of T bwt and auxiliary structures to compute Occ(c, q). Fig. 1 gives the pseudocode to get the area SA[First, Last] with the occurrences of P . It requires at most 2(m − 1) calls to Occ. Depending on the variant, each call to Occ can take constant time for small alphabets [8] or O(log σ) time in general [9], using wavelet trees (see below). A rank/select dictionary over a binary sequence B1,n is a data structure that supports functions rankc (B, i) and selectc (B, i), where rankc (B, i) returns the number of times c appears in prefix B1,i and selectc (B, i) returns the position of the i-th appearance of c within sequence B. International Workshop on Combinatorial Algorithms 07 69 Both rank and select can be computed in constant time using o(n) bits of space in addition to B [20, 10], or nH0 (B) + o(n) bits [23]. In both cases the o(n) term is Θ(n log log n/ log n). Let s be the number of one bits in B1,n . Then nH0 (B) ≈ s log ns , and thus the o(n) terms above are too large if s is not close to n. Existing lower bounds [19] show that constant-time rank can only be achieved with Ω(n log log n/ log n) extra bits. As in this paper we will have s << n, we are interested in techniques with less overhead over the entropy, even if not of constant-time (this will not be an issue for us). One such rank dictionary [14] encodes the gaps between successive 1’s in B using δ-encoding and adds some data to support a binaryn n search-based rank. It requires s(log ns + log log s +2 log log s )+O(log n) bits of space and supports rank in O(log s) time. We call this structure GR. The wavelet tree [13] wt(S) over a sequence S[1, n] is a perfect binary tree of height ⌈log σ⌉, built on the alphabet symbols, such that the root represents the whole alphabet and each leaf represents a distinct alphabet symbol. If a node v represents alphabet symbols in the range Av = [i, j], then its left child vl vr represents Avl = [i, i+j = [ i+j 2 ] and its right child vr represents A 2 + 1, j]. We v associate to each node v the subsequence S of S formed by the characters in Av . However, sequence S v is not really stored at the node. Instead, we store a bit sequence B v telling whether characters in S v go left or right, that is, Biv = 1 if Siv ∈ Avr . The wavelet tree has all its levels full, except for the last one that is filled left to right. In this paper S will be T bwt . A plain wavelet tree of S requires n log σ bits of space. If we compress the wavelet tree using a numbering scheme [23] we obtain nHk (T ) + o(n log σ) bits of space for any k ≤ α logσ n and any 0 < α < 1 [16]. The wavelet tree permits us to calculate Occ(c, i) using binary ranks over the bit sequences B v . Starting from the root v of the wavelet tree, if c belongs to the right side, we set i ← rank1 (i) over vector B v and move to the right child of v. Similarly, if c belongs to the left child we update i ← rank0 (i) and go to the left child. We repeat this until reaching the leaf that represents c, where the current i value is the answer to Occ(c, i). The locally compressed suffix array (LCSA) [12], is built on well-known regularity properties that show up in suffix arrays when the text they index is compressible [22]. The LCSA uses differential encoding on SA, which converts those regularities into true repetitions. Those repetitions are then factored out using Re-Pair [15], a compression technique that builds a dictionary of phrases and permits fast local decompression using only the dictionary (whose size one can control at will, at the expense of losing some compression). Also, the Re-Pair dictionary is further compressed with a novel technique. The LCSA can extract any portion of the suffix array very fast by adding a small set of sampled absolute values. It is proved in [12] that the size of the LCSA is O(Hk log(1/Hk )n log n) bits for any k ≤ α logσ n and any constant 0 < α < 1. The LCSA consists in three substructures: the sequence of phrases SP , the compressed dictionary CD needed to uncompress the phrases and the absolute sample values to restore the suffix array values. One disadvantage of the original structure is the space and time needed to construct it. In [5] they present a 70 International Workshop on Combinatorial Algorithms 07 heuristic to overcome this, as it can run with limited main memory and performs sequential passes on disk. It might not choose the pairs to replace as well as the original algorithm, but it can trade construction time for precision. 3 A Compressed Secondary Memory Structure We present a structure on secondary memory, which is able to answer count, locate and extract queries. It is composed of three substructures, each one responsible for one type of query, and allowing diverse trade-offs depending on how much main memory space they occupy. 3.1 Entropy-compressed rank dictionary on secondary memory As we will require several bitmaps in our structure with few bits set, we describe an entropy-compressed rank dictionary, suitable for secondary memory, to represent a binary sequence B1,n . In case it fits in main memory, we use GR (Section 2). Otherwise we will use DEB, the δ-encoded form of B: we encode the gaps between consecutive 1’s in B as variable-length integers, so that x is represented using log x + 2 log log x bits. DEB uses at most s log ns + 2s log log ns + O(log n) bits of space [16, Sec. 3.4.1]. Let b̄ be the number of bits contained in a disk block. We split DEB into blocks of at most b̄ bits: if a δ-encoding spans two blocks we move it to the next block. Each block is stored in secondary memory and, at the beginning of block i, we also store the number of 1’s accumulated up to block i − 1; we call this value OBi . To access DEB, we use in main memory an array B a , where B a [i] is the number of bits of B represented in blocks 1 to i − 1. B a uses (s log ns + 2s log log ns + O(log n)) logb̄ n bits of space. To answer rank1 (B, i) with this structure, we carry out the following steps: (1) We binary search B a to find j such that B a [j] ≤ i < B a [j + 1]. (2) We read block j from disk. (3) We decompress the δ-encodings in block j until reaching position i, summing up the bits set. (4) rank1 (B, i) will be the previous sum plus OBi , the accumulator of 1’s stored in the block. Overall this costs O(log sb̄ + log log ns + b̄) = O(log s + log log n + b̄) CPU time and just one disk access. When we use these structures in the paper, s will be Θ(n/b). Table 1 shows some real sizes and times obtained for the structures, when s = n/b. As it can be seen, we require very little main memory for the second scheme, and for moderate-size bitmaps even the GR option is good. 3.2 Counting We run the algorithm of Fig. 1 to answer a counting query. Table C uses σ log n bits and easily fits in main memory, thus the problem is how to calculate Occ over T bwt . We describe four different structures to count depending on how we represent T bwt . We enumerate the versions from 1 to 4. In versions 1 and 2, we use an uncompressed form of T bwt and pay one I/O per call to Occ. In versions 3 and International Workshop on Combinatorial Algorithms 07 71 Table 1. Different sizes and times obtained to answer rank, for some relevant choices of n and b. DEB is stored in secondary memory and is accessed using B a . B a and GR reside in main memory. Tb, Gb, etc. mean terabits, gigabits, etc. TB, GB, etc. mean terabytes, gigabytes, etc. Structure GR DEB+ Ba Real space if s = n/b CPU time n = 1 Tb 1 Gb 1 Gb 1 Mb for rank b = 32 KB 8 KB 4 KB 4 KB log n n s log n O(log s) 100 MB 354 KB 667 KB 677 B s + s log s + 2s log log s + O(log n) n + 2s log log + O(log n) + s log n O(log s + b̄ 93 MB 326 KB 613 KB 600 B s s log n n (s log n + log log n) 14 KB 153 B 575 B 1B s + 2s log log s + O(log n)) b̄ Space (bits) 4, we use a compressed form of T bwt and pay one or two I/Os per call to Occ. In versions 1 and 3, we spend O(b) CPU operations per call to Occ. In versions 2 and 4, this is reduced to O(log σ). Version 4 is omitted from now on as it is not competitive. To calculate Occ(c, i), we need to know the number of occurrences of symbol c before each block on disk. To do so, we store a two-level structure: the first level stores for every t-th block the number of occurrences of every c from the beginning, and the second level stores the number of occurrences of every c from the last t-th block. The first level is maintained in main memory and the second level on disk, together with the representation of T bwt (i.e., the entry of each block is stored within the block). Let K be the total number of blocks. We define: – Ec (j): number of occurrences of symbol c in blocks 0 to (j − 1)· t, with Ec (0) = 0, 1 ≤ j < ⌊K/t⌋. – Ec′ (j): j goes from 0 to K − 1. For j mod t = 0 we have Ec′ (j) = 0, and for the rest we have that Ec′ (j) is the number of occurrences of symbol c in blocks from ⌊j/t⌋ · t to j − 1. Summing up all the entries, E uses ⌈K/t⌉· σ log n bits and E ′ uses Kσ log t·n K bits of space in versions 1 and 2. In version 3, the numbering scheme [23] has a compression limit n/K ≤ b · log n/(2 log log n). Thus, for version 3, E ′ uses at b log n most K· σ log(t· 2 log log n ) bits. To use these structures, we first need to know in which block lies T bwt [i]: – In versions 1 and 2, where the block size is constant, we know directly that T bwt [i] belongs to block ⌊i/b⌋, where b is the number of symbols that fit in a disk block. – In version 3, the block size is variable. Compression ensures that there are at most n/b blocks. We use a binary sequence EB1,n to mark where each block starts. Thus the block of T bwt [i] is rank1 (EB, i). We use an entropycompressed rank dictionary (Section 2) for EB. If we need to use the DEB variant, we add up one more I/O per access to T bwt (Section 3.1) . With this sorted out, we can compute Occ(c, i) = Ec (j div t) + Ec′ (j) + Occ′ (Bj , c, offset), where j is the block where i belongs, offset is the position of i within block j, and Occ′ (Bj , c, offset) is the number of occurrences of symbol c 72 International Workshop on Combinatorial Algorithms 07 Fig. 2. Block propagation over the wavelet tree. Making ranks over the first level of W T (rank0 (12) = 6, rank0 (24) = 10 and rank1 (i) = i − rank0 (i)), we determine propagation over the second level of W T , and so on. within block Bj up to offset. Now we explain the three ways to represent T bwt , and this will give us three different ways to calculate Occ′ (Bj , c, offset). Version 1. We use directly T bwt without any compression. If a disk block can store b symbols (ie, b log σ bits), we will have K = ⌈n/b⌉ blocks. Occ′ (Bj , c, offset) is calculated by traversing the block and counting the occurrences of c up to offset. Version 2. Let b be the number of symbols within a disk block. We divide the first level of W T = wt(T bwt ) into blocks of b bits. Then, for each block, we gather its propagation over W T by concatenating the subsequences in breadthfirst order, thus forming a sequence of b log σ bits. See Fig. 2. Note that this propagation generates 2j−1 intervals at level j of W T . Some definitions: Bij : the i-th interval of level j, with 1 ≤ j ≤ ⌈log σ⌉ and 1 ≤ i ≤ 2j−1 . Lji : the length of interval Bij . Oij /Zij : the number of 1’s/0’s in interval Bij . Dj = B1j . . . B2jj−1 with 1 ≤ j ≤ ⌈log σ⌉: all concatenated intervals from level j. – B = D1 D2 . . . D⌈log σ⌉ : concatenation of all the Dj , with 1 ≤ j ≤ ⌈log σ⌉. – – – – Some relationships hold: (1) Lji = Oij + Zij . (2) Zij = rank0 (Bij , Lji ). (3) Lji = j−1 if i is odd (Bij is a left child); Lji = Oi/2 otherwise. (4) |Dj | = L11 = b for j < ⌊log σ⌋, the last level can be different if σ is not a power of 2. With those properties, Lji , Oij and Zij are determined recursively from B and b. We only store B plus the structures to answer rank1 on it in constant time [10]. Note that any rank1 (Bij ) is answered via two ranks over B. Fig. 3 shows how we calculate Occ′ in O(log σ) constant-time steps. To navigate the wavelet tree, we use some properties: j−1 Z(i+1)/2 1. Block Dj begins at bit (j − 1)· b + 1 of B, and |B| = b log σ. International Workshop on Combinatorial Algorithms 07 73 Algorithm Occ′ (B, c, j) node ← 1; ans ← j; des ← 0; B11 = B[1, b]; for level ← 1 to ⌈log σ⌉ do if c belongs to the left subtree of node then level ans ← rank0 (Bnode , ans); level len ← Znode ; node ← 2· node − 1; level else ans ← rank1 (Bnode , ans); level level len ← Onode ; des ← Znode ; node ← 2· node; level Bnode = B[level · b + des + 1, level · b + des + len]; return ans; Fig. 3. Algorithm to obtain the number of occurrences of c inside a disk block, for version 2. Table 2. Different sizes and times obtained to answer count(P1,m ). Version 1 2 3a 3b Main Memory n · σ log n) O( bt n O( bt · σ log n) Secondary Memory n log σ + O( n b · σ log(t· b)) n log σ + O( n b · σ log(t· b)) nHk (T ) + O(σ k+1 log n) n n O( bt · σ log n + b log n) +O( n b · σ log(t · b log n)) k+1 nH log n) n n k (T ) + O(σ O( bt · σ log n + b2 log n log b) n +O( b · σ log(t · b log n)) I/O 2(m − 1) 2(m − 1) CPU O(m· b) O(m log σ) 2(m − 1) O(m(b + log n)) 4(m − 1) O(m(b + log n)) 2. To know where Bij begins, we only need to add to the beginning of Dj the j length of B1j , . . . , Bi−1 . Each Bkj , with 1 ≤ k ≤ i − 1, belongs to a left branch that we do not follow to reach Bij from the root. So, when we descend through the wavelet tree to Bij , every time we take a right branch we accumulate the number of bits of the left branch (zeroes of the parent). 3. node is the number of the current interval at the current level. level 4. We do not calculate Bnode , we just maintain its position within B. Version 3. We compress block B from version 2 using a numbering scheme [23], yet without any structure for rank. In this case the division of T bwt is not uniform, but we add symbols from T bwt to the disk block as long as its compressed W T fits in the block. By doing this, we compress T bwt to nHk + O(σ k+1 log n+n log log n/ log n) bits for any k [16]. To calculate Occ′ (B, c, offset), we decompress block B and apply the same algorithm of version 2, in O(b) time. In Table 2 we can see the different sizes and times needed for our three versions. We added the times to do rank on the entropy-compressed bit arrays. 3.3 Locating Our locating structure will be a variant of the LCSA, see Section 2. The array SP from LCSA will be split into disk blocks of b̃ integers. Also, we will store in each block the absolute value of the suffix array at the beginning of the block. For 74 International Workshop on Combinatorial Algorithms 07 optimization of I/O, the dictionary will be maintained in main memory. So we compress the differential suffix array until we reach the desired dictionary size. Finally, we need a compressed bitmap LB to mark the beginning of each disk block. LB is entropy-compressed and can reside in main or secondary memory. For locating every match of a pattern P1,m , first we use our counting substructure to obtain the interval [First, Last] of the suffix array of T (see Section 2). Then we find the block First belongs to, j = rank1 (LB, First). Finally, we read the necessary blocks until we reach Last, uncompressing them using the dictionary of the LCSA. We define occ = Last − First + 1 and occ′ = cr· occ, where 0 < cr ≤ 1 is the compression ratio of SP . This process takes, without counting, ⌈occ′ /b̃⌉ I/O accesses, plus one if we store LB in secondary memory. This I/O cost is optimal and improves thanks to compression. We perform O(occ + b̃) CPU operations to uncompress the interval of SP . 3.4 Extracting To extract arbitrary portions of the text we use a variant of [11], which compresses T blockwise using a semistatic statistical modeler of order k plus an encoder EN . This compresses the text to nHk (T ) + fEN (n), where fEN (n) is the redundancy of the encoder. For example, a Huffman coder has redundancy n, whereas an arithmetic encoder has redundancy 2. The data generated by the modeler, DM , is maintained in main memory, which requires σ k+1 log n bits. This restricts the maximum possible k to be used: If we have M E bits to store the data generated by the modeler then k ≤ logσ (M E/ log n) − 1. To store the structure in secondary memory we split the compressed text into disk blocks of size b̄ bits (thus the overhead over the entropy is nb fEN (b̄)). If we store less than b = b̄/ log σ symbols in a particular disk block, we rather store it uncompressed. An extra bit per block indicates whether this was the case. Also each block will contain the context of order k of the first symbol stored in the block (k log σ bits). To know where a symbol of T is stored we need a compressed rank dictionary ER, in which we mark the beginning of each block. This can be chosen to be in main memory or in secondary memory, the latter requiring one more I/O access. The algorithm to extract Tl,r is: (1) Find the block j = rank1 (ER, l) where Tl is stored. (2) Read block j and decompress it using DM and the context of the k first symbols. (3) Continue reading blocks and decompressing them until reaching Tr . Using this scheme we have at most ⌈(j − i + 1)/b⌉ I/O operations, which on average is ⌈(j −i+1)Hk (T )/b̄⌉. We add one I/O operation if we use the secondary memory version of the rank dictionary. The total CPU time is O(j −i+b+log n). 4 Experiments We consider two text files for the experiments: the text wsj (Wall Street Journal) from the trec collection from year 1987, of 126 MB, and the 200 MB XML file International Workshop on Combinatorial Algorithms 07 Compressed dictionary - Different texts 100 100 90 90 SP size, as percent of SA SP size, as percent of SA Compressed dictionary - Sequence of phrases 80 70 60 50 40 30 80 70 60 XML DNA English Pitches Proteins Sources WSJ 50 40 30 20 20 0 1 2 3 CD size, as a percent of SA 4 75 5 0 0.5 1 1.5 2 CD size, as percent of SA Fig. 4. Compression ratio achieved on XML as a function of the percentage allowed to the dictionary (CD). Both are percentage over the size of SA, the right plot shows other texts. provided in the Pizza&Chili Corpus (http://pizzachili.dcc.uchile.cl). We searched for 5,000 random patterns, of length from 5 to 50, generated from these files. As in [6] and [1], we assume a disk page size of 32 KB. We first study the compressibility we achieve as a function of the dictionary size, |CD| (as CD must reside in RAM). Fig. 4 shows that the compressibility depends on the percentage |CD|/|SA| and not on the absolute size |CD|. In the following, we let our CD use 2% of the suffix array size. For counting we use version 1 (GR, Section 3.2) with t = log n. With this setting our index uses 19.15 MB of RAM for XML, and 12.54 MB for WSJ (for GR, CD, and DM ). It compresses the SA of XML to 34.30% and that of WSJ to 80.28% of its original size. We compared our results against String B-tree [7], Compact Pat Tree (CPT) [4], disk-based Suffix Array (SA) [2] and disk-based LZ-Index [1]. We add our results to those of [1, Sec. 4]. We omit the disk-based CSA [17] as it is not implemented, but that one is strictly worse than ours. Fig. 5 (left) shows counting experiments. Our structure needs at most 2(m−1) disk accesses. We show our index with and without the substructures for locating. Fig. 5 (right) shows locating experiments. For m = 5, we report more occurrences than those the block could store in raw format. We can see that the result depends a lot on the compressibility of the text. In the highly-compressible XML our index occupies a very relevant niche in the tradeoff curves, whereas in WSJ it is subsumed by String B-trees. Thus, our index is very competitive on compressible texts. We have used texts up to 200 MB, but our results show that the outcome scales up linearly for the RAM needed, while the counting cost is at most 2(m−1), the locating cost depends on the number of occurrences of P . Thus it is very easy to predict other scenarios. References 1. D. Arroyuelo and G. Navarro. A Lempel-Ziv text index on secondary storage. In Proc. CPM, LNCS 4580, pages 83–94, 2007. 2. R. Baeza-Yates, E. F. Barbosa, and N. Ziviani. Hierarchies of indices for text searching. Inf. Systems, 21(6):497–514, 1996. 76 International Workshop on Combinatorial Algorithms 07 3. M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Tech.Rep. 124, DEC, 1994. 4. D. Clark and I. Munro. Efficient suffix trees on secondary storage. In Proc. SODA, pages 383–391, 1996. 5. F. Claude and G. Navarro. A fast and compact web graph representation. In Proc. SPIRE, pages 105–116, 2007. LNCS 4726. 6. P. Ferragina and R. Grossi. Fast string searching in secondary storage: theoretical developments and experimental results. In Proc. SODA, pages 373–382, 1996. 7. P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236–280, 1999. 8. P. Ferragina and G. Manzini. Indexing compressed texts. J. ACM, 52(4):552–581, 2005. 9. P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. Compressed representations of sequences and full-text indexes. ACM TALG, 3(2):article 20, 2007. 10. R. González, Sz. Grabowski, V. Mäkinen, and G. Navarro. Practical implementation of rank and select queries. In Proc. Posters WEA, pages 27–38, Greece, 2005. CTI Press and Ellinika Grammata. 11. R. González and G. Navarro. Statistical encoding of succinct data structures. In Proc. CPM, LNCS 4009, pages 295–306, 2006. 12. R. González and G. Navarro. Compressed text indexes with fast locate. In Proc. CPM, LNCS 4580, pages 216–227, 2007. 13. R. Grossi, A. Gupta, and J. Vitter. High-order entropy-compressed text indexes. In Proc. SODA, pages 841–850, 2003. 14. A. Gupta, W.-K. Hon, R. Shah, and J. Vitter. Compressed data structures: dictionaries and data-aware measures. In Proc WEA, pages 158–169, 2006. 15. J. Larsson and A. Moffat. Off-line dictionary-based compression. Proc. IEEE, 88(11):1722–1732, 2000. 16. V. Mäkinen and G. Navarro. Implicit compression boosting with applications to self-indexing. In Proc. SPIRE, pages 214–226, 2007. LNCS 4726. 17. V. Mäkinen, G. Navarro, and K. Sadakane. Advantages of backward searching — efficient secondary memory and distributed implementation of compressed suffix arrays. In Proc. ISAAC, LNCS 3341, pages 681–692, 2004. 18. G. Manzini. An analysis of the Burrows-Wheeler transform. J. ACM, 48(3):407– 430, 2001. 19. P. Miltersen. Lower bounds on the size of selection and rank indexes. In Proc. SODA, pages 11–12, 2005. 20. I. Munro. Tables. In Proc. FSTTCS, LNCS 1180, pages 37–42, 1996. 21. G. Navarro. Indexing text using the Ziv-Lempel trie. J. Discrete Algorithms, 2(1):87–114, 2004. 22. G. Navarro and V. Mäkinen. Compressed full-text indexes. ACM Computing Surveys, 39(1):article 2, 2007. 23. R. Raman, V. Raman, and S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proc. SODA, pages 233–242, 2002. 24. K. Sadakane. New text indexing functionalities of the compressed suffix arrays. J. Algor., 48(2):294–313, 2003. 25. N. Ziviani, E. Moura, G. Navarro, and R. Baeza-Yates. Compression: A key for next-generation text retrieval systems. IEEE Computer, 33(11):37–44, 2000. International Workshop on Combinatorial Algorithms 07 77 counting cost - XML text, m=5 locating cost - XML text, m=5 30 Occurrences per disk access 25000 Disk accesses 25 20 15 10 5 20000 15000 10000 5000 0 0 0 1 2 3 4 5 6 7 0 Index size, as a fraction of text size (including the text) 1 counting cost - XML text, m=15 3 4 5 6 7 locating cost - XML text, m=15 60 Occurrences per disk access 5500 50 Disk accesses 2 Index size, as a fraction of text size (including the text) 40 30 20 10 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 0 0 1 2 3 4 5 6 Index size, as a fraction of text size (including the text) 7 0 1 2 3 4 5 6 7 Index size, as a fraction of text size (including the text) counting cost - WSJ text, m=5 locating cost - WSJ text, m=5 30 Occurrences per disk access 4000 Disk accesses 25 20 15 10 5 3500 3000 2500 2000 1500 1000 500 0 0 0 1 2 3 4 5 6 Index size, as a fraction of text size (including the text) 7 0 counting cost - WSJ text, m=15 50 Occurrences per disk access Disk accesses 60 2 3 4 5 6 7 locating cost - WSJ text, m=15 250 LZ-index GN-index GN-index w/o loc String B-trees SA CPT 70 1 Index size, as a fraction of text size (including the text) 40 30 20 10 0 LZ-index GN-index String B-trees SA CPT 200 150 100 50 0 0 1 2 3 4 5 6 Index size, as a fraction of text size (including the text) 7 0 1 2 3 4 5 6 7 Index size, as a fraction of text size (including the text) Fig. 5. Search cost vs. space requirement for the different indexes we tested. Counting on the left and locating on the right. 78 International Workshop on Combinatorial Algorithms 07 Star-shaped Drawings of Planar Graphs ⋆ Seok-Hee Hong1 and Hiroshi Nagamochi2 1 School of Information Technologies, University of Sydney shhong@it.usyd.edu.au 2 Department of Applied Mathematics and Physics, Kyoto University, nag@amp.i.kyoto-u.ac.jp Abstract. A star-shaped drawing of a plane graph is a straight-line drawing such that each inner facial cycle is drawn as a star-shaped polygon and the outer facial cycle is drawn as a convex polygon. Given a biconnected planar graph, we consider the problem of finding a star-shaped drawing of the graph with the minimum number of concave corners. We derive an effective lower bound on the number of concave corners, and prove that the problem can be solved in linear time. 1 Introduction Graph drawing has attracted much attention over the last ten years due to its wide range of applications, such as VLSI design, software engineering and bioinformatics. Two- or three-dimensional drawings of graphs with a variety of aesthetics and edge representations have been extensively studied (see [1]). One of the most popular drawing conventions is the straight-line drawing, where all the edges of a graph are drawn as straight-line segments. Every planar graph is known to have a planar straight-line drawing [5]. A straight-line drawing is called a convex drawing if every facial cycle is drawn as a convex polygon. Note that not all planar graphs admit a convex drawing. Tutte [14] gave a necessary and sufficient condition for a plane graph to admit a convex drawing. He also showed that every triconnected plane graph with a given boundary drawn as a convex polygon admits a convex drawing using the polygonal boundary. Later, Thomassen [13] gave a necessary and sufficient condition for a biconnected plane graph to admit a convex drawing. Based on this result, Chiba et al. [4] presented a linear time algorithm for finding a convex drawing (if any) for a biconnected plane graph with a specified convex boundary. In general, the convex drawing problem has been well investigated for the last ten years. Recently Hong and Nagamochi gave conditions for hierarchical plane ⋆ This is an extended abstract. This research was done when the second author was visiting NICTA (National ICT Australia) and partially supported by the Scientific Grant-in-Aid from Ministry of Education, Culture, Sports, Science and Technology of Japan. International Workshop on Combinatorial Algorithms 07 79 graphs to admit a convex drawing [7], and for c-planar clustered graphs to admit a convex drawing in which every cluster is also drawn as a convex polygon [8]. Another variation of convex drawing with minimum outer apices was studied by Miura et al. [12]. They gave a linear time algorithm for finding a convex drawing with minimum outer apices for an internally triconnected plane graph. However, not much attention has been paid to the problem of finding a convex drawing with a non-convex boundary or non-convex faces. Recently, in our companion paper [6], we proved that every triconnected plane graph whose boundary is fixed with a star-shaped polygon has an inner-convex drawing (a drawing in which every inner face is drawn as a convex polygon), if its kernel has a positive area. Note that this is an extension of the classical result by Tutte [14], since any convex polygon is a star-shaped polygon. In this paper, we deal with biconnected planar graph, and consider how to draw these graphs nicely. However, Kant [11] already proved that the problem of deciding whether a biconnected planar graph can be drawn with at most k nonconvex faces is NP-complete. We define a “star-shaped drawing” of a plane graph to be a straight-line drawing such that each inner facial cycle is drawn as a star-shaped polygon and the outer facial cycle is drawn as a convex polygon. We consider the problem of finding a star-shaped drawing of a biconnected planar graph with the minimum number of concave corners (including outer apices as concave corners). We first derive an effective lower bound on the number of concave corners by identifying two characteristic structures “multipaths” and “bi-facial cycles” that require concave corners in any straight-line drawing. Based on this, we design linear time algorithms for finding an optimal plane embedding F of G and for computing a star-shaped drawing of the embedding F . Theorem 1. Let G be a biconnected planar graph. A star-shaped drawing with the minimum number of concave corners among all straight-line drawings of G can be obtained in linear time. ⊓ ⊔ 2 Preliminaries graph. Let G = (V, E) be a graph. The set of edges incident to a vertex v ∈ V is denoted by E(v). The degree of a vertex v in G is denoted by dG (v) (i.e., dG (v) = |E(v)|). For a subset X ⊆ E (resp., X ⊆ V ), G − X denotes the graph obtained from G by removing the edges in X (resp., the vertices in X together with the edges in ∪v∈X E(v)). A graph G = (V, E) is called planar if its vertices and edges are drawn as points and curves in the plane so that no two curves intersect except for their endpoints, where no two vertices are drawn at the same point. In such a drawing, the plane is divided into several connected regions, each of which is called a face. A face is characterized by the cycle of G that surrounds the region. Such a cycle is called a facial cycle, A set F of facial cycles in a drawing is called a plane embedding of a planar graph G, where a face is specified as the outer face in a plane embedding, and is denoted by fFo . Let F(G) denote the set of all plane 80 International Workshop on Combinatorial Algorithms 07 u14 u1 u11 u16 u14 u15 u13 u12 u3 u6 u18 u17 u14 u15 u12 u7 u7 u2 (a) u13 u5 u12 u6 u11 u5 u8 u17 u8 u1 u16 15 u 3 u17 u4 u10 u9 u u13 u6 u18 u8 u16 u11 u5 : concave vertices u1 u3 u4 u10 (b) u9 u10 u2 u18 u7 u9 u2 u4 (c) Fig. 1. Example of a biconnected planar graph G = (V, E); (a) A plane embedding F1 ∈ F ({u1 , u2 , u7 }; G); (b) A plane embedding F2 ∈ F ({u1 , u2 , u7 }; G); (c) A starshaped drawing of embedding F2 ∈ F ({u1 , u2 , u7 }; G) with six concave corners. embeddings of G. Figure 1(a) and (b) show two different plane embeddings of the same planar graph. A planar graph G = (V, E) with a fixed embedding F of G is called a plane graph. The set of vertices, set of edges and set of facial cycles of a plane graph H may be denoted by V (H), E(H) and F (H), respectively. A vertex (resp., an edge) in the outer facial cycle is called an outer vertex (resp., an outer edge), while a vertex (resp., an edge) not in the outer facial cycle is called an inner vertex (resp., an inner edge). For a subset W ⊆ V , let F(W ; G) denote the set of all plane embeddings F of G such that W ⊆ V (fFo ). A biconnected plane graph G is called internally triconnected if, for any cutpair {u, v}, u and v are outer vertices and each component in G−{u, v} contains an outer vertex. For two points p1 , p2 in the plane, [p1 , p2 ] denotes the line segment with end points p1 and p2 , and for three points p1 , p2 , p3 , [p1 , p2 , p3 ] denotes the triangle with three corners p1 , p2 , p3 . For a polygon P , let V (P ) denote the set of vertices of P . A kernel K(P ) of a polygon P is the set of all points from which all points in P are visible. The boundary of a kernel, if any, is a convex polygon. A polygon P is called star-shaped if K(P ) 6= ∅. A straight-line drawing of a graph G = (V, E) in the plane is an embedding of G in the two dimensional space ℜ2 , such that each vertex v ∈ V is drawn as a point ψ(v) ∈ ℜ2 , and each edge (u, v) ∈ E is drawn as a straight-line segment (ψ(u), ψ(v)), where ℜ is the set of reals. A straight-line drawing F of a plane graph G = (V, E, F ) is called a convex drawing, if every facial cycle is drawn as a convex polygon. We say that a drawing F of a graph G is extended from a drawing ψ ′ of a subgraph G′ of G, if ψ(v) = ψ ′ (v) for all v ∈ V (G′ ). A convex polygon drawn for the outer facial cycle in a biconnected plane graph can be extended to a convex drawing when the next condition holds. International Workshop on Combinatorial Algorithms 07 81 Theorem 2. [4, 13] Let G = (V, E, F ) be a biconnected plane graph. Then a drawing of fFo on a convex polygon P can be extended to a convex drawing of G if and only if the following conditions (i)-(iii) hold: (i) For each inner vertex v with dG (v) ≥ 3, there exist three paths disjoint except for v, each connecting v and an outer vertex; (ii) Every cycle of G which has no outer edge has at least three vertices v with dG (v) ≥ 3; and (iii) Let Q1 , Q2 , . . . , Qk be the subpaths of fFo , each corresponding to a side of P . The graph G − V (fFo ) has no component H such that all the outer vertices adjacent to vertices in H are contained in a single path Qi , and there is no inner edge (u, v) whose end vertices are contained in a single path Qi . ⊓ ⊔ In a straight-line drawing of a planar graph G, the whole angle around a vertex v is split into dG (v) angles, each of which is formed by two consecutive edges in the drawing and is called a corner. A corner is called concave if its angle is greater than π. A vertex v in a straight-line drawing is called concave if one of the corners around v is concave. For a straight-line drawing D of a biconnected plane graph G, let Λ(D) denote the set of all concave vertices in D. A star-shaped drawing of a plane graph is a straight-line drawing such that each inner facial cycle is drawn as a star-shaped polygon and the outer facial cycle is drawn as a convex polygon. An outer vertex in a straight-line drawing of a plane graph is called an apex if it is concave in the drawing and its concave corner appears in the outer face. Figure 1(c) shows an example of a straight-line drawing D of the plane graph in Figure 1(b), where Λ(D) = {u4 , u10 , u14 , u16 , u17 , u18 } and u4 , u14 , and u18 are the apices. We easily observe the following. Lemma 1. Any straight-line drawing of a biconnected graph with at least three vertices requires at least three apices on its boundary. ⊓ ⊔ We call a graph trivial if G is a triconnected planar graph or a cycle. By Theorem 2, a trivial graph admits a convex drawing with a specified convex boundary. If we specify the boundary as a triangle, it has a convex drawing with three concave corners (three apices), and this is an optimal solution to our problem. In the following sections, we deal with nontrivial biconnected planar graphs. The SPQR tree of a biconnected graph G = (V, E) represents the adjacency the triconnected components of G (see [2, 3] for detail). Each node ν in the SPQR tree is associated with a graph σ(ν) = (Vν , Eν ) (Vν ⊆ V ), called the skeleton of ν, which corresponds to a triconnected component of G. Figure 2 shows the SPQR tree of the biconnected planar graph in Figure 1. We treat the SPQR tree of a graph G as a rooted tree Tν ∗ by choosing an arbitrary node ν ∗ as its root. The parent virtual edge of a node ν is denoted by parent(ν) (we let parent(ν) = ∅ if ν is the root). We define a parent cutpair of ν as the two endpoints of a parent virtual edge e. We denote the graph formed from σ(ν) by deleting its parent virtual edge as σ − (ν) = (Vν , Eν− ), where Eν− = Eν − {parent(ν)}. Let G− (ν) denote the subgraph of G which consists 82 International Workshop on Combinatorial Algorithms 07 u1 u1 u1 u1 j l i p p j u13 u14 q u6 i u7 u12 νp k u12 u7 νj o l n νl u13 u2 νn h g u7 νm u2 νe u2 νh u8 g f u9 u8 f u7 u9 νb u8 e νa h u2 νd u6 u17 u4 u10 u5 νq u2 u8 b d m n νo e u12 u18 u13 νc u1 u6 u7 u3 a u2 u11 νk u12 o u16 νr q k b a c u1 u2 d u1 u7 u14 u15 m c νi u7 u12 u14 u1 νf u2 νg Fig. 2. The SPQR tree of the biconnected planar graph G in Fig. 1. of the vertices and real edges in the graphs σ − (µ) for all descendants µ of ν, including ν itself. For a cut-pair {u, v} of a biconnected graph G, a u, v-component H is a connected subgraph of G that either consists of a single edge (u, v) or is a maximal subgraph such that H−{u, v} remains connected. We may treat a u, v-component H of a plane graph G as a plane graph under the same embedding of G. In this o case, we define the u, v-boundary path fuv (H) of H to be the path obtained by traversing the boundary of H from u to v in the clockwise order. A simple path with end vertices u and v of a graph G is called a u, v-path, and is called an induced u, v-path if the every internal vertex (i.e., non end vertex) is of degree 2. Let A ⊆ V in a plane graph G = (V, E, F ). For a plane subgraph H of G and two outer vertices u and v of H, we denote Auv (H) = A ∩ (V (Quv (H)) − {u, v}). If H is graph G− (ν) for a node ν in the SPQR tree and (u, v) = parent(ν), then we denote Auv (H) by Auv (ν) and Auv (ν) ∪ Avu (ν) by A(ν), where for the root ν ∗ , we denote by A(ν ∗ ) A ∩ V (fFo ). 3 Structure of Multipaths and Bi-facial Cycles This section identifies two structures “multipaths” and “bi-facial cycles” which cannot admit a straight-line drawing without introducing concave corners. A set of at least two induced u, v-paths is a multipath (a path may be a single edge). We call vertices u and v the terminals of a multipath, and denote by T (M ) the set of terminals of a multipath M . A multipath between terminals u and v is called maximal if it contains all induced u, v-paths. For a multipath M , we denote by |M | the number of paths in M . For example, the graph G in Figure 1(a) has three maximal multipaths M1 = {Q1 = (u1 , u5 , u2 ), Q2 = International Workshop on Combinatorial Algorithms 07 83 (u1 , u3 , u4 , u2 )}, M2 = {Q3 = (u7 , u12 ), Q4 = (u7 , u18 , u12 ), Q5 = (u7 , u17 , u12 )} and M3 = {Q6 = (u8 , u9 ), Q7 = (u8 , u10 , u9 )}. We here observe the following. Lemma 2. Assume that a nontrivial biconnected planar graph G contains a multipath M . In any straight-line drawing of G, each of at least |M |−1 paths in M must have a concave vertex among its internal vertices. ⊓ ⊔ Then we see by definition that for any multipath, its terminals appear as the vertices in the skeleton σ(ν) of a P-node ν (recall that G is not a cycle). Hence we observe the following. Lemma 3. Let Tν ∗ be the rooted SPQR-tree of a nontrivial biconnected planar graph G, and f be a face of the skeleton σ(ν ∗ ) of the root ν ∗ in Tν ∗ . Then the set of all maximal multipaths is unique for all plane embeddings in F(V (f ); G). ⊓ ⊔ In a fixed plane embedding F of a planar graph G, a cycle C is called a bifacial cycle if it is the boundary of a u, v-component H of G with fFo 6∈ F (H) (H is considered as a plane graph), where vertices u and v are called the terminals of the bi-facial cycle C. We denote by T (C) the set of terminals of a bi-facial cycle C. For example, in the plane embedding F1 in Figure 1(a), C1 = (u14 , u15 , u13 , u16 ), C2 = (u1 , u14 , u12 , u13 ), C3 = (u8 , u2 , u9 ), and C4 = (u6 , u8 , u2 ) are bi-facial cycles. We here observe the following. Lemma 4. Assume that a plane embedding F of a nontrivial biconnected graph G contains a bi-facial cycle C. In any straight-line drawing of F , at least one of non terminal vertices in C is concave. ⊓ ⊔ Note that a bi-facial cycle in a plane embedding of a graph G may not be a bi-facial cycle in a different embedding of G. For example, cycle C4 = (u6 , u8 , u2 ) is a bi-facial cycle in the plane embedding F1 in Figure 1(a) but not in the plane embedding F2 in Figure 1(b). A bi-facial cycle C with T (C) = {u, v} is called minimal if the corresponding u, v-component H contains no other bi-facial cycle C ′ which shares an edge with C and no multipath M sharing an edge with C. For our running example, C1 and C2 are minimal bi-facial cycles in both embeddings F1 and F2 in Figure 1(a) and (b), but bi-facial cycles C3 and C4 in embedding F1 are not minimal, since C3 and C4 share an edge with bi-facial cycle C4 and multipath M3 , respectively. It is not difficult to observe that The next property holds. Lemma 5. Let Tν ∗ be the rooted SPQR-tree of a nontrivial biconnected planar graph G, and f be a face of σ(ν ∗ ) of the root ν ∗ in Tν ∗ . Let C be a bi-facial cycle for some plane embedding F ∈ F(V (f ); G). Then C remains to be a bi-facial cycle for all plane embeddings in F(V (f ); G) if and only if C is a minimal bifacial cycle in the plane embedding F . ⊓ ⊔ By definition and Lemmas 3 and 5, we have the following. 84 International Workshop on Combinatorial Algorithms 07 Lemma 6. Let Tν ∗ be the rooted SPQR-tree of a nontrivial biconnected planar graph G, f be a face of σ(ν ∗ ) of the root ν ∗ in Tν ∗ , and F ∈ F(V (f ); G). Let L = {C1 , . . . , Cp , M1 , . . . , Mq } be the set of minimal bi-facial cycles Ci and maximal multipaths Mj such that (V (H) − T (H)) ∩ V (f ) = ∅ for all H ∈ L. Then: (i) For every two subgraphs H, H ′ ∈ L, H and H ′ are edge-disjoint, and (V (H)− T (H)) ∩ (V (H ′ ) − T (H ′ )) = ∅. (ii) L is unique for all plane embeddings in F(V (f ); G). ⊓ ⊔ For a face f in σ(ν ∗ ) of the root ν ∗ , we denote by L(f ; G) the set L of minimal bi-facial cycles and maximal multipaths such that (V (H)−T (H))∩V (f ) = ∅ for all H ∈ L. In our running example, for root ν ∗ = νr and face f = (u1 , u2 , u7 ) ∈ F (σ(νr )) in the SPQR tree in Figure 2, we have L(f ; G) = {C1 , C2 , M1 , M2 , M3 }. We are now ready to derive a lower bound on the number of concave corners in a straight-line drawing of a plane embedding F ∈ F(V (f ); G). Denote L(f ; G) = {C1 , . . . , Cp , M1 , . . . , Mq } as in Lemma 6, and define a(f ) := p + (|M1 | − 1) + · · · + (|Mq | − 1). (1) Note that any straight-line drawing of F ∈ F(V (f ); G) must have at least a(f ) concave corners on those subgraphs in L(f ; G). Fix an embedding F ∈ F(V (f ); G), and denote by b(F ; f ) the number of concave corners that appear on the boundary of F among such concave corners. More specifically, b(F ; f ) is given as follows. If the boundary of F consists of two induced paths Q, Q′ ∈ Mj for some Mj ∈ L(f ; G), then we define ½ 2 (|Mj | ≥ 3), (2) b(F ; f ) := 1 (|Mj | = 2); otherwise we define b(F ; f ) := |{H ∈ L(f ; G) | (V (H) − T (H)) ∩ V (fFo ) 6= ∅}|. (3) Hence we have the following. Lemma 7. (lower bound) Let Tν ∗ be the rooted SPQR-tree of a nontrivial biconnected planar graph G, and f be a face of σ(ν ∗ ) of the root ν ∗ in Tν ∗ . Then for any straight-line drawing D of a plane embedding F ∈ F(V (f ); G), it holds |Λ(D)| ≥ a(f ) + max{0, 3 − b(F ; f )}. (4) ⊓ ⊔ In general, equality in (4) does not hold. In fact, if F contains a bi-facial cycle which is edge-disjoint with any of all minimum bi-facial cycles, then equality in (4) does not hold. For example, embedding F1 in Figure 1(a) contains bi-facial cycle C4 = (u6 , u8 , u2 ) which is edge disjoint with any subgraph in L(f ; G) with f = (u1 , u2 , u7 ). International Workshop on Combinatorial Algorithms 07 85 We now consider a plane embedding F of a planar graph G = (V, E) and a candidate set A ⊆ V of concave vertices to compute a straight-line drawing of F whose concave vertices are given by A. When an embedding F of G is given, we assume that, for a P-node ν, the indices of the edges in Eν− = {e1 , e2 , . . . , ek } is given by the order in which the corresponding subgraphs of G are embedded in F (the subgraphs for e1 and ek enclose other subgraphs). Definition 1. For a plane embedding F ∈ F(V (f ); G), a subset A ⊆ V is called proper if the following is satisfied. (i) For each minimal bi-facial cycle C ∈ L(f ; G), |A ∩ (V (C) − T (C))| = 1; (ii) For each maximal multipath M = {Q1 , Q2 , . . . , Qk } ∈ L(f ; G), there is a path Qℓ such that |A ∩ (V (Qi ) − T (Qi ))| = 1 for i ∈ {1, 2, . . . , k} − {ℓ}, and A ∩ (V (Qℓ ) − T (Qℓ )) = ∅, where if M contains a single edge, then Qℓ is the edge; (iii) A contains max{0, 3 − b(L; F )} vertices from Vν ∗ ; (iv) For each P-node ν in the rooted SPQR-tree Tν ∗ and Eν− = {e1 , e2 , . . . , ek }, there are indices h′ and h as follows: h′ = h or h′ = h − 1, and if h′ = h − 1 then eh represents an induced u, v-path Q such that A ∩ (V (Q) − {u, v}) = ∅, where (u, v) = parent(ν). Moreover, ei , 1 ≤ i ≤ h′ is a virtual edge and its corresponding child node µi satisfies Avu (µi ) 6= ∅ while ej , h + 1 ≤ j ≤ k is a virtual edge and its corresponding child node µj satisfies Auv (µj ) 6= ∅ (see Figure 3(a)). u u - G(µ1) - - - G(µh‛) G(µ2) f2 f1 G(µh+1) - G(µk) w1 w2 - ... ... - G(µh ) G(µ1) v (a) : concave vertices w4 w3 - - G(µ2) G(µ3) - G(µ4) v (b) Fig. 3. (a)Definition 1(iv) for a P-node ν; (b) An example of a non-proper set A. Note that the total number of those vertices in A ∩ (V (C) − T (C)) in (i) and A ∩ (V (Qi ) − T (Qi )) in (ii) is a(f ). Hence by (iii), a proper pair (A, F ) satisfies |A| = a(f ) + max{0, 3 − b(F, f )}. Condition (iv) is necessary for an embedding F to have a straight-line drawing without introducing other concave vertices than those in A. For example, Figure 3(b) shows a subset A = {w1 , . . . , w4 , u, v} 86 International Workshop on Combinatorial Algorithms 07 in a plane graph G = (V, E, F ) from which no straight-line drawing can be constructed without introducing some other concave vertices in V − A. 4 Finding a Best Embedding This section shows how to maximize b(F ; f ) over all F ∈ F(V (f ), G) for a given face f ∈ F (σ(ν ∗ )) of the skeleton of the root ν ∗ of the rooted SPQR tree of G. Lemma 8. Let Tν ∗ be the rooted SPQR-tree of a nontrivial biconnected planar graph G, and f be a face of the skeleton σ(ν ∗ ) of the root ν ∗ in Tν ∗ . Then an embedding F ∈ F(V (f ); G) that maximizes b(F ; f ) and a proper set A ⊆ V with |A| = a(f ) + max{b(F ; f ) | F ∈ F(V (f ); G)} can be found in linear time. ⊓ ⊔ We can compute an embedding F that maximizes b(F ; f ) in a bottom-up manner along the rooted SPQR tree Tν ∗ . For a non-root node ν in the rooted SPQR tree Tν ∗ and its parent cut-pair {u, v} of ν, let L(ν; ν ∗ ) denote the set of minimal bi-facial cycles and maximal multipaths H ∈ L(f ; G) such that (V (H)− T (H)) ∩ {u, v} = ∅ (note that L(ν; ν ∗ ) remains unchanged even if other face of σ(ν ∗ ) is chosen as f ). Let b(ν; ν ∗ ) be the maximum number of subgraphs H ∈ L(ν; ν ∗ ) that share an edge with the u, v-boundary path Quv (G− (ν)) of G− (ν) (the maximum is taken over all embeddings in F({u, v}; G− (ν))). By definition, it is not difficult to see that L(ν; ν ∗ ) and b(ν; ν ∗ ) for all non root nodes ν can be computed in O(|V |) time by a dynamic programming algorithm based on a recursive formula (the detail is omitted due to the space limitation). From L(ν; ν ∗ ) and b(ν; ν ∗ ), ν ∈ Ch(ν ∗ ), we can actually compute a(f ) + max{b(F ; f ) | F ∈ F (V (f ); G)} for all faces f ∈ F (σ(ν ∗ )) in O(|Vν ∗ | + |Eν ∗ |) time. Let Hence a face fν ∗ ∈ F (σ(ν ∗ )) that maximizes a(f ) + max{b(F ; f ) | F ∈ F(V (f ); G)} can be obtained in the same time complexity. We can apply the above dynamic programming algorithm for other choice of root ν in the SPQR tree to find such a maximizer fν in O(|V |) time. However, we can compute such fν for all nodes ν in O(|V |) time by reusing the information used to compute b(ν; ν ∗ ) for all descendants ν of the first choice of root ν ∗ . When we choose a node ν1 adjacent to the current root ν ∗ as a new root, we can compute b(ν; ν1 ), ν ∈ Ch(ν1 ) in O(|Vν1 | + |Eν1 |) time. Therefore, we obtain a face f ∗ of a skeleton σ(ν) of a node that maximizes b(f ∗ ) in O(|V |) time, and we can conclude that the minimum number of concave corners in any straight-line drawing is at least a(f ∗ ) + max{0, 3 − b(f ∗ )}. For the optimal face f ∗ , we can also compute an embedding F ∈ F(V (f ∗ ); G) and a proper subset A ⊆ V in F such that |A| = a(f ∗ ) + max{0, 3 − b(f ∗ ; F )} in O(|V |) time. The remaining task to prove Theorem 1 is to show that, given a proper set A ⊆ V in a plane embedding F of G, a star-shaped drawing D with Λ(D) = A under the specified embedding F always exists and that such a drawing D can be computed in linear time. International Workshop on Combinatorial Algorithms 07 5 87 Constructing Star-shaped Drawings This section describes how to construct a star-shaped drawing of a given proper pair (A, F ). We prove the following. Lemma 9. Let Tν ∗ be the rooted SPQR-tree of a biconnected planar graph G, and f be a face of σ(ν ∗ ) of the root ν ∗ in Tν ∗ . Let L = L(f ; G), and (A, F ) be a proper pair. Then there exists a star-shaped drawing D of F such that c(D) = a(L) + max{0, 3 − b(L; F )}, and such a drawing D can be constructed in linear time. ⊓ ⊔ Let F be a plane embedding of G and A be a proper subset. For a node ν in the SPQR tree Tν ∗ , let fFo (H) denote the boundary of an induced subgraph H of G (where H are regarded as plane graphs induced from G by embedding F ). We compute a star-shaped drawing D with Λ(D) = A in a top-down manner along the SPQR tree Tν ∗ . We first explain key strategies to maintain a star-shaped drawing recursively: 1. After choosing an arbitrary |A(ν ∗ )|-gon Bν ∗ with V (Bν ∗ ) = A(ν ∗ ) for the root ν ∗ , we process all nodes ν in Tν ∗ from the root to the leaves by repeatedly computing a drawing Dν of skeleton σ − (ν) (or σ(ν)), where the line segments in Dν for virtual edges will be replaced with new drawings Dµ of the nodes corresponding to the virtual edges. 2. When we process a non-root R-node ν whose parent is an R-node, we fix the boundary of G− (ν) as an (|A(ν)| + 2)-gon Bν , and then compute a convex drawing Dν of skeleton σ − (ν) as an extension of Bν (we use the linear time convex drawing algorithm due to Chiba et al. [4] to compute such a convex extension). 3. When we process a non-root R-node ν whose parent is an R- (resp., S-node), we compute a convex drawing Dν of skeleton σ − (ν) (resp., Dν of skeleton σ(ν)), where the boundary of the skeleton has been fixed as a convex polygon Bν , and we b(we use the linear time convex drawing extend Bν to such a drawing Dν . Bµ 4 Bµ 3 f rf rf s Bµ 2 t (a) s Bµ 1 t : concave vertices : view points (b) Fig. 4. (a) Fixing the boundary Bν of G− (ν) of an S-node ν of type I; (b) Fixing the boundaries Bµi of G− (µi ) of the child nodes µi ∈ Ch(ν) of an S-node ν of type I. 88 International Workshop on Combinatorial Algorithms 07 4. When we process a non-root S-node ν whose parent is an R-node (resp., a P-node), the virtual edge corresponding to ν is drawn as a line segment Lν (resp., a convex polygon Bν ), and we then compute an (|A(µ)| + 2)-gon Bµ of its child node µ ∈ Ch(ν), and replace Lν with (resp., extend Bν to) the sequence of these convex polygons Bµ , µ ∈ Ch(ν). 5. When we process a non-root P-node ν with {u, v} = parent(ν), the boundary of G− (ν) has been fixed as Bν , and we fix the boundary G− (µ) of each child node µ ∈ Ch(ν) as an (|Avu (µ)| + 2)-, (|Auv (µ)| + 2)- or (|A(µ)| + 2)-gon Bν depending on the position of the corresponding virtual edge in the indexing of Definition 1(iv). 6. When new inner faces are introduced after computing a drawing Dν of σ − (ν) (or σ(ν)) of an R-node ν, we choose a point rf inside each new face f as the view point of f , which will be kept as a visible point in the kernel of the face f until a final drawing is obtained. 7. A convex polygon Bν for a node ν will be chosen so that the view point(s) rf of the face(s) adjacent to the corresponding virtual edge remain visible from everywhere in the face(s). : concave vertices : view points S : child S-nodes e1 S r S e2 (a) 1 r 1 r2 (b) Fig. 5. (a) Fixing the boundary Bν ∗ of G for the root R-node ν ∗ ; (b) Fixing the boundaries of G− (µ) and G− (µ′ ) for the child P- and R-nodes µ ∈ Ch(ν) and the child P- and R-nodes µ′ ∈ Ch(µ) with child S-nodes µ ∈ Ch(ν). We distinguish type I with type II for each of S- and R-nodes. We briefly explain what we need to compute for each type of nodes. (A full description of the algorithm is omitted due to the space limitation.) When a vertex u is drawn as a point in the plane, the point may be denoted by u for notational simplicity. type I S-node A non-root S-node ν is called type I if the virtual edge eν corresponding ν is an outer edge (resp., an edge) in the drawing for its parent R-node (resp., its parent P-node). Input: The view point rf of the face incident to eν and a convex polygon Bν o o drawn for fvu (G− (ν) (or fuv (G− (ν)) are given. International Workshop on Combinatorial Algorithms 07 89 u u u f* rf * S e1 v v (a) (b) v (c) : concave vertices : view points S : child S-nodes : terminals Fig. 6. (a) For an R-node ν of type I, a convex (|Avu (ν)| + 2)-gon Bν is drawn for o fuv (ν); (b) Extending Bν into a convex drawing of σvu (ν); (c) Fixing the boundaries of G− (µ) and G− (µ′ ) for the child P- and R-nodes µ ∈ Ch(ν) and the child P- and R-nodes µ′ ∈ Ch(µ) with child S-nodes µ ∈ Ch(ν). Output: For each child node µ ∈ Ch(ν), we place the parent cut-pair (s, t) on Bν and fix the boundary of G− (µ) as a convex (|A(µ)| + 2)-gon Bµ with V (Bµ ) = A(µ) ∪ {s, t} by combining the line segments of Bν between s and t and a new line segments between s and t inside Bν (see Figure 4, and the virtual edge e1 in Figure 5). type II S-node A non-root S-node ν is called type II if its parent node is an R-node, and the virtual edge eν corresponding ν is an inner edge in the drawing of the parent node. Input: The view points r1 and r2 of the two faces incident to eν and a line segment Bν = [u, v] drawn for parent(ν) = (u, v) are given. Output: For each (R- or P-node) µ, we place the parent cut-pair (s, t) on Bν and fix the boundary of G− (µ) as a convex (|A(µ)| + 2)-gon Bµ with V (Bµ ) = A(µ) ∪ {s, t} inside [s, t, r1 ] ∪ [s, t, r2 ] (see the virtual edge e2 in Figure 5). For a non-root node ν with (u, v) = parent(ν), two plane drawings for σ(ν) o can be obtained from the plane graph σ − (ν); one still has fuv (σ − (ν)) as part of o − its boundary, and the other fvu (σ (ν)), where we denote the former and latter plane graphs by σuv (ν) and σvu (ν), respectively. type I R-node A non-root R-node ν is called type I if a convex drawing of plane graph σvu (ν) (resp., σuv (ν)) is required to be computed. Let (u, v) = parent(ν). o Input: A convex (|Avu (ν)| + 2)-gon Bν for fvu (ν) (resp., (|Avu (ν)| + 2)-gon o Bν for fvu (ν)) is given (see Figure 6(a)). 90 International Workshop on Combinatorial Algorithms 07 u u u w S e1 w’ v v v (a) (b) (c) : concave vertices : view points S : child S-nodes : terminals Fig. 7. (a) For an R-node ν of type II, a convex (|A(ν)| + 2)-gon Bν is given as the boundary of G− (ν); (b) Extending Bν into a convex drawing of σ − (ν); (c) Fixing the boundaries of G− (µ) and G− (µ′ ) for the child P- and R-nodes µ ∈ Ch(ν) and the child P- and R-nodes µ′ ∈ Ch(µ) with child S-nodes µ ∈ Ch(ν). Output: A convex drawing of σvu (ν) (resp., σuv (ν)) as an extension of Bν (see Figure 6(b)). type II R-node An R-node ν is called type II if a convex drawing of σ − (ν) is required to be computed. Input: A convex (|A(ν)| + 2)-gon Bν for the boundary of G− (ν) is given (see Figure 7(a)), where a convex |A(ν ∗ )|-gon Bν ∗ for the boundary of G is given if ν is the root ν ∗ . Output: A convex drawing of σ − (ν) as an extension of Bν (see Figure 7(b)). P-node Let ν be a P-node. Let u, v be the vertices in Vν ∗ if ν is the root ν ∗ or the vertices in parent(ν) otherwise. Input: A convex boundary Bν of G− (ν), where (e1 , e2 , . . . , ej ∗ , . . . , ek ) denotes the sequence of Eν− , where ej ∗ = c(ν) is the central edge, and fi denotes the face between two edges ei and ei+1 in the plane graph σ − (ν) (see Figure 8(a)). Output: Drawings of σ − (µ) of child nodes µ ∈ Ch(ν). We process left edges in Eν− from e1 to ej ∗ −1 and right edges in Eν− from ek to ej ∗ +1 before the central edge ej ∗ is processed. Left edges: If left edge ei corresponds to an S-node µi , then we choose a view point rfi and treat µi as a type I S-node . If left edge ei corresponds to an R-node µi , then treat µi as a type I Rnode, and compute a convex drawing Dµ of σvu (µi ) (including virtual edge parent(ν)), where the view point rfi is computed during the computation of Dµ (see Figure 9(a) and (b)). Right edges: we apply the above procedure symmetrically to right edges ek , ek−1 , . . . , ej ∗ +1 . We treat the central edge ej ∗ as follows. If ej ∗ corresponds to an R-node µj ∗ , then we then treat µj ∗ as a type II R-node. International Workshop on Combinatorial Algorithms 07 91 u f1 e1 f2 e2 ... ... fj*-1 u fj* . . .f e j*-1 e j* ej*+1 rf k-1 ... 1 e j* e2 ek ek v (a) ... v (b) : concave vertices : view points Fig. 8. (a) Bν for the root P-node or an internal P-node ν; (b) e1 corresponds to an S-node µ1 , where Snode(µ1 , Bµ1 , rf1 , ∅,I) is executed after choosing view point rf1 . Since we can show that a boundary of Bν of a node ν can be fixed in linear time of the size of Bν , the entire algorithm can be implemented to run in O(|V |+ |E|) time. This proves Lemma 9. 6 Concluding Remarks In this paper, we considered the problem of finding a star-shaped drawing of a given biconnected planar graph with the minimum number of concave corners. By deriving an effective lower bound on the number of concave corners, we proved that the problem can be solved in linear time. A natural question related to the problem is whether the problem of finding a straight-line drawing that minimizes the number of concave corners for a given plane embedding F of a biconnected planar graph is hard or not. Remember that (4) does not hold by equality in general, as observed in the example in Fig. 1(a). However, recently we showed that the problem can be solved in linear time [9]. References 1. G. Di Battista, P. Eades, R. Tamassia and I. G. Tollis, Graph Drawing: Algorithms for the Visualization of Graphs, Prentice-Hall, 1998. 2. G. Di Battista and R. Tamassia, On-line planarity testing, SIAM J. on Comput. 25(5), pp. 956-997, 1996. 3. G. Di Battista and R. Tamassia, On-line maintenance of triconnected components with SPQR-trees, Algorithmica, 15, pp. 302-318, 1996. 4. N. Chiba, T. Yamanouchi and T. Nishizeki, Linear algorithms for convex drawings of planar graphs, Progress in Graph Theory, Academic Press, pp. 153-173, 1984. 5. I. Fáry, On straight line representations of planar graphs, Acta Sci. Math. Szeged, 11, pp. 229-233, 1948. 92 International Workshop on Combinatorial Algorithms 07 u u rf f 1 e j* ... ek 1 e2 ... e j* ... ek : concave vertices : view points S : child S-nodes S v (a) v (b) Fig. 9. (a) e1 corresponds to an R-node µ of type I, where Rnode(µ1 , Bµ1 , σvu (µ1 ),I) computes a convex drawing Dµ of σvu (µ) and returns a view point rf1 ; (b) Rnode(µ1 , Bµ1 , σvu (µ1 ),I) draws the boundaries of child nodes of µ1 . 6. S.-H. Hong and H. Nagamochi, Convex drawings of graphs with non-convex boundary, Proc. of WG 2006, Lecture Notes in Computer Science, vol. 4271, SpringerVerlag, pp. 113-124, 2006. 7. S.-H. Hong and H. Nagamochi, Convex drawings of hierarchical plane graphs, 17th Australasian Workshop on Combinatorial Algorithms (AWOCA 2006) Uluru, NT, Australia, July 13-16, 2006. 8. S.-H. Hong and H. Nagamochi, Fully convex drawings of clustered planar graphs, Proc. of Korea-Japan Joint Workshop on Algorithms and Computation, August 9-10, 2007, Chonnam National University, Gwangju, Korea, pp. 32–39. 9. S.-H. Hong and H. Nagamochi, Star-shaped drawings of plane graphs, Workshop on Algorithms and Computation 2008 (WALCOM 2008) February 7-8, 2008 at Dhaka, Bangladesh (submitted). 10. J. E. Hopcroft and R. E. Tarjan, Dividing a graph into triconnected components, SIAM J. on Comput., 2, pp. 135-158, 1973. 11. G. Kant, Algorithms for Drawing Planar Graphs, Ph.D. Dissertation, Department of Computer Science, University of Utrecht, Holland, 1993. 12. K. Miura, M. Azuma and T. Nishizeki, Convex drawings of plane graphs of minimum outer apices, Int. J. Found. Comput. Sci., 17, pp. 1115-1128, 2006. 13. C. Thomassen, Planarity and duality of finite and infinite graphs, J. of Combinatorial Theory, Series B 29, pp. 244-271, 1980. 14. W. T. Tutte, Convex representations of graphs, Proc. of London Math. Soc., 10, no. 3, pp. 304-320, 1960. 15. W. T. Tutte, Graph Theory, Encyclopedia of Mathematics and Its Applications, Vol. 21, Addison-Wesley, Reading, MA, 1984. International Workshop on Combinatorial Algorithms 07 93 Algorithms for Two Versions of LCS Problem for Indeterminate Strings⋆ Costas S. Iliopoulos1,4,⋆⋆ , M. Sohel Rahman1,4,⋆ ⋆ ⋆,† , and Wojciech Rytter2,3,5,‡ 3 1 Algorithm Design group Department of Computer Science King’s College London Strand, London WC2R 2LS, England http://www.dcs.kcl.ac.uk/adg 2 Institute of Informatics Warsaw University, Warsaw, Poland, Department of Mathematics and Informatics Copernicus University, Torun, Poland 4 {sohel,csi}@dcs.kcl.ac.uk 5 rytter@mimuw.edu.pl Abstract. We study the complexity of the longest common subsequence (LCS) problem from a new perspective. By an indeterminate string (ie = X[1] e X[2] e . . . X[n], e string, in short) we mean a sequence X where e ⊆ Σ for each i, and Σ is a given alphabet of potentially large size. X[i] e is any usual string over Σ which is an element of A subsequence of X e 1 ]X[i e 2 ] . . . X[i e p ], the finite (but usually of exponential size) language X[i where 1 ≤ i1 < i2 < i3 . . . < ip ≤ n, p ≥ 0. Similarly, we define a supersequence of x. Our first version of the LCS problem is Problem ILCS: for e and Ye , find their longest common subsequence. From given i-strings X the complexity point of view, new parameters of the input correspond e and Ye . There is also a to |Σ| and maximum size ℓ of the subsets in X e and third parameter R, which gives a measure of similarity between X Ye . The smaller the R, the lesser is the time for solving Problem ILCS. Our second version of the LCS problem is Problem CILCS (constrained e and Ye and a plain string Z, find the longest ILCS): for given i-strings X e common subsequence of X and Ye which is, at the same time, a supersequence of Z. In this paper, we present several efficient algorithms to solve both ILCS and CILCS problems. The efficiency in our algorithms are obtained in particular by using an efficient data structure for special type of range maxima queries and fast multiplication of boolean matrices. ⋆ ⋆⋆ ⋆⋆⋆ † ‡ Part of this research work was carried out when Costas Iliopoulos and M. Sohel Rahman were visiting Institute of Informatics, Warsaw University. Supported by EPSRC and Royal Society grants. Supported by the Commonwealth Scholarship Commission in the UK under the Commonwealth Scholarship and Fellowship Plan (CSFP). On Leave from Department of CSE, BUET, Dhaka-1000, Bangladesh. Supported by the grant of the Polish Ministery of Science and Higher Education N 206 004 32/0806. 94 1 International Workshop on Combinatorial Algorithms 07 Introduction This paper deals with two interesting variants of the classical and well-studied longest common subsequence (LCS) problem: LCS for indeterminate strings (istring) and Constrained LCS (CLCS) problem, also for i-strings. In i-strings, at each position, the string may contain a set of characters. The LCS problem and variants thereof have been focus of extensive research in the computer science literature since long. Given two strings, the LCS problem consists of computing a subsequence of maximum length common to both strings. In CLCS, the computed longest common subsequence must also be a supersequence of a third given string. The motivation of CLCS problem comes from bioinformatics: in the computation of the homology of two biological sequences it is important to take into account a common specific or putative structure [25]. The longest common subsequence problem for k strings (k > 2) was first shown to be NP-hard [17] and later proved to be hard to be approximated [15]. The restricted but probably the more studied problem that deals with two strings has been studied extensively. The classic dynamic programming solution to LCS problem, invented by Wagner and Fischer [27], has O(n2 ) worst case running time, where n is the length of the two strings. Masek and Paterson [18] improved this algorithm using the “Four-Russians” technique [1] to reduce the worst case running time to O(n2 / log n)6 . Since then not much improvement in terms of n can be found in the literature. However, several algorithms exist with complexities depending on other parameters. For example, Myers in [20] and Nakatsu et al. in [21] presented an O(nD) algorithm, where the parameter D is the simple Levenshtein distance between the two given strings [16]. Another interesting and perhaps more relevant parameter for this problem is R, where R is the total number of ordered pairs of positions at which the two strings match. Hunt and Szymanski [11] presented an algorithm running in O((R + n) log n) time. They also cited applications where R ∼ n and thereby claimed that for these applications the algorithm would run in O(n log n) time. Very recently, Rahman and Iliopoulos presented an improved LCS algorithm running in O(R log log n + n) time [23]. Notably, an O(R log log n) time algorithm for LCS was also reported in [19]; but their running time excludes a costly preprocessing time of O(n2 log n). For a comprehensive comparison of the well-known algorithms for LCS problem and study of their behaviour in various application environments the readers are referred to [5]. The CLCS problem, on the other hand, has been introduced quite recently by Tsai in [25]. In [25], a dynamic programming formulation for CLCS was presented leading to a O(pn4 ) time algorithm to solve the problem, where p is the length of the third string which applies the constraint. Later, Chin et al. [7] and independently, Arslan and Eğecioğlu [2] presented improved CLCS algorithm with O(pn2 ) time and space complexity. The problem was also studied 6 Employing different techniques, the same worst case bound was achieved in [10]. In particular, for most texts, the achieved time complexity in [10] is O(hn2 / log n), where h ≤ 1 is the entropy of the text. International Workshop on Combinatorial Algorithms 07 95 very recently in [13], where an algorithm running in O(pR log log n+n) time was devised. In this paper, we revisit the LCS and CLCS problems, but in a different setting: instead of standard strings, we consider i-strings, where at each position the string may contain a set of characters. The motivation of our study comes from the fact that i-strings are extensively used in molecular biology to express polymorphism in DNA sequences, e.g. the polymorphism of protein coding regions caused by redundancy of the genetic code or polymorphism in binding site sequences of a family of genes. To the best of our knowledge, the only work in the literature in this context is the recent paper [14], where a finite automata based solution for the CLCS problem for i-strings was presented with worst case running time O(|Σ|pn2 ). Here we present a number of improved algorithms to solve both LCS and CLCS problems for i-strings. In particular, we have used some novel techniques to preprocess the given i-strings, which let us use the corresponding solutions for normal strings to get efficient solution for i-strings. The rest of the paper is organized as follows. In Section 2, we present the preliminary concepts and formally define the problems handled in this paper. In Section 3, we handle Problem LCS for i-strings (Problem ILCS). To elaborate, in Sections 3.1 to 3.3, we present three different preprocessing steps to get efficient algorithms to solve Problem ILCS. In particular, in Section 3.1, we reduce the problem at hand to boolean matrix multiplication problem and uses the fast multiplication of boolean matrices to gain efficiency. In Sections 3.2 and 3.3 we take a different approach and consider algorithms with running time dependent on R. Particularly in Section 3.3, we have (implicitly) used efficient data structure of [23] for special type of range maxima queries to get efficient algorithms for ILCS. Notably, the use of RMQ in solving LCS problems and variants thereof was explored in [19] and later in [12, 22, 23]. Thus, the preprocessing step in Section 3.3, could be viewed as an extension of the work of [19, 12, 22, 23]. We handle the CLCS problem for i-strings (Problem ICLCS) in Section 4. In particular, we present two different preprocessing steps for ICLCS in Sections 4.1 and 4.2 extending the ideas and techniques used in Sections 3.1 and 3.2 respectively. Finally, we briefly conclude in Section 5. 2 Preliminaries We use LCS(X, Y ) to denote a longest common subsequence of X and Y . We denote the length of LCS(X, Y ) by L(X, Y ). Given two strings X[1..n] and Y [1..n] and a third string Z[1..p], a common subsequence S of X, Y is said to be constrained by Z if, and only if, Z is a subsequence of S. We use LCSZ (X, Y ) to denote a longest common subsequence of X and Y that is constrained by Z. We denote the length of LCSZ (X, Y ) by LZ (X, Y ). Example 1. Suppose X = T CCACA, Y = ACCAAG and Z = AC. As is evident from Fig. 1, S 1 = CCAA is an LCS(X, Y ). However, S 1 is not an LCSZ (X, Y ) because Z is not a subsequence of S 1 . On the other hand, S 2 = ACA is an LCSZ (X, Y ). Note that, in this case rZ (X, Y ) < r(X, Y ). 96 International Workshop on Combinatorial Algorithms 07 X= T C C A C A X= T C C A C A Y = A C C A A G Y = A C C A A G Z= A C Fig. 1. |LCS(X, Y )| = 4 and |LCSZ (X, Y )| = 2. In this paper, we are interested in indeterminate strings (i-strings, in short). In e contrast, usual strings are called here standard strings. A string X[1..n] is said to |Σ| be indeterminate, if it is built over the potential 2 −1 non-empty sets of letters e 1 ≤ i ≤ n can be thought of as a set of characters and belonging to Σ. Each X[i], e e denoted by |X|, e is we have |X[i]| ≥ 1, 1 ≤ i ≤ n. The length of the i-string X, the number of sets (of characters) in it, i.e., n. In this paper, the set containing the letters A and C will be denoted by [AC] and the singleton [C] will be simply denoted by C for ease of reading. Also, we use the following convention: we use plain letters like X to denote normal strings. The same letter may be used to e For i-strings, the notion of symbol equality denote a i-string if written as X. is extended to single-symbol match between two (indeterminate) letters in the following way. Given two subsets A, B ⊆ Σ we say that A matches B and write A ≈ B iff A ∩ B 6= ∅. Note that, the relation ≈, referred to as the ‘indeterminate equality’ henceforth, is not transitive. Example 2. e = AC[CT G]T G[AC]C and Ye = T C[AT ][AT ]T T C. Suppose we have i-strings X e e e Here we have X[3] ≈ Y [3] because X[3] = [CT G] ∩ Ye [3] = [AT ] = T 6= ∅. e ≈ Ye [1], and also X[3] e ≈ Ye [2] etc. Similarly we have, X[3] We can extend the notion of a subsequence for i-strings in a natural way replacing e is a the equality of symbols by the relation ≈ as follows. A subsequence of X plain string U over Σ which is an element of the finite (but usually of exponential e 1 ]X[i e 2 ] . . . X[i e p ], where 1 ≤ i1 < i2 . . . < ip ≤ n, p ≥ 0. size) language X[i e The notion of common and longest Similarly, we define a supersequence of X. common subsequence for i-strings can now be extended easily. In what follows, e = |Ye | = n. But our results can be for the ease of exposition, we assume that |X| e e easily extended when |X| = 6 |Y |. We are interested in the following two problems. e Problem “ILCS” (LCS for Indeterminate Strings). Given 2 i-strings X e Ye ). and Ye we want to compute an LCS(X, Problem “CILCS” (CLCS for Indeterminate Strings). Given 2 i-strings e and Ye and another (plain) string Z, we want to compute an LCSZ (X, e Ye ). X Example 3. Suppose, we are given the i-strings e = [AF ]BDDAAA, Ye = [AC]BA[CD]AA[DF ], Z = BDD X International Workshop on Combinatorial Algorithms 07 97 e Ye ) and an LCSZ (X, e Ye ). Note that, although L(X, e Ye ) = Figure 2 shows an LCS(X, e e 5, LZ (X, Y ) = 4. e X e Y [AF ] B D D AA A [AC] B A [CD] A A [DF ] e Y e) A B An LCS(X, D AA e X [AF ] B D D AAA e [AC] B A [CD] A A [DF ] Y Z B D D e Y e) A B An CLCSZ (X, D D e Ye ) and LCSZ (X, e Ye ) of Example 3. Fig. 2. LCS(X, 3 Algorithm for ILCS In this section, we devise efficient algorithms for Problem ILCS. We start with a brief review of the traditional dynamic programming technique employed to solve LCS [27] for standard strings. Here the idea is to determine the longest common subsequences for all possible prefix combinations of the input strings. The recurrence relation for extending the length of LCS for each prefix pair (X[1..i], Y [1..j]), i.e. L(X[1..i], Y [1..j]), is as follows [27]: T [i, j] = ( 0 max(T [i − 1, j − 1] + δ(i, j), T [i − 1, j], T [i, j − 1]), if i = 0 or j = 0, otherwise, (1) where δ(i, j) = 0 if X[i] = Y [j]; otherwise δ(i, j) = 1. Here we have used the tabular notion T [i, j] to denote L(X[1..i], Y [1..j]). After the table has been filled, L(X, Y ) can be found in T [n, n] and LCS(X, Y ) can be found by backtracking from T [n, n] (for detail please refer to [27] or any textbook on algorithms, e.g. [9]). It is easy to see that Equation 1 can be realized easily in O(n2 ) time. Interestingly, Equation 1 can be adapted to solve Problem ILCS quite easily. The only thing we need to do is to replace the equality check (‘=’) in Equation 1 with the indeterminate equality (‘≈’). However, this ‘simple’ change affects the running time of the algorithm because instead of the constant time equality check we need to perform intersection between two sets. To deduce the precise running time of the resulting algorithm, we make the assumption that the sets of characters for each input i-strings are given in sorted order. Under this assumption, we can perform the intersection operation in O(|Σ|) time in the worst case, since there can be at most |Σ| characters in a set of a i-string. So we have the following theorem. Theorem 1. Problem ILCS can be solved in O(|Σ|n2 ) time. In the rest of this section, we try to devise algorithms giving better running times than what is reported in Theorem 1. We assume that the alphabet Σ is 98 International Workshop on Combinatorial Algorithms 07 indexed, which is the case in most practical situations. We also assume that the sets of characters for each input i-strings are given in sorted order. Recall that the latter assumption is required to get the running time of Theorem 1. To improve the running time, we plan to do some preprocessing to realize Equation 1 more efficiently. In particular, we want to preprocess the two given strings so that the indeterminate equality check can be realized in O(1) time. In the next subsections, we present three different preprocessing steps and analyze the time and space complexity of the resulting algorithms. 3.1 Preprocessing 1 for ILCS Here the idea is to first compute a table I[i, j], 1 ≤ i, j ≤ n as defined below: ( e T Ye [j] 6= ∅ 1 If X[i] I[i, j] = (2) 0 Otherwise. It is easy to realize that, with that table I in our hand, we can realize Equation 1 in O(n2 ) time because the indeterminate equality check reduces to the constant time checking of the corresponding entry in I-table. Now it remains to see how efficiently we can compute the table I. Recall that, our ultimate goal is to get a overall running time better than the one reported in Theorem 1. To compute e 1 ≤ i ≤ n and Ye [j], 1 ≤ j ≤ n as a binary the table I, we first encode each X[i], ee and Yee to denote the encodings for X e vector of size |Σ| as follows. We use X e e and Y , respectively. For all 1 ≤ i ≤ n and c ∈ |Σ|, the encoding for X[i] is defined below: ( e ee [i][c] = 1 If c ∈ X[i] (3) X 0 Otherwise. ee and Yee as two The encoding for Ye is defined analogously. Now, we can view X ordered lists having n binary vectors each, where each vector is of size |Σ|. And it is easy to realize that the computation of I reduces to the matrix multiplication ee and Yee . To speed-up this computation, we perform the following trick. of X Without loss of generality, we assume that n is divisible by |Σ| We divide both ee and Yee in square partitions, each partition having size the boolean matrices X |Σ| × |Σ|. Now we can perform the matrix multiplication by performing square matrix multiplication of the constituent square blocks. Next, we analyze the running time of the preprocessing discussed above. The e and Ye require O(n|Σ|) time and space. For encoding of the two input i-strings X square matrix multiplication, the best known algorithm is due to Coppersmith and Winograd [8]. Their algorithm works in O(N 2.376 ) time, where the involved matrices are of size N × N . Now, recall that in our case the square matrices are of size |Σ| × |Σ|. Also, it is easy to see that, in total, we need (n/|Σ|)2 such computation. Therefore, the worst case computational effort required is O((n/|Σ|)2 × |Σ|2.376 ) = O(n2 |Σ|0.376 ). To sum up, the total time required to solve Problem ILCS is O(n|Σ| + n2 |Σ|0.376 ) + n2 ) = O(n|Σ| + n2 |Σ|0.376 )) in the worst case. This implies the following result. International Workshop on Combinatorial Algorithms 07 99 Theorem 2. Problem ILCS can be solved in O(n|Σ| + n2 |Σ|0.376 ) time. Before concluding this section, we briefly review some other boolean matrix multiplication results that may be used in our algorithm. There is a simple so called “Four Russians” algorithm of Arlazarov et al. [1], which performs Boolean N × N matrix multiplication in O(N 3 / log N ) time. This was eventually improved slightly to O(N 3 / log1.5 N ) time by Atkinson and Santoro [3]. Rytter [24] and independently Basch, Khanna, and Motwani [4] gave an O(N 3 / log2 N ) algorithm for Boolean matrix multiplication on the (log n)-word RAM. Similar result also follows from a very recent paper [28]. Therefore problem ILCS can be solved in O(n|Σ| + n2 |Σ|/ log2 |Σ|)) time without algebraically sophisticated matrix multiplication. 3.2 Preprocessing 2 for ILCS We first present some notations needed to discuss the next preprocessing phase. We define the term ℓ as follows: e ℓ = max{|X[i]|, |Ye [i]| | 1 ≤ i ≤ n}. e ≈ Ye [j]. The set of Also, we say a pair (i, j), 1 ≤ i, j ≤ n defines a match, if X[i] all matches, M, is defined as follows: e ≈ Ye [j], 1 ≤ i, j ≤ n}. M = {(i, j) | X[i] Observe that, R = |M|. Algorithm 1 Computation of the Table I 1: for each i, j do I[i, j] = 0; 2: for each a ∈ Σ do 3: LXe [a] := ∅; LYe [a] := ∅; 4: for i = 1 to n do e do 5: for each a ∈ X[i] 6: insert(i, LXe [a]) e for each b ∈ Y [i] do insert(i, LYe [b]) 7: for each a ∈ Σ do 8: for each i ∈ LXe [a] do 9: for each j ∈ LYe [a] do I[i, j] = 1. In the algorithm we pre-compute M, and then fill up the table I. We proceed as follows. We construct, for each symbol a ∈ Σ, two separate lists, LXe [a] and e LYe [a]. For each a ∈ Σ, LXe [a] (LYe [a]) stores the positions where a appears in X (Ye ), if any. We have for 1 ≤ i, j ≤ n I[i, j] = 1 ⇔ ∃ (a ∈ Σ) such that (i ∈ LXe [a]) and (j ∈ LYe [a]) 100 International Workshop on Combinatorial Algorithms 07 The initialization of the table I requires O(n2 ) time. The constructions of the e and Ye in lists LXe [a], LYe [a], a ∈ Σ can be done in (nℓ) time simply by scanning X turn. Traversing the two lists LX [a] and LY [a] for each a ∈ Σ to fill up I requires O(Rℓ) time. This follows from the fact that there are in total R positions where we can have a match and at each such position we can have up to ℓ matches. Thus the total running time required for the preprocessing is O(nℓ + n2 + Rℓ). Theorem 3. Problem ILCS can be solved in O(nℓ + n2 + Rℓ) time. We remark that, in the worst case we have ℓ = |Σ|. However, ℓ can be much smaller than |Σ| in many cases. Also, Rℓ and nℓ are ‘pessimistic’ upper bounds e in the sense that rarely for all 1 ≤ i ≤ n, we will have |X[i]| = ℓ (|Ye [i]| = ℓ). Also, in the worst case we have R = O(n2 ). But in many practical cases R turns out to be o(n2 ), or even O(n). Another remark is that to compute an actual LCS, we will additionally require O(nℓ) time in the worst case. 3.3 Preprocessing 3 for ILCS The preprocessing of Section 3.2 can be slightly modified to devise an efficient algorithm based on the parameter R. There exist efficient algorithms for computing LCS depending on parameter R. The recent work of Rahman and Iliopoulos [23] presents an O(R log log n+n) algorithm (referred to as LCS-II in [23]) for computing the LCS. LCS-II computes the set M, sorts it in a ‘prescribed’ order and then considers each (i, j) ∈ M and do some useful computation (instead of performing computation for all n2 entries in the usual dynamic programming matrix). The efficient computation in LCS-II is based on the use of the famous vEB data structure invented by van Emde Boas [26] to solve a restricted dynamic version of the Range Maxima Query problem. The vEB data structure allows us to maintain a sorted list of integers in the range [1..n] in O(log log n) time per insertion and deletion. In addition to that it can return next(i) (successor element of i in the list) and prev(i) (predecessor element of i in the list) in constant time. On the other hand, the range maxima query (RMQ) problem is to preprocess an array of numbers to answer queries to find the maximum in a given range of the array. In [23], the authors observed that to compute the LCS, one needs only solve a restricted but dynamic version of RMQ problem7 . Using the vEB structure, they then solved this restricted RMQ problem in O(R log log n) time per update and per query, which leads to an overall O(R log log n + n) time algorithm for computing the LCS. Now the essential thing about LCS-II is that if M is computed and supplied to it, it can compute the LCS in O(R log log n) time. Based on the results of [23], we get the following theorem. Theorem 4. Problem ILCS can be solved in O(Rℓ + R log log n + n) time. 7 Similar ideas were also utilized in [19] to solve LCS although employing a slightly different strategy and using a different data structure. International Workshop on Combinatorial Algorithms 07 101 Proof. (Sketch) We can slightly change Algorithm 1 to compute the set M instead of computing the table I. This can be done simply by replacing Step 13 of Algorithm 1 with the following statement: S M = M (i, j). We also need to initialize M to ∅ just before the for loop of Step 9. Now, for plain strings, M can be computed in O(R) time. But, for i-strings, it requires O(Rℓ) time, because at each match position we may have up to ℓ matches in the worst case. However, recall that our goal is to use LCS-II for which we need M to be in a prescribed order. This however, can be done using the same preprocessing algorithm (Algorithm Pre) used in [23]. Algorithm Pre computes M requiring O(R + n) time (for normal strings) and using the vEB structures maintain the prescribed order spending O(R log log n) time. In our case, we have already computed M for the degenerate strings, and hence, we use Algorithm Pre, only to maintain the orders. Therefore, in total our preprocessing requires O(Rℓ + R log log n) time. Finally, once M (for the degenerate strings) is computed in the prescribed order, we can employ LCS-II directly to solve Probelem ILCS, requiring a further O(R log log n) time. Therefore, in total, the running time to solve Problem ILCS remains O(Rℓ+R log log n+n) in the worst case. ¤ Remark 1. LCS-II can compute the actual LCS in O(L(X, Y )) time. However, in our adaptation of that algorithm for i-strings, we will need O(L(X, Y ) × ℓ) time because we don’t keep track of the matched character and therefore, are required to do the intersection operations to find a match. However, this can be reduced to O(L(X, Y )) simply by keeping track of the matched character (at least one of them if there exists more) in the set M. 4 Algorithm for CILCS In this section, we present algorithms to solve Problem CILCS, i.e. Constrained LCS problem for i-strings. We follow the same strategy of Section 3: we use the best known dynamic programming algorithm for CLCS and try to devise an efficient algorithm for CILCS by doing some preprocessing. We use the dynamic programming formulation for CLCS presented in [2]. Extending our tabular notion from Equation 1, we use T [i, j, k], 1 ≤ i, j ≤ n, 0 ≤ k ≤ p to denote LZ[1..k] (X[1..i], Y [1..j]). We have the following formulation for Problem CLCS from [2]. T [i, j, k] = max{T ′ [i, j, k], T ′′ [i, j, k], T [i, j − 1, k], T [i − 1, j, k]} (4) 102 International Workshop on Combinatorial Algorithms 07 where  1 + T [i − 1, j − 1, k − 1] if(k = 1 or     (k > 1 and T [i − 1, j − 1, k − 1] > 0)) T ′ [i, j, k] =  and X[i] = Y [j] = Z[k].    0 otherwise. (5) and   1 + T [i − 1, j − 1, k] if (k = 0 or T [i − 1, j − 1, k] > 0) ′′ (6) T [i, j, k] = and X[i] = Y [j].   0 otherwise. The following boundary conditions are assumed in Equations 4 to 6: T [i, 0, k] = T [0, j, k] = 0, 0 ≤ i, j ≤ n, 0 ≤ k ≤ p. It is straightforward to give an O(n2 p) algorithm realizing the dynamic programming formulation presented in Equations 4 to 6. Now, for CILCS, we have to make the following changes. First of all, all equality checks of the form X[i] = Y [j] have to be replaced by: e ≈ Ye [j]. X[i] (7) Here, the constant time operation is replaced by an O(|Σ|) time operation in the worst case. On the other hand, all the triple equality checks of the form X[i] = Y [j] = Z[k] have to be replaced by the check: \ e Ye [j]. (8) Z[k] ∈ X[i] Once again, the constant time operations are replaced by O(|Σ| + log |Σ|) time operations. So we have the following theorem. Theorem 5. Problem CILCS can be solved in O(|Σ|n2 p) time. As before, our goal is to do some preprocessing to facilitate O(1) time realization of the Check 7 and 8 and thereby improve the running time reported in Theorem 5. 4.1 Preprocessing 1 for CILCS It is clear that we need the table I as defined in Section 3.1 for constant time realization of Check 7. In addition to that, to realize Check 8 in constant time, we compute two more tables BXe [i, k], for 1 ≤ i ≤ n, 1 ≤ k ≤ p and BYe [j, k], 1 ≤ j ≤ n, 1 ≤ k ≤ p, as defined below: ( e 1 If Z[k] ∈ X[j] (9) BXe [i, k] = 0 Otherwise. International Workshop on Combinatorial Algorithms 07 BYe [j, k] = ( 1 If Z[k] ∈ Ye [j] 0 Otherwise. 103 (10) It is easy to realize that Check 8 evaluates to be true if, and only if, we have BXe [i, k] = BYe [j, k] = I[i, j] = 1. Therefore, with all three tables pre-computed, we can evaluate Check 8 in constant time. We have already discussed the construction of table I in Section 3.1. The other two tables can be computed exactly in the same way requiring O(np|Σ|0.376 ) time each. Note that p ≤ n. So we have: Theorem 6. Problem CILCS can be solved in O(n|Σ| + n2 |Σ|0.376 + n2 p) time. Theorem 7. Problem CILCS can be solved in O(n|Σ| + n2 |Σ|/ log2 |Σ|) + n2 p) time. However, instead of applying the complex matrix multiplication algorithm to compute BXe and BYe we can do it more simply by employing the set-membership check for each entry in the n × p table. This would require O(np log |Σ|) time to compute BXe and BYe . However, the overall asymptotic running time remains unimproved. 4.2 Preprocessing 2 for CILCS In this section, we adapt the preprocessing of Section 3.2 to solve Problem CILCS. We first define the set of all ‘triple’ matches, M3 , as follows: ^ \ e ≈ Ye [j] Z[k] ∈ X[i] e M3 = {(i, j, k) | X[i] Ye [j], 1 ≤ i, j ≤ n, 1 ≤ k ≤ p}. We assume that R3 = |M3 |. Now our goal is to compute the table I3 [i, j, k], 1 ≤ i, j ≤ n, 1 ≤ k ≤ p as defined below: ( e T Ye [j] 1 If Z[k] ∈ X[i] (11) I3 [i, j, k] = 0 Otherwise. It is easy to realize that, with table I and I3 , we can evaluate, respectively, Check 7 and 8 in constant time each. And it is quite straightforward to adapt Algorithm 1 to compute I3 (please see Algorithm 2). All we need to do is to construct lists LZ [a], (similar to LXe [a] and LYe [a]) for each symbol a ∈ Σ and incorporate it in the for loop of Step 9. The preprocessing time to compute I3 can be easily deduced following the analysis of Algorithm 1. To compute I3 we e Ye need to create 3 ∗ |Σ| lists. These can be constructed by simply scanning X, and Z in turn requiring in total O(2×nℓ+n) = O(nℓ) time in the worst case. The initialization of I3 requires O(n2 p) time. The filling up of I3 requires O(R3 ℓ) time. Note that we also need to compute I. Thus the total running time required for the preprocessing is O(nℓ + n2 + n2 p + Rℓ + R3 ℓ) = O(nℓ + n2 p + ℓ(R + R3 )). Since, we already have an n2 p component in the above running time, the total running time for the CILCS problem remains same as above. 104 International Workshop on Combinatorial Algorithms 07 Algorithm 2 Computation of the Table I3 1: for a ∈ Σ do e in L e [a] in sorted order 2: Insert the positions of a in X X e 3: Insert the positions of a in Y in LYe [a] in sorted order 4: Insert the positions of a in Z in LZ [a] in sorted order 5: for i = 1 to n do 6: for j = 1 to n do 7: for k = 1 to p do 8: I[i, j, k] = 0. 9: for a ∈ Σ do 10: for i ∈ LXe [a] do 11: for j ∈ LYe [a] do 12: for k ∈ LZ [a] do 13: I[i, j, k] = 1. Theorem 8. Problem CILCS can be solved in O(nℓ + n2 p + ℓ(R + R3 )) time. We remind that the remarks in Section 3.2, regarding ℓ and R, applies here as well. We further remark that, in the worst case we have R3 = n2 p. But in many practical cases R3 may turn out to be o(n2 p). Also, since ℓ can be much smaller than |Σ| in many cases, R3 ℓ remains as a ‘pessimistic’ upper bound. 5 Conclusion In this paper, we have studied the classic and well-studied longest common subsequence (LCS) problem and a recent variant of it namely the constrained LCS (CLCS) problem, when the inputs are i-strings. In LCS, given two strings, we want to find the common subsequence having the highest length; in CLCS, in addition to that, the solution to the problem must also be a supersequence of a third given string. We have presented efficient algorithms to solve both LCS and CLCS for i-strings. In particular, we have used some novel techniques to preprocess the given strings, which lets us use the corresponding DP solutions for normal string to get efficient solution for i-strings. It would be interesting to see how well the presented algorithms behave in practice and compare them among themselves on the basis of their practical performance. Acknowledgement We would like to express our gratitude to the anonymous reviewers for their helpful comments and especially for pointing out that the similar techniques used in [12, 23] to solve LCS have also been used in [19] to get efficient algorithms. International Workshop on Combinatorial Algorithms 07 105 References 1. V. Arlazarov, E. Dinic, M. Kronrod, and I. Faradzev. On economic construction of the transitive closure of a directed graph (english translation). Soviet Math. Dokl., 11:1209–1210, 1975. 2. A. N. Arslan and Ö. Eğecioğlu. Algorithms for the constrained longest common subsequence problems. Int. J. Found. Comput. Sci., 16(6):1099–1109, 2005. 3. M. D. Atkinson and N. Santoro. A practical algorithm for boolean matrix multiplication. Inf. Process. Lett., 29(1):37–38, 1988. 4. J. Basch, S. Khanna, and R. Motwani. On diameter verification and boolean matrix multiplication. Technical Report, Department of Computer Science, Stanford University, (STAN-CS-95-1544), 1995. 5. L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval (SPIRE), pages 39–48. IEEE Computer Society, 2000. 6. H. Broersma, S. S. Dantchev, M. Johnson, and S. Szeider, editors. Algorithms and Complexity in Durham 2007 - Proceedings of the Third ACiD Workshop, 17-19 September 2007, Durham, UK, volume 9 of Texts in Algorithmics. King’s College, London, 2007. 7. F. Y. L. Chin, A. D. Santis, A. L. Ferrara, N. L. Ho, and S. K. Kim. A simple algorithm for the constrained sequence problems. Inf. Process. Lett., 90(4):175– 179, 2004. 8. D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. J. Symb. Comput., 9(3):251–280, 1990. 9. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press and McGraw-Hill Book Company, 1989. 10. M. Crochemore, G. M. Landau, and M. Ziv-Ukelson. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In Symposium of Discrete Algorithms (SODA), pages 679–688, 2002. 11. J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest subsequences. Commun. ACM, 20(5):350–353, 1977. 12. C. S. Iliopoulos and M. S. Rahman. Algorithms for computing variants of the longest common subsequence problem. Theoretical Computer Science, 2007. To Appear. 13. C. S. Iliopoulos and M. S. Rahman. New efficient algorithms for LCS and constrained LCS problem. In Broersma et al. [6], pages 83–94. 14. C. S. Iliopoulos, M. S. Rahman, M. Voracek, and L. Vagner. Computing constrained longest common subsequence for degenerate strings using finite automata. In Broersma et al. [6], pages 95–106. 15. T. Jiang and M. Li. On the approximation of shortest common supersequences and longest common subsequences. SIAM Journal of Computing, 24(5):1122–1139, 1995. 16. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Problems in Information Transmission, 1:8–17, 1965. 17. D. Maier. The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2):322–336, 1978. 18. W. J. Masek and M. Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18–31, 1980. 19. V. Mkinen, G. Navarro, and E. Ukkonen. Transposition invariant string matching. Journal of Algorithms, 56:124–153, 2005. 106 International Workshop on Combinatorial Algorithms 07 20. E. W. Myers. An o(nd) difference algorithm and its variations. Algorithmica, 1(2):251–266, 1986. 21. N. Nakatsu, Y. Kambayashi, and S. Yajima. A longest common subsequence algorithm suitable for similar text strings. Acta Inf., 18:171–179, 1982. 22. M. S. Rahman and C. S. Iliopoulos. Algorithms for computing variants of the longest common subsequence problem. In T. Asano, editor, ISAAC, volume 4288 of Lecture Notes in Computer Science, pages 399–408. Springer, 2006. 23. M. S. Rahman and C. S. Iliopoulos. A new efficient algorithm for computing the longest common subsequence. In M.-Y. Kao and X.-Y. Li, editors, AAIM, volume 4508 of Lecture Notes in Computer Science, pages 82–90. Springer, 2007. 24. W. Rytter. Fast recognition of pushdown automaton and context-free languages. Information and Control, 67(1-3):12–22, 1985. 25. Y.-T. Tsai. The constrained longest common subsequence problem. Inf. Process. Lett., 88(4):173–176, 2003. 26. P. van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Inf. Process. Lett., 6:80–82, 1977. 27. R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168–173, 1974. 28. R. Williams. Matrix-vector multiplication in sub-quadratic time (some preprocessing required). In SODA, pages 1–11. ACM Press, 2007. International Workshop on Combinatorial Algorithms 07 107 Three NP-Complete Optimization Problems in Seidel’s Switching Eva Jelı́nková Department of Applied Mathematics Faculty of Mathematics and Physics, Charles University Malostranské nám. 25, 118 00 Praha 1, Czech Republic eva@kam.mff.cuni.cz Abstract. Seidel’s switching is a graph operation which makes a given vertex adjacent to precisely those vertices to which it was non-adjacent before, while keeping the rest of the graph unchanged. Two graphs are called switching-equivalent if one can be made isomorphic to the other by a sequence of switches. In this paper, we show the NP-completeness of the problem Switch-cnClique for each c ∈ (0, 1): determine if a graph G is switching-equivalent to a graph containing a clique of size at least cn, where n is the number of vertices of G. We also prove the NP-completeness of the problems Switch-Max-Edges and Switch-Min-Edges which decide if a given graph is switching-equivalent to a graph having at least or at most a given number of edges, respectively. 1 Introduction The concept of Seidel’s switching was introduced by a Dutch mathematician J. J. Seidel in connection with algebraic structures, such as systems of equiangular lines, strongly regular graphs, or the so-called two-graphs. For more structural properties of two-graphs, cf. [11–13]. Since then, switching has been studied by many others. Apart from the algebraic structures, consequences of switching arise in other research fields as well; for example, Seidel’s switching plays an important role in Hayward’s polynomial-time algorithm for solving the P3 -structure recognition problem [6]. As proved by Colbourn and Corneil [4] (and independently by Kratochvı́l et al. [9]), deciding whether two given graphs are switching equivalent is an isomorphism-complete problem. In this paper, we prove the NP-completeness of several problems related to Seidel’s switching. We examine the complexity of deciding if a given graph is switching-equivalent to a graph having a certain desired property. This paradigm has already been addressed by several authors. As observed by Kratochvı́l et al. [9] and also by Ehrenfeucht at al. [5], there is no correlation between the complexity of the problem and the complexity of the property P itself. For example, Kratochvı́l et al. [9] proved that any graph is switching-equivalent to a graph containing a Hamiltonian path, and it is polynomial to decide if a graph 108 International Workshop on Combinatorial Algorithms 07 is switching-equivalent to a graph containing a Hamiltonian cycle. However, the problems to decide if a graph itself contains a Hamiltonian path or cycle are well known to be NP-complete [7]. On the other hand, the problem of deciding switching-equivalence to a regular graph was proven to be NP-complete by Kratochvı́l [10], and switchingequivalence to a k-regular graph for a fixed k is polynomial, while both the regularity and k-regularity of a graph can be tested polynomially. Three-colorability and switching-equivalence to a three-colorable graph are both NP-complete [5]. One of the problems we address in this paper is deciding switching-equivalence to a graph containing a clique of a certain size. It is well known that deciding for instances (G, k) if the graph G itself contains a clique of size at least k is NPcomplete [7]. The corresponding switching problem, to decide for instances (G, k) if G is switching-equivalent to a graph with a clique of size at least k, was shown by Ehrenfeucht et al. [5] to be NP-complete as well. If k is fixed (not part of the input), then the problem can be solved by testing all induced subgraphs of size k. The whole graph G is switching-equivalent to a graph with a k-clique if and only if at least one induced subgraph of G on k vertices is switching-equivalent to a clique, and that can be determined in polynomial time. In Section 3 we extend the results of Ehrenfeucht et al. [5] by proving the NP-completeness of deciding switching-equivalence to a graph with a clique of size at least cn, where n is the number of vertices of G, for every fixed constant c in (0, 1). We further examine the complexity of problems Switch-Min-Edges and Switch-Max-Edges which for instances (G, k) decide if G is switching-equivalent to a graph with at most or at least k edges, respectively. We prove that both problems are NP-complete. Such a result may be unexpected, because switching a vertex affects the number of edges in a simple way. On the other hand, Suchý [14] recently proved that the problems SwitchMin-Edges and Switch-Max-Edges are fixed-parameter tractable. Hence, for fixed k they are polynomial, which complements our result. This paper is organized as follows. In Section 2, we introduce the notation and definitions used throughout the paper. In Section 3, we prove the NPcompleteness of Switch-cn-Clique. In Section 4 we prove the NP-completeness of Switch-Min-Edges and Switch-Max-Edges, and describe a connection of these problems to graph theoretic codes and the Maximum Likelihood Decoding problem. 2 2.1 Basic Definitions Preliminaries In this paper, we use the standard graph theoretic notation. Unless defined otherwise, by n we denote¡ the ¢ number of vertices of the currently discussed graph. The graph G = (V, V2 ) is called a complete graph and denoted by Kn . A complete subgraph on k vertices is called a k-clique. A path with n vertices is denoted by Pn , and a graph with n vertices and no edges is called discrete and denoted by In . The symmetric difference of sets A and B is denoted by A △ B. International Workshop on Combinatorial Algorithms 07 2.2 109 Seidel’s Switching Definition 1. Let G be a graph. Seidel’s switch of a vertex v ∈ VG results in the graph called S(G, v) whose vertex set is the same as of G and the edge set is the symmetric difference of EG and the full star centered in v, i. e., VS(G,v) = VG ES(G,v) = EG \ {xv : x ∈ VG , xv ∈ EG }) ∪ {xv : x ∈ VG , x 6= v, xv 6∈ EG }. It is easy to observe that the result of a sequence of vertex switches in G depends only on the parity of the number of times each vertex is switched. This allows generalizing switching to vertex subsets of G. Definition 2. Let G be a graph. Then the Seidel’s switch of a vertex subset A ⊆ VG is called S(G, A) and S(G, A) = (VG , EG △ {xy : x ∈ A, y ∈ VG \ A}). We say that two graphs G and H are switching equivalent (denoted by G ∼ H) if there is a set A ⊆ VG such that S(G, A) is isomorphic to H. 3 Searching for a Switch with a cn-Clique In this section, we consider the following problem. Problem: Switch-cn-clique Input: A graph G on n vertices Question: Is G switching-equivalent to a graph containing a clique of size at least cn? Theorem 1. The problem Switch-cn-clique is NP-complete for any c ∈ (0, 1). Proof. We prove the theorem in two steps: first we prove the statement for rational numbers c only; then we extend it to numbers c which are irrational. For rational c, it is clear that the problem is in NP—a polynomial-size certificate contains vertex subsets A and C such that S(G, A)[C] is a clique of the desired size. In the case of irrational c, we assume that an oracle can be used for querying the bits of c in constant time. This ensures that the certificate can be checked in time polynomial in n. In the first step, we show the NP-hardness of the problem by reducing SAT to it, whereas in the second step we reduce 3-SAT. Both SAT and 3-SAT are well known to be NP-complete [7]. Suppose that c is rational and equal to pq , where p, q ∈ N, and p < q. We have an instance of SAT: a formula ϕ in CNF with k clauses and l occurrences of literals, and ask if ϕ is satisfiable. Without loss of generality we can assume that k < l and k ≥ 2. 110 International Workshop on Combinatorial Algorithms 07 Let G = Gp,q (ϕ) be a graph constructed in the way illustrated in Fig. 1. The vertices of G are VG = L ∪ K ∪ Z, where L, K, Z are pairwise disjoint and |L| = l, |K| = pl + p − k, |Z| = (q − p − 1)(l + 1) + k + 1. K of size pl + p − k L of size l Z of size (q − p − 1)(l + 1) +k+1 Fig. 1. The graph Gp,q (ϕ). The edges of G are defined as follows: – K induces a clique and every vertex in K is adjacent to all vertices in L and no vertex in Z. – Every vertex in Z is adjacent to all vertices in L and nothing more. – Vertices of L represent occurrences of literals in ϕ. Two vertices l1 , l2 ∈ L are adjacent if and only if • l1 and l2 occur in different clauses and • they are not in the form l1 = ¬l2 nor l2 = ¬l1 . Lemma 1. Let j be an integer. The graph Gp,q (ϕ)[L] contains a j-clique if and only if the formula ϕ contains j simultaneously satisfiable clauses. Proof. Mutually adjacent vertices of Gp,q (ϕ)[L] correspond to simultaneously satisfiable literals in distinct clauses. ⊓ ⊔ Corollary 1. The formula ϕ is satisfiable if and only if Gp,q (ϕ)[L] contains a clique of size k (where k is the number of clauses in ϕ). Let us now consider cliques of size pl + p in the whole graph—either in the original graph G or in its switches. The reader can verify that n = |L ∪ K ∪ Z| = ql + q and (pl + p)/n = c, therefore cliques of size pl + p are exactly cn-cliques. International Workshop on Combinatorial Algorithms 07 111 Lemma 2. The following statements are equivalent for G = Gp,q (ϕ). (a) The graph G[L] contains a k-clique. (b) The graph G contains a (pl + p)-clique. (c) There exists a set A ⊆ VG such that S(G, A) contains a (pl + p)-clique. Proof. First we prove that (a) implies (b). Any clique in G[L] forms a larger clique together with all vertices of K. So, if G[L] contains a k-clique, then G[L ∪ K] contains a clique of size k + (pl + p − k) = pl + p. The implication from (b) to (c) is obvious. To prove that (c) implies (a), suppose that there is a set A ⊆ VG such that S(G, A) contains a (pl + p)-clique on a vertex set C. The set C does not contain more than two vertices of Z, because they are pairwise non-adjacent in G and in S(G, A) they induce a bipartite graph. From the assumptions k < l and k ≥ 2 it follows that l > 2, and p ≥ 1, so pl + p > 2. Therefore C contains some vertices of L or K. But all vertices of Z are nonadjacent in G and have the same neighborhood in G[L ∪ K]; surely all vertices in Z ∩ C have the same neighborhood in S(G, A)[C] (otherwise C would not induce a clique). But then either (Z ∩ C) ⊆ A or (Z ∩ C) ∩ A = ∅, so switching A does not affect edges inside S(G, A)[Z ∩ C] and any two vertices in S(G, A)[Z ∩ C] are non-adjacent. Therefore C contains at most one vertex of Z. Since 1 + |K| = 1 + (pl + p − k) < pl + p, the clique C contains at least one vertex of L. But then it cannot contain both vertices of K and Z, because in the graph G they have the same neighborhood in L and there is no edge between K and Z. Also, the set C cannot consist only of vertices of L, because pl + p > l. Therefore C contains one of the following: – pl + p − 1 (which is at least k) vertices of L and one vertex of Z – at least k vertices of L and at least one vertex of K. In both cases, C contains k vertices of L, and a vertex v of K ∪ Z. Since C induces a clique in S(G, A), the vertex v is adjacent to all other vertices in C. But in G, by definition, the vertex v is adjacent to all vertices of L, too. So switching A cannot have changed any edge connecting v and the k vertices, which means that either all these k + 1 vertices are in A or none of them is. But then they induce a (k + 1)-clique in G as well, and G[L] contains a k-clique, which we wanted to prove. ⊓ ⊔ Corollary 1 and Lemma 2 together give us that ϕ is satisfiable if and only if there exists a set A ⊆ VG such that S(G, A) contains a (pl + p)-clique. But we have already shown that pl + p = cn; and clearly a graph contains a clique of size exactly cn if and only if it contains a clique of size at least cn. That concludes the reduction. The graph Gp,q (ϕ) with q(l + 1) = O(l) vertices and O(l2 ) edges can be constructed in time polynomial in the size of ϕ. Hence the problem Switchcn-Clique is NP-complete for every rational constant c ∈ (0, 1). Proving the NP-hardness of Switch-cn-Clique for irrational numbers c is slightly more complicated. We use a theorem of Arora et al. [1] and certain 112 International Workshop on Combinatorial Algorithms 07 number theoretic results to get suitable numbers p, q, and then, analogously to the rational case, we reduce an instance of 3-SAT to Switch-cn-Clique for the graph Gp,q . Due to space limitations, the rest of the proof is placed in the Appendix. ⊓ ⊔ 4 Minimizing the Number of Edges In this section, we prove the NP-completeness of the following problem: Problem: Switch-Min-Edges Input: A graph G, an integer k. Question: Is G switching-equivalent to a graph with at most k edges? The problem Switch-Max-Edges is defined analogously with “at most” replaced by “at least”. It is easy to observe that these two problems are polynomially equivalent. Therefore it suffices to show the NP-completeness of SwitchMin-Edges only, which is done in the proof of Theorem 3. 4.1 The Connection to Maximum Likelihood Decoding ¡ ¢ Let V be a fixed set of n vertices. For an edge set E ⊆ V2 , by χE we denote (n) the characteristic vector of E, i. e., the element of Z2 2 such that χE (e) = 1 if and only if e ∈ E.¡ Thus any graph on the vertex set V can be represented by ¢ a vector of length n2 . The following observation expresses how switching works by means of characteristic vectors. ¡ ¢ Observation 2 Let Kn = (V, V2 ) be the complete graph, let V1 , V2 be a partition of V and let S = {{x, y}, x ∈ V1 , y ∈ V2 } be the corresponding cut in Kn . Then for any G = (V, E), χS(G,V1 ) = χS(G,V2 ) = χE + χS . (Note that the summation is done over Z2 .) Therefore, if we seek a switch of G with the minimum number of edges, we seek a characteristic vector χE + χS with the minimum Hamming weight. Or, equivalently, we seek a cut S in Kn with the minimum Hamming distance between χS and χE . It is a well-known fact that the cut space C ∗ (G) of a graph G is a vector space, and the cycle space C(G) is also a vector space orthogonal to C ∗ (G). The dimension of C ∗ (G) is |V | − 1, and C ∗ (G) can also be viewed as a linear [|E|, |V | − 1] code with a parity-check matrix C whose rows are |E| − |V | + 1 linearly independent characteristic vectors of cycles in G. Such a code is called a graph theoretic code; the concept of graph theoretic codes has been introduced by Hakimi and Frank [8]. The problem of finding a codeword in a linear code that is closest to a given vector is an important problem in coding theory. It can be formulated as a decision problem in the following way. International Workshop on Combinatorial Algorithms 07 113 Problem: Maximum Likelihood Decoding Input: A binary p × q matrix H, a vector r ∈ Zp2 , and an integer w > 0. Question: Is there a vector e ∈ Zq2 of Hamming weight at most w such that He = r? This problem was proven to be NP-complete by Berlekamp et al. [2]. Note that Switch-Min-Edges is indeed its special case, which we formalize in the following lemma (proved in the Appendix). Lemma 3. Switch-Min-Edges is a special case of Maximum Likelihood Decoding, where H is the parity check matrix of the code of cuts in a complete graph. Special cases of Maximum Likelihood Decoding have been studied. It is known that the problem is NP-complete even if we allow unbounded time for preprocessing the code H. This was proven by Bruck and Naor [3] by showing that Maximum Likelihood Decoding is NP-complete for the cut code of a special fixed graph, and therefore no preprocessing can help because this fixed code can be known in advance. Our proof in Subsection 4.2 provides an alternative proof of Bruck and Naor’s result by using Kn as the fixed graph. 4.2 Proof of NP-Completeness We use a reduction of the following well-known NP-complete problem [7]. Problem: Simple-Max-Cut Input: A graph G, an integer j. Question: Does there exist a partition V1 , V2 of VG such that the cut between V1 and V2 in G contains at least j edges? Theorem 3. Switch-Min-Edges is NP-complete. Proof. From an instance (G, j) of Simple-Max-Cut we create an instance (G′ , k) of Switch-Min-Edges in the following way. For each vertex of G we create a corresponding non-adjacent vertex pair in G′ . An edge in G is represented by four edges completely interconnecting the two pairs, and a non-edge in G is represented by only two edges connecting the two pairs in a parallel way. More formally, we set VG′ = {v ′ , v ′′ : v ∈ VG } EG′ = {{u′ , v ′ }, {u′′ , v ′′ } : u, v ∈ VG , u 6= v} ∪ {{u′ , v ′′ }, {u′′ , v ′ } : {u, v} ∈ EG }. The following lemma relates cuts in G with switches of G′ . Lemma 4. The following statements are equivalent: (a) There is a cut in G having at least j edges, ¡ ¢ (b) there exists a set A ⊆ VG′ such that S(G′ , A) contains at most 2 |V2G | + 2|EG | − 4j edges. 114 International Workshop on Combinatorial Algorithms 07 Proof. For a cut C in G we define a corresponding vertex subset A = A(C) of G′ . Suppose that C separates a vertex set V1 from the remaining vertices of G. We set A to be the set {v ′ , v ′′ : v ∈ V1 }. Note that such a set A satisfies the following condition. Definition 3. We say that a vertex subset of G′ is legal if it contains an even number of vertices out of the pair v ′ , v ′′ for every v ∈ VG . Otherwise, we say that it is illegal. Legality is a desired property, because there is an obvious correspondence between legal sets and cuts in G. Also, the number of edges in S(G′ , A) is determined by the size of the cut C. The original graph G′ contains two edges per vertex-pair of G and two more edges per every edge in G, which is ¡ every ¢ 2 |V2G | + 2|EG | edges altogether. Since A is legal, it can easily be checked that every non-edge {u, v} in G corresponds to two edges in both G′ and S(G′ , A), regardless of the cut C. For every edge {u, v} in the cut C we have u′ , u′′ ∈ A and v ′ , v ′′ 6∈ A (or vice versa), so switching A destroys all the four corresponding edges and creates none. For an edge not present in the cut C, switching A does not modify the corresponding edges, so there are still four of them in S(G′ , A). To sum it up, S(G′ , A) has ¶ µ |VG | + 2|EG | − 4|C| 2 2 edges. This proves that the statement (a) implies (b). As for the other implication, by reverting the construction of A(C) from a cut C, we get that it holds for legal sets A. It remains to deal with possible illegal switches. For that purpose, we introduce another definition. Definition 4. We say that a vertex u ∈ VG is broken in A if A contains exactly one vertex of u′ , u′′ . We say that a vertex set {u, v} ⊆ VG is broken in A if at least one of its vertices u, v is broken. Otherwise, we say that it is legal in A. Lemma 5. For every illegal set A there is a legal set A′ such that S(G′ , A′ ) contains at most the same number of edges as S(G′ , A). Proof. Let A be an illegal set. As can be seen in Fig. 2, a broken non-edge never decreases the resulting number of edges, and thus is no more profitable than a legal non-edge. Therefore, if all the broken pairs in A correspond to non-edges, then the set A minus the union of all broken pairs is legal, and it yields a switch with at most the same number of edges as A does. Assume that there are m broken edges, where m > 0. As shown in Fig. 2, a broken edge could in certain cases decrease the number of edges in G′ by more than a legal edge not present in the cut would. We create a legal set A′ from A using the following greedy algorithm. 1. A′ := A. International Workshop on Combinatorial Algorithms 07 115 -4 +0 +0 +0 +4 +1 -1 Fig. 2. The possible legal and illegal switches of non-edges (on the left, in the middle) and edges (on the right) in G and their influence on the number of edges in G′ . 2. Find a vertex v broken in A′ . If there is no such, STOP. 3. Look for vertices u that are legal in A′ and such that {u, v} is an edge in G. If there are more such vertices in A′ than in VG \ A′ , set A′ := A′ \ {v ′ , v ′′ }, otherwise set A′ := A′ ∪ {v ′ , v ′′ }. 4. Go back to step 2. It remains to prove that the algorithm finds a legal set that is better than A. In each iteration of step 3, the algorithm legalizes one vertex. It clearly finishes after a finite number of steps and creates a legal set. It does not modify legal vertices nor legal edges in A, therefore the number of edges in S(G′ , A′ ) corresponding to legal sets in A remains unchanged. Also, legalizing a non-edge does not increase the number of edges in S(G′ , A′ ) in comparison to S(G′ , A). As we already know, any legal set gives us a cut in G; consider the cut obtained from A′ . In each iteration of Step 3, at least half of the newly legalized edges became cut edges with one endpoint in A′ and the other not in A′ . Since each broken edge was legalized exactly once, we know that at least m/2 legalized edges became cut edges (and decreased the edge number by at least 3), and at most m/2 legalized edges became out of the cut (and increased the edge number by at most 1). Therefore, the edge number in S(G′ , A′ ) is lower by at least 3m/2 − m/2 = m, which we assumed to be positive. ⊓ ⊔ To finish the proof of Lemma 4, it remains to prove that (b) implies (a). Let ¡ ¢ S(G′ , A) be a switch with at most 2 |V2G | + 2|EG | − 4j edges. If A is illegal, Lemma 5 assures that there is a legal set A′ such that S(G′ , A′ ) has even less edges. The legal set yields a partition V1 = {v : v ′ , v ′′ ∈ A′ } and V2 = VG \ V1 such that the cut between V1 and V2 contains at least j edges. ⊓ ⊔ According to Lemma 4, the graph G contains a cut of size¡ at least j if and ¢ only if G′ is switching-equivalent to a graph with at most k = 2 |V2G | +2|EG |−4j 116 International Workshop on Combinatorial Algorithms 07 edges. The size of (G′ , k) is surely polynomial in the size of (G, j). That concludes the proof of NP-hardness of Switch-Min-Edges. To prove that Switch-MinEdges is in NP, it suffices to note that the positive answer can be certified by a vertex set A that gives us a switch with the desired number of edges. ⊓ ⊔ Acknowledgments I am grateful to Jan Kratochvı́l for numerous useful tips and valuable discussions; and to Jiřı́ Sgall for advice concerning some relevant references. References 1. S. Arora, C. Lund, R. Motwani, M. Sudan, M. Szegedy: Proof verification and the hardness of approximation problems, Journal of ACM 45(3) (1998), pp. 501–555. An extended abstract appeared in FOCS 1992. 2. E. R. Berlekamp, R. J. McEliece, H. C. A. van Tilborg, On the inherent intractability of certain coding problems, IEEE Trans. Inform. Theory, vol IT-24 (1978), pp. 384–386. 3. J. Bruck, M. Naor, The hardness of decoding linear codes with preprocessing, IEEE Trans. Inform. Theory, vol 36 (1990), pp. 381–384. 4. C. J. Colbourn, D. G. Corneil, On deciding switching equivalence of graphs, Discrete Appl. Math, vol 2 (1980), pp. 181–184. 5. A. Ehrenfeucht, J. Hage, T. Harju, G. Rozenberg: Complexity issues in switching classes, in: Theory and Application to Graph Transformations, LNCS 1764, Springer-Verlag (2000), pp. 59–70. 6. R. B. Hayward: Recognizing P3 -structure: A switching approach, J. Combin. Th. Ser. B 66 (1996), pp. 247–262. 7. M. R. Garey, D. S. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979. 8. S. L. Hakimi, H. Frank: Cut-set matrices and linear codes, IEEE Trans. Inform. Theory, vol. IT-11 (1965), pp. 457–458. 9. J. Kratochvı́l, J. Nešetřil, O. Zýka: On the computational complexity of Seidel’s switching, Proc. 4th Czech. Symp., Prachatice 1990, Ann. Discrete Math. 51 (1992), pp. 161–166. 10. J. Kratochvı́l: Complexity of hypergraph coloring and Seidel’s switching, in: Graph Theoretic Concepts in Computer Science (H. Bodlaender, ed.), Proceedings WG 2003, Elspeet, June 2003, LNCS 2880, Springer Verlag (2003), pp. 297–308. 11. J. J. Seidel: Graphs and two-graphs, in: Proc. 5th. Southeastern Conf. on Combinatorics, Graph Theory, and Computing, Winnipeg, Canada (1974). 12. J. J. Seidel: A survey of two-graphs, Teorie combinatorie, Atti Conv. Lincei, Vol 17, Academia Nazionale dei Lincei, Rome (1973), pp. 481–511. 13. J. J. Seidel, D. E. Taylor: Two-graphs, a second survey, Algebraic methods in graph theory, Vol. II, Conf. Szeged 1978, Colloq. Math. Janos Bolyai 25 (1981), pp. 689–711. 14. O. Suchý: Some Parametrized Problems Related to Seidel’s Switching, to appear in Proceedings of IWOCA 2007. 15. V. V. Vazirani: Approximation Algorithms, Springer-Verlag, 2001. International Workshop on Combinatorial Algorithms 07 117 Appendix Proof (of Theorem 1, continued). To prove the NP-hardness of Switch-cnClique for irrational numbers c, we use a theorem of Arora et al. proved in [1] (and restated in this way in [15] as Lemma 29.10). Theorem 4. (Arora, Lund, Motwani, Sudan, Szegedy) There exists a polynomial time transformation T from 3-CNF to 3-CNF and a constant ε > 0 such that – If ψ is satisfiable, then T (ψ) is satisfiable. – If ψ is not satisfiable, then at most 1 − ε fraction of the clauses of T (ψ) are simultaneously satisfiable. Our aim is to do a reduction from an instance ψ of 3-SAT. We will use the graph Gp,q , like in the previous part of the proof; this time for the transformed formula T (ψ) and for numbers p and q such that pq is sufficiently close to the irrational number c. Then we examine the relationship between cn-cliques and p q -cliques in the resulting graph. To show that some suitable numbers p and q exist, we will make use of Lemma 6, which is a variant of Dirichlet’s Theorem, and Lemma 7. Lemma 6. For any real number α ∈ [0, 1], any ε > 0 and r ∈ R there exists n ∈ N such that n > r and {nα} < ε (where {nα} stands for the decimal fraction of nα). Proof. Without loss of generality we can assume that ε < α and α ∈ (0, 1). We prove by induction that for each k ∈ N0 there exists nk ∈ N such that {nk α} ≤ α 2k and nk+1 > nk for all k. Then we take n = nk for k ≥ max {r, log2 ( αε )}. We set n0 = 1, because {1 · α} ≤ α1 . Now assume that we already have nk for some k ≥ 0 and want to find nk+1 . Let β = {nk α}; we want to get an integer m so that {mβ} ≤ β2 and m > 1. If β = 0, then clearly the inequality {mβ} ≤ β2 holds for any integer m, so we can set m = 2. Otherwise we consider the number ⌊ β1 ⌋β. It is clear that ⌊ β1 ⌋β ≤ 1; in case of an equality we have that {⌊ β1 ⌋β} = 0, while ⌊ β1 ⌋ is nonzero. Hence m can be either ⌊ β1 ⌋ or any its integral multiple larger than 1. The remaining case is that β > 0 and ⌊ β1 ⌋β < 1. Then 1 < ⌈ β1 ⌉β < β + 1, and after subtracting 1 we get that ½» ¼ ¾ 1 β < β. (1) β We want m to be an integer such that {mβ} ≤ β2 . Note that ⌈ β1 ⌉ > 1, since ⌊ β1 ⌋ > 0 for any β ∈ (0, 1). So suppose that m cannot be ⌈ β1 ⌉, because 118 {⌈ β1 ⌉β} > International Workshop on Combinatorial Algorithms 07 β 2. Then we define » ¼ 1 β. δ =β+1− β The assumption {⌈ β1 ⌉β} > β 2 together with (1) imply that δ ∈ (0, β2 ). Similarly β ⌋δ ≤ β2 and like before we obtain the inequalities ⌊ 2δ β β ⌉ ≤ (1 + ⌊ 2δ ⌋); hence surely true that ⌈ 2δ » ¼ β β δ ≤ β. ≤ 2 2δ Now we set m= » β 2δ β 2 β ≤ ⌈ 2δ ⌉δ. Moreover, it is (2) ¼ µ» ¼ ¶ 1 − 1 + 1. β It is clear that such an m is larger than one, and by substituting δ according to its definition, it can be easily verified that » ¼ » ¼ β β − δ + β. mβ = (3) 2δ 2δ By plugging the inequalities of (2) into (3), we obtain » ¼ » ¼ β β β ≤ mβ ≤ + , 2δ 2δ 2 which immediately gives us that {mβ} ≤ β2 , and that is what we wanted. It now α . Indeed, we have remains to set nk+1 = mnk and verify that {nk+1 α} ≤ 2k+1 that α {nk α} α β k ≤ 2 = k+1 , {nk+1 α} = {mnk α} = {mβ} < = 2 2 2 2 where the last inequality holds by the induction hypothesis. In all cases considered we chose m to be larger than one, hence nk+1 > nk , and we are done. ⊓ ⊔ Lemma 7. For each irrational c ∈ (0, 1) and ε > 0, there exist p, q ∈ N such that pq ∈ (0, 1) and ¶ ¶ µµ ε p p . , c∈ 1− 4p q q Proof. We shall find an integer p such that the interval ( pc − another integer q. We want p to satisfy the condition npo ε < , c 4c ε p 4c , c ) contains (4) and additionally we request that p> ε . 4(1 − c) (5) International Workshop on Combinatorial Algorithms 07 119 It is true that { pc } = {p{ 1c }}, the number { 1c } lies in the interval (0, 1), and ε ε ε > 0; hence Lemma 6 for α = { 1c }, ε′ = 4c and r = 4(1−c) ensures the surely 4c existence of such a p. we set q = ⌊ pc ⌋ and verify that it is really an integer in the interval ¢ ¡ p Then p p p p ε c − 4c , c . The number c is irrational, so we have q < c . The fact that ⌊ c ⌋ = p p ε 1 c − {p{ c }}, and (4) together give us the other inequality q > c − 4c . Moreover, from (5) we obtain ε p − > p, c 4c ¢ ¡ ε p , c is larger than p, and thus pq ∈ (0, 1). so any integer q in the interval pc − 4c ε we get the desired Also, by rewriting the inequalities q < pc and q > pc − 4c inequality ¶ µ p ε p <c< . 1− 4p q q ⊓ ⊔ Let c be an irrational number in (0, 1), let ε be the constant from Theorem 4, and p, q the integers given by Lemma 7 for ε and c. We take an instance ψ of 3-SAT and construct the graph G = Gp,q (T (ψ)) in the same way as in the previous part of the proof. Let us again denote the number of clauses of T (ψ) by k. The number of occurrences of literals is l = 3k and n stands for the number of vertices of G. If ψ is satisfiable, we have again by Corollary 1 that G[L] contains a k-clique, and by Lemma 2 the graph G contains a (pl + p)-clique, which is a pq n-clique. We shall show that if ψ is not satisfiable, then for any set A ⊆ VG the graph S(G, A) ε p ) q n. We limit ourselves to does not contain a clique of size larger than (1 − 4p instances ψ such that (1 − ε)k > 1 and pl + p − εk ≥ 2, which we can do without loss of generality. Let us first show the following lemma. Lemma 8. Let ψ be a formula such that (1 − ε)k > 1 and pl + p − εk ≥ 2. If ψ is not satisfiable, then for any set A ⊆ VG the graph S(G, A) does not contain a clique of size larger than pl + p − εk. Proof. Suppose that for some A ⊆ VG we have a clique on a vertex set C in S(G, A) and the size of the clique is larger than pl + p − εk. Then (similarly as in the proof of Lemma 2) we get that the set C does not contain more than two vertices of Z, because they are pairwise non-adjacent in G and in S(G, A) they induce a bipartite graph. Since |C| is more than two, C contains some vertices of L or K. But all vertices of Z are non-adjacent in G and have the same neighborhood in G[L∪K]; surely all vertices in Z ∩C have the same neighborhood in S(G, A)[C] (otherwise C would not be a clique). But then either (Z ∩ C) ⊆ A or (Z ∩ C) ∩ A = ∅, so switching A does not affect edges inside S(G, A)[Z ∩ C], and any two vertices in S(G, A)[Z ∩ C] are non-adjacent. Therefore C contains at most one vertex of Z. 120 International Workshop on Combinatorial Algorithms 07 The set C contains at least one vertex of L, because 1 + |K| = 1 + (pl + p − k) < (1 − ε)k + pl + p − k = pl + p − εk. But then C cannot contain both vertices of K and Z, because in G they have the same neighborhood in L and there is no edge between K and Z. By Lemma 1, every clique in L corresponds to |C ∩ L| clauses which are simultaneously satisfiable. Hence by Theorem 4, the maximum clique size in L is (1 − ε)k, which is not enough for C, since (1 − ε)k < (1 − ε)k + pl + p − k = pl + p − εk, hence C cannot consist only of vertices of L. Therefore C consists of one of the following: – more than pl + p − εk − 1 (which is larger than (1 − ε)k) vertices of L, and one vertex of Z – more than (1 − ε)k vertices of L, and pl + p − k vertices of K. In both cases, C contains more than (1 − ε)k vertices of L, and a vertex v from K or Z. Since C induces a clique in S(G, A), the vertex v is adjacent to all other vertices in C. But in G, by definition, v is adjacent to all vertices of L, too. So switching A cannot have changed any edge connecting v and the other vertices, which means that either all the vertices are in A or none of them is. But then they induce a clique of size larger than (1 − ε)k in G[L] as well. As we have already shown, the maximum clique size in G[L] is (1 − ε)k, which is a contradiction. ⊓ ⊔ By Lemma 8, if ψ is not satisfiable, then the maximum clique size in S(G, A) for any A is pl + p − εk. But εk ε p εk εk ε = = · , = ≥ n q(l + 1) q(3k + 1) 4q 4p q so the maximum clique size divided by n is p(l + 1) ε p εk p pl + p − εk = · = − ≤ − n q(l + 1) q(l + 1) q 4p q ¶ µ ε p 1− . 4p q We have chosen the numbers p, q so that ¶ ¶ µµ ε p p , , c∈ 1− 4p q q hence the maximum clique ratio matches the lower bound of the interval containing c. To sum it all up, we have shown that – if ψ is satisfiable, then there exists an A ⊆ VG such that S(G, A) contains a clique of size pq n, which is at least cn, International Workshop on Combinatorial Algorithms 07 121 – if ψ is not satisfiable, then for no set A ⊆ VG the graph S(G, A) contains ε p ) q n, especially of size at least cn. a clique of size more than (1 − 4p Hence ψ is satisfiable if and only if G can be switched to contain a clique of size at least cn. The graph Gp,q (T (ψ)) with q(l + 1) = O(l) vertices and O(l2 ) edges can be constructed in polynomial time. That concludes the polynomialtime reduction of 3-SAT to Switch-cn-Clique for an irrational constant c, and also the proof that the problem is NP-complete. ⊓ ⊔ Proof (of Lemma 3). Having a graph G with edge set E, we set H = C(Kn ) (the parity-check matrix of C ∗ (G)), r = HχE , and w = k. Then a vector e is a solution of Maximum Likelihood Decoding if and only if H(e + χE ) = 0, which means that the vector e + χE is an element of the cut space C ∗ (G) and its Hamming distance from χE is at most k. Therefore, by Observation 2 there exists a switch of S(G, A) whose characteristic vector is e. Since the Hamming weight of e is at most k, the switch S(G, A) has at most k edges and we are done. 122 International Workshop on Combinatorial Algorithms 07 On Superconnectivity of (4, g)-Cages Hongliang Lu1 , Yunjian Wu1 , Yuqing Lin2 , Qinglin Yu1,3 1 2 Center for Combinatorics, LPMC Nankai University, Tianjin, China School of Electrical Engineering and Computer Science The University of Newcastle, Newcastle, Australia 3 Department of Mathematics and Statistics Thompson Rivers University, Kamloops, BC, Canada Abstract. A (k, g)-cage is a graph that has the least number of vertices among all k-regular graphs with girth g. It was conjectured by Fu, Huang and Rodger [3] that all (k, g)-cages are k-connected for every k ≥ 3. A kconnected graph G is called superconnected if every k-cutset S is trivial. Moreover, if G − S has precisely two components, then G is called tightly superconnected. In [9, 13], the authors showed that every (4, g)-cage is 4-connected. In this extended abstract, we proved that every (4, g)-cage is tightly superconnected when g is odd. Key words: cage, superconnected, tightly superconnected 1 Introduction Throughout the paper, only undirected simple graphs are considered. Unless otherwise defined, we follow [1] for terminology and definitions. Let G = (V, E) be a graph with vertex set V (G) and edge set E(G). For u, v ∈ V (G), dG (u, v) denotes the length of a shortest path in G. For vertex sets T1 , T2 ⊆ V (G), E(T1 , T2 ) is the set of the edges between T1 and T2 , and d(T1 , T2 ) = dG (T1 , T2 ) = min{d(t1 , t2 ) | t1 ∈ T1 , t2 ∈ T2 } denotes the distance between T1 and T2 . For S ⊂ V (G), G − S is the subgraph of G obtained by deleting the vertices in S and all the edges incident with them. The set of vertices which are at distance r to S in G is denoted by Nr (S) = {v ∈ V (G) | d(v, S) = r}, where r is an integer. We write N (S) instead of N1 (S). The length of a shortest cycle in G is called the girth of G, denoted by g(G). The diameter of G is the maximum distance between any two vertices in G. A k-regular graph with girth g is called a (k, g)-graph. A (k, g)-cage is a (k, g)graph with the least number of vertices for given k and g. We use f (k, g) to denote the number of vertices in (k, g)-cages. A cutset X of G is called a non-trivial cutset if X does not contain the neighborhood N (u) of any vertex u ∈ / X. A kconnected (or k-vertex-connected) graph G is called superconnected if for every vertex cutset S ⊆ V (G) with |S| = k, S is a trivial cutset. Moreover, if G − S International Workshop on Combinatorial Algorithms 07 123 has precisely two components, then G is called tightly superconnected. Provided there exists a non-trivial cutset, the superconnectivity of G is denoted by κ1 = κ1 (G) = min{|X| : X is a non-trivial cutset}. The edge-superconnectivity λ1 is defined similarly. Cages were introduced by Tutte in 1947, and have been extensively studied. Most of the work carried out so far has focused on the existence problem, whereas very little is known about the structural properties of (k, g)-cages. For more information see survey [12]. Recently, several researchers have studied the connectivity of cages. Fu, Huang and Rodger [3] proved that all cages are 2connected, and then subsequently showed that all cubic cages are 3-connected. They then conjectured that (k, g)-cages are k-connected. Daven and Rodger [2], and independently Jiang and Mubayi [4], proved that all (k, g)-cages are 3-connected for k ≥ 3. In [9, 13], some authors also showed that every (4, g)-cage is 4-connected. Tang et al [10] conjectured that every (4, g)-cage with odd girth is tightly superconnected. In this paper, we show that this conjecture is true. For the edge-connectivity of (k, g)-cage, Wang, Xu and Wang [11] showed that (k, g)-cages are k-edge-connected when g is odd, and subsequently, Lin, Miller and Rodger [7] proved that (k, g)-cages are k-edge-connected when g is even. Recently, Lin et al. [5, 8] proved that (k, g)-cages are edge-superconnected. 2 Main Results First, we list several known results which will be used in proving our main theorem. Theorem 1 (see [3]) Let G be a (k, g)-cage with diameter D, where k ≥ 2 and g ≥ 3, then f (k, g) < f (k, g + 1) and D ≤ g. Theorem 2 ([8]) Every (k, g)-cage with odd girth g ≥ 5 is edge-superconnected. For edge-connectivity, Tang et al. [10] conjectured the following: Conjecture 1 ([10]) Every (k, g)-cage of odd girth g ≥ 5 has λ1 = 2k − 2. We can show that the conjecture is true for k = 4 and present it as a lemma below. Lemma 1 Every (4, g)-cage of girth g ≥ 5 has λ1 = 6 The following lemma has been proven in [10]. Lemma 2 ([10]) Let G be a (4, g)-cage with odd girth g ≥ 5. Assume that there exists a non-trivial cutset X, and C is a component of G − X. Then there exists a vertex u ∈ V (C) such that d(u, X) ≥ (g − 1)/2. We now provide a stronger version of this lemma for (4, g)-cages. 124 International Workshop on Combinatorial Algorithms 07 Lemma 3 Let G be a (4, g)-cage with odd girth g ≥ 5. Assume that there exists a non-trivial cutset X, and C is a component of G − X. Then max{d(u, X) : u ∈ V (C)} = (g − 1)/2. Proof. By contradiction. Let G − X contains exactly two components C and C ′ . Suppose the lemma is not true, then there is a vertex u ∈ C such that dC (u, X) = (g + 1)/2. We know that the diameter of G is at most g and also by Lemma 2, there exists a vertex v ∈ V (C ′ ) such that dC ′ (v, X) ≥ (g−1)/2. Denote NC (u) = {u1 , u2 , u3 , u4 }, NC ′ (v) = {v1 , v2 , v3 , v4 } and X = {x1 , x2 , x3 , x4 }, then d(ui , vj ) ≥ g − 2, where i, j = 1, 2, 3, 4. Claim 1. There are at least two pairs of vertices (ui , vj ) such that d(ui , vj ) ≥ g − 1. Otherwise there exists a vertex s ∈ N (u) ∪ N (v), and d(ui , vj ) = g − 2 if ui , vj 6= s. Then each vertex in N (u) − s is at distance (g − 1)/2 to each vertex in X, and there are at least twelve shortest paths at distance (g − 1)/2 from N (u) − s to X which can not have a common vertex of N (X) − X, otherwise a cycle of length shorter than g appears in G. So there are at least twelve edges from X to C, then at most four edges left from X to C ′ , a contradiction to Lemma 1. Without loss of generality, assume d(u1 , v1 ) = d(u2 , v2 ) = g − 1, then we can reconstruct a (4, g ′ )-graph as follows: In G′ = G − u − v, add a vertex y and six edges u1 v1 , u2 v2 , yu3 , yu4 , yv3 and yv4 . So |V (G′ )| < |V (G)|, and it is clear that g ′ ≥ g, a contradiction to Theorem 1. ¤ Suppose U and W are two vertex sets with |U | = |W |. For a one-to-one map f : U 7→ W , we define E(f ) = {uw | f (u) = w, u ∈ U, w ∈ W }. The following lemma is a key technical tool for the construction of a new (4, g)-graph of smaller order in our proof. Lemma 4 Let H be a bipartite graph with bipartition (U, W ), where |U | = |W | = 4, such that |E(H)| ≤ 4 and ∆(H) ≤ 3. Let H ∗ be a copy of H with bipartition (U ∗ , W ∗ ). Let G = H ∪ H ∗ . Then there exist two one-to-one maps f : W 7→ U ∗ and f ∗ : W ∗ 7→ U such that no new 4-cycle created in graph G ∪ E(f ) ∪ E(f ∗ ). Proof. It suffices to show that the result holds for |E(H)| = 4. Suppose that we can partition H into two vertex disjoint subgraphs H1 = (U1 , W1 ) and H2 = (U2 , W2 ) such that no edge e ∈ E(H1 , H2 ), where U1 = {a1 , b1 }, U2 = {c1 , d1 }, W1 = {a2 , b2 } and W2 = {c2 , d2 }. Let two maps be defined by E(f ) = {a2 c∗1 , b2 d∗1 , c2 a∗1 , d2 b∗1 } and E(f ∗ ) = {a∗2 a1 , b∗2 b1 , c∗2 c1 , d∗2 d1 }. No new 4-cycles will be created in the graph G ∪ E(f ) ∪ E(f ∗ ). If we can not partition H into two vertex disjoint subgraphs as above, then H must be one of graphs shown in Figure 1 since E(H) = 4 and ∆(H) ≤ 3. We give the two maps dotted line in Figure 1 for each case. ¤ To prove that all (4, g)-cages with odd girth g ≥ 11 are tightly superconnected, we use contrapositive arguments by assuming that there exists a nontrivial cutset S of order 4 in G. Let G1 be the smaller component of G − S International Workshop on Combinatorial Algorithms 07 125 Fig. 1. Graph G ∪ E(f ) ∪ E(f ∗ ) and G2 = G − S − G1 , we know that there exists a vertex u ∈ G1 such that d(u, S) = (g − 1)/2 by Lemma 3, and |V (G1 )| ≤ |V (G)/2| − 2. We proceed by constructing a (4, g ′ )-graph of order less than |V (G)|, where g ′ ≥ g, which then yields a contradiction against Theorem 1. Let S = {s1 , s2 , s3 , s4 }. Based on the degree distribution of vertices of S in the component G1 , we need to consider two major cases which are stated as lemmas below. Because of the page limit, in this extended abstract, we provide a detailed argument for one case in Lemma 5 and a sketch of the construction for the second case. Hopefully, the reader can get a taste of the main idea in both lemmas. For Lemma 6, we omit the entire proof. Lemma 5 If dG1 (si ) = 2 and dG2 (si ) = 2 where si ∈ S for i = 1, 2, 3, 4, then G is not a (4, g)-cage. Sketch of Proof. Let N (u) = {u1 , u2 , u3 , u4 } and Wi = N (ui )−u = {ui1 , ui2 , ui3 } for i = 1, 2, 3, 4. We consider two cases according to the neighbors of u, but firstly we show the following claim. Claim 1. For each Wi , i = 1, 2, 3, 4, if there is at most one vertex xj ∈ Wi such that d(xj , S) = (g − 5)/2, then G is not a (4, g)-cage. No two vertices of Wi will be at distance (g − 3)/2 to the same vertex in S. Otherwise, a cycle of length g − 1 appears. Similarly, it is impossible to have d(ui , s) = d(uj , s) ≤ (g − 3)/2 for any two different vertices ui , uj and a vertex s ∈ S. It is clear that there is a vertex in Wi at distance (g − 5)/2 or (g − 3)/2 to S. Otherwise, the vertex ui is at distance (g + 1)/2 to S which is impossible because of Lemma 3. Without loss of generality, assume ui1 ∈ Wi is the vertex that satisfies d(ui1 , S) = (g − 5)/2 or (g − 3)/2. In the rest of this paper, connecting two vertices means joining the two vertices by an edge and connecting a vertex x to a set R means joining x to every vertex in R. ′ ′ ′ ′ ′ Let W = {W1 , W2 , W3 , W4 }, where Wi = Wi − ui1 . We shall construct a ′ bipartite graph H = (W, S), where |W | = |S| = 4 and Wi sj ∈ E(H) if and only if dG1 (sj , Wi − ui1 ) ≤ (g − 3)/2 in G. It is clearS that there are at most eight paths in total of length at most (g − 3)/2 from i=1,2,3,4 Wi to S, otherwise a cycle of length shorter than g appears. This implies that there are at most four paths of length at most (g − 3)/2 from W to S. It is clear that at most two of the paths can share the same vertex in S since dG1 (si ) = 2. Also it is easy to see 126 International Workshop on Combinatorial Algorithms 07 ′ that these four paths can not start from the same Wi = Wi − ui1 , because, by the Pigeon Hole Principle, it would imply that ui1 and another vertex from Wi have distance at most (g − 3)/2 to the same vertex in S, and this is impossible. So we can conclude that △(H) ≤ 3 and |E(H)| ≤ 4. Let H ∗ be a copy of H. By Lemma 4, there are two one-to-one maps f : S 7→ W ∗ and f ∗ : S ∗ 7→ W such that no new 4-cycles created in H ∪ H ∗ ∪ E(f ) ∪ E(f ∗ ). We consider a subgraph N = G[(V (G1 ) − u − N (u)) ∪ S]. Let N ∗ be a copy of N . For every x ∈ V (N ), let x∗ denote its image in N ∗ . Now we can construct a 4-regular graph G′ (see Figure 2) with girth at least g by using N and N ∗ : u11 u*11 S*1 S1 u21 u*21 S*2 S2 u31 S3 u41 u*31 S*3 u*41 S*4 S4 N N* Fig. 2. Illustration of the construction for Claim 1, where f ∗ (s∗i ) = Wi′ , i = 1, 2, 3, 4; f (s1 ) = W2′∗ , f (s2 ) = W1′∗ , f (s3 ) = W4′∗ , f (s4 ) = W3′∗ . (a) connect ui1 and u∗i1 for i = 1, 2, 3, 4; (b) si are connected with u∗j2 and u∗j3 if and only if f (si ) = Wj′∗ for i, j = 1, 2, 3, 4; (c) s∗i are connected with uj2 and uj3 if and only if f ∗ (s∗i ) = Wj′ for i, j = 1, 2, 3, 4. Consider the additional edges: any new cycle, say C, which was introduced in the construction, has to use at least two new edges. If C goes through two edges in (a), then the cycle C has length at least 2(g − 4) + 2 > g since g ≥ 11. If C contains two edges in (b) and (c), then the cycle C has length at least g, since H ∪ H ∗ ∪ E(f ) ∪ E(f ∗ ) creates no new 4-cycles, so the length of C is at least (g − 1)/2 + (g − 3)/2 + 2 = g. If C goes through one edge in (a) and one International Workshop on Combinatorial Algorithms 07 127 edge in (b) or (c), then the cycle C has length at least (g − 4) + 2 + (g − 3)/2 > g since g ≥ 11. It is obvious that if the cycle C goes through more than two new edges, its length is at least g. Hence G′ is 4-regular and has girth g, but |V (G′ )| = |N ∗ | + |N | = 2|V (G1 )| − 2 < |V (G)|, a contradiction to the statement that G is a cage. Case 1. All the neighbors of u are at distance (g − 3)/2 to S. This is a special case of Claim 1, it is clear that in this case, G is not a (4, g)-cage. Case 2. There are at most three neighbors of u at distance (g − 3)/2 to S. The basic idea for this case is somewhat similar to that of Case 1, we sketch it here. For this case, there exists a vertex v ∈ N (u) such that d(v, S) = d(u, S) = (g − 1)/2. Let N (u) = {u1 , u2 , u3 , v}, N (v) = {v1 , v2 , v3 , u}, Wi = N (ui ) − u = {ui1 , ui2 , ui3 }, and Ti = N (vi ) − v = {vi1 , vi2 , vi3 }, i = 1, 2, 3. If there is at most one vertex x ∈ Wi such that d(x, S) = (g − 5)/2 for i = 1, 2, 3, then from Claim 1, we know that G is not a (4, g)-cage. Assume that there exist two sets Wi and Tj , say W3 and T3 , such that |N(g−5)/2 (S) ∩ T3 | ≥ 2 and |N(g−5)/2 (S) ∩ W3 | ≥ 2. By counting the number of paths of length at most (g − 3)/2 from N2 (u) to S, we can show that there are exactly two paths of length (g − 5)/2 and no paths of length (g − 3)/2 from W3 to S. Moreover, the ends of the two paths of length (g − 5)/2 are distinct in set N (u3 ). Thus there are only two paths of length (g − 7)/2 and no paths of length (g − 5)/2 from N2 (u3 ) to S. Now regarding u3 as u, we can construct a graph G′ in a fashion similar to that in Claim 1 and obtain a contradiction. ¤ Lemma 6 If dG1 (s1 ) = dG1 (s2 ) = 3, dG2 (s1 ) = dG2 (s2 ) = 1 and dG1 (s3 ) = dG1 (s4 ) = dG2 (s3 ) = dG2 (s4 ) = 2, where si ∈ S, then G is not a (4, g)-cage. With the preparation of two lemmas above, we can prove our main result now. Theorem 3 Every (4, g)-cage with odd girth g ≥ 11 is superconnected. Proof. Suppose G is not superconnected, then we choose a non-trivial cutset S of G such that S minimizes the order of the smaller Pcomponent of G−S among all non-trivial cutsets. Since 4|V (G1 )|−E(S, G1 ) = v∈V (G1 ) dG1 (v) ≡ 0 (mod 2), we have E(S, G1 ) ≡ 0 (mod 2). Similarly, E(S, G2 ) ≡ 0 (mod 2). Since every (4, g)-cage is edge-superconnected, we need only to discuss three cases for the cutsets S shown in Figure 3. (a) and (b) are impossible by Lemmas 5 and 6. For (c), we can simply delete edge s1 s2 from G[S] and obtain a contradiction as in Lemma 5. ¤ Corollary 1 Every (4, g)-cage with odd girth g ≥ 11 is tightly superconnected. Acknowledgments The authors are indebted to the anonymous referees for their constructive suggestions. 128 International Workshop on Combinatorial Algorithms 07 s1 G1 G2 G1 G2 G1 s2 S S S (a) (b) (c) G2 Fig. 3. The three extremal cutsets of a (4, g)-cage G References 1. B. Bollobás, Extremal Graph Theory, Academic Press, London, 1978. 2. M. D. Daven and C. A. Rodger, (k, g)-cages are 3-connected, Discrete Math., 199(1999), 207-215. 3. H. L. Fu, K. C. Huang and C. A. Rodger, Connectivity of cages, J. Graph Theory, 24(1997), 187-191. 4. T. Jiang and D. Mubayi, Connectivity and separating sets of cages, J. Graph Theory, 29(1998), 35-44. 5. Y. Lin, M. Miller, C. Balbuena and X. Marcote, All (k, g)-cages are edgesuperconnected, Networks, 47(2006), 102-110. 6. Y. Lin, M. Miller and C. Balbuena, Improved lower bound for the vertex connectivity of (δ, g)-cages, Discrete Math., 299(2005), 162-171. 7. Y. Lin, M. Miller and C. A. Rodger, All (k, g)-cages are k-edge-connected, J. Graph Theory, 48(2005), 219-227. 8. X. Marcote and C. Balbuena, Edge-superconnectivity of cages, Networks, 43(2004), 54-59. 9. X. Marcote, C. Balbuena, I. Pelayo and J. Fábrega, (δ, g)-cages with g ≥ 10 are 4-connected, Discrete Math., 301(2005), 124-136. 10. J. Tang, C. Balbuena, Y. Lin and M. Miller, An open problem: (4, g)-cage with odd g ≥ 5 are tightly superconnected, Proceedings of the Thirteenth Australasian Symposium on Theory of Computing, 65(2007), 141-144. 11. P. Wang, B. Xu and J.F. Wang, A note on the edge-connectivity of cages, Electron. J. Combin., 10(2003), N4. 12. P. K. Wong, Cages – A survey, J. Graph Theory, 6(1982), 1-22. 13. B. Xu, P. Wang and J. F. Wang, On the connectivity of (4, g)-cage, Ars Combin., 64(2002), 181-192. International Workshop on Combinatorial Algorithms 07 129 On the nonexistence of odd degree graphs of diameter 2 and defect 2⋆ Mirka Miller1 , Minh Hoang Nguyen2 and Guillermo Pineda-Villavicencio3 1 School of Information Technology and Mathematical Sciences University of Ballarat, P.O.Box 663, Vic 3353, Australia and Department of Mathematics University of West Bohemia, Pilsen, Czech Republic m.miller@allarat.edu.au 2 GNOC Sydney - Ericsson Australia 112-118 Talavera Road, North Ryde, NSW 2113, Australia and School of Information Technology and Mathematical Sciences University of Ballarat, P.O.Box 663, Vic 3353, Australia minh.n.nguyen@ericsson.com 3 School of Information Technology and Mathematical Sciences University of Ballarat, P.O.Box 663, Vic 3353, Australia and Department of Computer Science University of Oriente, Santiago de Cuba, Cuba gpinedavillavicencio@students.ballarat.edu.au Abstract. In 1960, Hoffman and Singleton investigated the existence of Moore graphs for diameter 2 and found that Moore graphs only exist for maximum degree d = 2, 3, 7 and possibly 57. In 1980, Erdős et al. asked the following more general question: Given nonnegative numbers d and ∆, is there a (d, 2, ∆)-graph, that is, a graph of diameter 2, maximum degree d and order d2 + 1 − ∆? Erdős et al. solved the case for ∆ = 1: C4 is the only possible graph. In this paper, we consider the next case (∆=2). We prove the nonexistence of such graphs for infinitely many values of odd d and we conjecture that they do not exist for any odd d greater than 5. Keywords: Moore graphs; diameter 2; degree/diameter problem 1 Introduction The degree/diameter problem is to determine, for each d and k, the largest order nd,k of a graph of maximum degree d and diameter at most k. It is easy to show that nd,k ≤ Md,k , where Md,k is the Moore bound given by ⋆ Research partially supported by the Australian Research Council grant ARC DP0450294. 130 International Workshop on Combinatorial Algorithms 07 nd,k ≤ Md,k = 1 + d + d(d − 1) + · · · + d(d − 1)k−1 . In this paper, we concentrate on the case when the diameter is equal to 2. Since a graph of diameter 2 and maximum degree d has at most d2 +1 vertices, it was asked in [3]: Given non-negative numbers d and ∆ (defect), is there a graph of diameter 2 and maximum degree d with d2 + 1 − ∆ vertices? In 1960, it was proved by Hoffman and Singleton[5] that if ∆ = 0 then there are unique graphs corresponding to d = 2, 3, 7 and possibly d = 57. The case ∆ = 1 was solved by Erdös et al. [3]. In this paper, for the case ∆ = 2, we show the nonexistence of such graphs for infinitely many values of odd d. We refer to a graph of maximum degree d, diameter k ≥ 2 and order Md,k −∆ (∆ ≥ 1) as a (d, k, ∆)-graph. Let G be a (d, k, ∆)-graph. Definition 1. Let u be a vertex in G. A vertex v in G is called a repeat of u with multiplicity mv (u) (1 ≤ mv (u) ≤ ∆) if there are exactly mv (u) + 1 different paths of lengths at most k from u to v. It is immediate that Observation 1. Vertex u is a repeat of v with multiplicity mu (v) if and only if v is a repeat of u with the same multiplicity. A repeat with multiplicity 1 will be called a single repeat, a repeat with multiplicity 2 will be called a double repeat, a repeat with multiplicity ∆ will be called a maximal repeat. We denote by Rs (u) the set of all repeats of a vertex u in G. Taking into account the multiplicities of repeats, we denote by Rm (u) the multiset of all repeats of a vertex u in G, containing each repeat v of u exactly mv (u) times. Let u be a vertex in G, we denote by N (u) the set of all neighbours of u. If A is a multiset of vertices of G, then N (A) denotes the multiset of all neighbours of the vertices of A. We use Rm (A) to denote the multiset of all repeats of all vertices in A. Proposition 1. If G is regular then for all u ∈ V (G), X |Rm (u)| = mv (u) = ∆. v∈Rs (u) Definition 2. A subset S of V (G) is called a closed repeat set if Rm (S) = S. A closed repeat set is minimal if none of its proper subsets is a closed repeat set. Definition 3. A repeat subgraph HS of a closed repeat set S of G is a multigraph whose vertex set V (HS ) = S and the number of parallel edges between a vertex u and any of its repeats, say v ∈ Rm (u), equals the multiplicity mv (u). We observe that Observation 2. If ∆ < 1 + (d − 1) + · · · + (d − 1)k−1 then G is regular. International Workshop on Combinatorial Algorithms 07 131 It is also true that Observation 3. If G is regular then the repeat graph HG of G is ∆-regular. For the purpose of this paper, we shall consider each pair of parallel edges in HG as a cycle of length 2. Observation 4. HG is the union of cycles of lengths ≥ 2, each cycle a minimal closed repeat set of G. 2 Structural properties of (d, 2, 2)-graphs Let G be a (d, 2, 2)-graph for d ≥ 3. From Observation 2, we deduce that Observation 5. Every (d, 2, 2)-graph for d ≥ 3 is regular. Let us consider repeat configurations in (d, 2, 2)-graphs. Let u be a vertex of a (d, 2, 2)-graph. Then there are two possiblities: – u has two single repeats (ri (u), i = 1, 2). – u has one double (maximal) repeat (r(u) = r1 (u) = r2 (u)) with multiplicity 2. With respect to repeats in G, there are five possible repeat configurations, as depicted in Fig. 2. We denote by n0 , n1 , n2a , n2b , n2c the number of vertices of the corresponding repeat types. We are now interested in pointing out the following theorem, which was proved in [8]. Theorem 1. [8] For odd d, – d = 3 and G is the graph in Fig. 1(i) or 1(ii) – d = 5 and G is the graph in Fig. 1(iii) – d ≥ 7 and n2b = d2 − 1. 3 Nonexistence of infinitely many (d, 2, 2)-graphs for odd d Let us now get back to the graph HG . For d ≥ 7, from Theorem 1, we see that each vertex u ∈ G has exactly two different repeats, that is, each component of HG is a cycle of length at least 4. From now on, each cycle in HG will be called a repeat cycle. Let A be the adjacency matrix of G and let B be the adjacency matrix of HG , called the defect matrix, in which the main diagonals consist entirely of 0’s and the row and column sums are equal to 2. With a suitable labeling of HG , B becomes a direct sum of symmetric ath -order circulants of the form, 132 International Workshop on Combinatorial Algorithms 07 i. d = 3: n0 = 3, n1 = 2, n2c = 3 ii. d = 3: n2b = 8 iii. d = 5: n0 = 9, n1 = 6, n2c = 9 Fig. 1. All known (d, 2, 2)-graphs for odd d. u u u r1 (u) r2 (u) i. Type 0 r1 (u) r(u) ii. Type 1 iii. Type 2c u r1 (u) iv. Type 2a r2 (u) u r2 (u) r1 (u) r2 (u) v. Type 2b Fig. 2. Repeat configurations in a (d, 2, 2)-graph. International Workshop on Combinatorial Algorithms 07 133   0 1 0 ... 0 1 1 0 1 . . . 0 0     Da =  0 1 0 . . . 0 0  a ≥ 4  . . . . . . . . . 1 0 0 ... 1 0 This matrix has been studied in [2] and [6] in the context of regular graphs of girth 5. Let n be the order of G, let b be the number of cycles in HG and let ai , i = 1, . . . , b, be the lengths of these cycles. We need to consider the following equation A2 + A − (d − 1)I = J + B (1) where J is a matrix all of whose entries are 1 and I is the identity matrix of order n. The special case when there is just one repeat cycle, that is, b = 1 and HG = Cn , was studied by Fajtlowicz in [4]. In that paper, Fajtlowicz proved the following, Theorem 2. [4] If B is the adjacency matrix of the n-cycle then d = 3. As A, B and J are symmetric matrices, they are diagonalizable. Since J commutes with A and B, B commutes with A and hence, all the three matrices are simultaneously diagonalizable, that is, there is an orthogonal matrix P for which P −1 AP , P −1 BP and P −1 JP are diagonal and the columns of P are corresponding eigenvectors for each of these matrices. Note also that these eigenvectors form an orthogonal basis of Rn . Furthermore, it is well known that the eigenvalues and its respective multiplicities·of a matrix representing a m-cycle are ¸ m 2π 2π 2 2 cos 2π m × 1 2c cos m × 2 . . . 2 cos m × ( 2 − 1) −2 (m even) 1 2 2 ... 2 1 · ¸ m−1 2π 2π 2 2 cos 2π m × 1 2 cos m × 2 . . . 2 cos m × ( 2 ) (m odd) 1 2 2 ... 2 The first row displays the eigenvalues and the second row their respective multiplicities. i Therefore, the eigenvalues of J + B are n + 2, 2 and 2 cos 2πc ai with ci = 1 . . . ai − 1 and i = 1 . . . b, of multiplicities 1, b − 1 and 1, respectively. Thus, the spectrum of A in the general case b ≥ 1 is: (i) The eigenvalue d with multiplicity 1, (ii) b − 1 roots of the equation α2 + α − (d − 1) = 2, (2) (iii) one root of each of the equations α2 + α − (d − 1) = 2 cos 2πci ai (3) 134 International Workshop on Combinatorial Algorithms 07 where i = 1 · · · and ci = 1 . . . ai − 1. We denote by m(α) the multiplicity of an eigenvalue α of A. The solutions √ √ −1− 4d+5 −1+ 4d+5 , β2 = and m(β1 ) + m(β2 ) = b − 1. The of (2) are β1 = 2 2 general solution of (3) is q i −1 ± 4d + 8 cos( 2πc ai ) − 3 β= 2 For each even ai , i = 1, . . . , b, and when ci = a2i , there is exactly one eigenvalue β of A with multiplicity 1 satisfying (3). In other words, corresponding to √ −1+ 4d−11 2πci , the special case when cos( ai ) = −1, there are eigenvalues β3 = 2 √ β4 = −1− 24d−11 . Let me = m(β3 ) + m(β4 ). Then me is exactly the number of even cycles in HG . Observation 6. For odd d, n = d2 − 1 is even. Therefore, if d is odd, then me ≡ b (mod 2). Using a similar method to that described in [6] we will next show that Theorem 3. G does not exist for any odd d ≥ 7 such that d 6= l2 + l + 3 and d 6= l2 + l − 1 for each nonnegative integer l. Proof. If all the four eigenvalues β1 , . . . , β4 are irrational, then m(β1 ) = m(β2 ) = me b−1 2 and m(β3 ) = m(β4 ) = 2 . But, by Observation 6, these equations cannot occur at the same time. We can then see that corresponding with those odd values of d such that d 6= l2 + l + 3 and d 6= l2 + l − 1 for each nonnegative integer l, the four eigenvalues β1 , . . . , β4 are irrational, thus the proof follows. By counting the total number (N5 ) of 5-cycles in G, we are able to derive some further necessary conditions for the existence of G. Theorem 4. G does not exist for any odd d ≥ 7 such that N5 is not an integer, where · ¸ (d2 − 1) 1 2 (d − 3)(d + d + 4) + d + 2 . N5 = 5 2 Proof. From Theorem 1, we know that n2b = d2 − 1. Let u0 be a vertex of type 2b. Let v, w be the two repeats of u0 and N (u0 ) = {u1 , . . . , ud }. We also denote by A, B, C, D, E, F the sets N (u1 ) \ {u0 , v}, {v}, N (u2 ) \ {u0 , v, w}, {w}, Sd N (u3 ) \ {u0 , w}, i=4 (N (ui ) \ {u0 }), respectively, as shown in Fig. 3. Let e(BE) be the number of edges in G between the two sets B and E. To reach v from u3 in two steps we must have e(BE) ≥ 1. But since u3 is of type 2b, v cannot be a repeat of u3 . Thus, e(BE) = 1. Similarly, we have e(AD) = 1. As a result, e(BF ) = e(DF ) = d − 3. By an analogous argument, we get e(AE) = d−2, which means that e(AC) = e(CE) = d − 3. Then it is not difficult to show that e(CF ) = (d − 3)2 , e(AF ) = e(EF ) = (d − 2)(d − 3) and e(F F ) = 21 (d − 2)(d − 3)2 . International Workshop on Combinatorial Algorithms 07 135 Let c5 be the number of different cycles of length 5 on which u0 lies. Then c5 = e(F F ) + e(F A) + 2e(F B) + e(F C) + 2e(F D) + e(F E) + e(EC) + 2e(EB)+ e(EA) + 2e(AD) + e(AC) = 21 (d − 3)(d2 + d + 4) + d + 2. For odd d ≥ 7, every vertex of G is of type 2b and so the number of different cycles of length 5 in G is N5 = 51 (d2 − 1)c5 . The theorem then follows. u0 u1 u3 u2 w v A ud u4 B C D E F Fig. 3. Illustration for proof of Theorem 4. The results of Theorems 3 and 4 improve the upper bound for the order of (d, 2, 2)-graphs so that nd,2 ≤ d2 −3 for infinitely many odd degrees d. For d ≥ 7, the first 50 values of d for which G might still exist are shown in Table 1. Table 1. The first 50 values of d for which a (d, 2, 2)-graph might still exist for odd d. 9 93 271 599 933 11 109 309 603 991 19 113 341 649 1059 23 131 379 653 1121 29 159 383 701 1189 33 181 419 759 1193 41 209 423 811 1259 59 213 461 869 1263 71 239 509 873 1331 89 243 551 929 1409 Finally, we conjecture the following Conjecture 1. For odd d ≥ 7, (d, 2, 2)-graphs do not exist. References 1. Bannai, E., Ito, T.: On finite Moore graphs. J. Fac. Sci. Tokyo Univ. 20 (1973) 191–208 2. Brown, W.G.: On the non-existence of a type of regular graphs of girth 5. Canad. J. Math. 19 (1967) 644-648 3. Erdös, P., Fajtlowicz, S., Hoffman, A. J.: Maximum degree in graphs of diameter 2. Networks 10 (1980) 87–90 136 International Workshop on Combinatorial Algorithms 07 4. Fajtlowicz, S.: Graphs of diameter 2 with cyclic defect. Colloq. Math. 51 (1987) 103–106. 5. Hoffman, A.J., Singleton, R.R.: On Moore Graphs with diameters 2 and 3. IBM J. Res. Dev. 64 (1960) 15-21 6. Kovács, P.: The Non-existence of Certain Regular Graphs of Girth 5. J. Combin. Theory Ser. B 30 (1981) 282–28 7. Miller, M., Širáň, J.: Moore graphs and beyond: A survey of the degree/diameter problem. Electronic Journal of Combinatorics DS14 (2005) 1–61 8. Nguyen, M. H., Miller, M.: Structural properties of graphs of diameter 2 with defect 2. preprint International Workshop on Combinatorial Algorithms 07 137 Computing Incongruent Restricted Disjoint Covering Systems Gerry Myerson1 , Jacky Poon1 and Jamie Simpson2 1 2 Department of Mathematics, Macquarie University, Sydney, NSW, gerry@ics.mq.edu.au, jackypoon@optusnet.com.au Department of Mathematics and Statistics, Curtin University of Technology, Perth, WA, simpson@maths.curtin.edu.au Abstract. An incongruent restricted disjoint covering system of length n is a sequence of positive integers s1 , . . . , sn with the property that si = m for some m if and only if si+km = m for all k for which i + km is in [1, n], and further such that any integer appearing in the sequence appears at least twice. An example of length 11 is 6, 9, 3, 4, 5, 3, 6, 4, 3, 5, 9. We describe two algorithms for finding all such systems of a given length. 1 Introduction An incongruent restricted disjoint covering system (henceforth IRDCS) of length n is a sequence of positive integers s1 , . . . , sn with the property that si = m for some m if and only if si+km = m for all k for which i + km is in [1, n], and further such that any integer appearing in the sequence appears at least twice. An example of length 11 is 6, 9, 3, 4, 5, 3, 6, 4, 3, 5, 9. We introduced the idea of an IRDCS in [5] and [3] as a new twist on the venerable theme of covering systems of congruences. We begin with some background on this, partly by way of motivation and partly to explain the impressive title of our paper. We write S(m, a) for the congruence class {x : x ≡ a (mod m)}. A covering system of congruences is a set of congruence classes with the property that every integer belongs to at least one class. If no integer belongs to more than one class then the system is disjoint (or exact), if the moduli of the classes are distinct then the system is incongruent. An example of a disjoint covering system is {S(2, 0), S(2, 1)}, and an example of an incongruent system is {S(2, 0), S(3, 0), S(4, 1), S(6, 1), S(12, 11)}. 138 International Workshop on Combinatorial Algorithms 07 It is not possible for a system to be both disjoint and incongruent (see [6]). These systems were introduced by Erdős in [1] and have spawned a large literature (see the survey [6] and sections F13 and F14 in [2]). We define a restricted disjoint covering system on [1, n] as a set of congruence classes such that each integer in the interval [1, n] belongs to exactly one class, and each class contains at least two members of the interval. The condition that the classes contain at least two members is included to avoid trivialities. As with standard covering systems we describe such a system as incongruent if the moduli are distinct. Here is an example of an IRDCS on [1, 11]. Example 1. S(6, 1), S(9, 2), S(3, 0), S(4, 0), S(5, 0). Rather than exhibiting an IRDCS in this way we can do so by writing down a sequence of n integers the ith of which equals the modulus of the unique congruence class to which i belongs. Thus Example 1 becomes 6, 9, 3, 4, 5, 3, 6, 4, 3, 5, 9 which is the sequence we started with. It is easy to see that our original definition and the definition just given in terms of congruence classes are equivalent. We have found all IRDCS of lengths up to 40 and found that their number increases quickly with length: there are 1,805,096 with length 40. In [3] we reported on some observations from this data, proved some results about the structure of IRDCS and presented some open problems. Some of these problems are repeated at the end of this paper and some properties of the data are given in Table 1. The main purpose of the present paper is to discuss the algorithms we used to find these IRDCS. We used two algorithms, one based on Knuth’s recursive Dancing Links algorithm and the other a more intuitive back-tracking algorithm. We will describe the second in some detail and the first more sketchily. We finish with a comparison of the algorithms, a general discussion and the open problems. 2 Backtracking We present an algorithm that finds all IRDCS on input n. The systems are built in an array x[1..n], initially having a 1 in each position, being built up one modulus at a time. If we find a modulus is causing a clash with an already used modulus, that is, if we are trying to use two moduli at the same position, we try a larger modulus if one is available, otherwise backtrack to the previous modulus used and increase that. Two other arrays are used. The modusage array shows which moduli have already been used - true if used, false if not. The modulus currently being considered is m. When a new modulus is being chosen, we increase from the previous value till we find one which hasn’t yet been used. The other array is primary. A primary array element is true if that position in the x array is empty or if it was International Workshop on Combinatorial Algorithms 07 139 the first position which used some modulus. After a modulus had been placed in a certain position we fill other appropriate positions with this modulus, but these later (or secondary) occurrences of the modulus would get ‘false’ in the corresponding position of primary. The variable position shows the position at which we’ve just inserted a primary modulus or where we’re just about to do so. The variable position starts at ⌊n/2⌋ + 1, then changes by having increment[i] added to it. Increment[i] goes 1, −2, 3, −4, . . . , ±(n+1) for odd n, signs reversed for even n. The next increment to be used (if we’re not backtracking) is increment[ctr], (ctr for counter). If we’re backtracking we have to subtract an increment and this requires decreasing ctr by 1 first. The point of this is that we filling the middle of the interval [1, n] first, then moving towards the ends. We found this was faster than starting at one end of the interval. If we get a clash while entering secondary moduli the variable clash takes the value true. Initialisation Input n Set all entries of vectors x[1..n] to 1, primary[1..n] to true, modusage[2..n] to f alse and for i = 1..n + 1 increment[i] to (−1)n+1 ∗ i. Set position = ⌊n/2⌋+1, ctr = 1, f inished := f alse and clash := f alse. Begin main loop while not f inished do Choose next modulus Set m = next unused modulus after x[position] if this modulus feasible at this position then Feasible modulus found - enter while checking for clashes i := position − m ∗ ⌊(position − 1)/m⌋ while i ≤ n and not clash if i 6= position then if x[i] = 1 then x[i] := x[position] primary[i] := f alse else clash := true i=i−m end if end if i := i + m If clash has occurred clear last modulus except at current position if clash then modusage[m] := f alse for i ≡ position (mod m) and primary[i] = f alse do x[i] = 1 primary[i] = true 140 International Workshop on Combinatorial Algorithms 07 Check to see if finished current system else while x[position] > 1 and ctr < n do: if ctr < n then position := position + increment[ctr] end if ctr := ctr + 1 end while if ctr > n then output system ctr := ctr + 1 modusage[m] := f alse set position to that of primary position for m reinitialise other positions where this m used end if end if else Backtrack or finish if position = ⌊n/2⌋ + 1 then f inished := true Output “No more solutions” else set m equal to last modulus entered reinitialise all positions where this m used change position to last modulus’ primary position end if end if end main loop 3 Recursion The second algorithm we used was Knuth’s Dancing Links implementation of his Algorithm X [4] for solving the NP-complete Exact Cover problem. Exact Cover may be given as follows. Instance An m × n matrix A with each entry either 0 or 1. Question Find a subset of the rows of A such that for each column exactly one row in the set has a 1 in that column. To find a length n IRDCS we consider all congruence classes that might appear in the IRDCS. These are {S(m, a) : 2 ≤ m ≤ n − 1, 1 ≤ a ≤ min{m, n − m − 1} For each such congruence class construct a row of a matrix A which has International Workshop on Combinatorial Algorithms 07 141 a 1 in column j if j ∈ S(m, a) and 0 otherwise. Thus the first few rows of the matrix for n = 11 would be:  10101010101 0 1 0 1 0 1 0 1 0 1 0   1 0 0 1 0 0 1 0 0 1 0.   0 1 0 0 1 0 0 1 0 0 1   0 0 1 0 0 1 0 0 1 0 0  Clearly finding an IRDCS of length n is equivalent to finding an exact cover for this matrix. Algorithm X solves the problem recursively by choosing a row i from A then for each j such that ai,j = 1 delete column j from A, and for each k 6= i such that ak,j = 1 delete row k. Then apply the algorithm to this reduced matrix. The algorithm is expensive in space and requires the computer to spend most of its time searching for 1s. For our problem A will have n2 /4 + O(n) rows. The Dancing Links algorithm implements the algorithm using circular doubly-linked lists of the 1s in the matrix. There is a list of 1s for each row and each column. Each 1 in the matrix has a link to the next 1 above, below, to the left, and to the right of itself. Table 1 compares the speed of our two algorithms, showing that the second is substantially quicker. Both algorithms were run on a PC with a 2.0 GHz processor. We had not expected Knuth’s algorithm to be faster than back-tracking since it is designed to solve a very general problem and does not exploit the special structure of an IRDCS. Perhaps another faster algorithm can be found. Knuth’s algorithm solves an NP-complete problem. A decision problem relating to our situation is: “given a set of positive integers, does there exist an IRDCS with these integers as moduli?” We do not know the complexity of this question. Length Number of IRDCS Backtracking Dancing Links 35 68,176 3 seconds 1 seconds 36 85,762 4 2 37 304,892 10 3 38 855,072 24 7 39 1,229,050 41 11 40 1,805,096 68 18 Table 1. Times taken to find all IRDCS of various lengths using the two algorithms described above. 4 Open Problems We end with some open problems. 142 International Workshop on Combinatorial Algorithms 07 (1) Is there a more efficient method of finding IRDCS than those described here? (2) Do there exist IRDCS with all moduli odd? (3) Can the smallest modulus of an IRDCS be arbitrarily large? (4) In [3] we showed that the number of distinct moduli appearing in an IRDCS of length n is at most (n − 1)/2. This bound is attained in the example with n = 11 given above, but computational results suggest that a stronger result should be possible for large n. What is it? (5) The following two IRDCS both have length 43 and their sets of moduli are disjoint. {S(24, 1), S(2, 2), S(4, 3), S(36, 5), S(12, 9), S(16, 13), S(20, 17)} {S(25, 1), S(33, 2), S(7, 3), S(8, 4), S(9, 5), S(21, 6), S(18, 7), S(13, 8), S(10, 9), S(11, 11), S(27, 13), S(15, 15), S(26, 16), S(19, 18)}. Generally, if each of {S(m1 , a1 ), ..., S(ms , as )} and {S(n1 , b1 ), ..., S(nt , bt )} is an IRDCS for [1, n] and their sets of moduli are disjoint, as in the case above, then {S(3mi , 3ai + 1) : i = 1, . . . , s} ∪ {S(3ni , 3bi + 2) : i = 1, . . . , t} ∪ {S(3, 0)} is an IRDCS for [1, 3n] in which every modulus is divisible by 3. This suggests the question, does an IRDCS exist with every modulus divisible by k for any value of k? This example shows it’s possible for k = 3. We can easily produce examples for k = 2 by doubling each modulus then inserting a 2 in every second position, so that, for instance, the n = 11 example produces 2, 12, 2, 18, 2, 6, 2, 8, 2, 10, 2, 6, 2, 12, 2, 8, 2, 6, 2, 10, 2, 18, 2. (6) Our definition of an IRDCS requires that each congruence is satisfied at least twice. Do analogous systems exist in which each congruence is satisfied at least k times for values of k exceeding 2? References 1. P. Erdős, On integers of the form 2k +p and some related problems, Summa Brasil. Math. 2(1950), 192-210. 2. R.K. Guy, Unsolved Problems in Number Theory, 3rd edition, Springer, 2004. 3. Gerry Myerson, Jacky Poon and Jamie Simpson, Incongruent Restricted Disjoint Covering Systems, submitted to Discrete Mathematics. 4. Donald Knuth, Dancing Links, http://www-cs-faculty.stanford.edu/∼knuth/preprints.html, P159, retrieved 7/7/7. 5. G. Myerson, ed., Western Number Theory Problems. Western Number Theory Conference, 18 and 20 Dec., 2006, Website in preparation. 6. Š. Porubský and J. Schönheim, Covering Systems of Paul Erdős: past, present and future in Halász, Gábor (ed.), Paul Erdős and his Mathematics I, Bolyai Soc. Math. Stud. 11, 581-627 (2002). International Workshop on Combinatorial Algorithms 07 143 Existence of Regular Supermagic Graphs Andrea Semaničová1 , Jaroslav Ivančo2 and Petr Kovář3 andrea.semanicova@tuke.sk, jaroslav.ivanco@upjs.sk, petr.kovar@vsb.cz 1 2 Department of Appl. Mathematics, Technical University, Letná 9, 04200 Košice, Slovakia Institute of Mathematics, P.J.Šafárik University, Jesenná 5, 04154 Košice, Slovakia 3 Department of Appl. Mathematics, VŠB – Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic Abstract. A graph is called supermagic if it admits a labeling of the edges by pairwise different consecutive integers such that the sum of the labels of the edges incident with a vertex is independent of the particular vertex. In this talk we deal with the existence of r-regular supermagic graph of order n. Some constructions of supermagic labeling of regular graphs are described. Keywords: regular graph, supermagic graph, (a, 1)-antimagic graph, circulant graph 1 Introduction We consider finite undirected graphs without loops and multiple edges. Let G be a graph, we denote by V (G) and E(G) the vertex set and edge set of G respectively. Let G be a graph and f be a mapping from E(G) into positive integers. The index-mapping of f is the mapping f ⋆ from V (G) into positive integers defined by X f ⋆ (v) = η(v, e)f (e) for every v ∈ V (G), e∈E(G) where η(v, e) is equal to 1 when e is an edge incident with the vertex v, and 0 otherwise. An injective mapping f from E(G) into positive integers is called a magic labeling of G for an index λ if its index-mapping f ⋆ satisfies f ⋆ (v) = λ for all v ∈ V (G). A magic labeling f of G is called supermagic if the set {f (e) : e ∈ E(G)} consists of consecutive positive integers. A graph G is called supermagic (magic) if and only if there exists a supermagic (magic) labeling of G. The concept of magic graphs was introduced by Sedláček [12]. The regular magic graphs are characterized in [3]. Two different characterizations of all magic graphs are given in [10] and [9]. Supermagic graphs were introduced by 144 International Workshop on Combinatorial Algorithms 07 B.M. Stewart [13]. It is easy to see that the classical concept of a magic square of n2 boxes corresponds to the fact that the complete bipartite graph Kn,n is supermagic for every positive integer n 6= 2 (see also [13]). Stewart [14] characterized supermagic labeling of complete graphs. In [7], supermagic regular complete multipartite graphs and supermagic cubes are characterized. In [8] characterizations are given for magic line graphs of general graphs and supermagic line graphs of regular bipartite graphs. In [11] and [1] supermagic labellings of the Möbius ladders and two special classes of 4-regular graphs are constructed. Constructions of supermagic labellings of various classes of regular graphs are described in [6] and [7]. More comprehensive information on magic and supermagic graphs can be found in [5]. Trenkler proved the following condition Proposition 1 [15] A connected magic graph with n vertices and ε edges exists n(n−1) . if and only if n = 2 and ε = 1 or n ≥ 5 and 5n 4 <ε≤ 2 However, for supermagic graphs we do not have similar results. In [4], it is proved that if d is the greatest common divisor of integers n and ε, and if nd and ε are both even, then there exists no supermagic graph of order n and size ε. Moreover in [4] the following bound was established for the number of edges in supermagic graphs of order n:   for n = 5, 8 √ n(n−1) ⌈(3 − 3)n⌉ ≤ |E(G)| ≤ for 6 ≤ n 6≡ 0 (mod 4), 2   n(n−1) − 1 for 8 ≤ n ≡ 0 (mod 4). 2 The existence of supermagic graphs is a very complicated problem. In this talk we will focus on the existence of regular supermagic graph of a given order. 2 Regular supermagic graphs Ivančo proved the following necessary conditions for supermagic regular graphs Proposition 2 [7] Let G be an r-regular supermagic graph of order n. Then the following statements hold: (i) if r ≡ 1 (mod 2), then n ≡ 2 (mod 4); (ii) if r ≡ 2 (mod 4) and n ≡ 0 (mod 2), then G contains no component of an odd order; (iii) if n > 2, then r > 2. It is easy to see, that K2 is the only one 1-regular supermagic graph and that there exists no 2-regular supermagic graph. For 3-regular supermagic graphs, let us consider the Möbius ladder. The Möbius ladder Mn , where 6 ≤ n ≡ 0 (mod 2), is the 3-regular graph consisting of the cycle Cn of length n, in which all pairs of the opposite vertices are connected. Sedláček [11] proved that if 6 ≤ n ≡ 2 (mod 4) then the Möbius ladder is supermagic. International Workshop on Combinatorial Algorithms 07 145 Proposition 3 [14] A complete graph of order n is supermagic if and only if n = 2 or 5 < n 6≡ 0 (mod 4). From this result it is clear that the (n − 1)-regular supermagic graph of order n exists if and only if n = 2 or 5 < n 6≡ 0 (mod 4). Before we present the main result, we first introduce some ¥ ¦definitions. Let n, m and a1 , . . . , am be positive integers, 1 ≤ ai ≤ n2 and ai 6= aj for all 1 ≤ i, j ≤ m. An undirected graph with the set of vertices V = {v1 , . . . , vn } and the set of edges E = {vi vi+aj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, the indices being taken modulo n, is called a circulant graph and it is denoted by Cn (a1 , . . . , am ). A graph G is called (a, 1)-antimagic if it is possible to label its edges with the integers from the set 1, 2, . . . , |E(G)| such that {f ⋆ (v) : v ∈ V (G)} = {a, a + 1, . . . , a + |V (G)| − 1}. The (a, 1)-antimagic labeling is a special type of (a, d)-antimagic labeling defined by Bodendiek and Walter [2]. A k-regular spanning subgraph of a graph is called the k-regular factor of a graph. In [7] it is proved Proposition 4 [7] Let G be a graph decomposable into pairwise edge-disjoint supermagic regular factors. Then G is supermagic. In the next sections, we will deal with the constructions of supermagic rregular graphs of order n for r ≥ 4. We will consider two cases, when r is odd and r is even. 3 Main results According to Proposition 2, if r = 2k + 1 then n ≡ 2 (mod 4) for a graph G. For r = 4k + 1, we proved: Theorem 1 Let n, k be positive integers, 4k + 2 ≤ n ≡ 2 (mod 4). Then there exists a (4k + 1)-regular supermagic graph of order n. Proof. Let n, k be positive integers, 4k + 2 ≤ n ≡ 2 (mod 4). If 4k + 2 = n then the (4k + 1)-regular graph is isomorphic to the complete graph K4k+2 and according to Proposition 3 is supermagic. If 4k + 2 < n ≡ 2 (mod 4) then the graph G := Cn (2, 4, . . . , 4A − 2, 4A; 1, 3, . . . , 4B − 3, 4B − 1, n ) 2 is supermagic graph of order n. To prove this, we will decompose this graph into two edge-disjoint supermagic regular factors G1 and G2 where n ), 2 G2 := Cn (1, 3, . . . , 4B − 3, 4B − 1), G1 := Cn (2, 4, . . . , 4A − 2, 4A, 146 International Workshop on Combinatorial Algorithms 07 where A + B = k, 8A + 2 ≤ n and 8B + 2 ≤ n. It is not difficult to prove that both graphs G1 and G2 are supermagic. According to Proposition 4, the graph G is supermagic. Analogously to construct the (4k + 3)-regular supermagic graph of order n we shall decompose the graph into three factors: G1 := Cn (1, n2 ); G2 := Cn (3, 5, 7, 9, . . . , 4A − 1, 4A + 1); G3 := Cn (2, 4, 6, 8; 10, 12, 14, 16; . . . ; 8B − 6, 8B − 4, 8B − 2, 8B), where A, B are arbitrary positive integers such that 4k + 3 = 3 + 4A + 8B, 4A ≤ n 2 − 3, 4B ≤ n 2 − 1. It is not difficult to show that G1 , G2 and G3 are supermagic graphs. According to Proposition 4 the graph with the factorization into G1 , G2 and G3 is a (4k+3)regular supermagic graph of order n. Thus we have Theorem 2 Let n, k be positive integers, 4k + 4 < n ≡ 2 (mod 4). Then there exists a (4k + 3)-regular supermagic graph of order n. Using the above mentioned ideas, we construct 8k-regular supermagic graphs of an arbitrary order and 4k-regular supermagic graphs of even order. Theorem 3 Let n, k be positive integers. Then the following statements hold: (i) if 8k + 1 ≤ n, then there exists a 8k-regular supermagic graph of order n. (ii) if 4k + 1 ≤ n ≡ 0 (mod 2), then there exists a 4k-regular supermagic graph of order n. Moreover we are able to prove the following two lemmas Lemma 1 A 4-regular supermagic graph of order n exists if and only if n ≥ 6. Lemma 2 A 6-regular supermagic graph of order n exists if n ≥ 7. Analogously as in the proof of Theorem 2, we prove that up to a finite number of cases, there exists a 2r-regular supermagic graph of order n by decomposing the original graph into 4-, 6- and 8k-regular supermagic factors. 4 Conclusion In this talk we introduce some constructions of the r-regular supermagic graphs of order n for all r and n except certain values. However, there are some difficulties of applying this technique for dense graphs. This is a topic for further investigation. Acknowledgement. Support of Slovak VEGA Grant 1/4005/07 is acknowledged. Research for this article was partially supported by the institutional project MSM6198910027. International Workshop on Combinatorial Algorithms 07 147 References 1. M. Bača, I. Holländer, Ko-Wei Lih, Two classes of super-magic graphs, J. Combin. Math. Combin. Comput. 23 (1997) 113–120. 2. R. Bodendiek and G. Walther, Aritmethisch antimagische graphen, in: K. Wagner, R. Bodendiek, eds., Graphentheorie III (BI-Wiss. Verl., Mannheim, 1993). 3. M. Doob, Characterizations of regular magic graphs, J. Combin. Theory, Ser. B 25 (1978) 94–104. 4. S. Drajnová, J. Ivančo, A. Semaničová, Numbers of edges in supermagic graphs, J. Graph Theory 52 (2006) 15–26. 5. J.A. Gallian, A dynamic survey of graph labeling, Electronic J. Combinatorics # DS6(2007). 6. N. Hartsfield, G. Ringel, Pearls in Graph Theory, Academic Press, Inc., San Diego, 1990. 7. J. Ivančo, On supermagic regular graphs, Mathematica Bohemica 125 (2000) 99– 114. 8. J. Ivančo, Z. Lastivková, A. Semaničová, On magic and supermagic line graphs, Mathematica Bohemica 129 (2004) 33–42. 9. R.H. Jeurissen, Magic graphs, a characterization, Europ. J. Combin. 9 (1988) 363– 368. 10. S. Jezný, M. Trenkler, Characterization of magic graphs, Czechoslovak Math. J. 33 (1983) 435–438. 11. J. Sedláček, On magic graphs, Math. Slovaca 26 (1976) 329–335. 12. J. Sedláček, Problem 27. Theory of Graphs and Its Applications, Proc. Symp. Smolenice, Praha (1963) 163–164. 13. B.M. Stewart, Magic graphs, Canad. J. Math. 18 (1966) 1031–1059. 14. B.M. Stewart, Supermagic complete graphs, Canad. J. Math. 19 (1967) 427–438. 15. M. Trenkler, Numbers of vertices and edges of magic graphs, Ars Combinatoria 53 (2000) 93–96. 148 International Workshop on Combinatorial Algorithms 07 Some Parameterized Problems Related to Seidel’s Switching Ondřej Suchý Department of Applied Mathematics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic Abstract. Seidel’s switching of vertex set is an operation, which deletes edges leaving this set from the graph and adds those edges between the set and the rest of the graph, that weren’t there originally. Other edges remain untouched by this operation. The usual question in parameterized complexity is whether the exponential part of the algorithms for hard problems can be bounded by some function of only selected parameter, which we assume to be small. We study the complexity of a question, if the given graph can be turned into a graph with some property P using Seidel’s switching, from the parameterized view. We show ﬁxedparameter tractability of switching to a regular graph, to a graph with bounded degree of vertices, or with bounded number of edges, a graph without a forbidden subgraph and a bipartite graph. 1 Introduction Parameterized complexity became one of the standard tools for exact solving of hard problems these days. The basic notions and ideas of parameterized complexity, were introduced by Downey and Fellows [1]. We say that a parameterized problem P ⊆ Σ ∗ × N, where Σ is some ﬁxed alphabet is fixed-parameter tractable (FPT), if there is an algorithm that decides whether the input (I, k) belongs to P in time f (k)|I|c , where c is a ﬁxed constant and f (k) is a function independent of the size of the input |I|. If (I, k) ∈ Σ ∗ × N is an instance of the problem P , then I is called the main part and k is called the parameter. We use the standard O∗ () notion for the asymptotic running time of an exact algorithm, which suppresses the polynomial part of the function. Seidel’s switching is possibly the most elegant graph transformation since we can achieve global modiﬁcation of a graph by switching just one vertex. It was introduced by Dutch mathematician J.J.Seidel [2]. For a simple undirected graph G = (V, E) and A ⊆ V subset of its vertices we deﬁne Seidel’s switching of a set A in a graph G to be the graph S(G, A) = (V, E ′ ), where E ′ = E △ {{u, v}|u ∈ A, v ∈ V \ A}. Specially if A = {v}, we talk about Seidel’s switching of a vertex v. International Workshop on Combinatorial Algorithms 07 149 It is quite easy to see (cf. Lemma 1), that the following relation is an equivalence. Graphs G and H are switching equivalent (denoted by G ∼ H), iﬀ there is an A ⊆ VG such that S(G, A) is isomorphic to H. An equivalence class of this relation, e.g. the set [G] = {S(G, A) : A ⊆ VG } is called switching class of the graph G. Since all the switchings in this paper are Seidel’s we sometimes omit this adjective. There were many results concerning structural properties as well as complexity questions related to Seidel’s switching. Probably the most studied problem is, whether the given (or every in the structural approach, respectively) graph can be switched to have some property P . This problem is usually denoted by S(P ). It is highly interesting that the complexity of the problem S(P ) is not related to the complexity of the original problem P . In particular there are problems known such that the problem P is NP-complete and S(P ) is polynomial and vice-versa. For a survey of results in this ﬁeld see e.g. [3] or [4]. For the problems we concentrate on, the most important is the work of Kratochvı́l [5]. He proved that the problem of switching to a k-regular graph is NP-complete when k is a part of the input. On the other hand, he proved that for k ﬁxed and for the polynomial time recognizable class of k-degenerate graphs P (e.g. the class of all k-regular graphs), the problem S(P ) is polynomial time solvable. Another important result is due to Jelı́nková [6], who proved that it is NP-complete to decide whether the graph can be switched to a graph having at most k edges, when k is a part of the input. In the next section we provide some basic results about the switching. Then we examine the problem whether the given graph can be switched to a graph with all the degrees at most k. As with all the other problems here considered we show ﬁxed-parameter tractability. Since the classical complexity is not known yet, the main reason to consider this problem is to provide a tool for the problems that follow. We highlight that the parameterized dual of the problem (Switching to a graph with all the degrees at least k) is only of limited interest since it is trivially ﬁxed parameter tractable due to the Corollary 1. The complexity of that problem parameterized above the guaranteed values remains open. We continue in the next section by proving, that the parameterized version of the NP-complete problem of switching to a k-regular graph is in FPT. The same is proven for another NP-complete problem of switching to graph with at most k edges in the next section. We also show the tractability result for the problem of switching to an H-free graph. The last section contains some open problems. 2 Basic properties of Seidel’s switching Now we give a few basic results that we will use through the paper. Lemma 1. Let G be a graph and A, B ⊂ V two subsets of its vertices. Then 150 – – – – – – International Workshop on Combinatorial Algorithms 07 S(G, A)[A] = G[A], S(G, A) = S(G, V \ A), S(G, ∅) = S(G, V ) = G, S(S(G, A), B) = S(S(G, B), A) = S(G, A △ B), S(S(G, A), A) = G, S(G, A) = S(G, A). Lemma 2. If G is a graph, v is its vertex and X ⊆ V \ {v}, then there is a unique graph H ∈ [G] such that NH (v) = X. Lemma 3. Let G = (V, E) be a graph on n vertices. Then there is a graph H ∈ [G] such that ∀v ∈ V : degH (v) ≥ ⌊n/2⌋. Corollary 1. It is trivially fixed-parameter tractable to decide whether a graph on the input can be switched to some graph that has all the degrees at least k. Proof. If k < n/2 then answer YES. Otherwise try all possibilities, this takes time at most n2 · 2n−1 ≤ k 2 · 22k . 3 Switching to a graph with all the vertices of degree at most k The classical complexity of this problem is unknown. Indeed, ideas of the algorithm we present here are the crucial ones in the proofs of the tractability of other problems. Moreover, it is the best known algorithm for this problem. The exact speciﬁcation of the problem is: Switching to a graph with all the vertices SwitchSmallDegs of degree at most k Input: Graph G = (V, E) Parameter: A positive integer k — the desired maximal degree Task: Is there a subset of vertices A ⊆ V such that the degree of each vertex in the switching of the set A in a graph G is at most k, i.e. ∀v ∈ V : degS(G,A) (v) ≤ k? Before we present an algorithm to solve the problem, observe that we can assume that the input graph contains an isolated vertex v0 . If it does not, we can switch it in linear time (in the number of the edges of the original graph) according to Lemma 2 to obtain an isolated vertex. The answer remains unchanged due to Lemma 1. Since A is a solution iﬀ V \ A is, we may assume without loss of generality, that v0 ∈ / A. These two things together give us that NS(G,A) (v0 ) = A. Thus necessarily |A| ≤ k. During the run of the algorithm we denote by l the number of the vertices we can still switch. At the beginning we set l := k. Now we introduce two simple rules that solve the ”big” instances: Lemma 4 (Rule 1). We must switch every vertex v of degree greater than k +l in order to obtain a solution. International Workshop on Combinatorial Algorithms 07 151 Proof. By contradiction. Suppose that A determines a switching solving the problem and that v ∈ / A. Vertex v has at least k + l + 1 neighbors in G. By switching v could loose at most |A| of them and thus at most l. So it still has at least k + 1 neighbors. Switching doesn’t solve the problem, a contradiction. Lemma 5 (Rule 2). No vertex v of degree less than n − k − l can be switched to obtain a solution. This lemma is proved similarly as the previous one, so we omit the proof. The following statement is an easy corollary of the previous two lemmas: Corollary 2 (Boundary). If n ≥ 2k + 2l + 1, then at least one of the rules 1 and 2 applies on each vertex. That means we have already obtained a kernel of size 4k for the problem and thus the problem is in FPT. As we want to give a better bound on the running time of the algorithm, we concentrate on the instance with 0 ≤ p ≤ k − 1 and n = 2k + 2p + 2 or n = 2k + 2p + 1. In this case, after switching k − p vertices it is possible to decide which vertices should be switched and which should not. The algorithm proceeds as follows: 1. Set l := k 2. For every vertex v (except v0 ) – Check which of the rules 1 and 2 applies on v – If both rules apply answer NO and quit. – If the rule 1 applies switch v and decrease l – If l < 0 answer NO and quit. 3. If n > 4k then check if all the vertices have degree at most k in this switching and answer. 4. If n ≤ 4k then try all the possibilities A to choose up to l − p vertices that can be switched and that were not switched already. For each choice of A switch according to A, apply the rules once again and check whether we obtained the desired solution. Remark 1. The algorithm can be also used to enumerate all the graphs with the desired property. The correctness of the algorithm follows immediately from Lemmas 4, 5 and Corollary 2. To ¡count ¢ the running time of the algorithm we must bound the number Pk−p n−1 of the candidate sets in step 4. It is easy to observe that this is i=0 i ¡ ¢ always at most 2k+2p+1 · k, so we have to search the biggest number among k−p µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ 2k + 1 2k + 3 2k + 2p + 1 4k − 3 4k − 1 , , ..., , ..., , . k k−1 k−p 2 1 By comparison of two neighboring terms we obtain a cubic equation with parameter k, with only one real root, which is asymptotically p = ⌈0.223k⌉. This ¢ ¡ · k to be the biggest term. gives 2⌈1.223k⌉+1 ⌊0.777k⌋ We summarize our results in the following theorem: 152 International Workshop on Combinatorial Algorithms 07 Theorem 1. The problem SwitchSmallDegs is linear fixed-parameter trac¡ ¢ table and it can be solved in time O∗ 4.62k . Proof. A graph G with an isolated vertex can be obtained in time O (m). Then we just test some candidate sets each of the size at most k. It takes O (k · n) time to switch the graph according to the candidate set and O (n) time to check if it is a solution. If¡n > 4k then ¢ we have at most one candidate. If n ≤ 4k then · k candidates, as we have counted before. This we can have up to 2⌈1.223k⌉+1 ⌊0.777k⌋ k grows approximately as 4.614 4 Switching to a k-regular graph Switching to a k-regular graph SwitchReg Input: Graph G = (V, E) Parameter: A positive integer k — the desired degree of the vertices Task: Is there a subset of vertices A ⊆ V such that the graph S(G, A) is k-regular? Kratochvı́l [5] proved that the problem of SwitchReg is NP-complete, but it can be solved in a polynomial time for any ﬁxed k. We show ﬁxed-parameter tractability of this problem. In an arbitrary k-regular graph each vertex has degree at most k, this means that we can use our previous algorithm to decide the problem. We just enumerate all the degree ≤ k graphs and check if some of them is k-regular. But we can slightly improve the algorithm. We always have to switch exactly k vertices in this case, so if there is a candidate set of size k − p that leads to the solution then there is also one using just ﬁrst n − p − 1 vertices diﬀerent Using ¢this idea we can bound ¡ from ¢ v0 ¡. 2k+p+1 the number of candidate tests by n−p−1 ≤ . To ﬁnd the bound for k−p k−p ¢ ¡ this is somewhat easier than in the previous case, the answer being ⌈2.171k⌉+1 ⌊0.829k⌋ asymptotically (for p = ⌈0.171k⌉). Our results are summarized by the following theorem: Theorem 2. The problem is linear fixed-parameter tractable and ¡ SwitchReg ¢ can be solved in time O∗ 4.263k 5 Switching to a graph with at most k edges Switching to a graph with at most k edges SwitchMinEdges Input: Graph G = (V, E) Parameter: A positive integer k — the desired number of edges Task: Is there a subset of vertices A ⊆ V such that the graph S(G, A) has at most k edges? The problem SwitchMinEdges is NP-complete[6]. But since the desired graph has at most k edges, each vertex has degree at most k. So the problem is International Workshop on Combinatorial Algorithms 07 153 ﬁxed-parameter tractable by similar arguments to the ones used for SwitchSmallDegs problem. We present much more eﬃcient algorithm for this problem. Following the way our previous algorithm works, we may suppose that our graph has an isolated vertex v0 that is not switched in the solution sought. We denote by l the upper bound for the number of vertices we can still switch and by B the set of vertices already switched. At the beginning B := ∅ and during the entire algorithm we set l := k − |B|. We introduce two rules. They are more powerful and the proof is diﬀerent. Rule 1: Switch every vertex of degree greater than l. Lemma 6 (Correctness of Rule 1). Vertex v ∈ V \ B \ {v0 } of degree degS(G,B) (v) > l must be switched in order to obtain a solution. Proof. Suppose that v ∈ V \ B \ {v0 } is a vertex of degree d = degS(G,B) (v) > l and A ⊂ V \ B \ {v0 } is a set, such that its switching in S(G, B) gives a graph H. We show that if v ∈ / A, then the graph H has too many edges. Denote D = A ∩ NS(G,B) (v). Then {{v0 , x}|x ∈ D ∪ B} and {{v, x}|x ∈ NS(G,B) (v) \ D} are two disjoint sets of edges of the graph H such that the size of their union is |B| + |D| + |NS(G,B) (v) \ D| = |B| + |NS(G,B) (v)| = d + k − l > k. Thus the graph H has more than k edges and the switching of the set A does not lead to a solution. Lemma 7 (Rule 2). Vertex v ∈ V \ B \ {v0 } of degree degS(G,B) (v) < n − l − 1 cannot be switched in order to obtain a solution. Proof. Let again d = degS(G,B) (v) < n − l − 1, A ⊂ V \ B \ {v0 } and H = S(G, A ∪ B). Suppose v ∈ A. Then {{v0 , x}|x ∈ A ∪ B \ {v}} and {{v, x}|x ∈ V \ NS(G,B) (v) \ A} are two disjoint sets of edges of the graph H. The size of them together is at least (|B| + |A| − 1) + (n − d − |A|) = k − l − 1 + n − d > k − l + n − (n − l − 1) − 1 = k. Thus the graph H has more than k edges and the switching of the set A does not lead to a solution. Lemma 8 (Boundary). If n > 2l + 1, then on each vertex either Rule 1 or Rule 2 applies. The rest of the algorithm remains almost unchanged, only the kernel bound is 2k. Time complexity can be determined in a similar way as in previous cases giving us the following theorem: Theorem 3. SwitchMinEdges tractable and there is an ³¡ is fixed-parameter ¢´ k ≈ 2.148 . algorithm running in time O∗ 2⌈0,611k⌉+1 ⌊0,389k⌋ 6 Switching to an H-free graph There are many polynomial time algorithms known that decide whether the input graph can be switched to some containing no subgraph isomorphic to a ﬁxed graph H. Most of them uses a reduction to 2-SAT, or to the system of linear 154 International Workshop on Combinatorial Algorithms 07 equations over GF[2] or a characterization by forbidden subgraphs. But these methods cannot be applied when the number of switches should be optimized. We show that these problems are ﬁxed-parameter tractable with the same parameter (i.e. the number of switches allowed). Before we state the general theorem, we show the method on the special case of the problem Switching to a triangle free graph SwitchTriangle-Free Input: Graph G = (V, E) Parameter: A positive integer k — the number of switches allowed Task: Is there a subset of vertices A ⊆ V of size at most k such that the graph S(G, A) is triangle-free? Hayward et al. [7] and Hage et al. [3] independently presented two diﬀerent algorithms recognizing the graphs ¡ ¢ that can be switched to triangle-free graphs. Both of them work in time O n3 by means of reduction to 2-SAT. In contrary, it is known that Weighted 2-SAT is NP-complete problem [1]. Theorem ¡ 4. It ¢ is possible to decide the problem SwitchTriangle-Free in time O 3k · n3 . Proof. We use the bounded search tree technique. Our algorithm is recursive and each call gets on the input a graph G, a number k and a set A of already switched vertices. At the beginning of each call, we search the graph S(G, A) for a triangle. If there is none, we answer YES and the set A determines the solution. Otherwise if |A| = k or the graph G[A] = S(G, A)[A] contains a triangle then answer NO. If that is not the case then pick an (arbitrary) triangle with the most number of vertices in A. For each of its vertices not in A call the algorithm on triple (G, k, A ∪ {v}) recursively. Answer YES if any of the calls answers so, taking the appropriate solution set and NO otherwise. The result is obtained by the call on (G, k, ∅). Correctness and the time complexity of the algorithm are easy to check. We generalize this result to an arbitrary set of forbidden induced subgraphs. Suppose that we are given a set S = {H1 , H2 , ..., Hp }. We consider this problem: Switching to a graph with no induced subgraph from the set S Switch S-free Input: Graph G = (V, E) Parameter: A positive integer k — the number of switches allowed Task: Is there a subset of vertices A ⊆ V of size at most k such that the graph S(G, A) contains no induced subgraph from the set S ? Denote by l(S) = max{|VH ||H ∈ S} the size of the biggest graph in the set S. ³ ´ k Theorem 5. For each finite set S it can be decided in time O l(S) · nl(S) whether the input graph G can be switched to a graph containing no induced subgraph from the set S. Hence the corresponding problem of Switch S-free is fixed-parameter tractable for each finite set S of graphs. International Workshop on Combinatorial Algorithms 07 155 The proof is a slight modiﬁcation of the ideas of the triangle-free case and we omit it here. 7 Switching to a bipartite graph The last problem we concern is: Switching to a bipartite graph SwitchBipartite Input: Graph G = (V, E) Parameter: A positive integer k — the number of switches allowed Task: Is there a subset of vertices A ⊆ V of size at most k such that the graph S(G, A) is bipartite? This problem can be formulated as switching to a graph without odd cycles. As the set of forbidden graphs is inﬁnite, our algorithm for Switch S-free does not immediately apply. Hage et al. [3] showed an ¡algorithm for the case ¢ of unbounded number of switches that works in time O n3 by a reduction to 2-SAT. This indicated that the bounded problem could be signiﬁcantly harder. We show ﬁxed-parameter tractability of the problem. We start with some key observations: First of all, each graph that is not bipartite contains not only an odd cycle, but also an induced odd cycle. Hence it is suﬃcient to get rid of induced odd cycles. Moreover an odd cycle Cn of length n ≥ 7 cannot be switched to a bipartite graph [8]. Hence the graph that contains an odd cycle of length at least 7 as an induced subgraph, cannot be switched to a bipartite graph. So we deal almost only with triangles and 5-cycles. 1 1 5 2 3 4 (a) 1 5 2 3 4 (b) 1 5 2 3 4 (c) 5 2 3 4 (d) Fig. 1. A ﬁve-cycle (a), with one vertex (b), two adjacent (c) and two non-adjacent vertices (d) switched. Moreover if we have a 5-cycle ({1, 2, 3, 4, 5}, {{12}, {23}, {34}, {45}, {51}}) then we must switch at least two vertices to obtain a bipartite graph (see Fig. 1). The sets {13}, {24}, {35}, {41}, {52} and their complements determine the only suitable switches. Since all the three-element sets contain some of the two-element sets as a subset, we can switch always one of the ﬁve two-element sets when dealing with ﬁve-cycle, with possibly switching some more in some of the next steps. This signiﬁcantly reduces the size of the search tree. 156 International Workshop on Combinatorial Algorithms 07 Now we introduce our recursive algorithm, leaving some details to the next paragraph: Each call gets on the input a graph G, a number k and a set A of already switched vertices. At the beginning of each call, we search the graph S(G, A) for the shortest odd cycle (the shortest is always induced). If there is none, we answer YES and the set A determines the solution. Otherwise if |A| = k or the entire cycle is contained in the graph G[A] = S(G, A)[A] then answer NO. Otherwise continue according to what we found: – Triangle: For each of the vertices of triangle not in A call the algorithm on triple (G, k, A ∪ {v}) recursively. Answer YES if any of the calls answers so, taking the appropriate solution set and NO otherwise. – Five-cycle {a1 , . . . , a5 }: Check whether |A| is at most k−2. If it is not, answer NO. Otherwise for each of the sets {x, y} from {a1 , a3 }, {a2 , a4 }, {a3 , a5 }, {a4 , a1 } and {a5 , a2 } that has empty intersection with A call the algorithm on (G, k, A ∪ {x, y}). Answer YES if any of the calls answers so, taking the appropriate solution set and NO otherwise. – An induced odd cycle of length seven or more: Answer NO. The result is obtained by the call on (G, k, ∅). The search for the shortest odd cycle can be done be running breath-ﬁrstsearch (BFS) from each vertex u and looking for an edge between two vertices of same distance from u. When running from one vertex u it is suﬃcient to use each edge ¡at most twice. As there is at most n2 edges, the search can be done ¢ 3 in time O n . The only remaining thing to count the running time of the algorithm is the maximum number of the recursive calls. It is suﬃcient to count the number f (k) of leaves of the search tree of height k. It holds that f (0) = 1 and we have two recurrences for the function f (k): f (k) = 3 · f (k − 1) for the triangle case f (k) = 5 · f (k − 2) for the ﬁve-cycle case The ﬁrst recurrence gives the worse result of f (k) = 3k . We summarize the result: ¡ ¢ Theorem 6. SwitchBipartite can be solved in time O 3k · n3 . 8 Open problems The classical complexity of several problems in this area is still unknown. We mention at least one of them that is closely related to the problems we have solved. International Workshop on Combinatorial Algorithms 07 157 ¡ ¢ Switching to a k-degenerated graph Kratochvı́l [5] presented an O nk+5 time algorithm for this problem. If k is a part of the input, no polynomial time algorithm is known. Fixed-parameter tractability would be a good alternative. Acknowledgments The author would like to thank his advisor Jan Kratochvı́l for his useful advices and tips and also Petr Hliněný for valuable discussion and suggestions. References 1. Downey, R.G., Fellows, M.R.: Parametrized Complexity. Monographs in Computer Science. Springer (1999) 2. Seidel, J.: Graphs and two-graphs. In: Proc. 5th Southeastern Conf. on Combinatorics, Graph Theory, and Computing, Winnipeg, Canada, Utilitas Mathematica Publishing Inc. (1974) 3. Hage, J., Harju, T., Welzl, E.: Euler graphs, triangle-free graphs and bipartite graphs in switching classes. In: Proceedings of ICGT 2002. Volume 2505 of LNCS., Springer (2002) 148–160 4. Ondráčková, E.: Computational complexity in graph theory. Master’s thesis, Department of Applied Mathematics, Faculty of Mathematics and Physics, Charles University in Prague (2006) 5. Kratochvı́l, J.: Complexity of Hypergraph Coloring and Seidel’s Switching. In Bodlaender, H.L., ed.: WG. Volume 2880 of Lecture Notes in Computer Science., Springer (2003) 297–308 6. Jelı́nková, E.: Three NP-Complete Optimization Problems in Seidel’s Switching. In: Proceedings of IWOCA 2007 7. Hayward, R.B., Hougardy, S., Reed, B.A.: Polynomial time recognition of P4 structure. In: Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. (2002) 382–389 8. Hage, J., Harju, T.: Towards a characterization of bipartite switching classes by means of forbidden subgraphs. Technical Report UU-CS-2006-028, Institute of Information and Computing Sciences, Utrecht University (2006) 158 International Workshop on Combinatorial Algorithms 07 Computing the Moments of Costs over the Solution Space of the TSP in Polynomial Time Paul J. Sutcliffe, Andrew Solomon, and Jenny Edwards Faculty of Information Technology University of Technology, Sydney, Sydney, Australia. psutclif,andrews,jenny@it.uts.edu.au Abstract. We give polynomial time algorithms to compute the third and fourth moments about the mean of tour costs over the solution space of the general symmetric Travelling Salesman Problem (TSP). These algorithms complement previous work on the population variance and provide a tractable method to compute the skewness and kurtosis of the probability distribution of tour costs. The methodology is generalisable to higher moments. Experimental evidence is given that suggests the skewness asymptotically approaches a limit point as the instance size is increased in several problem types. 1 1.1 Introduction The TSP The travelling salesman problem (TSP) is a classic problem in combinatorial optimization. Extensive references include [1–3]. Linear programming reductions are surveyed in [4] while the properties of frequently used local search heuristics are considered in [5]. It is natural to define the symmetric TSP in terms of a complete undirected graph Γ = (V, E) with the vertices V representing cities, and the edges E representing the connections between cities. We label the set of n vertices as {1, 2, . . . , n}, and an n-cycle permutation of these is a tour or solution, π. The set of all tours, the solution space, is denoted Θ. The distance between cities (or cost of an edge), is a function cost : E → R which we to the function Ω : Θ → R, defined as the cost of a tour Pextend n Ω(π) = i=1 cost({π(i), π((i mod n) + 1)}). The TSP is to find some n-cycle permutation π of V for which Ω(π) is smallest. Such a permutation π ∗ is called a global minimum tour. If there are n cities then the number of tours is |Θ| = (n − 1)!/2. 1.2 Survey of Statistical Results Previous theoretical work on the probability distribution of the TSP is surveyed in [6, 7], these largely concern the case of the Euclidean TSP with city coordinates as n random variables in bounded subsets of Rd . Beardwood et. al. [8] prove that International Workshop on Combinatorial Algorithms 07 159 Ω(π ∗ ) approaches a constant as n → ∞. Steele proves the variance of costs over the solution space is bounded [6]. Rhee and Talagrand prove that the tails of the cost distribution approach that of a Gaussian as the number of cities increases [9]. In the more general case Krauth and Mézard [10] extend the result of Beardwood et. al. to problems with uniform random edge costs. More recently Wästlund [11] extends it to the TSP on bipartite graphs with uniform random edge costs. Basel et. al. [12] show by random sampling a remarkable linear correlation between the square root of a problem size and an estimate of the number of standard deviations between the mean tour cost and the known optimal tour cost in a real world set of approximately Euclidean problems. Sutcliffe et. al. [13] give a constructive proof that the population variance of tour costs over the solution space of an instance of size n cities can be computed in O(n2 ), see Theorem 1 below. Applying this, they confirm the linear relationship found by Basel et. al. and show a similar, although non-linear, relationship in the case of a set of non-Euclidean real world problems. 1.3 Moments in Terms of the TSP In terms of a TSP with solution space Θ, cost function Ω and mean tour cost µ, the kth moment about the mean or central moment [14] can be written P ((Ω(π) − µ)k ) π∈Θ . (1) mmk (Θ) = |Θ| It is reported in [15](and a simple proof follows from Lemma 1) that the mean tour cost P over the solution space of a problem of size n cities with edge set 2 cost(e). The second moment or population variance is given E is µ = n−1 e∈E by Theorem 1 below. Comparison of the second and third moments provides the mm3 (X) well known statistic, the skewness, α3 (X) = mm 3/2 , which reflects the degree 2 (X) of symmetry of a probability distribution [14]. Theorem 1. The population variance of tour costs over the solution space of a TSP of size n cities and with edge set, E and vertex set V is var = 4β1 + 2β2 2β1 − (n − 1) (n − 1)(n − 2) with the values β1 , β2 being defined as P β1 = c0 (e)2 e∈E β2 = P [c0 (e)(Sx + Sy − 2c0 (e))] (2) (3) e={x,y}∈E where P c0 (e) = cost(e) − µ/n, Ix is the set of edges incident to a vertex x with Sx = e∈Ix c0 (e) and similarly for Sy . 160 2 International Workshop on Combinatorial Algorithms 07 The Third and Fourth Moment of Costs over the Solution Space We begin with a technical lemma providing the number of tours containing various configurations of edges. Table 1 enumerates the eleven cases to be used. Lemma 1. Given a TSP with graph Γ, let P be a set of m, non-cyclic, nonsingleton paths over Γ sharing no vertices. Let k be the number of vertices not appearing in any path of P. Then there are 2m−1 (k + m − 1)! tours containing all the paths in P. Proof. Label the paths of P, pj with j ∈ [1 . . . m]. We recall that a tour is a cyclic permutation of vertices. Therefore, without loss of generality, fix p1 in position and orientation and write a tour as (p1 , i1 , i2 . . . , iq , p2 , iiq +1 . . . , pm . . . , ik ). There are (k + m − 1)! orderings of the free paths and vertices. Each path is at least 2 vertices long and so each of the m − 1 free paths has 2 orientations, implying the result. ⊓ ⊔ Table 1. The eleven ways that, up to four unlabelled edges, can be arranged into paths in tours of size n. The − character represent an edge, so −− means a path with 2 edges and three vertices. The ! symbol, the set (possibly empty) of free vertices between unconnected paths. The number of paths is given by m, while k is the number of free vertices. case pattern 1 2 3 4 5 6 7 8 9 10 11 2.1 −! −− ! −!−! −−−! −− ! − ! −!−!−! − − −− ! −−−! −! −− ! −− ! −− ! − ! − ! −!−!−!−! m k 1 1 2 1 2 3 1 2 2 3 4 (n − 2) (n − 3) (n − 4) (n − 4) (n − 5) (n − 6) (n − 5) (n − 6) (n − 6) (n − 7) (n − 8) num. tours cities n (n − 2)! (n − 3)! 2(n − 3)! (n − 4)! 2(n − 4)! 4(n − 4)! (n − 5)! 2(n − 5)! 2(n − 5)! 4(n − 5)! 8(n − 5)! n>2 n>2 n>3 n>3 n>4 n>5 n>4 n>5 n>6 n>6 n>7 Computing the Third Moment In order to prove our central theorem we provide some notational machinery. Let Θ be the solution space of a TSP with edge set E and cost function Ω. We index each π in Θ with an integer m ∈ [1 . . . |Θ|], similarly we label the edges of E as International Workshop on Combinatorial Algorithms 07 161 ei with i ∈ [1 . . . |E|]. We define the function [1 . . . |Θ|] × [1 . . . |E|] : t → {0, 1} as ( 1 if edge ei is in tour m tmi = 0 otherwise. Under this arrangement if m is the index of a tour π then the cost of π is Ω(π) = tm1 cost(e1 ) + tm2 cost(e2 ) . . . tm|E| cost(e|E| ) , and specializing (1) to k = 3, the third moment about the mean µ is mm3 (Θ) = |Θ| P ((tm1 cost(e1 ) + tm2 cost(e2 ) . . . tm|E| cost(e|E| ) − µ)3 ) m=1 . |Θ| (4) Now |Θ| is, of course, factorial on n and so this formulation is impractical for all but the smallest problems. In Theorem 2 we give a polynomial time solution to the problem. Returning to notational matters, let Ap be the set of edges adjacent to edge ep . Let Np,q,... be the set adjacent to nor equal to edges ep , eq , . . . , S of edges S neither S so Np,q,... = E − (Ap {ep } Aq {eq } . . .). Theorem 2. The third moment about the mean of tour costs over the solution space of a TSP with n > 3 cities, mean tour cost µ, and with edge set E is mm3 = 2(γ2 + 2γ3 ) 2(γ4 + 2γ5 + 4γ6 ) 2γ1 + + (n − 1) (n − 1)(n − 2) (n − 1)(n − 2)(n − 3) with the values γ1 , γ2 , γ3 γ4 , γ5 , γ6 given by γ1 = P c0 (e) e∈E P γ2 = 3 3 c0 (ep )2 P c0 (ep )2 P P c0 (eq ) eq ∈Np ep ∈E γ4 = 3 c0 (eq ) eq ∈Ap ep ∈E γ3 = 3 P c0 (ep ) ep ∈E P eq ∈Ap c0 (eq ) P er ∈Aq −(Ap c0 (er ) S {ep }) P P P c0 (er ) c0 (eq ) γ5 = 3 c0 (ep ) er ∈Np,q ep ∈E eq ∈Ap P P P c0 (er ) c0 (eq ) c0 (ep ) γ6 = ep ∈E eq ∈Np er ∈Np,q where c0 (e) = cost(e) − µ/n. Proof. Consider (4). Each tour has only n edges, so for a given m there are just n tmi which are equal to 1, the remainder being equal to 0. So let c0 (ei ) = cost(ei ) − µ/n. Then (4) is written 162 International Workshop on Combinatorial Algorithms 07 mm3 (Θ) = = |Θ| P ((tm1 c0 (e1 ) + tm2 c0 (e2 ) . . . tm|E| c0 (e|E| ))3 ) m=1 |Θ| P |E| P |E| P |E| P , |Θ| tmi tmj tmk c0 (ei )c0 (ej )c0 (ek ) m=1 k=1 j=1 i=1 |Θ| . The product tmi tmj tmk = 1, if and only if, tour m contains the edges ei , ej , ek and there are six way in which this can occur, case 1 All of ei , ej , ek are equal. By Lemma 1 and Case 1 of Table 1 there are (n − 2)! tours containing the edge. case 2 Two of ei , ej , ek are equal and the third is adjacent. By Lemma 1 and Case 2 of Table 1 there are (n − 3)! tours containing the three edges so configured. case 3 Two of ei , ej , ek are equal and the third is non-adjacent to them. By Lemma 1 and Case 3 of Table 1 there are 2(n − 3)! tours containing the 2 edges so configured. case 4 The three edges ei , ej , ek form a path. By Lemma 1 and Case 4 of Table 1 there are (n − 4)! tours containing the edges so configured. case 5 Two of ei , ej , ek are adjacent and the third is non adjacent to either. By Lemma 1 and Case 5 of Table 1 there are 2(n − 4)! tours containing the three edges so configured. case 6 All ei , ej , ek are all non adjacent to each other. By Lemma 1 and Case 6 of Table 1 there are 4(n − 4)! tours containing the three edges so configured. For each of there six cases we write the sum of edge cost products as γ1 to γ6 in (2). Upon collecting like terms we have: mm3 (Θ) =((n − 2)!γ1 + (n − 3)!γ2 + 2(n − 3)!γ3 + (n − 4)!γ4 + 2(n − 4)!γ5 + 4(n − 4)!γ6 )/|Θ| as required. = 2((n − 2)!γ1 + (n − 3)!(γ2 + 2γ3 ) + (n − 4)!(γ4 + 2γ5 + 4γ6 )) (n − 1)! = 2(γ2 + 2γ3 ) 2(γ4 + 2γ5 + 4γ6 ) 2γ1 + + . (n − 1) (n − 1)(n − 2) (n − 1)(n − 2)(n − 3) ⊓ ⊔ International Workshop on Combinatorial Algorithms 07 2.2 163 Reducing the Computational Complexity of Third Moment The set Ap is O(n) in size, while the sets E, Np , Np,q are all O(n2 ) in size. This implies that a naive application of Theorem 2 above would have complexity O(n6 ), being that of the sum γ6 . Here we show that this can be reduced to P O(n4 ). Let Ix be the set of edges incident the vertex x and let Sx = c0 (e), e∈Ix be the sum of edge costs incident to x. Now |Ix | = n − 1, so the time complexity of pre-computing all the n values Sx is O(n2 ) and the space complexity of saving them is O(n). Lemma 2. γ2 can be found in O(n2 ) P P c0 (eq ). Consider the right most sum c0 (ep )2 Proof. Recall that γ2 = 3 eq ∈Ap ep ∈E on Ap . We show this can be found in constant time. Writing each edge ep , as S ep = {p1 , p2 } and noting that Ap = (Ip1 Ip2 ) − {ep } gives, P γ2 = 3 c0 (ep )2 (Sp1 + Sp2 − 2c0 (ep )) ep ∈E P = 6γ1 + 3 c0 (ep )2 (Sp1 + Sp2 ) . ep ∈E This along with |E| ∈ O(n2 ) implies the result. ⊓ ⊔ Lemma 3. γ3 = −γ2 − 3γ1 P P c0 (eq ). Consider the right most sum, c0 (ep )2 Proof. Recall that γ3 = 3 eq ∈Np ep ∈E P P S P c0 (e) − c0 (ep ), but c0 (e) − c0 (e) = Np = E − (Ap {ep }). So e∈Ap e∈E e∈Np P c0 (e) = 0 thus e∈E " # c0 (eq ) − c0 (ep ) c0 (ep )2 − eq ∈Ap P P P c0 (eq ) − 3 c0 (ep )2 c0 (ep ) = −3 c0 (ep )2 γ3 = 3 P ep ∈E ep ∈E P eq ∈Ap ep ∈E = −γ2 − 3γ1 . As required. ⊓ ⊔ Lemma 4. γ4 can be found in O(n3 ) P P c0 (eq ) Proof. Recall that γ4 = 3 c0 (ep ) ep ∈E eq ∈Ap P er ∈Aq −(Ap c0 (er ). We show S {ep }) that the right most sum can be found in constant time given an ep and eq . Let ep = {s, p}, let eq = {s, q} be adjacent to it, sharing vertex s, and let epq = {p, q} be adjacent to both. In addition let Iq be the sets of edges S incident to vertex S q and let Sq be the pre-computed edge sum. Then Aq −(Ap {ep }) = Iq −({eq } {epq }) and P P γ4 = 3 c0 (ep ) c0 (eq )(Sq − c0 (eq ) − c0 (epq )) . ep ∈E eq ∈Ap 2 This along with |E| ∈ O(n ) and |Ap | ∈ O(n) implies the result. ⊓ ⊔ 164 International Workshop on Combinatorial Algorithms 07 Lemma 5. γ5 can be found in O(n3 ) Proof. Recall that γ5 = 3 P c0 (ep ) ep ∈E P c0 (eq ) P c0 (er ) We show that the er ∈Np,q eq ∈Ap right most sum can be found in constant time. Let ep = {s, p}, let eq = {s, q} be adjacent to it, sharing vertex s, and let epq = {p, q} be adjacent to both. In addition let Is , Ip , Iq be the sets of edges incident to the vertices s, p, q respectively and let Ss , S p , Sq be the preS S P computed edge sums. Now Np,q = E − (Is Ip Iq ), but c0 (e) = 0 and e∈E P the edges epq , eq , ep are each elements of two of Is , Ip , Iq so, c0 (er ) = er ∈Np,q c0 (ep ) + c0 (eq ) + c0 (epq ) − Ss − Sp − Sq and γ5 = 3 P " # P c0 (eq ) [c0 (ep ) + c0 (eq ) + c0 (epq ) − Ss − Sp − Sq ] c0 (ep ) # " eq ∈Ap P P = 6γ2 + 3 c0 (eq ) [c0 (epq ) − Ss − Sp − Sq ] . c0 (ep ) ep ∈E ep ∈E eq ∈Ap This along with |E| ∈ O(n2 ) and |Ap | ∈ O(n) implies the result ⊓ ⊔ Lemma 6. γ6 can be found in O(n4 ) Proof. Recall that γ6 = P P c0 (ep ) c0 (eq ) eq ∈Np ep ∈E P c0 (er ). We show that er ∈Np,q the right most sum can be found in constant time and the that the middle sum can be rewritten over Ap . Let ep = {p1 , p2 } and let eq = {q1 , q2 } be non adjacent. In addition let Ip1 , Ip2 , Iq1 , Iq2 be the sets of edges incident to these vertices and S letS Sp1 ,SSp2 , Sq1 , Sq2Pbe the pre-computed edge sums. Now Np,q = E−(Ip1 Ip2 Iq1 Iq2 ), but c0 (e) = 0 and the edges ep , eq , {p1 , q1 }, e∈E {p1 , q2 }, {p2 , q1 },P {p2 , q2 } are each elements of two of Ip1 , Ip2 , Iq1 , Iq2 . Therefore write SN p,q = c0 (e) = −Sp1 −Sp2 −Sq1 −Sq2 +c0 (ep )+c0 (eq )+c0 ({p1 , q1 })+ e∈Np,q c0 ({p1 , q2 }) + c0 ({p2 , q1 }) + c0 ({p2 , q2 }) and γ6 = P ep ∈E as required. " c0 (ep ) P eq ∈Np # c0 (eq )SN p,q , ⊓ ⊔ Theorem 3. The complexity of computing the third moment about the mean of tour costs over the solution space of a TSP with n cities is O(n4 ). Proof. This follows directly from Theorem 2, the comments at the beginning of Sect. 2.2, and Lemmas 2 to 6. ⊓ ⊔ International Workshop on Combinatorial Algorithms 07 2.3 165 Computing the Fourth Moment about the Mean of Tour Costs Theorem 4. The fourth moment about the mean of tour costs over the solution space of a TSP with mean tour cost µ, size n > 5 cities and with edge set E is mm4 = 2(δ2a + δ2b + 2δ3a + 2δ3b ) 2δ1 + (n − 1) (n − 1)(n − 2) 2(δ4a + δ4b + 2δ5a + 2δ5b + 4δ6 ) + (n − 1)(n − 2)(n − 3) 2(δ7 + 2δ8 + 2δ9 + 4δ10 + 8δ11 ) , + (n − 1)(n − 2)(n − 3)(n − 4) with the values δ1 , δ2a , δ2b , δ3a , δ3b , δ4a , δ4b , δ5a , δ5b , δ6 , δ7 , δ8 , δ9 , δ10 δ11 given by δ1 = P c0 (e)4 e∈E P δ2a = 3 c0 (ep )2 ep ∈E P δ2b = 4 eq ∈Ap c0 (ep )3 ep ∈E P δ3a = 3 P c0 (ep )2 c0 (ep )3 c0 (ep )2 P P ep ∈E δ6 = 6 P ep ∈E P P c0 (ep ) c0 (eq ) c0 (ep )2 c0 (eq )2 P P P P c0 (eq ) c0 (er ) eq ∈Np P c0 (er ) er ∈Np,q c0 (eq ) eq ∈Ap c0 (ep )2 c0 (er ) er ∈Aq , er 6∈Ap , er 6=ep eq ∈Ap c0 (ep ) P er ∈Aq , er 6∈Ap , er 6=ep eq ∈Ap ep ∈E δ5b = 6 c0 (eq ) eq ∈Ap ep ∈E δ5a = 12 P eq ∈Np P P c0 (eq )2 P ep ∈E δ4b = 6 c0 (eq ) eq ∈Np ep ∈E δ4a = 12 P eq ∈Ap ep ∈E δ3b = 4 c0 (eq )2 P P c0 (er )2 er ∈Np,q c0 (eq ) P er ∈Np,q c0 (er ) 166 International Workshop on Combinatorial Algorithms 07 δ7 = 12 P c0 (ep ) P c0 (ep ) ep ∈E δ8 = 12 δ9 = 3 P c0 (eq ) P c0 (ep ) P c0 (eq ) P c0 (ep ) P c0 (eq ) ep ∈E c0 (er ) P c0 (er ) P c0 (er ) c0 (eq ) eq ∈Np P er ∈N p,q c0 (es ) P c0 (es ) es ∈Np,q,r P co (es ) es ∈Ar , es ∈Np,q er ∈N p,q eq ∈Ap P P P es ∈Ar , es 6∈Aq , es 6∈Ap er ∈N p,q eq ∈Ap c0 (ep ) c0 (er ) er ∈Aq , er 6∈Ap , er 6=ep eq ∈Ap P P er ∈Aq , er 6∈Ap , er 6=ep ep ∈E ep ∈E δ11 = c0 (eq ) eq ∈Ap ep ∈E δ10 = 6 P c0 (er ) P c0 (es ) es ∈Np,q,r P c0 (es ) es ∈Np,q,r where c0 (e) = cost(e) − µ/n. Proof. Specializing (1) to k = 4, and proceeding as we did for the third moment we have mm4 (Θ) = |Θ| P |E| P |E| P |E| P |E| P tmi tmj tmk tml c0 (ei )c0 (ej )c0 (ek )c0 (el ) m=1 l=1 k=1 j=1 i=1 |Θ| . The product tmi tmj tmk tml = 1 if and only if tour m contains the edges ei , ej , ek , el and there are eleven ways in which this can occur. case 1 All of ei , ej , ek , el are equal. By Lemma 1 there are (n − 2)! tours containing the edge. The value δ1 is the sum of terms in this case. case 2 From ei , ej , ek , el there are 2 distinct edges and they form a path. By Lemma 1 there are (n − 3)! tours containing the edges. The values δ2a , δ2b are the sums of terms in this case. case 3 From ei , ej , ek , el there are 2 distinct edges and they are non adjacent. By Lemma 1 there are 2(n−3)! tours containing the edges. The values δ3a , δ3b are the sums of terms in this case. case 4 From ei , ej , ek , el there are 3 distinct edges and they form a path. By Lemma 1 there are (n − 4)! tours containing the edges. The values δ4a , δ4b are the sums of terms in this case. case 5 From ei , ej , ek , el there are 3 distinct edges two of which form a path, the third is non adjacent. By Lemma 1 there are 2(n − 4)! tours containing the edges. The values δ5a , δ5b are the sums of terms in this case. case 6 From ei , ej , ek , el there are 3 distinct. All are non adjacent. By Lemma 1 there are 4(n − 4)! tours containing the edges. The value δ6 is the sum of terms in this case. case 7 Each of ei , ej , ek , el are distinct and form a path. By Lemma 1 there are (n − 5)! tours containing the edges. The value δ7 is the sum of terms in this case. International Workshop on Combinatorial Algorithms 07 167 case 8 Each of ei , ej , ek , el are distinct, 3 form a path, the other is non adjacent. By Lemma 1 there are 2(n − 5)! tours containing the edges. The value δ8 is the sum of terms in this case. case 9 Each of ei , ej , ek , el are distinct and form 2 non adjacent paths of 2 edges. By Lemma 1 there are 2(n − 5)! tours containing the edges. The value δ9 is the sum of terms in this case. case 10 Each of ei , ej , ek , el are distinct. Two are adjacent. The remaining are non adjacent. By Lemma 1 there are 4(n − 5)! tours containing the edges. The value δ10 is the sum of terms in this case. case 11 Each of ei , ej , ek , el are distinct. All are non adjacent. By Lemma 1 there are 8(n − 5)! tours containing the edges. The value δ11 is the sum of terms in this case. For each of these cases we write the sum of edge cost products as δ1 to δ11 in (4). Upon collecting like terms we have mm4 (Θ) = (n−2)!δ1 |Θ| + (n−3)!δ2a +(n−3)!δ2b +2(n−3)!δ3a +2(n−3)!δ3b |Θ| + (n−4)!δ4a +(n−4)!δ4b +2(n−4)!δ5a +2(n−4)!δ5b +4(n−4)!δ6 |Θ| + (n−5)!δ7 +2(n−5)!δ8 +2(n−5)!δ9 +4(n−5)!δ10 +8(n−5)!δ11 |Θ| . Recall |Θ| = (n − 1)!/2 so upon cancelation we have the result. 3 ⊓ ⊔ Empirical Examination of the Relationship between Skewness and Problem Size We examine four problem sets, two real world and two randomly generated. The four types are summarized in Table 2. Table 2. Problem types ProblemType Random Euclidean VLSI Random no embed. RH Data Size in Cities 10-1000 131-984 10-1000 68-662 Cases 21 10 21 39 Problem Description 2 Euclidean Metric of TSPLIB [16]. 2 Euclidean Metric of TSPLIB. Random integer edge costs from U (0, 999) Non Euclidean. Genomics problems. Of the real world sets the first set originated in the production of very large scale integrated circuits (VLSI) and uses the 2 dimensional Euclidean metric of [16]. The second set, of 39 instances, approximately obey the triangular inequality, but are non-Euclidean. They originate in the genomics community and arise from physical mapping of canine DNA by the radiation-hybrid (RH) method. The specific data set used was obtained from the RHDF9000 dog radiation hybrid panel[17]. 168 International Workshop on Combinatorial Algorithms 07 Skewness 0.000000 -0.100000 Type Random Euc. Random No Embedding RH Data VLSI Euc. -0.200000 -0.300000 0.00 400.00 800.00 1200.00 200.00 600.00 1000.00 Problem Size Fig. 1. The skewness versus the problem size in four problem types. 3.1 Results The skewness of each instance was found using Theorems 1 and 2 in conjunction with Lemmas 2 to 6. Figure 1 shows its relationship to problem size. The relationship suggests, that in the case of the non RH data sets the skewness asymptotically approaches 0 with size. The RH data set is somewhat suggestive of convergence but to a lower limit point. 4 Conclusions and Future Work In this paper we have given constructive proofs that the third central moment of tour costs over the solution space of any instance of a TSP of size n cities can be computed in O(n4 ) and the that fourth central moment can be computed in O(n8 ). Experience with the third moment would suggest this computational complexity may be reduced to O(n6 ). The method can be generalised to higher moments (at increased cost) and to variations of the problem such as the asymmetrical TSP. Previous theoretical work on the probability distribution of the TSP was largely confined to the Euclidean case and did not extend to providing the moments. Future work will investigate the role of the third and fourth moments in refining current methods to estimate the optimal solution cost and to understanding the solution space of the problem. Experimental evidence is given suggesting that the skewness asymptotically approaches 0 as the problem size is increased, in randomly generated nonembeddable and both random and real world 2 dimensional Euclidean instances. This implies that in these problem types, the distribution of tour costs become more symmetric as the problem size increases. This may make it possible to find bounds on the value of the odd moments of the cost distribution in certain classes of problem. International Workshop on Combinatorial Algorithms 07 169 Acknowledgements The authors would like to thank Andre Rohe, Simon de Givry, Thomas Schiex, Christophe Hitte and Jill Maddox for their assistance in obtaining the real world data sets. References 1. Gutin, G., Punnen, A.P.: Traveling Salesman Problem and Its Variations. Kluwer Academic Publishers (2002) 2. Ausiello, G., Protasi, M., Marchetti-Spaccamela, A., Gambosi, G., Crescenzi, P., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer-Verlag New York, Inc., Secaucus, NJ, USA (1999) 3. Reinelt, G.: The traveling salesman: Computational solutions for TSP applications. Springer Verlag (1994) LNCS 840. 4. Orman, A.J., Williams, H.P.: A survey of different integer programming formulations of the travelling salesman problem. Working Paper LSEOR 04.67, Department of Operational Research, London School of Economics and Political Science, London (2004) 5. Colletti, B., Barnes, J.: Local search structure in the symmetric travelling salesperson problem under a general class of rearrangement neighborhoods. Applied Mathematics Letters. 14(1) (2001) 105–108 6. Steele, J.M.: Probability Theory and Combinatorial Optimization. SIAM (1997) 7. Yukich, J.E.: Probability Theory of Classical Euclidean Optimization Problems. Volume 1675 of Lecture Notes in Mathematics. Springer (1998) 8. Beardwood, J., Halton, J., Hammersley, J.: The shortest path through many points. Proc. Cambridge Philos. Soc. 55 (1959) 299–327 9. Rhee, W.T., Talgrand, M.: A sharp deviation inequality for the stochastic traveling salesman problem. Annals of Probability 17 (1989) 1–8 10. Krauth, W., Mézard, M.: The cavity method and the travelling-salesman problem. Europhys. Lett. 8(3) (1988) 11. Wästlund, J.: The limit in the mean field bipartite travelling salesman problem. unpublished 2006, wwww.mai.liu.se/jowas 12. John Basel, I., Willemain, T.R.: Random tours in the traveling salesman problem analysis and application. Comput. Optim. Appl. 20(2) (2001) 211–217 13. Sutcliffe, P., Solomon, A., Edwards, J.: Finding the population variance of costs over the solution space of the tsp in polynomial time. In Psarris, K., Jones, A.D., eds.: Math 07, Proceedings of the Eleventh WSEAS International Conference on Applied Mathematics, WSEAS (March 22-24, 2007) 23–28 14. Freund, J.E.: Mathematical Statistics. First edn. Prentice Hall (1972) 15. Punnen, A., Margot, F., Kabadi, S.: Tsp heuristics: Domination analysis and complexity. Technical report, Dept. of Mathematics, Univ. of Kentucky (2001) 16. Reinelt, G.: TSPLIB — a traveling salesman problem library. ORSA Journal on Computing 3(4) (1991) 376–384 17. Faraut, T., de Givry, S., Chabrier, P., Derrien, T., Galibert, F., Hitte, C., Schiex, T.: A comparative genome approach to marker ordering. In: Proc. of ECCB-06. (2007) 7p. 170 International Workshop on Combinatorial Algorithms 07 Vertex Coloring of Chordal+k1 e−k2 e Graphs Yasuhiko Takenaga and Yusuke Miura⋆ The University of Electro-Communications, Chofu, Tokyo, Japan takenaga@cs.uec.ac.jp Abstract. F +ke (F −ke resp.) graph is the classes of graphs which can be obtained by adding (deleting resp.) at most k edges to a graph in graph class F. Complexity of the problems on these graphs can be measured using k which is a parameter to represent the closeness to graph class F. In this paper, we consider chordal+k1 e−k2 e graphs which are the classes of graphs obtained by adding and deleting edges at the same time from chordal graphs. We show an algorithm for vertex coloring of chordal+k1 e−k2 e graphs and prove that it is fixed parameter tractable. 1 Introduction Many graph problems are NP-complete for general graphs. It is natural to consider that if the graph problem is tractable for a graph class F, it is also tractable for a class of graphs which are close to graphs in F. The classes of graphs obtained by adding or deleting edges or vertices to an F-graph are very natural classes which are close to F-graphs. F + ke (F − ke resp.) graphs are the classes of graphs obtained by adding (deleting resp.) k edges to F-graphs. Similarly, F + kv (F − kv resp.) graphs are the classes of graphs obtained by adding (deleting resp.) k vertices to F-graphs. Recently, complexity of several problems on such parameterized graph classes has been interested in from the viewpoint of parameterized complexity [2, 5, 7]. In general, problems become difficult as k increases. A problem with parameter k is called to be fixed parameter tractable if it can be solved in f (k)nc time, where f is an arbitrary function depending only on k and n is the size of instance[3]. Though such parameterized graph classes has been studied recently, only the addition or deletion of edges (or vertices) to F-graphs has been considered. It has not been considered yet to add and delete edges at the same time. In this paper, we first consider F +k1 e−k2 e graphs, which is the class of graphs obtained by adding at most k1 edges and deleting at most k2 edges from F-graphs. We call the added edges to be the plus modulators, and the removed edges to be the minus modulators. The modulators are the union of plus and minus modulators. In this paper, we consider vertex coloring problem of chordal+k1 e−k2 e graphs. For fixed k1 and k2 , chordal+k1 e−k2 e graphs can be recognized in polynomial time by checking exhaustively all the possibilities of the plus and minus modulators. However, it is not known if it is fixed parameter tractable or not. Thus, we assume that chordal+k1 e−k2 e graphs are given with their modulators. ⋆ Presently with SIC CO.,LTD. International Workshop on Combinatorial Algorithms 07 171 In this paper, we show that vertex coloring problem of chordal+k1 e−k2 e graphs is fixed parameter tractable for parameters k1 and k2 , that is, solved in f (k1 , k2 )nc time. When only the addition or deletion of edges are made to chordal graphs, it is already known that vertex coloring is fixed parameter tractable. For chordal+ke graphs, Marx showed a fixed parameter tractable algorithm[7]. For chordal−ke graphs, it is known that vertex coloring of F-ke graphs is fixed parameter tractable if F is closed under edge contraction[2]. Our algorithm for coloring chordal+k1 e−k2 e graphs is based on the idea of the algorithm of [7] and operations to deal with minus modulator are added. This paper is organized as follows. Section 2 gives the definition of tree decomposition. In Section 3, we overview the vertex coloring algorithm for chordal+ke graphs. In Section 4, we show that vertex coloring of chordal+k1 e−k2 e graphs is fixed parameter tractable. 2 Tree Decomposition A graph which does not contain an induced cycle of length four or larger is called a chordal graph. In this section, we give the characterization of chordal graphs in relation with the tree decomposition[4]. Theorem 1. The following two statements are equivalent. – G(V, E) is a chordal graph. – There exist a tree T (U, F ) and a subtree Tv ⊆ T for each v ∈ V such that (u, v) ∈ E iff Tu ∩ Tv 6= ∅. The tree T (with its subtrees Tv ) is called the tree decomposition of G. A tree decomposition of a chordal graph can be found in linear time [1]. Note that we use the term ‘node’ for a decomposition tree and ‘vertex’ for a graph to be colored. For a node x ∈ U , let Vx be the set of vertices of G whose corresponding subsets include x or the descendants of x. Let the subgraph of G induced by Vx be Gx = G[Vx ]. For a node x ∈ U , let Kx be the set of vertices v such that x is a node in Tv . We consider T as a rooted tree with some root. A tree decomposition that satisfies the following conditions are called a nice tree decomposition [6]. A nice tree decomposition can be obtained from a tree decomposition in polynomial time. – Each node x ∈ U has at most two children. – If x ∈ U has two children y, z, Kx = Ky = Kz holds. In this case, x is called a join node. – If x ∈ U has only one child y, either Kx = Ky ∪ {v} or Kx = Ky \{v} holds for some v ∈ V . In the former case, x is called an introduce node, and in the latter case, x is called a forget node. – If x ∈ U has no child, Kx consists of exactly one vertex. In this case, x is called a leaf node. 172 International Workshop on Combinatorial Algorithms 07 b a G e d f r def t s de c bde bd abd ab a bde be bce bc c bde Fig. 1. A chordal graph and its nice tree decomposition. An example of a chordal graph and its nice tree decomposition are shown in Fig.1. Node r is the root of the decomposition tree. The vertices in Kx are shown in each node x. Ks = Kt = {b, d, e} as shown in the figure. Vs = {a, b, c, d, e} and Vt = {a, b, d, e}. 3 Vertex Coloring of Chordal+ke Graphs Our algorithm for coloring chordal+k1 e−k2 e graphs is based on the algorithm for coloring chordal+ke graphs proposed by Marx[7]. So, we briefly explain the algorithm in this section. Though the notations for our algorithm defined in the next section is similar to the ones used in the algorithm, many of them are modified to deal with minus modulators. Thus, to avoid confusion, we give only a few definitions in this section and describe only the outline of the algorithm. Let H be a chordal+ke graph, and G be a chordal graph obtained by removing modulators from H. Let Hx be the subgraph of H induced by Vx . The algorithm is based on the nice tree decomposition of chordal graphs. In the algorithm, all the possible C-colorings of the subgraph Hx are listed for each node x of the decomposition tree. A coloring of Hx is represented by a set S of pairs of vertices with the same color. The set system Sx is the collection of sets of pairs which have the corresponding C-coloring for Hx The listing of C-colorings starts from the leaves of the decomposition tree and proceeds to their parents. In the algorithm, when we deal with node x, C − |Kx | dummy vertices u1 , · · · , uC−|Kx | are added to Kx so as to make a C-clique consisting of Kx and the dummy vertices. Let Hx∗ be the graph that consists of Hx and the dummy vertices. On each node of the decomposition tree, the C-colorings for Hx∗ is computed from the C-colorings of the subgraphs corresponding to the children of x. The operation to list the colorings depends on the kind of the node; leaf node, introduce node, forget node or join node. When vertex v is introduced, a dummy vertex is replaced by v in the sets of pairs. When vertex v is forgotten, v is replaced by a dummy vertex. Then for each set S and each i, set S[ui ] that is International Workshop on Combinatorial Algorithms 07 173 obtained by exchanging ui and uC−|Kx | is added to the set system. On a join node, by merging the sets for its children, all the possible colorings obtained by joining the colorings of two subgraphs are listed. In addition, when a modulator first appears in Hx∗ on an introduce node or a join node, the colorings that have the same color in the endpoints of the modulator must be omitted. For the root r of the decomposition tree, Hr equals H. Therefore, H is C-colorable iff there exists a C-coloring when the listing finishes at the root. The number of sets in the set system may become too large. However, instead of having all the possible C-colorings, it is sufficient to have its representative sets[8]. By using representative sets, the number of sets is reduced in small computational time. Hence the algorithm is fixed parameter tractable. 4 Vertex Coloring of Chordal+k1 e−k2 e Graphs In this section, we propose an algorithm for coloring chordal+k1 e−k2 e graphs. We call an endpoint of a plus modulator be a plus special vertex, and that of a minus modulator be a minus special vertex. Let the set of plus special vertices and minus special vertices be W + and W − , respectively. Here, |W + | ≤ 2k1 and |W − | ≤ 2k2 hold. Let Wx+ (Wx− resp.) denote the set of plus (minus resp.) special vertices in Hx∗ . In the algorithm for coloring chordal+k1 e−k2 e graphs, for each minus modulator, we have to consider two cases; the case when the endpoints have the same color and the case when they have different colors. When there exist minus modulators in Kx whose endpoints have the same color, the number of colors used in Kx is less than |Kx |. Let |Kx |′ denote the number of colors used in Kx . Let Hx∗ (|Kx |′ ) be the graph obtained by adding a clique that consists of |C| − |Kx |′ dummy vertices u1 , u2 , . . . , u|C|−|Kx |′ to Hx and connecting each dummy vertex with each vertex of Kx . Let Kx∗ (|Kx |′ ) = Kx ∪ {u1 , u2 , . . . , u|C|−|Kx |′ }. In the following, we simply write them as Hx∗ and Kx∗ unless they are confusing. Let y be a child of x in the nice tree decomposition. A minus special vertex v is called latter-in if v is introduced in node x when v − ∈ Ky for some minus modulator (v, v − ). A minus special vertex v is called former-out if v is forgotten in node x when v − ∈ Ky for some minus modulator (v, v − ). When a latter-in minus special vertex is introduced, one or more minus modulators appear in Hx . On the other hand, when a former-out minus special vertex is forgotten, one or more minus modulators disappear from Hy . 4.1 Set Systems In our algorithm, a coloring of a graph is represented by a set system. Here, we use the same notations as in [7], however, some modifications are made considering that there are two kinds of special vertices. When some minus modulators form a clique, more than two vertices in Kx can have the same color. In such a coloring, the vertices with the same color are represented by the pairs constructed according to the order the vertices 174 International Workshop on Combinatorial Algorithms 07 b a c e d f Fig. 2. A chordal+1e-2e graph. are forgotten. When vertices v1 , · · · , vs have the same color and the vertices are forgotten in this order, the set representing the coloring includes the pairs (vs , vs−1 ), · · · , (v2 , v1 ). If some of the vertices in v1 , · · · , vs are in Kr for the root r of the tree decomposition, the order of them can be decided arbitrarily. Among the vertices with the same color as v, let l(v) denote the minus special vertex forgotten at last, and let f (v) be the minus special vertex forgotten first. Definition 1. To each C-coloring ψ of Hx∗ , we associate a set Sx (ψ) ⊆ (Kx∗ × W + ) ∪ ((W − ∩ Kx ) × (W − ∩ Kx )) that is obtained by the following rules. i) For v1 , · · · , vs ∈ Kx that have the same color in ψ, pairs to represent them are in Sx (ψ). ii) For w ∈ W + which does not appear in the right of any pair made by rule i), (f (v), w) ∈ Sx (ψ) iff ψ(v) = ψ(w) for v ∈ Kx∗ . The set system Sx contains a set S iff there is a coloring ψ of Hx∗ such that S = Sx (ψ). Fig.2 is an example of a chordal+1e−2e graph which is obtained by adding and removing edges to chordal graph G of Fig.1. Here, (a, c) is the plus modulator and (b, d), (b, e) are the minus modulators. In this example, when |C| = 3, St consists of four sets of pairs {(d, b), (e, a)}, {(e, a)}, {(d, b), (u1 , a)} and {(e, b), (u1 , a)}. In each of them, two colors are used in Kt . Hence, each set corresponds to a coloring of Ht∗ (2), which consists of vertices a, b, d, e and u1 . Definition 2. A set S ⊆ (Kx∗ × W + ) ∪ ((W − ∩ Kx ) × (W − ∩ Kx )) is called regular if there exists at most one pair (v, w) for each w ∈ W + . Definition 3. A blocker of S, denoted as B(S), is the set of pairs (v, w) ∈ Kx∗ × W + satisfying that (v, w′ ) ∈ S for some plus modulator (w, w′ ). We say that sets S1 and S2 form a non-blocking pair if B(S1 ) ∩ S2 = S1 ∩ B(S2 ) = ∅. 4.2 Algorithm In this section, we describe the operations executed in each node x of the tree decomposition. The operations are different according to the property of the node. Leaf node If a vertex v of Hx is a plus special vertex, Sx contains only the set {(v, v)}. Otherwise, Sx contains only the empty set. International Workshop on Combinatorial Algorithms 07 175 Introduce node Let y be the child of x, and let vertex v be introduced in x. Case 1. v is neither a latter-in minus special vertex nor a plus special vertex. If S ∈ Sy includes a minus modulator in Ky∗ , remove S from Sy . After that, obtain Sx by replacing each u|C|−|Ky |′ by v in each set in Sy . Case 2. v is a plus special vertex and not a latter-in minus special vertex. When v ′ ∈ Wx+ for some plus modulator (v, v ′ ), remove the sets including (u|C|−|Ky |′ , v ′ ) from Sy . After that, execute the same operations as in Case 1. Finally, if v ′ 6∈ Wx+ , add (v, v) to each set. Case 3. v is a latter-in minus special vertex and not a plus special vertex. For each set in Sy , execute both 1) and 2) in the following. 1) corresponds to the case when v has the same color as some of its pairs in Ky , and 2) corresponds to the case when v has the color different from any of its pairs in Ky . 1) Let (v, v1 ), · · · , (v, vt ) be all the minus modulators from v. Divide v1 , . . . , vt into the sets of vertices with the same color. For each of them, if there does not exist a vertex in Kx \{v1 , . . . , vt } with the same color, make a set in Kx by adding v to the vertices with the same color. In this case, pairs are made so as to follow the rule described in Section 4.1. If v is forgotten first among the vertices with the same color, replace the pair (vs , w) (w 6∈ Kx ) by (v, w) when v has the same color as vs . 2) Make a set in Kx by replacing u|C|−|Kx |′ by v. Case 4. v is both a positive special vertex and a latter-in minus special vertex. This case is almost similar to Case 3. However, there are two differences. In 1), when v ′ ∈ Wx+ for some plus modulator (v, v ′ ), v cannot have the same color as vi for the set including (vi , v ′ ). If v = l(v), add (v, v) to the set. In 2), add (v, v) to the set. Forget node Let y be the child of x, and let vertex v be forgotten in x. Case 1. v is not a former-out minus special vertex. Replace each v in the left of a pair by u|C|−|Kx |′ in each set. Then, add S[ui ] (1 ≤ i ≤ |C| − |Kx |′ ) to Sx for each S ∈ Sx . Case 2. v is a former-out minus special vertex. For a set in S that does not include a minus modulator (v − , v), do the same operations as Case 1. For a set in S that includes a minus modulator (v − , v), execute the following operations. If v is not a plus special vertex, remove the pair (v − , v) from S. If v is a plus special vertex, replace (v, v ′ ) by (v − , v ′ ). Join node Let y and z be the children of x. Sx = {Sy ∪ Sz | Sy ∈ Sy and Sz ∈ Sz form a non-blocking pair, Sy and Sz include the same minus modulators in Ky and Kz .} In the following, we prove the correctness of the algorithm. Lemma 1. Any set created by the above algorithm is regular. Proof. In the above algorithm, a new pair is added to a set only on the operations of an introduce node and a join node. introduce node We consider the case when a plus special vertex v is also a latter-in minus special vertex. It is obvious in other cases. Pair (v, v) is added if either v is not colored with any of its pairs or v is forgotten at last among the 176 International Workshop on Combinatorial Algorithms 07 vertices with the same color in Kx∗ . No other pair of the form (∗, v) is added in both cases. Join node Assume that Sy ∈ Sy and Sz ∈ Sz are joined. Consider v ∈ Kx . Sy and Sz have the same coloring on the vertices in Kx . Therefore, if both sets have the pair of the form (∗, v), they must be the same. For v 6∈ Kx , either Sy or Sz does not include a pair of the form (∗, v). Therefore, the lemma holds for Sy ∪ Sz . ⊓ ⊔ Lemma 2. For each set S ∈ Sx , there exists a coloring of Hx∗ that does not contradict with the pairs in S. Proof. We inductively prove the lemma on the decomposition tree. A leaf node is the base case, and for non-leaf nodes, induction step should be shown for three kinds of nodes. Leaf node For a leaf node x, the set created at x is obviously a proper coloring of Hx∗ . Introduce node Assume that a set Y ∈ Sy corresponds to a coloring of Hy∗ . Case 1. v is neither a latter-in minus special vertex nor a plus special vertex. There exists a coloring corresponding to the set produced from Y because Hx∗ is the same as Hy∗ except that u|C|−|Kx |′ is replaced by v. Case 2. v is a plus special vertex and not a latter-in minus special vertex. This case is similar to Case 1. Case 3. v is a latter-in minus special vertex and not a plus special vertex. 1) When v has the same color as some of its pairs in Ky , it is checked that the vertices form a clique of minus modulators. Thus the vertices have the same color. When (f (v), w) ∈ Y , v and w have the same color in Hx∗ . Thus, the set produced from Y corresponds to a proper coloring of Hx∗ . 2) When v has a color different from any of its pairs in Ky , it is obvious because Hx∗ is equivalent to Hy∗ except that u|C|−|Kx |′ is replaced by v. Case 4. v is both a positive special vertex and a latter-in minus special vertex. This case is similar to Case 3. Forget node Assume that a set Y ∈ Sy corresponds to a coloring of Hy∗ . Case 1. v is not a former-out minus special vertex. As Hx∗ is equivalent to Hy∗ except that v is replaced by u|C|−|Kx |′ , the set obtained by the replacement has a corresponding proper coloring. In addition, S[ui ] also has a corresponding coloring in which only the colors of ui and u|C|−|Kx |′ are exchanged from the corresponding coloring of S. Case 2. v is a former-out minus special vertex. When (v, v − ) ∈ Y for a minus modulator (v, v − ), the corresponding coloring does not change by replacing (v, ∗) by (l(v), ∗) because v and v − have the same color. When (v, v − ) 6∈ Y , it is similar to Case 1. Join node Let Sy and Sz include the same minus modulators. In addition to the regularity of the sets, from the algorithm, if there exists a pair (v, v ′ ) for a plus special vertex v ′ ∈ Hx∗ \Kx∗ in Sy and Sz , then v = f (v) holds. Therefore, if Sy and Sz form a non-blocking pair, there is no conflict on the colorings. Thus the set obtained by merging them has a corresponding coloring. ⊓ ⊔ International Workshop on Combinatorial Algorithms 07 177 The next lemma is almost obvious from the algorithm. Lemma 3. For each coloring of Hx∗ , there exists a set S ∈ Sx that does not contradict with Hx∗ . From Lemma 2 and 3, there exists a coloring of G iff the set system is nonempty at the root node of the decomposition tree. 4.3 Representative System Definition 4. [8] A set system S ′ ⊆ S is q-representative for S if the following holds: for every set B of size at most q, there is a set A ∈ S with A ∩ B = φ iff there is a set A′ ∈ S ′ with A′ ∩ B = φ. Lemma 4. [8] Given a set system S containingPn sets of size at most p, a qn representative subsystem S ′ ∈ S of size at most i=0 pi can be found in O(pq · Pp P i n) time. i=0 p The above definition of a representative set can be applied to our set system representing vertex colorings in the following manner. Definition 5. [7] A subsystem Sx∗ ⊆ Sx is representative for Sx if the following holds: for each regular set U ⊆ Kx × W that does not contain vertices from Wx \Kx∗ , if Sx contains a set S disjoint from B(U ), then Sx∗ also contains a set S ′ disjoint from B(U ). In the algorithm described in Section4.2, we can replace a set system Sx by its representative system Sx∗ for any node x. The following lemmas show that the algorithm works correctly even when we use the representative systems. Lemma 5. If Sy∗ is representative for Sy , Sx∗ generated from Sy∗ by introducing a vertex is representative for Sx . Sketch of proof. We show that, if a set S ∈ Sx which corresponds to a coloring ψ of Hx∗ is disjoint from B(U ), there exists a set S ′ ∈ Sx∗ which is disjoint from B(U ). In the case when v is not a latter-in minus special vertex, The proof is similar to the proof for chordal+ke graphs[7]. The only difference is that there may be more than one vertices with the same color. Even when v is a latter-in minus special vertex, if v has a color different from any vertex of Kx in ψ, it is equivalent to the case when v is not a latter-in minus special vertex. In the following, we consider the case when v is a latter-in minus special vertex and v has the same color as vertices v1 , . . . , vt in Kx . Let a coloring ψ ′ of Hy∗ be obtained by only removing v from Hx∗ . Note that the number of dummy vertices in Hx∗ and Hy∗ are the same. Set R corresponding to ψ ′ is obtained from S by removing v or replacing v by a vertex with the same color. Thus, as S is disjoint from B(U ), R is also disjoint from B(U ). As Sy∗ is representative, there exists R′ ∈ Sy∗ which is disjoint from B(U ). It is easy to check that the sets created from R′ by our algorithm is disjoint from B(U ). ⊓ ⊔ 178 International Workshop on Combinatorial Algorithms 07 Lemma 6. If Sy∗ is representative for Sy , Sx∗ generated from Sy∗ by forgetting v is representative for Sx . Sketch of proof. The outline of the proof is the same as that of Lemma5. As the other cases are similar to the case of chordal+ke graphs, we consider the case when v is a former-out minus special vertex and S includes a minus modulator (v, v − ). By adding v to the set of vertices with the same color as v − in Hx∗ , a coloring ψ ′ of Hy∗ is obtained. Set R corresponding to ψ ′ is equivalent to the set obtained by adding v to the set of vertices with the same color as v − , except the replacement of v by vertices with the same color. Therefore, R is disjoint to B(U ). As Sy∗ is representative, there exists R′ ∈ Sy∗ which is disjoint to B(U ). It can be easily seen that the set obtained from R′ by our algorithm is disjoint to B(U ). ⊓ ⊔ Lemma 7. If Sy∗ and Sz∗ are representative for Sy and Sz respectively, Sx∗ obtained by joining them is representative for Sx . Lemma7 can be proved similarly to the case of chordal+ke graphs. In a set of pairs, the number of pairs which include a plus special vertex is bouded by the number of plus special vertices. The number of pairs which does not include a plus special vertex is bounded by the number of minus modulators. Thus, the size of each set is bounded by a function of k1 and k2 . Whenever we construct Sx∗ , we can apply Lemma4 for sets which has the same minus modulators in Kx . The number of different sets of minus modulators in Kx is bounded by 2k2 . Therefore, throughout the algorithm, the size of Sx∗ is always bounded by a function of k1 and k2 . The time to compute Sx∗ is also bounded by a function of k1 and k2 . From the above discussions, the following theorem holds. Theorem 2. Vertex coloring problem of chordal+k1 e−k2 e graphs is fixed parameter tractable for parameters k1 and k2 . Acknowledgements This research was partially supported by the Scientific Grant-in-Aid from Ministry of Education, Science, Sports and Culture of Japan. References 1. J. R. S. Blair and B. W. Payton, An Introduction to Chordal Graphs and Clique Trees, In J. A. George, J. R. Gilbert, and J. W. H. Liu, editors, Graph Theory and Sparse Matrix Computations, pp.1-30. Springer Verlag, IMA Volumes in Mathematics and its Applications, Vol.56, 1993. 2. L. Cai, Parameterized Complexity of Vertex Colouring, Discrete Applied Math. 127, pp.415-429, 2003. International Workshop on Combinatorial Algorithms 07 179 3. R. G. Downey and M. R. Fellows, Parameterized Complexity, Monographs in Computer Science, Springer-Verlag, New York, 1999. 4. M. C. Golumbic, Algorithmic Graph Theory and Perfect Graphs 2nd Ed., Annals of Discrete Mathematics 57, Elsevier, Amsterdam, 2004. 5. J. Guo, F. Hüffner and R. Niedermeier, A Structural View on Parameterizing Problems: Distance from Triviality, IWPEC 2004, LNCS 3162, 162-173, 2004. 6. T. Kloks, Treewidth, LNCS 842, Springer-Verlag, Berlin, 1994. 7. D. Marx, Parameterized Coloring Problems on Chordal Graphs, Theoret. Comput. Sci., 351, 3, pp.407-424, 2006. 8. B. Monien, How to Find Long Paths Efficiently, Analysis and Design of Algorithms for Combinatorial Problems (Udine, 1982), vol.109, North-Holland MAth. Stud., pp.239-254, North-Holland, Amsterdam, 1985. 9. Y. Takenaga and K. Higashide, Vertex Coloring of Comparability+ke and -ke Graphs, 32nd International Workshop on Graph-Theoretic Concepts in Computer Science, LNCS 4271, pp.102-112, 2006. 180 International Workshop on Combinatorial Algorithms 07 Fault-free Hamiltonian Cycles in Alternating Group Graphs with Conditional Edge Faults Ping-Ying Tsai1,∗ , Jung-Sheng Fu2 , and Gen-Heuy Chen1 1 Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, ROC 2 Department of Electronics Engineering, National United University, Miaoli, Taiwan, ROC ∗ bytsai0808@gmail.com Abstract. The alternating group graph, which belongs to the class of Cayley graphs, is one of the most versatile interconnection networks for parallel and distributed computing. In this paper, adopting the conditional fault model in which each vertex is assumed to be incident with two or more fault-free edges, we show that an n-dimensional alternating group graph can tolerate up to 4n − 13 edge faults, where n ≥ 4, while retaining a fault-free Hamiltonian cycle exists. The result is optimal with respect to the number of edge faults tolerated. Previously, for the same problem, at most 2n − 6 edge faults can be tolerated if the random fault model is adopted. Keywords: alternating group graph, embedding, Hamiltonian cycle, fault tolerance, conditional fault model, Cayley graph. 1 Introduction Network topology is a crucial factor for the interconnection networks since it determines the performance of the networks. Many interconnection network topologies have been proposed in the literature for the purpose of connecting hundreds or thousands of processing elements. It can be represented by a graph where nodes represent processors and edges represent links between processors. The alternating group graphs [11], like the well-known star graphs [1] and hypercubes [14], belongs to the class of Cayley graphs [2, 12]. Furthermore, it has been shown in [7] that a class of generalized star graphs called the arrangement graphs also contains alternating group graphs as members. Indeed, a proof given in [7] showed that the n-dimensional alternating group graph AGn is isomorphic to the (n, n − 2)-arrangement graph An,n−2 . Arrangement graphs have been shown to be vertex and edge symmetric, strongly hierarchical, maximally fault tolerant, and strongly resilient [8], and thus of alternating group graphs. Besides, alternating group graphs have sublogarithmic degree and diameter [11]. They are all desirable when we are building an interconnection topology for a parallel and distributed system. An efficient International Workshop on Combinatorial Algorithms 07 181 communication algorithm for shortest-path routing is available for alternating group graphs [11]. The study of graph embeddings arises naturally in a number of computational problems: finding storage schemes for logical data structures, layout of circuits in VLSI, portability of algorithms across various parallel architectures, just to mention a few [12]. Among of them, the ring is one of the most fundamental networks for parallel and distributed computation, and it is suitable to develop simple and efficient algorithms. Numerous algorithms that were designed on rings for solving various algebraic problems and graph problems can be found in [3, 13]. A ring can be also used as a control/data flow structure for distributed computation in a network. These applications motivate the embedding of cycles in networks. It was shown that the arbitrary cycles can be embedded in an alternating group graph [11]. Besides, the alternating group graph also can embed some other fundamental networks, such as grids [11], trees [11], and arbitrary paths [6]. Since node faults and/or link faults may occur to networks, it is significant to consider faulty networks. Many fundamental problems such as diameter, routing, broadcasting, gossiping, embedding, etc., have been studied on various faulty networks. Among them, two fault models were adopted; one is the random fault model, and the other is the conditional fault model. The random fault model assumed that the faults might occur everywhere without any restriction, whereas the conditional fault model assumed that the distribution of faults must satisfy some properties, e.g., two or more fault-free links incident to each node. Apparently, it is more difficult to solve problems under the conditional fault model than the random fault model. Previously related work about embedding in faulty networks under the conditional fault model can be found in [4, 5, 9, 15, 16]. In this paper, under the conditional fault model and with the assumption of at least two fault-free links incident to each node, we show that an n-dimensional alternating group graph can tolerate up to 4n−13 link faults, where n ≥ 4, while retaining a fault-free Hamiltonian cycle exists. The result is optimal with respect to the number of link faults tolerated. For the same problem, at most 2n − 6 link faults can be tolerated if the random fault model is adopted [10]. With our results, all parallel algorithms developed on rings can be executed as well on an n-dimensional alternating group graph with up to 4n − 13 link faults. The rest of the paper is organized as follows. In Section 2, the structure of the alternating group graph is reviewed. Necessary definitions, notations, and some properties of the alternating group graph are also introduced in order to prove the main result. Then the main result and its proof are shown in Section 3. Finally, this paper concludes with some remarks in Section 4. 2 Preliminaries It is convenient to represent a network with a graph G, where each vertex (edge) of G uniquely represents a node (link) of the network. For the graph definition and notation, we follow [17]. We use V (G) and E(G) to denote the 182 International Workshop on Combinatorial Algorithms 07 vertex set and edge set of G, respectively. Given a vertex u in G, we define N (u) = {v|(u, v) ∈ E(G)} to be the neighborhood of u, which is the set of vertices that are adjacent to u in G. The degree of u, denoted by deg(u), is the size of N (u), i.e., deg(u) = |N (u)|. We use δ(G) to denote S min{deg(u)|u ∈ V (G)}. Let V ′ be a vertex subset of G. We define N (V ′ ) = u∈V ′ N (u) − V ′ to be the neighborhood of V ′ . A path Px0 xt = hx0 , x1 , . . . , xt i, is a sequence of vertices such that every two consecutive vertices are adjacent. In addition, Px0 xt is a cycle if x0 = xt . A path hx0 , x1®, . . . , xt i may contain other subpath, denoted as x0 , x1 , . . . , xi , Pxi xj , xj , . . . , xt , where Pxi xj = hxi , xi+1 , . . . , xj−1 , xj i. A path (or cycle) in G is called a Hamiltonian path (or Hamiltonian cycle) if it contains every vertex of G exactly once. A graph is called Hamiltonian if it has a Hamiltonian cycle. A graph is called Hamiltonian connected if every two vertices of G are connected by a Hamiltonian path. All Hamiltonian connected graphs except K1 and K2 are Hamiltonian. Let p = a1 a2 · · · an is a permutation on {1, 2, . . . , n}. A pair of symbols ai and aj in p are said to be an inversion if ai < aj whenever i > j. A permutation is an even permutation if it has an even number of inversions. The alternating group An is the set consisting of all even permutations on {1, 2, . . . , n}, where |An | = n! 2 . The following is a formal definition of alternating group graphs, in terms of graph theory. Definition 1. An n-dimensional alternating group graph, denoted by AGn , has the vertex set V (AGn ) = {a1 a2 · · · an |a1 a2 · · · an is an even permutation of 1, 2, . . . , n} and the edge set E(AGn ) = {(a1 a2 · · · an , a2 ai · · · ai−1 a1 ai+1 · · · an ) or (a1 a2 · · · an , ai a1 · · · ai−1 a2 ai+1 · · · an )|a1 a2 · · · an ∈ V (AGn ) and 3 ≤ i ≤ n}. Fig. 1. Examples of alternating group graphs. (a) AG3 . (b) AG4 . From definition, it is easy to see that the vertex set of AGn is the alternating (n−2)n! edges. The group An . It has n! 2 vertices, each of degree 2(n − 2), and has 2 alternating group graphs AG3 and AG4 are shown in Fig. 1. Let u = a1 a2 · · · an International Workshop on Combinatorial Algorithms 07 183 be any vertex of the alternating group graph AGn . The edges (a1 a2 · · · an , a2 ai · · · ai−1 a1 ai+1 · · · an ) and (a1 a2 · · · an , ai a1 · · · ai−1 a2 ai+1 · · · an ), denoted by e(i) (u), are referred to as i-dimensional edges of u, where 3 ≤ i ≤ n. We use E (i) (AGn ) to denote the set of all i-dimensional edges in AGn . Alternating group graphs are vertex symmetric, edge symmetric, and strongly hierarchical [11]. For 1 ≤ k ≤ n, let AGn (k) denote the subgraph of AGn induced by those vertices u with an = k. Clearly, each AGn (k) is isomorphic to AGn−1 for 1 ≤ k ≤ n. Due to the strongly hierarchical structure, the alternating group graph can also be defined recursively: AGn is constructed from n disjoint copies (n) of (n − 1)-dimensional alternating group graph AGn−1 ’s. We use Ẽp,q (AGn ) to represent the set of those n-dimensional edges in AGn that connect AGn (p) and AGn (q), where 1 ≤ p 6= q ≤ n. Throughout this paper, the paired terms network and graph, node and vertex, and link and edge are used interchangeably. Since AGn is isomorphic to the (n, n − 2)-arrangement graph An,n−2 , the following lemma of AGn is deduced from a result of arrangement graphs. Lemma 1. ([10]) For any F ′ ⊆ V (AGn ) ∪ E(AGn ), AGn − F ′ is Hamiltonian if |F ′ | ≤ 2n − 6, and Hamiltonian connected if |F ′ | ≤ 2n − 7, where n ≥ 4. We also present some properties of AGn in the following. They are necessary in order to show our main result in the next section. Besides, we use F (⊆ E(AGn )) to denote the set of edge faults in AGn . (n) Lemma 2. |Ẽi,j (AGn )| = (n − 2)!, where n ≥ 4. Proof. Consider p = p1 p2 · · · pn ∈ V (AGn (i)), where pn = i. Suppose that p connect to AGn (j), hence we have p1 = j or p2 = j. If p1 = j, then there choices for p2 , p3 , . . . , pn−1 . The discussion is similar if p2 = j. So are (n−2)! 2 (n) |Ẽi,j (AGn )| = (n − 2)!. Lemma 3. Suppose that I = {k1 , k2 , . . . , km } ⊆ {1, 2, . . . , n}, S where n ≥ 5 and m ≥ 2. Let AGn (I) denote the subgraph of AGn induced by k∈I V (AGn (k)). If (n) AGn (k)−F is Hamiltonian connected for every k ∈ I and |Ẽkj ,kj+1 (AGn )−F | ≥ 3 for all 1 ≤ j < m, then there is a Hamiltonian path Pst in AGn (I) − F , where s ∈ V (AGn (k1 )) and t ∈ V (AGn (km )). Proof. Note that if v = c1 c2 · · · cn ∈ V (AGn ), then v ∈ N (V (AGn (c1 ))) ∩ N (V (AGn (c2 ))), thus, the two edges of e(n) (v) incident to different AGn (c)s. (n) Let u1 = s. Since |Ẽkj ,kj+1 (AGn ) − F | > 2 for all 1 ≤ j < m, we can find an edge (v1 , u2 ) ∈ E (n) (AGn ) − F such that v1 6= u1 and u2 ∈ V (AGn (k2 )). Similarly, we can find edges (v2 , u3 ), (v3 , u4 ), . . . , (vm−2 , um−1 ) ∈ E (n) (AGn ) − F , where ui and vi are two distinct vertices in AGn (ki ) for all i ∈ {2, 3, . . . , m − 1}. Since (n) |Ẽkm−1 ,km (AGn )−F | ≥ 3, we can find an edge (vm−1 , um ) ∈ E (n) (AGn )−F such that vm−1 6= um−1 , um 6= t, and um ∈ V (AGn (km )). Let vm = t. In addition, since AGn (ki )−F is Hamiltonian connected for all ki ∈ I, there is a Hamiltonian 184 International Workshop on Combinatorial Algorithms 07 path Pui vi in AGn (ki ) − F for all i ∈ {1, 2, . . . , m}. An Hamiltonian path Pst in AGn (I) − F is constructed as follows (see Fig. 2): ® s, Psv1 , v1 , u2 , Pu2 v2 , v2 , . . . , um−1 , Pum−1 vm−1 , vm−1 , um , Pum t , t . s v1 u2 v2 u3 v3 u4 v4 um t Fig. 2. A Hamiltonian path Pst in AGn (I) − F . 3 Main Result In this section, we would show that with the assumption of two or more faultfree edges incident to each vertex, an n-dimensional alternating group graph can tolerate up to 4n − 13 edge faults, while retaining a fault-free Hamiltonian cycle exists, where n ≥ 4. Theorem 1. AGn − F is Hamiltonian if |F | ≤ 4n − 13 and δ(AGn − F ) ≥ 2, where n ≥ 4. Proof. We can proceed by induction on n. When n = 4, we can use a computer program to check that the result is true [18]. Assume that the result holds for AGn for some n ≥ 4. Consider AGn+1 with |F | ≤ 4n − 9 and δ(AGn+1 − F ) ≥ 2. For brevity, assume that |F | = 4n − 9. Without loss of generality, assume that (3) |E (n+1) (AGn+1 ) ∩ F | ≥ |E (n) (AGln+1 ) ∩ F m | ≥ . . . ≥ |E (AGn+1 ) ∩ F |. If n ≥ 7, ≥ 4, |F − E (n+1) (AGn+1 )| ≤ 4n − 13, we have |E (n+1) (AGn+1 ) ∩ F | ≥ (4n−9) (n−1) and |E(AGn+1 (r)) ∩ F | ≤ 4n − 13 for all 1 ≤ r ≤ n + 1. In addition, when 4 ≤ n ≤ 6, we have |E (n+1) (AGn+1 ) ∩ F | ≥ 3, |F − E (n+1) (AGn+1 )| ≤ 4n − 12, and |E(AGn+1 (r)) ∩ F | ≤ 4n − 12 for all 1 ≤ r ≤ n + 1. Without loss of generality, assume that |E(AGn+1 (n + 1)) ∩ F | ≥ |E(AGn+1 (n)) ∩ F | ≥ . . . ≥ (n+1) |E(AGn+1 (1)) ∩ F |. Note that when n ≥ 5, by Lemma 2, |Ẽj,k (AGn+1 )| = (n+1) ((n+1)−2)! = (n−1)! > 4n−6 = |F |+3, hence we have |Ẽj,k (AGn+1 )−F | ≥ 3 (5) for all j, k ∈ {1, 2, . . . , n + 1} and j 6= k. If n = 4, we have |Ẽj,k (AG5 )| = (5 − 2)! = 6 < 7 = 4n − 9 for all j, k ∈ {1, 2, . . . , 5} and j 6= k. Hence, it is (5) possible that |Ẽj ′ ,k′ (AG5 ) − F | < 3 for some j ′ , k ′ ∈ {1, 2, . . . , 5} and j ′ 6= k ′ . We use i1 , i2 , . . . , in+1 to denote the n + 1 distinct integers from 1 to n + 1 (i.e., {i1 , i2 , . . . , in+1 } = {1, 2, . . . , n + 1}). Four cases are considered: Case 1. |E(AGn+1 (n + 1)) ∩ F | ≤ 2n − 7. In this case, δ(AGn+1 (r) − F ) ≥ 2 for all 1 ≤ r ≤ n + 1. Let i1 = n + 1 and I = {1, 2, . . . , n}. We can find u1 , v1 ∈ V (AGn+1 (i1 )) such that (v1 , u2 ), (u1 , vn+1 ) ∈ E (n+1) (AGn+1 ) − F , where u2 ∈ International Workshop on Combinatorial Algorithms 07 185 V (AGn+1 (i2 )) and vn+1 ∈ V (AGn+1 (in+1 )). By Lemma 1 and Lemma 3, we can find a Hamiltonian path Pu1 v1 in AGn+1 (i1 ) − F and a Hamiltonian path ′ ′ ′ ′ 6 {ir , ir+1 } Pu2 vn+1 in AGn+1 (I) − F ® {j , k } = (when n = 4 and if j , k exist, let for 2 ≤ r ≤ n). Hence u1 , Pu1 v1 , v1 , u2 , Pu2 vn+1 , vn+1 , u1 form a Hamiltonian cycle in AGn+1 − F . Case 2. 2n − 6 ≤ |E(AGn+1 (n + 1)) ∩ F | ≤ 4n − 13 and |E(AGn+1 (n)) ∩ F | ≤ 2n − 7. Then we have |E(AGn+1 (r)) ∩ F | ≤ 2n − 7 for all r ∈ {1, 2, . . . , n}. Let i1 = n + 1. Three cases are further considered: v1 u2 u1 vn +1 un +1 v2 ... v3 u3 v1 u4 v4 u1 u2 un +1 v2 v1 u2 vn +1 u1 vn +1 un +1 v2 u3 u4 v4 ... u3 ... v3 v3 u4 v4 Fig. 3. A Hamiltonian cycle in AGn+1 − F . (a) |E(AGn+1 (n + 1)) ∩ F | ≥ 2n − 6, |E(AGn+1 (n)) ∩ F | ≤ 2n − 7, and δ(AGn+1 (i1 ) − F ) ≥ 1. (b) |E(AGn+1 (n + 1)) ∩ F | ≥ 2n − 6, |E(AGn+1 (n)) ∩ F | ≤ 2n − 7, and δ(AGn+1 (i1 ) − F ) = 0. (c) |E(AGn+1 (n + 1)) ∩ F | = |E(AGn+1 (n)) ∩ F | = 2n − 6. Case 2.1. δ(AGn+1 (i1 ) − F ) ≥ 2. The induction hypothesis assures that there exists a Hamiltonian cycle C in AGn+1 (i1 ) − F . We can find (u1 , v1 ) ∈ C such that (v1 , u2 ), (u1 , vn+1 ) ∈ E (n+1) (AGn+1 ) − F , where u2 ∈ V (AGn+1 (i2 )) and 186 International Workshop on Combinatorial Algorithms 07 vn+1 ∈ V (AGn+1 (in+1 )). Let Pu1 v1 = C − {(u1 , v1 )} and I = {1, 2, . . . , n}. By Lemma 3, we can find a Hamiltonian path Pu2 vn+1 in AGn+1 (I) − F (when n = 4 and if j ′ , k ′ exist, let {j ′ , k ′ } = 6 {ir , ir+1 } for 2 ≤ r ≤ n). Hence a desired Hamiltonian cycle in AGn+1 − F can be obtained as shown in Fig. 3(a). Case 2.2. δ(AGn+1 (i1 )−F ) = 1. There is exactly one vertex v1 with degree one in AGn+1 (i1 )−F . Since δ(AGn+1 −F ) ≥ 2, we have (v1 , u2 ) ∈ E (n+1) (AGn+1 )−F , for some u2 ∈ V (AGn+1 (i2 )). Select (u1 , v1 ) ∈ E(AGn+1 (i1 )) ∩ F such that (u1 , vn+1 ) ∈ E (n+1) (AGn+1 )−F , where vn+1 ∈ V (AGn+1 (in+1 )). We can always find such (u1 , v1 ) since |{z|(v1 , z) ∈ F and z ∈ V (AGn+1 (i1 ))}| = 2n − 5, |e(n+1) (z)| = 2 for all z ∈ V (AGn+1 (i1 )), |E (n+1) (AGn+1 ) ∩ F | ≤ (4n − 9) − (2n − 5) = 2n − 4, and 2(2n − 5) > 2n − 4 when n ≥ 4. Moreover, since |E(AGn+1 (i1 )) ∩ (F − {(u1 , v1 )})| ≤ 4n − 14, the induction hypothesis assures that there is a Hamiltonian cycle C in AGn+1 (i1 ) ∩ (F − {(u1 , v1 )}). Note that (u1 , v1 ) must be contained in C. Then the construction of a Hamiltonian cycle in AGn+1 − F is similar to Fig. 3(a) (when n = 4 and if j ′ , k ′ exist, let {j ′ , k ′ } = 6 {ir , ir+1 } for 2 ≤ r ≤ n). Case 2.3. δ(AGn+1 (i1 ) − F ) = 0. Note that 4n − 13 = 3 < 4 = 2(n − 2) when n = 4, hence this case will occur only when n ≥ 5, thus, j ′ and k ′ do not exist. There is exactly one vertex s with degree zero in AGn+1 (i1 ) − F . Since δ(AGn+1 − F ) ≥ 2, we have e(n+1) (s) ∩ F = 0. Let (v3 , s), (s, u4 ) ∈ e(n+1) (s) where v3 ∈ V (AGn+1 (i3 )) and u4 ∈ V (AGn+1 (i4 )). Additionally, there exist (v1 , u2 ), (u1 , vn+1 ) ∈ E (n+1) (AGn+1 ) − F , where u1 , v1 are two distinct vertices in V (AGn+1 (i1 )) − {s}, u2 ∈ V (AGn+1 (i2 )), and vn+1 ∈ V (AGn+1 (in+1 )). In addition, let F ′ = {s} ∪ (E(AGn+1 (i1 )) ∩ F − {(s, z)|z ∈ V (AGn+1 (i1 ))}). Note that |F ′ | ≤ 2n − 8. By Lemma 1, we can find a Hamiltonian path Pu1 v1 in AGn+1 (i1 ) − F ′ . Let I1 = {i2 , i3 } and I2 = {1, 2, . . . , n + 1} − {i1 , i2 , i3 }. By Lemma 3, we can find a Hamiltonian path Pu2 v3 in AGn+1 (I1 ) − F , and a Hamiltonian path Pu4 vn+1 in AGn+1 (I2 ) − F . Hence a desired Hamiltonian cycle in AGn+1 − F can be obtained as shown in Fig. 3(b). Case 3. |E(AGn+1 (n + 1)) ∩ F | = 4n − 12. Note that this case will occur only when 4 ≤ n ≤ 6. In addition, j ′ and k ′ do not exist and |E(AGn+1 (r)) ∩ F | = 0 for all r ∈ {1, 2, . . . , n}. Let i1 = n + 1. Three cases are further considered: Case 3.1. δ(AGn+1 (i1 ) − F ) ≥ 2. We can find (u1 , v1 ) ∈ E(AGn+1 (i1 )) ∩ F such that (v1 , u2 ), (u1 , vn+1 ) ∈ E (n+1) (AGn+1 ) − F , where u2 ∈ V (AGn+1 (i2 )) and vn+1 ∈ V (AGn+1 (in+1 )). Since |E(AGn+1 (i1 )) ∩ (F − {(u1 , v1 )})| = 4n − 13, the induction hypothesis assures that there is a Hamiltonian cycle C in AGn+1 (i1 ) ∩ (F − {(u1 , v1 )}). Assume that C contains (u1 , v1 ) (otherwise, the discussion is easier). Then the construction of a Hamiltonian cycle in AGn+1 − F is similar to Fig. 3(a). Case 3.2. δ(AGn+1 (i1 )−F ) = 1. There is exactly one vertex v1 with degree one in AGn+1 (i1 )−F . Since δ(AGn+1 −F ) ≥ 2, we have (v1 , u2 ) ∈ E (n+1) (AGn+1 )−F , for some u2 ∈ V (AGn+1 (i2 )). Select (u1 , v1 ) ∈ E(AGn+1 (i1 )) ∩ F such that (u1 , vn+1 ) ∈ E (n+1) (AGn+1 )−F , where vn+1 ∈ V (AGn+1 (in+1 )). We can always International Workshop on Combinatorial Algorithms 07 187 find such (u1 , v1 ) since |{z|(v1 , z) ∈ F and z ∈ V (AGn+1 (i1 ))}| = 2n − 5, |e(n+1) (z)| = 2 for all z ∈ V (AGn+1 (i1 )), |E (n+1) (AGn+1 ) ∩ F | = (4n − 9) − (4n − 12) = 3, and 2(2n − 5) > 3 when n ≥ 4. Moreover, since |E(AGn+1 (i1 )) ∩ (F − {(u1 , v1 )})| = 4n − 13, the induction hypothesis assures that there is a Hamiltonian cycle C in AGn+1 (i1 ) ∩ (F − {(u1 , v1 )}). Note that (u1 , v1 ) must be contained in C. Then the construction of a Hamiltonian cycle in AGn+1 − F is similar to Case 3.1. Case 3.3. δ(AGn+1 (i1 ) − F ) = 0. There is exactly one vertex s with degree zero in AGn+1 (i1 ) − F . Since δ(AGn+1 − F ) ≥ 2, we have e(n+1) (s) ∩ F = 0. Let (v3 , s), (s, u4 ) ∈ e(n+1) (s) where v3 ∈ V (AGn+1 (i3 )) and u4 ∈ V (AGn+1 (i4 )). Additionally, there exist (v1 , u2 ), (u1 , vn+1 ) ∈ E (n+1) (AGn+1 )−F , where u1 , v1 ∈ V (AGn+1 (i1 )) − {s}, u2 ∈ V (AGn+1 (i2 )), and vn+1 ∈ V (AGn+1 (in+1 )). In addition, let F ′ = {s} ∪ (E(AGn+1 (i1 )) ∩ F − {(s, z)|z ∈ V (AGn+1 (i1 ))}). Note that |F ′ | ≤ 2n − 7. By Lemma 1, we can find a Hamiltonian path Pu1 v1 in AGn+1 (i1 ) − F ′ . Let I1 = {i2 , i3 } and I2 = {1, 2, . . . , n + 1} − {i1 , i2 , i3 }. By Lemma 3, we can find a Hamiltonian path Pu2 v3 in AGn+1 (I1 )−F , and a Hamiltonian path Pu4 vn+1 in AGn+1 (I2 ) − F . Hence a desired Hamiltonian cycle in AGn+1 − F can be obtained as shown in Fig. 3(b). Case 4. |E(AGn+1 (n + 1)) ∩ F | = |E(AGn+1 (n)) ∩ F | = 2n − 6. Note that this case will occur only when 4 ≤ n ≤ 6. In addition, j ′ and k ′ do not exist in this case and |E(AGn+1 (r)) ∩ F | = 0 for all r ∈ {1, 2, . . . , n − 1}. In this case, we also have δ(AGn+1 (r) − F ) ≥ 2 for all 1 ≤ r ≤ n + 1. Let i1 = n + 1 and i3 = n. The induction hypothesis assures that there are a Hamiltonian cycle C1 in AGn+1 (i1 ) − F and a Hamiltonian cycle C3 in AGn+1 (i3 ) − F . We can find (u1 , v1 ) ∈ C1 and (u3 , v3 ) ∈ C3 such that (v1 , u2 ), (v2 , u3 ), (v3 , u4 ), (u1 , vn+1 ) ∈ E (n+1) (AGn+1 ) − F , where u2 , v2 ∈ V (AGn+1 (i2 )), u4 ∈ V (AGn+1 (i4 )), and vn+1 ∈ V (AGn+1 (in+1 )). Let I = {1, 2, . . . , n + 1} − {i1 , i2 , i3 }, Pu1 v1 = C1 − {(u1 , v1 )}, and Pu3 v3 = C3 − {(u3 , v3 )}. By Lemma 1 and Lemma 3, we can find a Hamiltonian path Pu2 v2 in AGn+1 (i2 ) − F and a Hamiltonian path Pu4 vn+1 in AGn+1 (I)−F . Hence a desired Hamiltonian cycle in AGn+1 −F can be obtained as shown in Fig. 3(c). 2n − 6 ... ... 2n − 6 Fig. 4. A distribution of 4n − 12 edge faults over AGn . The result is optimal with respect to the number of edge faults tolerated. Since alternating group graphs can be constructed recursively, it is easy to see 188 International Workshop on Combinatorial Algorithms 07 that there exists a cycle of length 3 in AGn , where n ≥ 4. Fig. 4 shows a distribution of 4n − 12 edge faults over AGn , where hw, u, v, wi is a cycle and (u, v), (u, w) (respectively, (v, u), (v, w)) are the only two fault-free edges incident to u (respectively, v). It is easy to see that no fault-free Hamiltonian cycle exists in the faulty AGn . 4 Concluding Remarks Since processor faults and/or link faults may occur to multiprocessor systems, it is both practically significant and theoretically interesting to study the fault tolerance of multiprocessor systems. Most of previous works used the random fault model, which assumed that the faults might occur everywhere without any restriction. There was another fault model, i.e., the conditional fault model, which assumed that the fault distribution must satisfy some properties. In this paper, adopting the conditional fault model and assuming that there were two or more fault-free edges incident to each vertex, we constructed a faultfree Hamiltonian cycle of an n-dimensional alternating group graph with up to 4n−13 edge faults. The main construction methods were demonstrated in Fig. 3. This result is optimal with respect to the number of edge faults tolerated. Since an n-dimensional alternating group graph AGn is isomorphic to an (n, n − 2)arrangement graph An,n−2 , our results and methods might be useful for people who want to solve Hamiltonian problems on faulty arrangement graphs under conditional fault model and the same assumption. Acknowledgments The authors would like to thank the National Science Council of the Republic of China, Taiwan for financially supporting this research under Contract No. NSC 95-2221-E-239-002-. References 1. S. B. Akers, D. Horel, and B. Krishnamurthy. The Star Graph: An Attractive Alternative to the n-cube. Proceedings of the International Conference on Parallel Processing, pp. 393-400, 1987. 2. S. B. Akers and B. Krishnamurthy. A Group-Theoretic Model for Symmetric Interconnection Networks. IEEE Transactions on Computers, vol. 38 (4), pp. 555566, 1989. 3. S. G. Akl. Parallel Computation: Models and Methods. Prentice Hall, NJ, 1997. 4. Y. A. Ashir and I. A. Stewart. Fault-Tolerant Embedding of Hamiltonian Circuits in k-ary n-cube. SIAM Journal on Discrete Mathematics, vol. 15 (3), pp. 317-328, 2002. 5. M. Y. Chan and S. J. Lee. On the Existence of Hamiltonian Circuits in Faulty Hypercubes. SIAM Journal on Discrete Mathematics, vol. 4 (4), pp. 511-527, 1991. International Workshop on Combinatorial Algorithms 07 189 6. J. M. Chang, J. S. Yang, Y. L. Wang, and Y. Cheng. Panconnectivity, Fault-Tolerant Hamiltonicity and Hamiltonian-Connectivity in Alternating Group Graphs. Networks, vol. 44 (4), pp. 302-310, 2004. 7. W. K. Chiang and R. J. Chen. On the Arrangement Graph. Information Processing Letters, vol. 66, pp. 215-219, 1998. 8. K. Day and A. Tripathi. Arrangement Graphs: A Class of Generalized Star Graphs. Information Processing Letters, vol. 42, pp. 235-241, 1992. 9. J. S. Fu. Conditional Fault-Tolerant Hamiltonicity of Star Graphs. Parallel Computing, vol. 33, pp. 488-496, 2007. 10. H. C. Hsu, T. K. Li, Jimmy J. M. Tan, and L. H. Hsu. Fault Hamiltonicity and Fault Hamiltonian Connectivity of Arrangement Graphs. IEEE Transactions on Computers, vol. 53 (1), pp. 39-53, 2004. 11. J. S. Jwo, S. Lakshmivarahan, and S. K. Dhall. A New Class of Interconnection Networks Based on the Alternating Group. Networks, vol. 23, pp. 315-326, 1993. 12. S. Lakshmivarahan, J. S. Jwo, and S. K. Dhall. Symmetry in Interconnection Networks Based on Cayley Graphs of Permutation Groups: A Survey. Parallel Computing, vol. 19, pp. 1-47, 1993. 13. F. T. Leighton. Introduction to Parallel Algorithms and Architecture: Arrays. Trees. Hypercubes. Morgan Kaufman, CA, 1992. 14. Y. Saad and M. H. Schultz. Topological Properties of Hypercubes. IEEE Transactions on Computers, vol. 37 (7), pp. 867-872, 1988. 15. C. H. Tsai. Linear Array and Ring Embeddings in Conditional Faulty Hypercubes. Theoretical Computer Science, vol. 314, pp. 431-443, 2004. 16. P. Y. Tsai, J. S. Fu, and G. H. Chen. Hamiltonian-Laceability in Star Graphs with Conditional Edge Faults. Proceedings of the International Computer Symposium, pp. 144-152, 2006. 17. D. B. West. Introduction to Graph Theory (2nd Edition). Prenctice Hall, Upper Saddle River, 2001. 18. http://inrg.csie.ntu.edu.tw/∼bytsai/ 190 International Workshop on Combinatorial Algorithms 07 Regular Expression Matching Algorithms using Dual Position Automata ⋆ Hiroaki Yamamoto Department of Information Engineering, Shinshu University, 4-17-1 Wakasato, Nagano-shi, 380-8553 Japan. yamamoto@cs.shinshu-u.ac.jp Abstract. This paper introduces a new automaton model called a dual position automaton (a dual PA), and then presents a faster translation algorithm from a regular expression (RE) to a dual PA and RE matching algorithms using a dual PA. For any RE r over an alphabet Σ, our translation algorithm generates a dual PA consisting of m̃(m̃ + 1) bits in O(⌈m̃/w⌉ P m̃) time and space, where w is the length of a computer word, m̃ = a∈Σ ma and ma is the number of occurrences of alphabet symbol a in r. Furthermore, we give a methodPto construct a compact DFA representation consisting of only (m̃ + 1) a∈Σ 2ma bits from a dual PA. Using such a DFA representation, we can solve an RE matching problem fast. Finally, we will generalize our algorithm by introducing a parameter Ka for each a ∈ Σ. 1 Introduction Regular expressions (REs) pattern matching problems play an important role in the field of computer science, computational biology and so on. For this reason, RE pattern matching algorithms have intensively been studied [1, 3, 12–16]. We are here concerned with the following RE pattern matching problem: Let r be an RE and let x be a text string. Then the RE matching problem is to decide whether or not there is a substring y of x such that y ∈ L(r), where L(r) denotes the language generated by r. In general, RE matching algorithms are divided into two parts, a preprocessing part and a matching part. The preprocessing part translates a given RE into a finite automaton and the matching part does a matching job using the generated finite automaton. This time, many algorithms make use of nondeterministic finite automata (NFAs) because the translation can be efficiently done. Hence constructing NFAs with a smaller size for given REs is crucial in practical applications and a lot of research for efficiently generating smaller NFAs has been done (see [2, 5, 7–11]). Deterministic finite automata (DFAs) are also useful models for the RE matching problem because we can do an RE matching in a linear time. However, the DFA corresponding to an RE may become an exponential size in the length of the RE in the worst case. ⋆ This research has been supported in part of Grant-in-Aid for Scientific Research, Ministry of Education, Culture, Sports, Science and Technology, Japan. International Workshop on Combinatorial Algorithms 07 191 Hence constructing as compact a representation of DFAs as possible is desired. In techniques using automata, the time for constructing an NFA or a DFA from an RE is also crucial when the length of a given RE is large. Therefore, a faster translation algorithm from an RE to a finite automaton is also required. The aim of the paper is to introduce a new automaton model and to design efficient RE matching algorithms as well as a faster translation algorithm. Two types of NFAs for REs are widely known, one is a Thompson automaton and the other is a position automaton (PA for short, also called a Glushkov automaton). Let r be an RE over an alphabet Σ, and let m be the total number of occurrences of alphabet symbols and operator symbols in r and let m̃ be the number of occurrences of alphabet symbols in r. Without loss of generality, we may assume m = O(m̃) as mentioned in [7]. As seen in [8], a Thompson automaton is an NFA with ǫ-moves and has at most 2m states and 4m transitions. Thompson automata can recursively be constructed based on the inductive definition of REs in O(m̃) time and space. Given a text string of length n, the traditional algorithm using a Thompson automaton solves the RE matching problem in O(mn) time and O(m) space. Myers [12] has improved it using the Four Russians technique so that his algorithm can solve the RE matching problem in O(mn/ log n) time and space. Recently, Bille [4] has proposed a new algorithm which improves O(mn) time while preserving O(m) space. His algorithm is also based on a Thompson automaton. On the other hand, a PA is an ǫ-free NFA (that is, an NFA without any ǫmoves) and has exactly m̃+1 states and at most m̃2 + m̃ transitions. PAs are also important models in practical applications because they become smaller than Thompson automata for some kinds of REs. For this reason, some studies for efficiently constructing PAs have been done, and O(m̃2 ) time and space algorithms have been developed [5, 6]. Recently, Yamamoto, Miyazaki and Okamoto [17] have presented a faster bit-parallel algorithm generating a PA and related automata. PAs have another important property that for any state, all incoming transitions to the state have the same symbol. This property has a potential for improving complexities of RE matching algorithms. Indeed, Navarro and Raffinot [14, 15] have made use of this property to obtain a compact DFA representation and have presented a faster RE matching algorithm. Their compact DFA representation requires only O(m̃2m̃ ) bits while a traditional DFA representation obtained from a Thompson automaton requires O(m22m ) bits. In this paper, we will introduce a new automaton model called a dual position automaton (dual PA), and present an efficient translation algorithm from an RE to a dual PA and RE matching algorithms. A dual PA is also an ǫ-free NFA with exactly m̃ + 1 states and at most m̃2 + m̃ transitions. Unlike a PA, however, it has a property that for any state, all outgoing transitions from the state have the same symbol. Clearly, by reversing a PA, we can get an NFA satisfying such a property, but it accepts a reversed string. We first give a faster algorithm for translating an RE into a dual PA using a Thompson automaton. Our algorithm translates a given RE into a dual PA in O(m̃⌈m̃/w⌉) time and space, where w is the length of a computer word. Furthermore, a generated dual PA is represented 192 International Workshop on Combinatorial Algorithms 07 with m̃(m̃ + 1) bits. Hence if m̃ = O(w), then the algorithm runs in O(m̃) time and space. To the best of our knowledge, we do not know any algorithms which directly generate a dual PA from r. Furthermore, it would take O(m̃2 ) time even though a similar technique to a construction of a PA is used because it takes O(m̃2 ) time. Next we show a compact DFA representation obtained from a dual PA. We improve an idea of Navarro and Raffinot [14, 15] by grouping all states of a dual PA by a symbol on the outgoing transitions. That is, we partition the set of states into at most |Σ| subsets Qa according to a symbol a on transitions. In other words, if the outgoing transitions of states q and p have the same symbol, then q and p belong to the same subset. Then, for each subset Qa , we make DFA transitions. In addition, we introduce an idea of a lookahead symbol to a matching algorithm. By these improvements, a matching time gets to depend on mα but not on m̃, where mα is defined as mα = max{ma | a ∈ Σ} when ma denotes the number of occurrences alphabet symbol a in r. Our DFA P of an ma bits. Since Navarro and Raffinot’s representation requires only ( m̃+1) 2 a∈Σ Q representation is (m̃ + 1) a∈Σ 2ma bits, ours improves the space. Especially, the space required would be much smaller when an RE consists of many kinds of symbols. The RE matching algorithm using our representation also runs in O(n⌈mα /w⌉) time. Furthermore, the preprocessing time for constructing a DFA P = O(w), then representation from r is O(⌈m̃/w⌉(m̃ + a∈Σ 2ma )). Hence mα P the matching time is O(n) and the preprocessing time is O(m̃ + a∈Σ 2ma ). Finally, we apply a decomposition technique to our algorithm. Although Navarro and Raffinot [15] decomposed the whole set of states, we decompose each subset Qa . That is, we introduce a parameter 1 ≤ Ka ≤ w for each a ∈ Σ and decompose each Qa by a parameter Ka into ⌈ma /Ka ⌉ subsets. Again, by this improvement, a matching time gets to depend on mα but not on m̃. Indeed, we show anPRE matching algorithm running in O(⌈m2α /(wKα )⌉n) time using O((m̃ + 1) a∈Σ ⌈ma /Ka ⌉2Ka ) bits. Hence if Ka = O(log n) and m̃ = O(w), then we get a RE matching algorithm running in O(⌈mα / log n⌉n) time and O(m̃n/ log n) space. This time, the preprocessing time is O(m̃ + m̃n/ log n). We will rely on a w-bit uniform RAM to estimate the complexities of algorithms. In general, since most papers assume w ≥ log n, we also do so. Note that we will describe algorithms using m̃-bit vectors or mα -bit vectors for the sake of convenience. In a practical implementation, however, these bit vectors must be divided into w-bit vectors. Hence factors of ⌈m̃/w⌉ and ⌈mα /w⌉ appear in the time and space. The paper is organized as follows. In Section 2, we will give basic definitions of REs. In Section 3, we will explain a Thompson automaton and a dual PA, and then give a translation algorithm from an RE into a dual PA. In Section 4 we will give RE matching algorithms. 2 Regular Expressions and Some Notations We here give some definitions for regular expressions. International Workshop on Combinatorial Algorithms 07 ε q1 M1 p1 ε ε q0 p0 ε q2 M2 p2 193 q1 M1 p1 ε q2 M2 p2 ε (a) q0 ε q1 M1 p1 ε p0 ε (b) (c) Fig. 1. Translation of an RE into an NFA. (a) union, (b) concatenation and (c) Kleene closure Definition 1. Let Σ be an alphabet. The regular expressions (REs) over Σ are defined as follows. 1. ∅, ǫ (the empty string) and a (∈ Σ) are REs that denote the empty set, the set {ǫ} and the set {a}, respectively. 2. Let r1 and r2 be REs denoting the sets R1 and R2 , respectively. Then (r1 ∨r2 ), (r1 r2 ) and (r1∗ ) are also REs that denote the sets R1 ∪ R2 (union), R1 R2 (concatenation), and R1∗ (Kleene closure or star), respectively. In this paper, we use some notations as follows. – By L(r) we denote the language generated by an RE r. – By m we denote the number of occurrences of alphabet symbols and operator symbols in r. The length of r means this m. – By ma we denote the number P of occurrences of an alphabet symbol a in r. In addition, we define m̃ = a∈Σ ma and mα = max{ma | a ∈ Σ}. As mentioned in Introduction, we may assume m = O(m̃). Hence we will use a parameter m̃ to state our results. 3 3.1 From Thompson Automata to Dual Position Automata Thompson Automata Thompson automata are recursively constructed based on the definition of REs, whose construction algorithm is widely known (for example, see [8]). We give an outline of the construction in Fig. 1. In Fig.1, (a), (b) and (c) show recursive constructions for union (r1 ∨ r2 ), concatenation (r1 r2 ) and Kleene closure (r1∗ ), respectively. Here M1 and M2 denote NFAs for REs r1 and r2 , respectively. Let M = (Q, Σ, δ, q0 , qf ) be a Thompson automata obtained from an RE r of length m, where Q is a set of states, Σ is an alphabet, δ is a transition function, q0 is the initial state and qf is the final state. Note that a Thompson automaton has just one initial state and one final state. Then M has at most 2m states and 4m transitions. Furthermore, for any state q ∈ Q, all outgoing transitions from q are caused either by the empty string ǫ or by an alphabet 194 International Workshop on Combinatorial Algorithms 07 symbol a ∈ Σ. If the transitions from q are caused by an alphabet symbol, then we call state q a sym-state; otherwise an ǫ-state. We partition a set of sym-states into several subsets according to an alphabet symbol. Let us call a sym-state an a-state if the alphabet symbol of the sym-state is a ∈ Σ. Then we define Qa = {q | q is an a-state}. Such subsets Qa play an important role in improving an efficiency of an algorithm. We have the following proposition. Proposition 1. For any RE r of length m, we can construct the Thompson automaton with at most 2m states and 4m transitions in O(m̃) time and space. We define the reversed automaton M R of M by reversing all transitions and interchanging the initial state and the final state, but remaining sym-states and ǫ-state unchanged. That is, for any states q and p of M , p ∈ δ(q, a) if and only if q ∈ δ R (p, a), where δ R is the transition function of M R . Hence if q is a sym-state in M , then the incoming transition is done by an alphabet symbol in M R . Let us introduce some notions for M and M R . We call a state with two incoming transitions a junction state. Also we say that a state p is a predecessor of a state q if there is a transition from p to q. Furthermore, a sequences of transitions from a state to a state is called a path. As seen in (c) of Fig.1, a star operator generates a transition going back to a previous state. We call this transition a back transition. That is, the transition from p1 to q1 is a back transition. Note that a back transition in M becomes a back transition also in M R . By removing such back transitions from M , we can sort the states of M from the initial state in a topological order. Here, by a topological order, we mean that for any states p and q, p < q if and only if there is a directed path from p to q. It is clear that a topological order of states of M R is defined by reversing that of M . Thompson automata have the following important property. Lemma 1 ([13],Lemma 1). Let M be a Thompson automaton. Then, any loop-free path in M has at most one back transition. By the symmetrical structure of a Thompson automaton M , we notice that Proposition 1 and Lemma 1 also hold for M R . With Lemma 1 and a reversed automaton, we can efficiently generate a bit vector representation of a dual position automaton. Example 1. Let us consider an RE r = (00 ∨ 10)∗ 1 over Σ = {0, 1}. Fig. 2 shows the Thompson automaton for r. 3.2 Position Automata and Dual Position Automata First we will explain a position automaton (a PA). Let r be an RE with m̃ occurrences of alphabet symbols. We number all alphabet symbols in r with its position and denote the obtained RE by r̄. The existing algorithms for generating a PA have focused on computing sets of positions First, Last and Follow(i). Here the set First is defined to be {i | ai α ∈ L(r̄)}, the set Last is defined to be {i | αai ∈ L(r̄)}, and the set Follow(i) is defined to be {j | αai aj β ∈ L(r̄)}. A PA G is an ǫ-free NFA and has the following properties. International Workshop on Combinatorial Algorithms 07 195 ε p2 p0 ε ε 0 p3 ε 2 p4 0 p5 ε 1 p10 p1 ε p11 ε p12 1 p13 ε ε p6 1 ε p7 p8 5 0 4 0 p9 3 ε Fig. 2. Thompson automaton for r = (00 ∨ 10)∗ 1 Property 1 For any state q of G, all incoming transitions to q are activated by the same alphabet symbol a ∈ Σ. Property 2 The number of initial states is just one, and the number of final states is more than or equal to one. Property 3 The number of states is just m̃ + 1 and the number of transitions is at most m̃2 + m̃. Now let us define a dual version of PAs. For any RE r, we call an NFA Ḡ satisfying the following properties a dual position automaton (a dual PA). Property 1’ For any state q of Ḡ, all outgoing transitions from q are activated by the same alphabet symbol a ∈ Σ. Property 2’ The number of initial states is more than or equal to one, and the number of final states is just one. Property 3 The number of states is just m̃ + 1 and the number of transitions is at most m̃2 + m̃. As we can see, Property 1 and 1’, and Property 2 and 2’ become symmetric. Property 3 holds for both of a PA and a dual PA. We will make use of a dual PA for an RE matching algorithm. Hence, to generate a bit representation of a dual PA, we also make use of a Thompson automaton as in Yamamoto [17]. Example 2. Let us consider an RE r = (00 ∨ 10)∗ 1. Fig. 3 shows the PA and the dual PA for r. The PA is made from the Thompson automaton in Fig. 2 by considering only states p3 , p5 , p7 , p9 , p13 and the initial state p0 . On the other hand, the dual PA is made by considering only states p2 , p4 , p6 , p8 , p12 and the final state p13 . The PA has one initial state p0 and one final state p13 , while the dual PA has three initial states p2 , p6 , p12 and one final state p13 . 3.3 Bit-Parallel Translation Algorithm from an RE to a Dual PA We focus on sym-states and the final state of a Thompson automaton. By computing the reachability between these states, we can construct a dual PA from 196 International Workshop on Combinatorial Algorithms 07 1 0 0 p0 0 p3 p5 1 0 p7 0 1 p9 1 p13 1 1 (a) Position automaton 0 p2 0 0 p4 0 p6 1 p8 0 p12 1 p13 0 0 (b) Dual position automaton Fig. 3. Position automaton and dual position automaton for r = (00 ∨ 10)∗ 1 a Thompson automaton. We will present an efficient bit-parallel algorithm for translating an RE into a dual PA by using a Thompson automaton. We number sym-states to implement a set of sym-states as a bit vector. To do this, when Σ = {a1 , · · · , al }, we number states in order of Qa1 , Qa2 , · · ·, Qal . That is, states numbered from P ma1 + 1 to of Qa1 are numbered from 1 to ma1 , states of Qa2 areP ma1 + ma2 , · · ·, and states of Qal are numbered from j<l maj + 1 to j≤l maj . We assign the number 0 to the final state. For any state q, we denote by num(q) a number assigned to q. In Fig. 2, a figure attached to each sym-state is the number of the state. Let r be an RE over Σ and let M = (Q, Σ, δ, q0 , qf ) be the Thompson automaton constructed from r. To compute the transition function of a dual PA, we compute an array NEXT[q, σ] whose elements are bit vectors of m̃ + 1 bits, where q ∈ Q and σ ∈ Σ ∪ {ǫ}. Here note that if σ = a ∈ Σ, then q ∈ Qa . In addition, the array NEXT[q, σ] satisfies that the ith bit of NEXT[q, σ] is equal to 1 if and only if there is a path from state q to a sym-state p with number i. If q is an a-state, then this means that M can move from state q to p by an alphabet symbol a. We can efficiently compute NEXT[q, a] by using the property of Lemma 1 and the reversed automaton M R of M . Note that Lemma 1 also holds for M R . Finally, the desired dual PA is represented by NEXT[q, a] on all sym-states q and alphabet symbols a. Hence the size of NEXT[q, a] becomes m̃(m̃ + 1) bits. The algorithm starts with REtoDPA given in Fig. 4. In the algorithm, the operator | denotes bitwise OR. Furthermore, we use two functions BitSet and BitCheck. BitSet(v, i) sets the ith bit of v to 1. BitCheck(v, i) checks whether or not the ith bit of v is equal to 1, and if equal, then it returns 1; otherwise returns 0. Since these functions can easily be implemented so that they can run in O(1) time, the details are omitted here. For any RE r, REtoDPA translates r into the International Workshop on Combinatorial Algorithms 07 197 Algorithm REtoDPA(r) Input: an RE r. Output: Dual PA Ḡ = (Q′ , Σ, δ ′ , I, qf ). Step 1. Translate r into the Thompson automaton M . This time, also compute Σr which is the set of alphabet symbols occurring in r. Step 2. For all a ∈ Σr , number each state of Qa . Step 3. Let q0 be the initial state of M . Then, add a new initial state qini and a transition from qini to q0 by ǫ to M . Step 4. For all states q of M , if q is an a-state for an alphabet symbol a ∈ Σr , then NEXT[q, a] := 0; otherwise NEXT[q, ǫ] := 0. Step 5. Generate M R from M and do ReachState(M R ,NEXT,Σr ). Step 6. Generate Ḡ as follows: 1. define Q′ to be the set {q | q is a sym-state or the final of M }, 2. define the final state qf to be the final state of M , 3. for all a ∈ Σ for all q ∈ Qa , define δ ′ (q, a) to be NEXT[q, a]. 4. for all states q ∈ Q′ , if BitCheck (NEXT[qini , ǫ], num(q)) = 1, then add q to I. Fig. 4. The algorithm REtoDPA Thompson automaton M , and then invokes the procedure ReachState for M R . The procedure ReachState, given in Fig. 5, computes array NEXT[q, a] using M R . This time, Lemma 1 guarantees that we can correctly compute NEXT[q, a] by traversing all states of M R twice in a topological order. We have the following theorem. Theorem 1. Let r be an RE with m̃ occurrences of alphabet symbols. Then the algorithm REtoDPA correctly translates r into the dual PA in O(m̃⌈m̃/w⌉) time and space. If m̃ = O(w), then it runs in O(m̃) time and space. The proof will be given in appendix. 4 RE Matching Algorithms We first give a compact DFA representation and a matching algorithm using it, and then extend them by grouping parameters Ka . 4.1 Compact DFA Representation and Matching Algorithm We generate a compact DFA representation D[a, b, Ia ] and FINAL[a] from a dual PA Ḡ with NEXT [p, a], where a, b ∈ Σ and 0 ≤ Ia ≤ 2ma − 1. The element of D[a, b, Ia ] is a bit vector of mb bits, and if D[a, b, Ia ] = Ib (0 ≤ Ib ≤ 2mb −1), then it means that there is a transition from a subset Ia of a-states to a subset Ib of bstates in Ḡ. Note that Ia is a bit vector representation for the set Qa of a-states of Ḡ. The element v of array FINAL[a] is a bit vector of 2ma −1 bits, which satisfies 198 International Workshop on Combinatorial Algorithms 07 Procedure ReachState(M R ,NEXT,Σr ) Repeat the following twice: for all states q of M R do the following in a topological order: 1. if q is the initial state, then BitSet(NEXT[q, ǫ], num(q)), 2. if q is an a-state with an incoming transition from a state p1 (note that in this case q has just one incoming transition), then NEXT [q, a] := NEXT [p1 , ǫ] and BitSet(NEXT[q, ǫ], num(q)), 3. if q is a junction state, then NEXT [q, ǫ] := NEXT [p1 , ǫ] | NEXT [p2 , ǫ], where p1 and p2 are two predecessors of q; 4. otherwise NEXT [q, ǫ] := NEXT [p1 , ǫ], where p1 is a predecessor of q. Fig. 5. The procedure ReachState that the i-th bit of v is 1 if and only if there is a transition on a from a state of the i-th subset of Qa to the final state of Ḡ, where subsets of Qa are numbered from 0 to 2ma − 1. Then D[a, b, Ia ] and FINAL[a] are generated by procedure GenDFA(Ḡ) given in Fig.7, which is an extension of a technique used in [14, 15]. To compute D[a, b, Ia ], we first compute the array E[Ia ] denoting a transition from a set Ia of a-states to sym-states, and then compute from a P a transition ma bits because set IaPof a-states to b-states. The size of D[a, b, I ] is m̃ 2 a∈Σ P a m̃ = a∈Σ the size of FINAL[a] is a∈Σ 2ma bits. Hence the total size Pma , and is (m̃ + 1) a∈Σ 2ma bits. In the algorithm, the operator & denotes bitwise AND and the operator >> denotes Shift Right. Furthermore, B[b] denotes a bit-mask for b-states, that is, its value is an m̃ + 1-bit vector and only bits corresponding to b-states are 1; other bits are 0. The matching algorithm REMatchDFA(r, x), given in Fig.6, outputs endpoints of all substrings of x matching r. This algorithm makes use of one lookahead symbol, if any, to hold only a-states for some alphabet symbol a at a time. In D[a, b, Ia ], symbol a corresponds to current symbol and symbol b corresponds to the next symbol (that is, the lookahead symbol). Now let us check the time for generating a DFA representation D[a, b, Ia ] from a given RE r. By Theorem 1, it takes O(m̃) time to generate PA M . P a dual ma The procedure GenDFA(Ḡ) generates D[a, b, I ] in O(⌈ m̃/w⌉( )) time. 2 a a∈Σ P Hence it takes O(⌈m̃/w⌉(m̃ + a∈Σ 2ma )) time in total. We get the following theorem. Here, as mentioned before, m̃ is the number of occurrences of alphabet symbols in an given RE r and ma is the number of occurrences of an alphabet symbol a, and mα = max{ma | a ∈ Σ}. Theorem 2. The matching part P of the algorithm REMatchDFA runs in O(n ⌈mα /w⌉) time using O((m̃ + 1) a∈Σ 2ma ) bits. Furthermore, the P preprocessing time for constructing a DFA representation is O(⌈m̃/w⌉(m̃ + a∈Σ 2ma )). If m̃ = O(w), then the matching time is O(n) and the preprocessing time is O(m̃ + P ma ). a∈Σ 2 International Workshop on Combinatorial Algorithms 07 199 Algorithm REMatchDFA(r,x) Input: an RE r and a text string x = x1 · · · xn , where xi ∈ Σ. Output: endpoints i of all substring of x matching r. Step 1. Generate a dual PA Ḡ = (Q, Σ, δ, Iq , qf ) using REtoDPA(r). Step 2. Generate a DFA D[a, b, Ia ] using GenDFA(Ḡ). Step 3. /* Setting initial states. Each bit vector INIT [a] is set for all initial a-states */ 1. INIT := 0, /* INIT is an m̃ + 1-bit vector */ 2. for all states q ∈ Iq , BitSet(INIT , num(q)), 3. J := 1, 4. for a = a1 , · · · , al , (a) INIT [a] := (INIT & B[a]) >> J, (b) J := J + ma , Step 4. if the final state qf is included in Iq , then output 0, /* This means that ǫ matches r */ Step 5. CSTATE := INIT [x1 ]. Step 6. for i = 1 to n − 1 do 1. if BitCheck (FINAL[xi ], CSTATE ]) = 1, then output the position i, 2. CSTATE := D[xi , xi+1 , CSTATE ], 3. /* set self-loop on initial states to find all substrings matching r*/ CSTATE := CSTATE | INIT [xi+1 ] Step 7. if BitCheck (FINAL[xn ], CSTATE ]) = 1, then output the position n Fig. 6. The algorithm REMatchDFA 4.2 Generalization of Algorithm by Grouping States Navarro and Raffinot [15] partition the whole set of states of a PA into several subsets and construct DFA-like transitions for each subset of states. For each a ∈ Σ, we introduce a parameter 1 ≤ Ka ≤ w, and then partition each subset Qa but not the whole set into t = ⌈ma /Ka ⌉ subsets Q0a ,· · ·,Qat−1 each of which consists of Ka states. Then we generate a DFA representation for every subset. We call such a DFA representation a partial DFA representation, which is represented by an array D[a, b, Iaha , ha ], where a, b ∈ Σ, 0 ≤ Iaha ≤ 2Ka and 0 ≤ ha ≤ t − 1. The array D[a, b, Iaha , ha ] is computed by the procedure GenPartial given in Fig. 9 and satisfies D[a, b, Ia ] = D[a, b, Ia0 , 0] | D[a, b, Ia1 , 1] | · · · | D[a, b, Iat−1 , t − 1] This leads to a more efficient matching algorithm REMatchPartial given in Fig.8. Theorem 3. The matching part of the P algorithm REMatchPartial runs in O(⌈ m2α /(wKα )⌉n) time using O((m̃ + 1) a∈Σ ⌈ma /Ka ⌉2Ka ) bits. Furthermore the preprocessing time for constructing a partial DFA representation is O(⌈m̃/w⌉(m̃ P + a∈Σ ⌈ma /Ka ⌉2Ka )). Each parameter Ka can be regarded as a kind of a measure to decide a degree of determinism. If m̃ = O(w) and Ka = O(log n), then we get an RE 200 International Workshop on Combinatorial Algorithms 07 Procedure GenDFA(Ḡ) Input: a dual PA Ḡ with N EXT [q, a]. 1. I := 1 and E[0] := 0 2. for a = a1 , · · · , al /* Σ = {a1 , · · · , al } */ 3. for i = 0, . . . , ma − 1 4. for j = 0, · · · , 2i − 1 5. E[2i + j] := E[j] | N EXT [I + i, a] 6. if BitCheck (E[2i + j], 0) = 1, then BitSet(FINAL[a], 2i + j) 7. J := 1 8. for b = a1 , · · · , al 9. D[a, b, 2i + j] := (E[2i + j] & B[b]) >> J 10. J := J + mb 11. for-end 12. for-end 13. for-end 14. I := I + ma 15. for-end Fig. 7. The algorithm GenDFA matching algorithm running in O(mα n/ log n) time with preprocessing time O(m̃ + m̃n/ log n). References 1. A.V. Aho, Algorithms for finding patterns in strings, In J.V. Leeuwen, ed. Handbook of theoretical computer science, Elsevier Science Pub., 1990. 2. V. Antimirov, Partial derivative of regular expressions and finite automaton construction, Theoret. Comput. Sci., 155, 291-319, 1996. 3. A. Apostolico, Z. Galil ed., Pattern Matching Algorithms, Oxford University Press, 1997. 4. P. Bille, New Algorithms for Regular Expression Matching, Proc. of ICALP 2006, LNCS4501, 643-654, 2006. 5. A. Brüggemann-Klein, Regular expressions into finite automata, Theoret. Comput. Sci., 120, 197-213, 1993. 6. C.H. Chang and R. Paige, From regular expressions to DFA’s using compressed NFA’s, Theoret. Comput. Sci., 178, 1-36, 1997. 7. J.-M. Champarnaud, Evaluation of three implicit structures to implement nondeterministic automata from regular expressions, IJFCS, 13, 99-113, 2002. 8. J.E. Hopcroft and J.D. Ullman, Introduction to automata theory language and computation, Addison Wesley, Reading Mass, 1979. 9. J. Hromkovic̆, S. Seibert and T. Wilke, Translating Regular Expressions into Small ǫ-free Nondeterministic Finite Automata, JCSS, 62, 565-588, 2001. 10. L. Ilie and S. Yu, Constructing NFAs by optimal use of positions in regular expressions, Proc. of CPM2002, LNCS2373,279-288, 2002. 11. L. Ilie and S. Yu, Follow Automata, Information and Computation,186,140-162, 2003. International Workshop on Combinatorial Algorithms 07 201 Algorithm REMatchPartial(r,x) Input: an RE r and a text string x = x1 · · · xn , where xi ∈ Σ.. Output: endpoints i of all substring of x matching r. Step 1. Generate a dual PA Ḡ = (Q, Σ, δ, Iq , qf ) using REtoDPA(r). Step 2. Generate a DFA D[a, b, Ia ] using GenDFA(Ḡ). Step 3. /* Setting initial states. Each bit vector INIT [a] is set for all initial a-states */ 1. INIT := 0, /* INIT is an m̃ + 1-bit vector */ 2. for all states q ∈ Iq , BitSet(INIT , num(q)), 3. J := 1, 4. for a = a1 , · · · , al , (a) INIT [a] := (INIT & B[a]) >> J, (b) J := J + ma , Step 4. if the final state qf is included in Iq , then output 0, /* This means that ǫ matches r */ Step 5. CSTATE := INIT [x1 ]. Step 6. for i = 1 to n − 1 do 1. if BitCheck (FINAL[xi ], CSTATE ) = 1, then output i, 2. Temp := 0 and J := 0, 3. for h = 0, · · · , ⌈mxi /Kxi ⌉ − 1, (a) KSTATE := CSTATE & 0 · · · 01Kxi , /* extract Kxi bits from CSTATE */ (b) Temp := Temp | D[xi , xi+1 , KSTATE , h], (c) J := J + Kxi , (d) CSTATE := CSTATE >> J, 4. CSTATE := Temp | INIT [xi+1 ], /* update the current state and set self-loop */ Step 7. if BitCheck (FINAL[xn ], CSTATE ) = 1, then output n, Fig. 8. The algorithm REMatchPartial 12. G. Myers, A Four Russians Algorithm for Regular Expression Pattern Matching, J. ACM. 39, 4, 430-448, 1992. 13. E. Myers and W. Miller, Approximate Matching of Regular Expressions, Bull. of Mathematical Biology, 51, 1, 5-37, 1989. 14. G. Navarro and M. Raffinot, Compact DFA Representation for Fast Regular Expression Search, Proc. of WAE2001, LNCS 2141, 1-12, 2001. 15. G. Navarro and M. Raffinot, New Techniques for Regular Expression Searching, Algorithmica, 41, 89-116, 2004. 16. S. Wu, U. Manber and E. Myers, A Sub-Quadratic Algorithm for Approximate Regular Expression Matching, J. of Algorithm, 19, 346-360, 1995. 17. H. Yamamoto, T. Miyazaki and M. Okamoto, Bit-Parallel Algorithms for Translating Regular Expressions into NFAs, IEICE Trans. Inf. & Syst., Vol.E90-D, No.2, 418-427, 2007. 202 International Workshop on Combinatorial Algorithms 07 Procedure GenPartial(Ḡ) Input: a dual PA Ḡ with N EXT [q, a]. 1. I := 1 and E[0] := 0 2. for a = a1 , · · · , al /* Σ = {a1 , · · · , al } */ 3. for h = 0, · · · , ⌈ma /Ka ⌉ − 1 4. if h 6= ⌈ma /Ka ⌉ − 1, then K := Ka ; otherwise K := ma − Ka h 5. for i = 0, . . . , K − 1 6. for j = 0, · · · , 2i − 1 7. E[2i + j] := E[j] | N EXT [I + i, a] 8. if BitCheck (E[2i + j], 0) = 1, then BitSet(FINAL[a], 2i + j) 9. J := 1 10. for b = a1 , · · · , al 11. D[a, b, 2i + j, h] := (E[2i + j] & B[b]) >> J 12. J := J + mb 13. for-end 14. for-end 15. I := I + K + 1 16. for-end 17. for-end 18. for-end Fig. 9. The algorithm GenPartial Appendix 5 Proof of Theorem 1 Let M be the Thompson automaton for a given RE r. First we will prove the following lemma to show the correctness of the algorithm. Lemma 2. The ith bit of NEXT [q, a] becomes 1 if and only if M can move from an a-state q to a sym-state p with num(p) = i by the alphabet symbol a. Proof. First let us show the only-if-part. For any states q and p of M , it is clear that there is a path from q to p if and only if there is a path from p to q in M R . We can easily see that if the ith bit of NEXT [q, σ] is set to 1 by ReachState, then for a sym-state p with num(p) = i, there is a path from p to q in M R . This time, if q is an a-state for any alphabet symbol a, then σ = a and the sequence of symbols over the path is ǫ · · · ǫ · a. That is, this means that M can move from q to p by the alphabet symbol a. Next let us show the reverse direction, that is, the if-part. To do this, it is sufficient to show the following claim. Here Q′ = {q | q is a sym-state or the final state of M }. Claim. For any states q, p ∈ Q′ , if M can move from q to p by a symbol a, then ReachState sets the num(p)th bit of NEXT [q, a] to 1. International Workshop on Combinatorial Algorithms 07 203 Proof of the claim. Since M can move from q to p by a symbol a, there is a loopfree path from q to p in M . Hence the reversed path from p to q is also loop-free in M R . Furthermore, this reversed path have at most one back transition because Lemma 1 holds for M R . Now let this path be Z = p1 (= p), p2 , . . . , pt (= q). If Z does not contain any back transitions, then ReachState sets the num(p)th bit of NEXT [q, a] to 1 in the first traverse of Step 2. This is because we have p1 < p2 < · · · < pt in a topological order and all states other than q and p are ǫ-states. Next suppose that Z contains one back transition. Here, without loss of generality, let the transition from pl to pl+1 be a back transition. This time, we have p1 < · · · < pl and pl+1 < · · · < pt in a topological order. Hence, since ReachState can see that there is a path from p to pl in the first traverse of Step 2, it sets the num(p)th bit of NEXT [pl , ǫ] to 1. However, at this moment, it may not know whether or not there is a path from pl to q. In the second traverse of Step 2, ReachState can see that there is a path from p to q because it can use the value of NEXT [pl , ǫ] by the back transition from pl to pl+1 . Hence ReachState sets the num(p)th bit of NEXT [q, a] to 1. Thus we have the claim. It follows from this lemma that the algorithm REtoGlu correctly computes a Glushkov automaton. Next let us discuss the complexity. By Proposition 1, we can construct the Thompson automaton M and the reversed automaton M R with O(m̃) states and transitions in O(m̃) time and space. The procedure ReachState traverses all states and transitions of M at most twice. When W is word-length of a computer, since it takes O(⌈m̃/w⌉) time to process each state, Step 3 takes O(m̃⌈m̃/w⌉) time. Step 4 also takes O(m̃⌈m̃/w⌉) time. Hence the total time becomes O(m̃⌈m̃/w⌉). The space mainly depends on an array NEXT [q, a], which requires O(m̃⌈m̃/w⌉) space. Hence the space becomes O(m̃⌈m̃/w⌉). If m̃ = O(w), then the algorithm runs in O(m̃) time and space. Thus the theorem has proven. Appendix A Abstracts of Invited Talks for International Workshop on Combintatoric Algorithms 07 204 International Workshop on Combinatorial Algorithms 07 The Volume of the Birkhoff Polytope E. Rodney Canfield1 and Brendan D. McKay2 1 2 Department of Computer Science, University of Georgia, Athens, GA 30602, USA ercanfie@uga.edu Department of Computer Science, Australian National University, Canberra, ACT 0200, Australia bdm@cs.anu.edu.au Abstract. For integer n, define Bn to be the polytope of all n × n nonnegative real matrices whose rows and columns each sum to 1. Using a recent asymptotic enumeration of non-negative integer matrices (Canfield and McKay, 2007), we determine the asymptotic volume of Bn as n → ∞. A doubly-stochastic matrix of order n is an n × n non-negative real matrix whose rows and columns each sum to 1. As is well-known, the set of all doubly-stochastic matrices of order n form a polytope, the Birkhoff-von Neumann polytope, whose vertices are the permutation matrices. We are concerned with the volume of this polytope. The Birkhoff polytope is an example of a lattice polytope, since its vertices lie on the integer lattice. It sits in the space Rn×n , but the constraints on the row and column sums imply that it spans an affine subspace of dimension only (n−1)2 . Two types of volume are customarily defined for lattice polytopes. We can illustrate the difference using the example ¾ ·³ ´ ³ ´¸ ½³ ´ ¯ 10 01 z 1−z ¯¯ , , 0 ≤ z ≤ 1 = B2 = ¯ 01 10 1−z z where the last notation indicates a closed line-segment in R2×2 . The length of this line-segment is the volume vol(B2 ) = 2. We can also consider the ¢ ¡ z lattice 1−z induced by Z2×2 on the affine span of B2 : this consists of the points 1−z z for integer z. The polytope B2 consists of a single basic cell of this lattice, so it has relative volume ν(B2 ) = 1. In general, vol(Bn ) is the volume in units of the ordinary (n−1)2 -dimensional Lebesgue measure, while ν(Bn ) is the volume in units of basic cells of the lattice induced by Zn×n on the affine span of Bn . It was proved by Diaconis and Efron [3] that vol(Bn ) = nn−1 ν(Bn ). If we expand Bn by a large factor, its volume can be approximated by the number of points of the integer lattice which lie inside it. This can be made rigorous: M (n, s) 2 , s→∞ s(n−1) ν(Bn ) = lim International Workshop on Combinatorial Algorithms 07 205 where M (n, s) is the number of non-negative integer matrices of order n such that all the rows and columns sum to s. The latter is known asymptotically for n → ∞ due to recent work of Canfield and McKay [2]. This leads eventually to vol(Bn ) = 1 (n−1)2 (2π)n−1/2 n ³ ´ exp 31 + n2 + O(n−1/2+ǫ ) . This asymptotic formula compares very well with the exact values known up to n = 10 [1]. We also achieve some generalization to non-square matrices. References 1. M. Beck and D. Pixton, The Ehrhart polynomial of the Birkhoff polytope, Discrete Comput. Geom., 30 (2003) 623-637. 2. E. R. Canfield and B. D. McKay, Asymptotic enumeration of contingency tables with constant margins, submitted (2007). Preprint available at http://www.arxiv.org/abs/math.CO/0703600. 3. P. Diaconis and B. Efron, Testing for independence in a two-way table: New interpretations of the chi-square statistic, Ann. Stat., 13 (1985) 845–874. 206 International Workshop on Combinatorial Algorithms 07 Orthogonal Drawings of Series-Parallel Graphs⋆ Takao Nishizeki Graduate School of Information Sciences, Tohoku University, Aoba-yama 6-6-05, Sendai, 980-8579, Japan. Abstract In an orthogonal drawing of a planar graph G, each vertex is drawn as a point, each edge is drawn as a sequence of alternate horizontal and vertical line segments, and any two edges do not cross except at their common end. A bend is a point where an edge changes its direction. A drawing of G is called an optimal orthogonal drawing if the number of bends is minimum among all orthogonal drawings of G. In this talk we deal with the class of series-parallel (multi)graphs of degrees at most 3, and give a simple linear algorithm to find an optimal orthogonal drawing in the variable embedding setting. The graph G in Fig. 1 is series-parallel, and has various plane embeddings; two of them are illustrated in Figs. 1(b) and (c); there is no plane embedding having an orthogonal drawing with no bend; however, the embedding in Fig. 1(b) has an orthogonal drawing with one bend as illustrated in Fig. 1(a) and hence the drawing is optimal; the embedding in Fig. 1(c) needs three bends as illustrated in Fig. 1(d); given G, our algorithm finds an optimal drawing in Fig. 1(a). Our algorithm works well even if G has multiple edges or is not biconnected, and is much simpler and faster than the known algorithms for biconnected series-parallel simple graphs; we use neither the min-cost flow technique nor the SPQ∗ R tree, but uses some structural features of series-parallel graphs. We furthermore obtain a best possible upper bound on the minimum number of bends. a b s g g b h c d (a) t b h s t f c e d a a a s f e (b) g h c f (c) b t e d g h s d t e c (d) f Fig. 1. (a) An optimal orthogonal drawing with one bend, (b), (c) two embeddings of the same planar graph, and (d) an orthogonal drawing with three bends. ⋆ This work is supported by JSPS grants. International Workshop on Combinatorial Algorithms 07 207 Time-Constrained Graph Searching Brian Alspach School of Mathematical and Physical Sciences, The University of Newcastle Abstract Searching graphs or digraphs for an intruder has been studied since the 1970s. There has been a boost in activity in the area because of mobile software agents. Most of the research has concentrated on the minimum number of searchers required to capture an intruder for a variety of searching models. There are applications for which the cost of searchers is negligible and minimizing the time required to capture an intruder is of interest. The latter is the subject of this talk. 208 International Workshop on Combinatorial Algorithms 07 Computing the k Most Representative Skyline Points Xuemin Lin School of Computer Science & Engineering University of New South Wales Sydney, NSW 2052, Australia lxue@cse.unsw.edu.au Abstract Skyline computation has many applications including multi-criteria decision making. In this talk, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, we show that the problem is NP-hard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1− 1e . To speed-up the computation, an efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. International Workshop on Combinatorial Algorithms 07 209 Distance constrained graph labeling: From frequency assignment to graph homomorphisms Jan Kratochvil Charles University, Prague Abstract The notion of distance constrained graph labelings stems from a practical problem of assigning frequencies to transmitters with the aim of avoiding unwanted interference. Apart from having this applied motivation, the problem is rather interesting from theoretical point of view as well. We will survey recent results and open problems, and in particular explore the connections to the algebraically motivated notion of graph homomorphisms. 210 International Workshop on Combinatorial Algorithms 07 The use of decomposition in the study of even-hole-free graphs Kristina Vušković School of Computing, University of Leeds, UK Abstract We consider finite and simple graphs. We say that a graph G contains a graph F , if F is isomorphic to an induced subgraph of G. A graph G is F -free if it does not contain F . Let F be a (possibly infinite) family of graphs. A graph G is F-free if it is F -free, for every F ∈ F. Many interesting classes of graphs can be characterized as being F-free for some family F. Most famous such example is the class of perfect graphs. A graph G is perfect if for every induced subgraph H of G, χ(H) = ω(H), where χ(H) denotes the chromatic number of H and ω(H) denotes the size of a largest clique in H. The famous Strong Perfect Graph Theorem states that a graph is perfect if and only if it does not contain an odd hole nor an odd antihole (where a hole is a chordless cycle of length at least four). In the last 15 years a number of other classes of graphs defined by excluding a family of induced subgraphs have been studied, perhaps originally motivated by the study of perfect graphs. The kinds of questions this line of research was focused on were whether excluding induced subgraphs affects the global structure of the particular class in a way that can be exploited for putting bounds on parameters such as χ and ω, constructing optimization algorithms (problems such as finding the size of a largest clique or a minimum coloring), recognition algorithms and explicit construction of all graphs belonging to the particular class. A number of these questions were answered by obtaining a structural characterization of a class through their decomposition (as was the case with the proof of the Strong Perfect Graph Theorem). In this talk we survey some of the most reacent uses of the decomposition theory in the study of classes of even-hole-free graphs. Even-hole-free graphs are related to β-perfect graphs in a similar way in which odd-hole-free graphs are related to perfect graphs. β-Perfect graphs are a particular class of graphs that can be polynomialy colored, by coloring greedily on a particular, easily constructable, ordering of vertices. International Workshop on Combinatorial Algorithms 07 211 Haplotype Inference Constrained by Plausible Haplotype Data Gad M. Landau⋆ Department of Computer Science, University of Haifa, Israel Abstract The haplotype inference problem (HIP) asks to find a set of haplotypes which resolve a given set of genotypes. This problem is of enormous importance in many practical fields, such as the investigation of diseases, or other types of genetic mutations. In order to find the haplotypes that are as close as possible to the real set of haplotypes that comprise the genotypes, two models have been suggested which by now have become widely accepted: The perfect phylogeny model and the pure parsimony model. All known algorithms up till now for the above problem may find haplotypes that are not necessarily plausible, i.e. very rare haplotypes or haplotypes that were never observed in the population. In order to overcome this disadvantage we study in this paper, for the first time, a new constrained version of HIP under the above mentioned models. In this new version, a pool of plausible haplotypes H is given together with the set of genotypes G, and the goal is to find a subset of H that resolves G. ⋆ Joint work with Tzvika Hartman, Danny Hermelin and Liat Leventhal. 212 International Workshop on Combinatorial Algorithms 07 Full-Text Indexing in a Changing World Moshe Lewenstein Department of Computer Science, Bar Ilan University, Israel Abstract Full-Text indices have been around for over 30 years now. Nevertheless, with the advent of the web, growing databases and a growing collection of applications dictates new needs from full-text indices. There is a need for them to be fast, space efficient, and to handle new issues such as errors in the text. In this talk several of these problems will be presented and some of the new approaches to solutions will be presented. Appendix B Open Problems Presented in International Workshop on Combintatoric Algorithms 07 International Workshop on Combinatorial Algorithms 07 1 213 Open Problems in Dynamic Map Labeling Brief description The problem of map labeling is well-studied in GIS and computational geometry. The standard setting for such problems is the traditional (static) map. We are interested in dynamic maps, the sort we are familiar with on the web. The user can zoom and pan to view different portions of some large map that does not fit in the screen. One idea for solving dynamic labeling is to reduce them directly to static labeling. This is essentially the approach of Petzold [3, 4, 6, 5]. The problem with this approach is the consistency issue as pointed out by [1]. They listed a set of consistency desiderata, formalized the dynamic labeling problem. In particular, they introduce a natural class of dynamic placements called point-invariant placements which as a nice interpretation as a geometric cone (the third dimension is scale space). They also provided a practical solution satisfying the desiderata. This solution is implemented in the G-Vis System [2], a dynamic map for continental USA which is accessible by any browser. Since this problem is quite new, most issues are open. But here are two specific problems taken from [1]: 1. The solution in [1] assumes each label has a single placement. It should not be hard to generalize this to having a (small) finite set of discrete possibilities per label. More challenging is to accomodate labels with a continuous set of possibilities. 2. The complexity of the problem of active range optimization (ARO) in [1] is open. Intuitively, we can view a dynamic label as a cone (with a rectangular base) in 3-D. The problem is to truncate these cones so that they are non-overlapping subject to some optimization criteria. Recently, Sheung-Hung Poon (private communication) noted that the 1-D optimization version of the simplified ARO is polynomial-time, but the 2-D version remains open. More Detail Here is the standard (simplified) setting: we are given a map M , viewed as rectangular region of the plane, and containing a set of (map) features. There are three kinds of features: points, lines, regions. Note that a line feature is really a polygonal line, and a region feature is really a connected polygonal region. Each of these features has a label (viewed as a floating rectangle) which may be placed on the map satisfying suitable constraints. E.g., A point label must be within a certain radius of the point (but not cover it). A line label must be placed parallel to one on the line segments of the polyline, either above or below the line segment within some distance. A region label must intersect the region. The computational problem is to (1) select a subset of the labels, and (2) place each selected label in a suitable position satisfying the above constraints, such that no two labels overlap. The optimization criterion is usually to maximize the number of selected labels. Sometimes, we assume the selection subproblem (1) has been solved, and we only want to do the placement subproblem (2). In the dynamic setting, we assume that only a portion of the map M is visible. That is we can choose a “viewing window” or portal P . Intuitively, we can zoom and pan this portal to view different parts of the map at different 214 International Workshop on Combinatorial Algorithms 07 scales. For simplicity, assume the portal shape is fixed (say, it is a square). A portal is parametrized by three numbers: (s, x, y) where s > 0 is the zoom scale, and (x, y) is the portal position. This determines a square P (s, x, y) centered at position (x, y) on M . The sides of P (s, x, y) has length s. We imagine this portion of M is then transformed to a fixed size window W on the computer screen. Note that W has the same shape as the portal, and a label displayed on W has a fixed size (i.e., it does not depend on the scale s). This means that when we zoom in, labels grow in size on the screen, and two non-overlapping labels may begin to overlap. Petzold et al. [3, 4, 6, 5] used a pre-processing solution that reduces dynamic to static map labeling. The idea is this: given (s, x, y), we first retrieve some superset L of the labels that are potentially visible in the portal P (s, x, y). Then we run a static labeling algorithm on this set L. The problem with this solution is that there is no “consistency” for different choices of s, x, y. A set of consistency requirements is specified in [1], who also provided a solution that satisfies them. References 1. K. Been, E. Daiches, and C. Yap, Dynamic map labelling, IEEE Transactions on Visualization and Computer Graphics, 12(5):773-780,2006. Proc. 12th Symp. on Information Visualization (InfoVis06), Baltimore, Maryland, October 2006. 2. G-Vis Dynamic Labeling Demo, 2002. URL http://sage.mc.yu.edu/gvis/. 3. I. Petzold. Textplazierung in dynamisch erzeugten Karten. Diploma-thesis, Institute for Computer Science, Bonn, 1996. 4. I. Petzold, Beschrifting von Bildschirmkarten in Echtzeit - Konzept und Struktur. Ph.D. thesis, Institute of Cartographt and Geoinformation, University of Bonn,2003. 5. I. Petzold, G. Gröger, and L. Plümer, Fast screen map labeling - data-structures and algorithms, In Proc. 21th International Cartographic Conferences (ICC’03), pages 288-298, 2003. 6. I.Petzold, L. Plümer, and M. Heber, Label placement for dynamically generated screen maps. In Proc. 19th International Cartographic Conferences (ICC’99), pages 893-903, 1999. Chee Yap, New York University, USA. International Workshop on Combinatorial Algorithms 07 2 215 Graphs with no equal length cycles Let f (n) be the maximum number of edges in a graph on n vertices in which no two cycles have the same length. In 1975, Erdös raised the problem of determining f (n) (see [1]). Y. Shi [4] proved that √ f (n) ≥ n + ⌊( 8n − 23 + 1)/2⌋ for n ≥ 3. Boros et al. proved that √ f (n) ≤ n + 1.98 n(1 + o(1)). Chunhui Lai [3] proved that lim inf n→∞ f (n) − n √ √ ≥ 2.4. n Combining this with the upper bound in [2], we get f (n) − n f (n) − n √ √ √ ≥ lim inf ≥ 2.4. n→∞ n n 1.98 ≥ lim sup n→∞ We make the following conjecture: Conjecture. lim n→∞ f (n) − n √ √ = 2.4. n References 1. J.A. Bondy and U.S.R. Murty, Graph Theory with Applications (Macmillan, New York, 1976)], p.247, Problem 11. 2. E. Boros, Y. Caro, Z. Füredi and R. Yuster, Covering non-uniform hypergraphs, Journal of Combinatorial Theory, Series B 82(2001), 270-284. 3. Chunhui Lai, Graphs without repeated cycle lengths, Australasian Journal of Combinatorics 27(2003), 101-105. 4. Y. Shi, On maximum cycle-distributed graphs, Discrete Math. 71(1988) 57-71. ChunHui Lai Zhangzhou Teachers College P.R. of China. 216 3 International Workshop on Combinatorial Algorithms 07 Entropy-compressed suffix trees Given a text T [1, n] over an alphabet of size s, a plain representation requires n log s bits (log is to base 2). A k-th order compressor can reduce its size to nHk bits, where Hk is the empirical k-th order entropy of T [1]. Classical text indexes such as suffix trees and suffix arrays [2] require O(n log n) bits. This waste of space is troublesome when indexing large texts that could fit in main memory but whose indexes (sometimes 20 times larger than the text!) cannot. Thus there is a strong interest in reduced-space representations that retain reasonable efficiency. Many recent developments [3] achieve suffix array functionality using nHk + o(n log s) bits, for any k ≤ a logs n and any constant 0 < a < 1. This space also contains the text, in the sense that the structure is capable of reproducing any text substring. By “suffix array functionality” I mean counting the number of occurrences of any pattern, and enumerating its text positions. The former can be done, say, in O(m log s) time (m being the pattern length), and the latter in O(polylog(n)) time per reported occurrence. Suffix tree functionality is more ambitious. It permits navigating the (concrete or virtual) suffix tree with operations like parent, child-labeled-a, first-child, next-sibling, suffix-link (leading from node representing ax to node representing x, a being a symbol and x a string), queries like subtree-size, first-leaf, last-leaf, and optionally other more ambitious ones like level-ancestor, lowest-commonancestor, etc. There have been recent achievements on succinct suffix trees with full functionality, most notably Sadakane’s [4]. Yet all of them still require O(n) extra bits of space on top of the entropy. In principle, nHk + o(...) bits should be sufficient (as at worst one can uncompress the text, build the suffix tree, and do the operation!), but no one has devised a way to operate efficiently on a suffix tree structure that is fully entropy-compressed, without any extra linear space. I believe this should be possible. References 1. G. Manzini. An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3):407-430, 2001. 2. D. Gusfield. Algorithms on strings, trees, and sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. 3. G. Navarro and V. Makinen. Compressed full-text indexes. ACM Computing Surveys 39(1):article 2, 2007. 4. K. Sadakane. Compressed suffix trees with full functionality. Theory of Computing Systems, to appear. Preliminary version available at http://tcslab.csce.kyushuu.ac.jp/ sada/papers/cst.ps Gonzalo Navarro, University of Chile, Chile. International Workshop on Combinatorial Algorithms 07 4 217 Indexed approximate string matching This is the problem of finding all the approximate occurrences, in a text T [1, n], of a pattern P [1, m], both over an alphabet of size s. By “approximate occurrence” I mean that at most k “edit operations” need to be done on any text substring to make it match the pattern. The most popular edit operations are insertions, deletions, and substitution of characters [1]. In particular I refer to the indexed variant of the problem [2], where one builds an index on T to speed up the searches for arbitrary patterns. Although there has been progress on this problem, one still finds that either the index is of exponential size (in k or m or s), or the search takes exponential time. See e.g. [3, 4]. I believe this is a fundamental space/time barrier, but as far as I know this has not been proved. References 1. G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys 33(1):31-88, 2001. 2. G. Navarro, R. Baeza-Yates, E. Sutinen, J. Tarhio. Indexing methods for approximate string matching. IEEE Data Engineering Bulletin 24(4):19-27, 2001. 3. R. Cole, L. Gottlieb, M. Lewenstein. Dictionary matching and indexing with errors and don’t cares. Proc. STOC’04, pp 91-100, 2004. 4. M. Maas, J. Nowak. Text indexing with errors. Proc. CPM’05, pp. 21-32, 2005. Gonzalo Navarro, University of Chile, Chile. 218 5 International Workshop on Combinatorial Algorithms 07 Does a polynomial maximising algorithm imply a polynomial minimising algorithm? Optimisation problems come in two flavours: maximisation and minimisation. Among these, some are polynomially solvable. Polynomially solvable optimisation problems can be said to form a class Popt . Thus Popt = Pmax ∪ Pmin , where Pmax (Pmin ) is the class of polynomially solvable maximisation (respectively, minimisation) problems. Of course, Pmax ∩ Pmin = ∅. For a problem P ∈ Popt , let P ′ be the dual of P . (a) If P ∈ Pmax , is P ′ ∈ Pmin always? (In other words, is P ′ also always polynomially solvable?) (b) Is the opposite always true — if P ∈ Pmin , is P ′ ∈ Pmax always? (c) Are both (a) and (b) true always? In other words, is Popt closed under duality ? Are there any results known on these problems ? Same “problem” as above, but let the optimisation problems be polynomially bounded — that is, the optimal solution value is bounded by a polynomial in the size of the problem instance. Prabhu Manyem, University of Ballarat, Australia. International Workshop on Combinatorial Algorithms 07 6 219 Certificate Dispersal Problems We consider a mobile network, where each node u has a private key pri.u and a public key pub.u. Users themselves create their public and private keys. In this network, in order for a node u to send a message m to a node v securely, u needs to know pub.u to encrypt the message with it, denoted by pub.v < m >. If a node u knows the public key pub.v of another node v in the network, then the node u can issue a certificate from u to v that identifies pub.v. A certificate from a node u to a node v is of the following form: pri.u < u, v, pub.v >. The certificate is encrypted by using pri.u and it contains three items: (i) the identity of the certificate issuer u, (ii) the identity of the certificate subject v, and (iii) the public key of the certificate subject pub.v. Any node who knows pub.u can use it to decrypt the certificate from u to v for obtaining pub.v. When a node u wants to obtain the public key of another node v, u acquires a sequence of certificates pri.u < u, v0 , pub.v0 >, pri.v0 < v0 , v1 , pub.v1 >, . . . pri.vℓ < vℓ , v, pub.v > which are stored in either u or v. All certificates issued by nodes in a network can be represented by a directed graph, called a certificate graph, denoted by G = (V, E). We define a dispersal D of a directed graph G = (V, E) as a family of sets of edges indexed by V , where D = {Dv ⊆ E|v ∈ V }. We define a request for a certificate graph G as a reachable ordered pair of nodes in G, denoted by (u, v). A set of requests, denoted by R, is called full if all reachable pairs of nodes in G are contained in R. We call a dispersal D of a directed graph G = (V, E) satisfies a set of requests R, if for any request (u, v) in R, there exists a path from u to v in Du ∪ Dv , where Du and Dv are dispersals of u and v. Let D be a dispersal of G satisfying R. The cost of dispersal D, denoted by c.D, is the sum of cardinalities of each dispersal in D. A dispersal D of G satisfying R is optimal if and only if for any other dispersal D′ satisfies R, c.D ≤ c.D′ . The MINIMUM CERTIFICATE DISPERSAL PROBLEM(MCD) is defined as follows: INPUT: A directed graph G = (V, E) and a set of requests R OUTPUT: A dispersal D of G satisfying R with the minimum cost. It has been shown that MCD is NP-complete, even if the input graph is restricted to a strongly connected one. Also a polynomial-time 2-approximation algorithm can be constructed for strongly connected graphs. Moreover, it can be shown that this algorithm outputs optimal dispersals for complete graphs, trees, rings and Cartesian product of graphs. In general, MCD is NP-complete for strongly connected graphs. A remaining question is whether MCD remains NP-complete for bidirectional graphs(or undirected graphs) and/or full requests or not. Can the approximation ratio of MCD algorithms can be improved? That is, can a polynomial-time α-approximation algorithm such that α < 2 be constructed for any directed graph? 220 International Workshop on Combinatorial Algorithms 07 The 2-approximation algorithm shown in [1] becomes optimal one for complete graphs, trees, rings hypercubes(more generally, Cartesian product of graphs). For some useful graph classes, such as chordal graphs or interval graphs, can we construct an optimal MCD algorithm? References 1. H. Zheng, S. Omura, K. Wada: A 2-approximation algorithm for minimum certificate dispersal problems, Proceedings of the 16th Australasian Workshop on Combinational Algorithms, 384-394 (2005). Koichi Wada, Nagoya Institute of Technology, JAPAN. International Workshop on Combinatorial Algorithms 07 7 221 The Maximum Number of Runs in a String Given a nonempty string u and an integer e ≥ 2, we call ue a repetition; if u itself is not a repetition, then ue is a proper repetition. Given a string x, a repetition in x is a substring x[i..i + e|u|1] = ue , £ ¤ where ue is a proper repetition and neither x i + e|u|..i + (e + 1)|u| − 1) nor x[i − |u|..i − 1] equals u. We say the repetition has period |u| and exponent e; it can be specified by the integer triple (i, |u|, e). It is well known [2] that the maximum number of repetitions in a string x = x[1..n] is Θ(n log n), and that the number of repetitions in x can be computed in Θ(n log n) time [2, 1, 10]. A string u is a run if and only if it is periodic of (minimum) period p ≤ |u|/2. Thus x = abaabaabaabaab = (aba)4 ab is a run of period |aba| = 3. A substring u = x[i..j] of x is a run or maximal periodicity in x if and only if it is a run of period p and neither x[i − 1..j] nor x[i..j + 1] is a run of period p. The run u has exponent e = ⌊|u|/p⌋ and possibly empty tail t = x[i + ep..j] (proper prefix of x[i..i + p − 1]). Thus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x=baaabaaba a b a b a contains a run x[3..12] of period p = 3 and exponent e = 3 with tail t = a of length t = |t| = 1. It can be specified by a 4-tuple (i, p, e, t) = (3, 3, 3, 1). and it includes the repetitions (aab)3 , (aba)3 and (baa)2 of period p = 3. In general it is easy to see that for e = 2 a run encodes t + 1 repetitions; for e > 2, p repetitions. Clearly, computing all the runs in x specifies all the repetitions in x. The idea of a run was introduced in [9]. Let rx denote the number of runs that actually occur in a given string x, and let ρ(n) denote the maximum number of runs that can possibly occur in any string x of given length n. A string x = x[1..n] such that rx = ρ(n) is said to be run-maximal. In [7, 8] it was shown that there exist universal positive constants k1 and k2 such that √ ρ(n)/n < k1 k2 log2 n/ n, but the proof was nonconstructive and provided no way of estimating the magnitude of k1 and k2 . In [7], using a brute force algorithm, a table of ρ(n) was computed for n = 5, 6, . . . , 31, giving also for each n an example of a run-maximal string; for every n in this range, ρ(n)/n < 1 and ρ(n) ≤ ρ(n − 1) + 2. In [5] an infinite sequence X = {x1 , x2 , . . . .} of strings was described, with |xi+1 | > |xi | for every i ≥ 1, such that lim rxi /|xi | = i→∞ 3 , 2φ 222 International Workshop on Combinatorial Algorithms 07 where φ = √ 1+ 5 2 is the golden mean. Moreover, it was conjectured that in fact lim ρ(n)/n = n→∞ 3 . 2φ (1) Recently a different and simpler construction was found [6] to yield another infinite sequence X of strings for which the ratio rxi /|xi | approached the same limit; in addition, it was shown that for every ǫ > 0 and for every sufficiently 3 ǫ provides an asymptotic lower bound on ρ(n)/n. large n = n(ǫ), 2φ In 2006 considerable progress was made on the estimation of an upper bound on ρ(n)/n: ∗ ∗ ∗ ∗ ρ(n)/n ≤ 5.0 [12]; ρ(n)/n ≤ 3.48 [11]; ρ(n)/n ≤ 3.44 [13]; ρ(n)/n ≤ 1.6 [3]. Thus the problem may be stated as follows: Is conjecture (1) true? If not, then characterize the function ρ(n)/n. Help may be found in recent work studying the limitations imposed on the existence and length of runs in neighbourhoods of positions where two runs are known to exist [4, 14]. References 1. Alberto Apostolico & Franco P. Preparata, Optimal off-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) 297–315. 2. Maxime Crochemore, An optimal algorithm for computing the repetitions in a word, Inform. Process. Lett. 12–5 (1981) 244–250. 3. Maxime Crochemore & Lucian Ilie, Maximal repetitions in strings, submitted for publication (2006). 4. Kangmin Fan, Simon J. Puglisi, W. F. Smyth & Andrew Turpin, A new periodicity lemma, SIAM J. Discrete Math. 20–3 (2006) 656–668. 5. Frantisek Franek, R. J. Simpson & W. F. Smyth, The maximum number of runs in a string, Proc. 14th Australasian Workshop on Combinatorial Algs., M. Miller & K. Park (eds.) (2003) 26–35. 6. Frantisek Franek & Qian Yang, An asymptotic lower bound for the maximum-number-of-runs function, Proc. Prague Stringology Conference ’06, Jan Holub & Jan Žd’árek (eds.) (2006) 3–8. 7. Roman Kolpakov & Gregory Kucherov, Maximal Repetitions in Words or How to Find all Squares in Linear Time, Rapport LORIA 98-R-227, Laboratoire Lorrain de Recherche en Informatique et ses Applications (1998) 22 pp. 8. Roman Kolpakov & Gregory Kucherov, On maximal repetitions in words, J. Discrete Algs. 1 (2000) 159–186. 9. Michael G. Main, Detecting leftmost maximal periodicities, Discrete Applied Maths. 25 (1989) 145–153. International Workshop on Combinatorial Algorithms 07 223 10. Michael G. Main & Richard J. Lorentz, An O(n log n) algorithm for finding all repetitions in a string, J. Algs. 5 (1984) 422–432. 11. Simon J. Puglisi, R. J. Simpson & W. F. Smyth, How many runs can a string contain?, submitted for publication (2006). 12. Wojciech Rytter, The number of runs in a string: improved analysis of the linear upper bound, Proc. 23rd Symp. Theoretical Aspects of Computer Science, B. Durand & W. Thomas (eds.), LNCS 2884, Springer-Verlag (2006) 184–195. 13. Wojciech Rytter, The number of runs in a string, submitted for publication (2006). 14. R. J. Simpson, Intersecting periodic words, to appear, Theoret. Comput. Sci. Bill Smyth McMaster University, Canada and Curtin University, Australia Author Index Alspach, Brian . . . . . . . . . . . . . . . . . . . 207 Nguyen, Minh . . . . . . . . . . . . . . . . . . . .129 Araki, Toru . . . . . . . . . . . . . . . . . . . . . . . . 1 Nishizeki, Takao . . . . . . . . . . . . . . . . . 206 Arroyuelo, Diego . . . . . . . . . . . . . . . . . . 11 Pineda-Villavicencio, Guillermo . . 129 Balbuena, Camino . . . . . . . . . . . . . . . . 21 Poon, Jacky . . . . . . . . . . . . . . . . . . . . . .137 Canfield, E. Rodney . . . . . . . . . . . . . .204 Rahman, M Sohel . . . . . . . . . . . . . . . . . 93 Chen, Gen-Huey . . . . . . . . . . . . . . . . . 180 Ryjacek, Zdenek . . . . . . . . . . . . . . . . . . 25 Rytter, Wojciech . . . . . . . . . . . . . . 45, 93 Dafik, . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Donovan, Diane . . . . . . . . . . . . . . . . . . . 35 Sant, Paul . . . . . . . . . . . . . . . . . . . . . . . . 56 Semanicova, Andrea . . . . . . . . . . . . . 143 Edwards, Jenny . . . . . . . . . . . . . . . . . . 158 Simpson, Jamie . . . . . . . . . . . . . . . . . . 137 Fraczak, Wojciech . . . . . . . . . . . . . . . . . 45 Solomon, Andrew . . . . . . . . . . . . . . . . 158 Fu, Jung-Sheng . . . . . . . . . . . . . . . . . . 180 Smyth, Bill . . . . . . . . . . . . . . . . . . . . . . 221 Suchy, Ondrej . . . . . . . . . . . . . . . . . . . . 148 Gibbons, Alan . . . . . . . . . . . . . . . . . . . . 56 Sutcliffe, Paul . . . . . . . . . . . . . . . . . . . . 158 Takenaga, Yasuhiko . . . . . . . . . . . . . . 170 Tang, jianmin . . . . . . . . . . . . . . . . . . . . . 21 Iliopoulos, Costas . . . . . . . . . . . . . .25, 93 Thompson, Bevan . . . . . . . . . . . . . . . . . 35 Ivanco, Jaroslav . . . . . . . . . . . . . . . . . . 143 Thompson, Jayne . . . . . . . . . . . . . . . . . 35 Tsai, Ping-Ying . . . . . . . . . . . . . . . . . . 180 Jelinkova, Eva . . . . . . . . . . . . . . . . . . . 107 Vušković, Kristina . . . . . . . . . . . . . . . 210 Kovar, Petr . . . . . . . . . . . . . . . . . . . . . . 143 Kratochvil, Jan . . . . . . . . . . . . . . . . . . 209 Wada, Koichi . . . . . . . . . . . . . . . . . . . . 219 Wu, Yunjian . . . . . . . . . . . . . . . . . . . . . 122 Lai, ChunHui . . . . . . . . . . . . . . . . . . . . 215 Landau, M. Gad . . . . . . . . . . . . . . . . . 211 Yamamoto, Hiroaki . . . . . . . . . . . . . . 190 Lewenstein, Moshe . . . . . . . . . . . . . . . 212 Yap, Chee . . . . . . . . . . . . . . . . . . . . . . . 213 Lin, Xuemin . . . . . . . . . . . . . . . . . . . . . 208 Yazdani, Mohammadreza . . . . . . . . . .45 Lin, Yuqing . . . . . . . . . . . . . . . . . . 21, 122 Yu, Qinglin . . . . . . . . . . . . . . . . . . . . . . 122 Loch, Birgit . . . . . . . . . . . . . . . . . . . . . . . 35 Lu, Hongliang . . . . . . . . . . . . . . . . . . . .122 Hong, Seok-Hee . . . . . . . . . . . . . . . . . . . 78 Manyem, Prabhu . . . . . . . . . . . . . . . . 218 Marshall, Kim . . . . . . . . . . . . . . . . . . . . 21 Mckay, Brendan . . . . . . . . . . . . . . . . . . 204 Miller, Mirka . . . . . . . . . . . . . . . . . 25, 129 Miura, Yusuke . . . . . . . . . . . . . . . . . . . 170 Myerson, Gerry . . . . . . . . . . . . . . . . . . 137 Nagamochi, Hiroshi . . . . . . . . . . . . . . . 78 Navarro, Gonzalo . . . . 11, 66, 216, 217

RELATED PAPERS

RELATED TOPICS

Log In

On Superconnectivity of (4, g)-Cages

On Superconnectivity of (4, g)-Cages