research-article

Data-parallel finite-state machines

Authors:

Todd Mytkowicz,

Madanlal Musuvathi,

Wolfram SchulteAuthors Info & Claims

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Pages 529 - 542

https://doi.org/10.1145/2541940.2541988

Published: 24 February 2014 Publication History

Abstract

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-access patterns making them difficult to parallelize. This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol. This allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores. For instance, on benchmarks from three FSM applications: regular expressions, Huffman decoding, and HTML tokenization, the parallel algorithm achieves up to a 3x speedup over optimized sequential baselines on a single core, and linear speedups up to 21x on 8 cores.

References

[1]

G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.

[2]

J. A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. Mathematical Theory of Automata, 12: 529--561, 1962.

[3]

R. D. Cameron, E. Amiri, K. S. Herdy, D. Lin, T. C. Shermer, and F. Popowich. Parallel scanning with bitstream addition: An XML case study. In European Conference on Parallel and Distributed Computing, Part II, pages 2--13, 2011.

Digital Library

[4]

B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.

[5]

R. L. Cloud, M. L. Curry, H. L. Ward, A. Skjellum, and P. Bangalore. Accelerating lossless data compression with GPUs. Computing Research Repository (CoRR), abs/1107.1525, 2011.

[6]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51 (1): 107--113, Jan. 2008.

Digital Library

[7]

B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A MapReduce framework on graphics processors. In International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, 2008.

Digital Library

[8]

W. D. Hillis and G. L. Steele. Data parallel algorithms. In Commun. ACM, volume 29, pages 1170--1183, Dec 1986.

Digital Library

[9]

kr(2009)}Holub:2009J. Holub and S.vStekr. On parallel implementations of deterministic finite automata. In Implementation and Application of Automata, CIAA '09, pages 54--64, 2009.

Digital Library

[10]

J. E. Hopcroft and J. D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1969.

Digital Library

[11]

P. Howard and J. Vitter. Parallel lossless image compression using Huffman and arithmetic coding. In Data Compression Conference, 1992. DCC '92., pages 299 --308, March 1992.

[12]

013)}haswellIntel Haswell Microarchitecture, 2013. URL http://software.intel.com/en-us/haswell.

[13]

C. G. Jones, R. Liu, L. Meyerovich, K. Asanović, and R. Bodík. Parallelizing the web browser. In Hot Topics in Parallelism (HotPar), pages 7--7, 2009.

Digital Library

[14]

S. T. Klein, Y. Wiseman, S. T. Klein, and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. The Computer Journal, 46: 487--497, 2003.

[15]

R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27 (4): 831--838, 1980.

Digital Library

[16]

}libhuffmanlibhuffman. URL http://huffman.sourceforge.net/.

[17]

C.-H. Lin and C.-W. Jen. Low power parallel Huffman decoding. Electronics Letters, 34 (3): 240 --241, Feb 1998.

[18]

D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In High Performance Computer Architecture (HPCA), pages 373--384, 2012.

Digital Library

[19]

D. Luchaup, R. Smith, C. Estan, and S. Jha. Speculative parallel pattern matching. IEEE Transactions on Information Forensics and Security, 6 (2): 438--451, June 2011.

Digital Library

[20]

G. H. Mealy. A method for synthesizing sequential circuits. Bell System Technical Journal, 34 (5): 1045--1079, 1955.

[21]

G. Navarro. NR-grep: A fast and flexible pattern matching tool. Software: Practice and Experience, 31 (13): 1265--1312, 2001.

Digital Library

[22]

Y. Pan, Y. Zhang, and K. Chiu. Simultaneous transducers for data-parallel XML parsing. International Symposium on Parallel and Distributed Processing, pages 1--12, 2008.

[23]

V. Paxson. flex - fast lexical analyzer generator, 1988.

[24]

G. R. Prakash Prabhu and K. Vaswani. Safe programmable speculative parallelism. In Programming Languages Design and Implementation (PLDI), June 2010.

Digital Library

[25]

}gutenbergProject Gutenberg. URL http://www.gutenberg.org/.

[26]

G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Programming Languages Design and Implementation (PLDI), pages 118--131, 2006.

Digital Library

[27]

M. Roesch. Snort - Lightweight intrusion detection for networks. In Proceedings of the 13th USENIX conference on System administration, LISA '99, pages 229--238, 1999.

Digital Library

[28]

D. P. Scarpazza and G. F. Russell. High-performance regular expression scanning on the Cell/B.E. processor. In International Conf. on Supercomputing, ICS '09, pages 14--25, 2009.

Digital Library

[29]

S. Sengupta, M. Harris, Y. Zhang, and J. Owens. Scan primitives for GPU computing. In SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106. Eurographics Association, 2007.

Digital Library

[30]

P. Sutton. Partial character decoding for improved regular expression matching in FPGAs. In Field-Programmable Technology, pages 25 -- 32, Dec. 2004.

[31]

J. Talbot, R. M. Yoo, and C. Kozyrakis. Phoenix

[32]

: Modular MapReduce for shared-memory systems. In Workshop on MapReduce and its Applications, pages 9--16, 2011.

Digital Library

[33]

G. Vasiliadis, M. Polychronakis, S. Antonatos, E. Markatos, and S. Ioannidis. Regular expression matching on graphics hardware for intrusion detection. In Recent Advances in Intrusion Detection, volume 5758, pages 265--283. 2009.

Digital Library

[34]

B. Wei and T. Meng. A parallel decoder of programmable Huffman codes. Circuits and Systems for Video Technology, 5 (2): 175 --178, Apr 1995.

Digital Library

[35]

Y.-H. E. Yang, W. Jiang, and V. K. Prasanna. Compact architecture for high-throughput regular expression matching on FPGA. In Architectures for Networking and Communications Systems, ANCS '08, pages 30--39, 2008.

Digital Library

[36]

F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Architecture for Networking and Communications Systems, ANCS '06, pages 93--102, 2006.

Digital Library

[37]

Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Operating Systems Design and Implementation, OSDI'08, pages 1--14, 2008.

Digital Library

Cited By

Sun MXie GZhang FGuo WFan XLi TChen LDu J(2024)PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/365562618:1(1-28)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3655626
Ge TZhang TLiu HTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624848
Cicolini LCarloni FSantambrogio MConficconi DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444810
Show More Cited By

Index Terms

Data-parallel finite-state machines

Recommendations

Data-parallel finite-state machines
ASPLOS '14

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-...
Data-parallel finite-state machines
ASPLOS '14

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-...
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

February 2014

780 pages

ISBN:9781450323055

DOI:10.1145/2541940

General Chairs:
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Program Chair:
Sarita Adve
University of Illinois at Urbana-Champ

ACM SIGARCH Computer Architecture News Volume 42, Issue 1
ASPLOS '14
March 2014
729 pages
ISSN:0163-5964
DOI:10.1145/2654822
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 49, Issue 4
ASPLOS '14
April 2014
729 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2644865
Editors:
Mark W. Bailey
Hamilton College, Clinton, NY
,
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Sarita Adve
University of Illinois at Urbana-Champ
Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '14

Sponsor:

ASPLOS '14: Architectural Support for Programming Languages and Operating Systems

March 1 - 5, 2014

Utah, Salt Lake City, USA

Acceptance Rates

ASPLOS '14 Paper Acceptance Rate 49 of 217 submissions, 23%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

103
Total Citations
View Citations
882
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)6

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun MXie GZhang FGuo WFan XLi TChen LDu J(2024)PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/365562618:1(1-28)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3655626
Ge TZhang TLiu HTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624848
Cicolini LCarloni FSantambrogio MConficconi DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444810
Liu HPai SJog A(2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3579453
Yao Y(2023)SE-CNN: Convolution Neural Network Acceleration via Symbolic Value PredictionIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324476713:1(73-85)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3244767
Li LTaura K(2023)SimdFSM: An Adaptive Vectorization of Finite State Machines for Speculative ExecutionParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_37(481-493)Online publication date: 8-Apr-2023
https://doi.org/10.1007/978-3-031-29927-8_37
Liu YZhang XZhuang DFu XSong S(2022)DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata ProcessorACM Transactions on Architecture and Code Optimization10.1145/355697619:4(1-26)Online publication date: 7-Oct-2022
https://dl.acm.org/doi/10.1145/3556976
Voetter RHuijben MRietveld KSterpone LBartolini AButko A(2022)Compilation on the GPU?Proceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530249(230-236)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3528416.3530249
Wang YWatling RQiu JWang Z(2022)GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00053(481-491)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00053
Sengupta SPratap VHannun A(2022)Parallel Composition of Weighted Finite-State TransducersICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9747713(6542-6546)Online publication date: 23-May-2022
https://doi.org/10.1109/ICASSP43922.2022.9747713
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents