Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2541940.2541988acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Data-parallel finite-state machines

Published: 24 February 2014 Publication History

Abstract

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-access patterns making them difficult to parallelize. This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol. This allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores. For instance, on benchmarks from three FSM applications: regular expressions, Huffman decoding, and HTML tokenization, the parallel algorithm achieves up to a 3x speedup over optimized sequential baselines on a single core, and linear speedups up to 21x on 8 cores.

References

[1]
G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.
[2]
J. A. Brzozowski. Canonical regular expressions and minimal state graphs for definite events. Mathematical Theory of Automata, 12: 529--561, 1962.
[3]
R. D. Cameron, E. Amiri, K. S. Herdy, D. Lin, T. C. Shermer, and F. Popowich. Parallel scanning with bitstream addition: An XML case study. In European Conference on Parallel and Distributed Computing, Part II, pages 2--13, 2011.
[4]
B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.
[5]
R. L. Cloud, M. L. Curry, H. L. Ward, A. Skjellum, and P. Bangalore. Accelerating lossless data compression with GPUs. Computing Research Repository (CoRR), abs/1107.1525, 2011.
[6]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51 (1): 107--113, Jan. 2008.
[7]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A MapReduce framework on graphics processors. In International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, 2008.
[8]
W. D. Hillis and G. L. Steele. Data parallel algorithms. In Commun. ACM, volume 29, pages 1170--1183, Dec 1986.
[9]
kr(2009)}Holub:2009J. Holub and S.vStekr. On parallel implementations of deterministic finite automata. In Implementation and Application of Automata, CIAA '09, pages 54--64, 2009.
[10]
J. E. Hopcroft and J. D. Ullman. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1969.
[11]
P. Howard and J. Vitter. Parallel lossless image compression using Huffman and arithmetic coding. In Data Compression Conference, 1992. DCC '92., pages 299 --308, March 1992.
[12]
013)}haswellIntel Haswell Microarchitecture, 2013. URL http://software.intel.com/en-us/haswell.
[13]
C. G. Jones, R. Liu, L. Meyerovich, K. Asanović, and R. Bodík. Parallelizing the web browser. In Hot Topics in Parallelism (HotPar), pages 7--7, 2009.
[14]
S. T. Klein, Y. Wiseman, S. T. Klein, and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. The Computer Journal, 46: 487--497, 2003.
[15]
R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27 (4): 831--838, 1980.
[16]
}libhuffmanlibhuffman. URL http://huffman.sourceforge.net/.
[17]
C.-H. Lin and C.-W. Jen. Low power parallel Huffman decoding. Electronics Letters, 34 (3): 240 --241, Feb 1998.
[18]
D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In High Performance Computer Architecture (HPCA), pages 373--384, 2012.
[19]
D. Luchaup, R. Smith, C. Estan, and S. Jha. Speculative parallel pattern matching. IEEE Transactions on Information Forensics and Security, 6 (2): 438--451, June 2011.
[20]
G. H. Mealy. A method for synthesizing sequential circuits. Bell System Technical Journal, 34 (5): 1045--1079, 1955.
[21]
G. Navarro. NR-grep: A fast and flexible pattern matching tool. Software: Practice and Experience, 31 (13): 1265--1312, 2001.
[22]
Y. Pan, Y. Zhang, and K. Chiu. Simultaneous transducers for data-parallel XML parsing. International Symposium on Parallel and Distributed Processing, pages 1--12, 2008.
[23]
V. Paxson. flex - fast lexical analyzer generator, 1988.
[24]
G. R. Prakash Prabhu and K. Vaswani. Safe programmable speculative parallelism. In Programming Languages Design and Implementation (PLDI), June 2010.
[25]
}gutenbergProject Gutenberg. URL http://www.gutenberg.org/.
[26]
G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Programming Languages Design and Implementation (PLDI), pages 118--131, 2006.
[27]
M. Roesch. Snort - Lightweight intrusion detection for networks. In Proceedings of the 13th USENIX conference on System administration, LISA '99, pages 229--238, 1999.
[28]
D. P. Scarpazza and G. F. Russell. High-performance regular expression scanning on the Cell/B.E. processor. In International Conf. on Supercomputing, ICS '09, pages 14--25, 2009.
[29]
S. Sengupta, M. Harris, Y. Zhang, and J. Owens. Scan primitives for GPU computing. In SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106. Eurographics Association, 2007.
[30]
P. Sutton. Partial character decoding for improved regular expression matching in FPGAs. In Field-Programmable Technology, pages 25 -- 32, Dec. 2004.
[31]
J. Talbot, R. M. Yoo, and C. Kozyrakis. Phoenix
[32]
: Modular MapReduce for shared-memory systems. In Workshop on MapReduce and its Applications, pages 9--16, 2011.
[33]
G. Vasiliadis, M. Polychronakis, S. Antonatos, E. Markatos, and S. Ioannidis. Regular expression matching on graphics hardware for intrusion detection. In Recent Advances in Intrusion Detection, volume 5758, pages 265--283. 2009.
[34]
B. Wei and T. Meng. A parallel decoder of programmable Huffman codes. Circuits and Systems for Video Technology, 5 (2): 175 --178, Apr 1995.
[35]
Y.-H. E. Yang, W. Jiang, and V. K. Prasanna. Compact architecture for high-throughput regular expression matching on FPGA. In Architectures for Networking and Communications Systems, ANCS '08, pages 30--39, 2008.
[36]
F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Architecture for Networking and Communications Systems, ANCS '06, pages 93--102, 2006.
[37]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Operating Systems Design and Implementation, OSDI'08, pages 1--14, 2008.

Cited By

View all
  • (2024)PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/365562618:1(1-28)Online publication date: 1-Apr-2024
  • (2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
  • (2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
February 2014
780 pages
ISBN:9781450323055
DOI:10.1145/2541940
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data parallel
  2. finite state machine
  3. regular expression

Qualifiers

  • Research-article

Conference

ASPLOS '14

Acceptance Rates

ASPLOS '14 Paper Acceptance Rate 49 of 217 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)6
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/365562618:1(1-28)Online publication date: 1-Apr-2024
  • (2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
  • (2024)One Automaton to Rule Them All: Beyond Multiple Regular Expressions ExecutionProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444810(193-206)Online publication date: 2-Mar-2024
  • (2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
  • (2023)SE-CNN: Convolution Neural Network Acceleration via Symbolic Value PredictionIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.324476713:1(73-85)Online publication date: Mar-2023
  • (2023)SimdFSM: An Adaptive Vectorization of Finite State Machines for Speculative ExecutionParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_37(481-493)Online publication date: 8-Apr-2023
  • (2022)DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata ProcessorACM Transactions on Architecture and Code Optimization10.1145/355697619:4(1-26)Online publication date: 7-Oct-2022
  • (2022)Compilation on the GPU?Proceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530249(230-236)Online publication date: 17-May-2022
  • (2022)GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00053(481-491)Online publication date: May-2022
  • (2022)Parallel Composition of Weighted Finite-State TransducersICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9747713(6542-6546)Online publication date: 23-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media