research-article

ASPEN: a scalable in-SRAM architecture for pushdown automata

Authors:

Kevin Angstadt,

Arun Subramaniyan,

Elaheh Sadredini,

Westley Weimer,

Reetuparna DasAuthors Info & Claims

MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture

Pages 921 - 932

https://doi.org/10.1109/MICRO.2018.00079

Published: 20 October 2018 Publication History

Abstract

Many applications process some form of tree-structured or recursively-nested data, such as parsing XML or JSON web content as well as various data mining tasks. Typical CPU processing solutions are hindered by branch misprediction penalties while attempting to reconstruct nested structures and also by irregular memory access patterns. Recent work has demonstrated improved performance for many data processing applications through memory-centric automata processing engines. Unfortunately, these architectures do not support a computational model rich enough for tasks such as XML parsing.

In this paper, we present ASPEN, a general-purpose, scalable, and reconfigurable memory-centric architecture for processing of tree-like data. We take inspiration from previous automata processing architectures, but support the richer deterministic pushdown automata computational model. We propose a custom datapath capable of performing the state matching, stack manipulation, and transition routing operations of pushdown automata, all efficiently stored and computed in memory arrays. Further, we present compilation algorithms for transforming large classes of existing grammars to pushdown automata executable on ASPEN, and demonstrate their effectiveness on four different languages: Cool (object oriented programming), DOT (graph visualization), JSON, and XML.

Finally, we present an empirical evaluation of two application scenarios for ASPEN: XML parsing, and frequent subtree mining. The proposed architecture achieves an average 704.5 ns per KB parsing XML compared to 9983 ns per KB in a state-of-the-art XML parser across 23 benchmarks. We also demonstrate a 37.2x and 6x better end-to-end speedup over CPU and GPU implementations of subtree mining.

References

[1]

N. Chomsky and G. A. Miller, "Introduction to the formal analysis of natural languages," in Handbook of Mathematical Psychology, 1963, vol. 2, ch. 11, pp. 269--322.

[2]

Computer Sciences Corporation, "Big data universe beginning to explode," http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode, 2012.

[3]

DNV GL, "Are you able to leverage big data to boost your productivity and value creation?" https://www.dnvgl.com/assurance/viewpoint/viewpoint-surveys/big-data.html, 2016.

[4]

K. Asanović, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The landscape of parallel computing research: A view from berkeley," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183, Dec 2006.

[5]

Z. Dai, N. Ni, and J. Zhu, "A 1 cycle-per-byte xml parsing accelerator," in Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 2010.

Digital Library

[6]

V. M. Glushkov, "The abstract theory of automata," Russian Mathematical Surveys, 1961.

[7]

M. Sipser, Introduction to the Theory of Computation, 3rd ed. Cengage Learning, 2013.

[8]

P. Caron and D. Ziadi, "Characterization of Glushkov automata," Theoretical Computer Science, vol. 233, 2000.

Digital Library

[9]

J. Clark, "The Expat XML parser," http://expat.sourceforge.net.

[10]

A. S. Foundation, "Xerces C++ XML parser," http://xerces.apache.org/xerces-c/.

[11]

P. Kilpeläinen et al., "Tree matching problems with applications to structured text databases," 1992.

[12]

Y. Chi, R. R. Muntz, S. Nijssen, and J. N. Kok, "Frequent subtree mining---an overview," Fundamenta Informaticae, vol. 66, 2005.

Digital Library

[13]

E. Sadredini, R. Rahimi, K. Wang, and K. Skadron, "Frequent subtree mining on the automata processor: challenges and opportunities," in International Conference on Supercomputing, 2017.

Digital Library

[14]

S. A. Greibach, "A new normal-form theorem for context-free phrase structure grammars," J. ACM, vol. 12, Jan. 1965.

Digital Library

[15]

M. M. Geller, M. A. Harrison, and I. M. Havel, "Normal forms of deterministic grammars," Discrete Mathematics, vol. 16, 1976.

[16]

M. A. Harrison and I. M. Havel, "Real-time strict deterministic languages," SIAM Journal on Computing, vol. 1, 1972.

[17]

J. Levine and L. John, Flex & Bison, 1st ed. O'Reilly Media, Inc., 2009.

[18]

D. Beazley, "PLY (python lex-yacc)," http://www.dabeaz.com/ply/index.html.

[19]

INRIA, "Lexer and parser generators (ocamllex, ocamlyacc)," http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html.

[20]

K. Angstadt, J. Wadden, V. Dang, T. Xie, D. Kramp, W. Weimer, M. Stan, and K. Skadron, "MNCaRT: An open-source, multi-architecture automata-processing research and execution ecosystem," IEEE Computer Architecture Letters, vol. 17, Jan 2018.

Digital Library

[21]

W. J. Bowhill, B. A. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, O. Mendoza, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, B. Brock, D. Bradley, C. Bostak, S. Bhimji, and M. Becker, "The Xeon® processor E5-2600 v3: a 22 nm 18-core product family," J. Solid-State Circuits, vol. 51, 2016.

[22]

W. Chen, S.-L. Chen, S. Chiu, R. Ganesan, V. Lukka, W. W. Mar, and S. Rusu, "A 22nm 2.5 mb slice on-die l3 cache for the next generation Xeon® processor," in Symposium on VLSI Technology, 2013.

[23]

M. Huang, M. Mehalel, R. Arvapalli, and S. He, "An energy efficient 32-nm 20-mb shared on-die L3 cache for Intel® Xeon® processor E5 family," J. Solid-State Circuits, vol. 48, 2013.

[24]

P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes, "An efficient and scalable semiconductor architecture for parallel automata processing," IEEE Transactions on Parallel and Distributed Systems, vol. 25, 2014.

[25]

A. Subramaniyan, J. Wang, E. R. M. Balasubramanian, D. Blaauw, D. Sylvester, and R. Das, "Cache automaton," in International Symposium on Microarchitecture, 2017.

Digital Library

[26]

G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM J. Scientific Computing, vol. 20, 1998.

Digital Library

[27]

Intel, "Cache Allocation Technology," 2017. {Online}. Available: https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology

[28]

"Performance Application Programming Interface." http://icl.cs.utk.edu/papi/.

[29]

H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le, "Rapl: Memory power estimation and capping," in International Symposium on Low-Power Electronics and Design, 2010.

Digital Library

[30]

"nvprof profiling tool," http://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview.

[31]

D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron, "Parabix: Boosting the efficiency of text processing on commodity processors," in International Symposium on High Performance Computer Architecture, 2012.

Digital Library

[32]

"Ximpleware XML dataset," http://www.ximpleware.com/xmls.zip.

[33]

"XML Data Repository," http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html.

[34]

J. Wadden and K. Skadron, "VASim: An open virtual automata simulator for automata processing application and architecture research," University of Virginia, Tech. Rep. CS2016-03, 2016.

[35]

M. J. Zaki, "Efficiently mining frequent trees in a forest," in knowledge Discovery and Data Mining, 2002.

Digital Library

[36]

R. Iváncsy and I. Vajk, "Automata theory approach for solving frequent pattern discovery problems," Journal of Computer, Electrical, Automation Control and Information Engineering, vol. 1, 2007.

[37]

J. Ellson, E. Gansner, L. Koutsofios, S. North, G. Woodhull, S. Description, and L. Technologies, "Graphviz---open source graph drawing tools," in Lecture Notes in Computer Science. Springer-Verlag, 2001.

[38]

M. Becchi, "Regular expression processor," http://regex.wustl.edu, 2011, accessed 2017-04-06.

[39]

J. van Lunteren, C. Hagleitner, T. Heil, G. Biran, U. Shvadron, and K. Atasu, "Designing a programmable wire-speed regular-expression matching accelerator," in International Symposium on Microarchitecture, 2012.

Digital Library

[40]

P. Tandon, F. M. Sleiman, M. J. Cafarella, and T. F. Wenisch, "HAWK: hardware support for unstructured log processing," in International Conference on Data Engineering, 2016.

[41]

V. Gogte, A. Kolli, M. J. Cafarella, L. D'Antoni, and T. F. Wenisch, "HARE: hardware accelerator for regular expressions," in International Symposium on Microarchitecture, 2016.

Digital Library

[42]

Y. Fang, T. T. Hoang, M. Becchi, and A. A. Chien, "Fast support for unstructured data processing: the unified automata processor," in International Symposium on Microarchitecture, 2015.

Digital Library

[43]

Y. Fang, C. Zou, A. J. Elmore, and A. A. Chien, "UDP: a programmable accelerator for extract-transform-load workloads and more," in International Symposium on Microarchitecture. ACM, 2017.

Digital Library

[44]

A. Subramaniyan and R. Das, "Parallel automata processor," in International Symposium on Computer Architecture, New York, NY, USA, 2017.

Digital Library

[45]

T. Xie, V. Dang, J. Wadden, K. Skadron, and M. R. Stan, "REAPR: Reconfigurable engine for automata processing," in International Conference on Field-Programmable Logic and Applications, 2017.

[46]

V. B. Schneider and M. D. Mickunas, "Optimal compression of parsing tables in a parsergenerating system," Purdue University, Tech. Rep. 75--150, 1975.

[47]

P. Dencker, K. Dürre, and J. Heuft, "Optimization of parser tables for portable compilers," ACM Trans. Program. Lang. Syst., vol. 6, Oct. 1984.

Digital Library

[48]

E. Klein and M. Martin, "The parser generating system PGS," Software: Practice and Experience, vol. 19, 1989.

Digital Library

[49]

S. McPeak and G. C. Necula, "Elkhound: A fast, practical GLR parser generator," in Compiler Construction, 2004.

[50]

T. Parr and K. Fisher, "LL(<sup>*</sup>): The foundation of the ANTLR parser generator," in Programming Language Design and Implementation, 2011. {Online}. Available

Digital Library

[51]

J. Van Lunteren, T. Engbersen, J. Bostian, B. Carey, and C. Larsson, "Xml accelerator engine," in The First International Workshop on High Performance XML Processing, 2004.

[52]

A. Krishna, T. Heil, N. Lindberg, F. Toussi, and S. VanderWiel, "Hardware acceleration in the IBM PowerEN processor: Architecture and performance," in Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012.

Digital Library

[53]

P. Ogden, D. Thomas, and P. Pietzuch, "Scalable XML query processing using parallel pushdown transducers," Proceedings of the VLDB Endowment, vol. 6, 2013.

Digital Library

Cited By

Wang XGong LCao JLou WWang WWang CZhou XIenne PZhang Z(2023)hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573190(185-196)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573190
Sadredini ERahimi RImani MSkadron K(2021)Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480934(311-323)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480934
Romero-Gainza EStewart CLi AHale KMorris N(2021)Memory Mapping and Parallelizing Random Forests for Speed and Cache Efficiency50th International Conference on Parallel Processing Workshop10.1145/3458744.3474052(1-5)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3474052
Show More Cited By

ASPEN: a scalable in-SRAM architecture for pushdown automata
1. Theory of computation

Recommendations

Cache automaton
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Finite State Automata are widely used to accelerate pattern matching in many emerging application domains like DNA sequencing and XML parsing. Conventional CPUs and compute-centric accelerators are bottlenecked by memory bandwidth and irregular memory ...
Input-driven languages are linear conjunctive

Linear conjunctive grammars define the same family of languages as one-way real-time cellular automata (A. Okhotin, "On the equivalence of linear conjunctive grammars to trellis automata", RAIRO ITA, 2004), and this family is known to be incomparable ...
Efficient determinization of visibly and height-deterministic pushdown automata

New algorithms for the determinization of nondeterministic visibly and nondeterministic real-time height-deterministic pushdown automata are presented. The algorithms improve the results of existing algorithms. They construct only accessible states and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture

October 2018

1015 pages

ISBN:9781538662403

General Chairs:
Mark Oskin
University of Washington
,
Koji Inoue
Kyushu University

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Press

Publication History

Published: 20 October 2018

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MICRO-51

Sponsor:

SIGMICRO

MICRO-51: The 51st Annual IEEE/ACM International Symposium on Microarchitecture

October 20 - 24, 2018

Fukuoka, Japan

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
78
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang XGong LCao JLou WWang WWang CZhou XIenne PZhang Z(2023)hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573190(185-196)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573190
Sadredini ERahimi RImani MSkadron K(2021)Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480934(311-323)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480934
Romero-Gainza EStewart CLi AHale KMorris N(2021)Memory Mapping and Parallelizing Random Forests for Speed and Cache Efficiency50th International Conference on Parallel Processing Workshop10.1145/3458744.3474052(1-5)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3474052
Qin YGonzalez SAngstadt KWang XForrest SDas RLeach KWeimer WZhang YSion R(2020)MARTINIProceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop10.1145/3411495.3421353(77-90)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.1145/3411495.3421353
Sabet AQiu JZhao ZKrishnamoorthy S(2020)Reliability Analysis for Unreliable FSM ComputationsACM Transactions on Architecture and Code Optimization10.1145/337745617:2(1-23)Online publication date: 29-May-2020
https://dl.acm.org/doi/10.1145/3377456
Angstadt KJeannin JWeimer WLarus JCeze LStrauss K(2020)Accelerating Legacy String Kernels via Bounded Automata LearningProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378503(235-249)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378503
Fang YZou CChien A(2019)Accelerating raw data analysis with the ACCORDA software and hardware architectureProceedings of the VLDB Endowment10.14778/3342263.334263412:11(1568-1582)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.14778/3342263.3342634
Jiang LSun XFarooq UZhao ZBahar IHerlihy MWitchel ELebeck A(2019)Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors - A Compilation-based ApproachProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304008(79-92)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304008

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten