Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO.2018.00079acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

ASPEN: a scalable in-SRAM architecture for pushdown automata

Published: 20 October 2018 Publication History

Abstract

Many applications process some form of tree-structured or recursively-nested data, such as parsing XML or JSON web content as well as various data mining tasks. Typical CPU processing solutions are hindered by branch misprediction penalties while attempting to reconstruct nested structures and also by irregular memory access patterns. Recent work has demonstrated improved performance for many data processing applications through memory-centric automata processing engines. Unfortunately, these architectures do not support a computational model rich enough for tasks such as XML parsing.
In this paper, we present ASPEN, a general-purpose, scalable, and reconfigurable memory-centric architecture for processing of tree-like data. We take inspiration from previous automata processing architectures, but support the richer deterministic pushdown automata computational model. We propose a custom datapath capable of performing the state matching, stack manipulation, and transition routing operations of pushdown automata, all efficiently stored and computed in memory arrays. Further, we present compilation algorithms for transforming large classes of existing grammars to pushdown automata executable on ASPEN, and demonstrate their effectiveness on four different languages: Cool (object oriented programming), DOT (graph visualization), JSON, and XML.
Finally, we present an empirical evaluation of two application scenarios for ASPEN: XML parsing, and frequent subtree mining. The proposed architecture achieves an average 704.5 ns per KB parsing XML compared to 9983 ns per KB in a state-of-the-art XML parser across 23 benchmarks. We also demonstrate a 37.2x and 6x better end-to-end speedup over CPU and GPU implementations of subtree mining.

References

[1]
N. Chomsky and G. A. Miller, "Introduction to the formal analysis of natural languages," in Handbook of Mathematical Psychology, 1963, vol. 2, ch. 11, pp. 269--322.
[2]
Computer Sciences Corporation, "Big data universe beginning to explode," http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode, 2012.
[3]
DNV GL, "Are you able to leverage big data to boost your productivity and value creation?" https://www.dnvgl.com/assurance/viewpoint/viewpoint-surveys/big-data.html, 2016.
[4]
K. Asanović, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, "The landscape of parallel computing research: A view from berkeley," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183, Dec 2006.
[5]
Z. Dai, N. Ni, and J. Zhu, "A 1 cycle-per-byte xml parsing accelerator," in Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 2010.
[6]
V. M. Glushkov, "The abstract theory of automata," Russian Mathematical Surveys, 1961.
[7]
M. Sipser, Introduction to the Theory of Computation, 3rd ed. Cengage Learning, 2013.
[8]
P. Caron and D. Ziadi, "Characterization of Glushkov automata," Theoretical Computer Science, vol. 233, 2000.
[9]
J. Clark, "The Expat XML parser," http://expat.sourceforge.net.
[10]
A. S. Foundation, "Xerces C++ XML parser," http://xerces.apache.org/xerces-c/.
[11]
P. Kilpeläinen et al., "Tree matching problems with applications to structured text databases," 1992.
[12]
Y. Chi, R. R. Muntz, S. Nijssen, and J. N. Kok, "Frequent subtree mining---an overview," Fundamenta Informaticae, vol. 66, 2005.
[13]
E. Sadredini, R. Rahimi, K. Wang, and K. Skadron, "Frequent subtree mining on the automata processor: challenges and opportunities," in International Conference on Supercomputing, 2017.
[14]
S. A. Greibach, "A new normal-form theorem for context-free phrase structure grammars," J. ACM, vol. 12, Jan. 1965.
[15]
M. M. Geller, M. A. Harrison, and I. M. Havel, "Normal forms of deterministic grammars," Discrete Mathematics, vol. 16, 1976.
[16]
M. A. Harrison and I. M. Havel, "Real-time strict deterministic languages," SIAM Journal on Computing, vol. 1, 1972.
[17]
J. Levine and L. John, Flex & Bison, 1st ed. O'Reilly Media, Inc., 2009.
[18]
D. Beazley, "PLY (python lex-yacc)," http://www.dabeaz.com/ply/index.html.
[19]
INRIA, "Lexer and parser generators (ocamllex, ocamlyacc)," http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html.
[20]
K. Angstadt, J. Wadden, V. Dang, T. Xie, D. Kramp, W. Weimer, M. Stan, and K. Skadron, "MNCaRT: An open-source, multi-architecture automata-processing research and execution ecosystem," IEEE Computer Architecture Letters, vol. 17, Jan 2018.
[21]
W. J. Bowhill, B. A. Stackhouse, N. Nassif, Z. Yang, A. Raghavan, O. Mendoza, C. Morganti, C. Houghton, D. Krueger, O. Franza, J. Desai, J. Crop, B. Brock, D. Bradley, C. Bostak, S. Bhimji, and M. Becker, "The Xeon® processor E5-2600 v3: a 22 nm 18-core product family," J. Solid-State Circuits, vol. 51, 2016.
[22]
W. Chen, S.-L. Chen, S. Chiu, R. Ganesan, V. Lukka, W. W. Mar, and S. Rusu, "A 22nm 2.5 mb slice on-die l3 cache for the next generation Xeon® processor," in Symposium on VLSI Technology, 2013.
[23]
M. Huang, M. Mehalel, R. Arvapalli, and S. He, "An energy efficient 32-nm 20-mb shared on-die L3 cache for Intel® Xeon® processor E5 family," J. Solid-State Circuits, vol. 48, 2013.
[24]
P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes, "An efficient and scalable semiconductor architecture for parallel automata processing," IEEE Transactions on Parallel and Distributed Systems, vol. 25, 2014.
[25]
A. Subramaniyan, J. Wang, E. R. M. Balasubramanian, D. Blaauw, D. Sylvester, and R. Das, "Cache automaton," in International Symposium on Microarchitecture, 2017.
[26]
G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM J. Scientific Computing, vol. 20, 1998.
[27]
Intel, "Cache Allocation Technology," 2017. {Online}. Available: https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology
[28]
"Performance Application Programming Interface." http://icl.cs.utk.edu/papi/.
[29]
H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le, "Rapl: Memory power estimation and capping," in International Symposium on Low-Power Electronics and Design, 2010.
[30]
"nvprof profiling tool," http://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview.
[31]
D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. D. Cameron, "Parabix: Boosting the efficiency of text processing on commodity processors," in International Symposium on High Performance Computer Architecture, 2012.
[32]
"Ximpleware XML dataset," http://www.ximpleware.com/xmls.zip.
[33]
"XML Data Repository," http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html.
[34]
J. Wadden and K. Skadron, "VASim: An open virtual automata simulator for automata processing application and architecture research," University of Virginia, Tech. Rep. CS2016-03, 2016.
[35]
M. J. Zaki, "Efficiently mining frequent trees in a forest," in knowledge Discovery and Data Mining, 2002.
[36]
R. Iváncsy and I. Vajk, "Automata theory approach for solving frequent pattern discovery problems," Journal of Computer, Electrical, Automation Control and Information Engineering, vol. 1, 2007.
[37]
J. Ellson, E. Gansner, L. Koutsofios, S. North, G. Woodhull, S. Description, and L. Technologies, "Graphviz---open source graph drawing tools," in Lecture Notes in Computer Science. Springer-Verlag, 2001.
[38]
M. Becchi, "Regular expression processor," http://regex.wustl.edu, 2011, accessed 2017-04-06.
[39]
J. van Lunteren, C. Hagleitner, T. Heil, G. Biran, U. Shvadron, and K. Atasu, "Designing a programmable wire-speed regular-expression matching accelerator," in International Symposium on Microarchitecture, 2012.
[40]
P. Tandon, F. M. Sleiman, M. J. Cafarella, and T. F. Wenisch, "HAWK: hardware support for unstructured log processing," in International Conference on Data Engineering, 2016.
[41]
V. Gogte, A. Kolli, M. J. Cafarella, L. D'Antoni, and T. F. Wenisch, "HARE: hardware accelerator for regular expressions," in International Symposium on Microarchitecture, 2016.
[42]
Y. Fang, T. T. Hoang, M. Becchi, and A. A. Chien, "Fast support for unstructured data processing: the unified automata processor," in International Symposium on Microarchitecture, 2015.
[43]
Y. Fang, C. Zou, A. J. Elmore, and A. A. Chien, "UDP: a programmable accelerator for extract-transform-load workloads and more," in International Symposium on Microarchitecture. ACM, 2017.
[44]
A. Subramaniyan and R. Das, "Parallel automata processor," in International Symposium on Computer Architecture, New York, NY, USA, 2017.
[45]
T. Xie, V. Dang, J. Wadden, K. Skadron, and M. R. Stan, "REAPR: Reconfigurable engine for automata processing," in International Conference on Field-Programmable Logic and Applications, 2017.
[46]
V. B. Schneider and M. D. Mickunas, "Optimal compression of parsing tables in a parsergenerating system," Purdue University, Tech. Rep. 75--150, 1975.
[47]
P. Dencker, K. Dürre, and J. Heuft, "Optimization of parser tables for portable compilers," ACM Trans. Program. Lang. Syst., vol. 6, Oct. 1984.
[48]
E. Klein and M. Martin, "The parser generating system PGS," Software: Practice and Experience, vol. 19, 1989.
[49]
S. McPeak and G. C. Necula, "Elkhound: A fast, practical GLR parser generator," in Compiler Construction, 2004.
[50]
T. Parr and K. Fisher, "LL(<sup>*</sup>): The foundation of the ANTLR parser generator," in Programming Language Design and Implementation, 2011. {Online}. Available
[51]
J. Van Lunteren, T. Engbersen, J. Bostian, B. Carey, and C. Larsson, "Xml accelerator engine," in The First International Workshop on High Performance XML Processing, 2004.
[52]
A. Krishna, T. Heil, N. Lindberg, F. Toussi, and S. VanderWiel, "Hardware acceleration in the IBM PowerEN processor: Architecture and performance," in Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012.
[53]
P. Ogden, D. Thomas, and P. Pietzuch, "Scalable XML query processing using parallel pushdown transducers," Proceedings of the VLDB Endowment, vol. 6, 2013.

Cited By

View all
  • (2023)hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573190(185-196)Online publication date: 12-Feb-2023
  • (2021)Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480934(311-323)Online publication date: 18-Oct-2021
  • (2021)Memory Mapping and Parallelizing Random Forests for Speed and Cache Efficiency50th International Conference on Parallel Processing Workshop10.1145/3458744.3474052(1-5)Online publication date: 9-Aug-2021
  • Show More Cited By
  1. ASPEN: a scalable in-SRAM architecture for pushdown automata

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture
    October 2018
    1015 pages
    ISBN:9781538662403

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 20 October 2018

    Check for updates

    Author Tags

    1. accelerators
    2. emerging technologies (memory and computing)
    3. pushdown automata

    Qualifiers

    • Research-article

    Conference

    MICRO-51
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)hAP: A Spatial-von Neumann Heterogeneous Automata Processor with Optimized Resource and IO Overhead on FPGAProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573190(185-196)Online publication date: 12-Feb-2023
    • (2021)Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching AccelerationMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480934(311-323)Online publication date: 18-Oct-2021
    • (2021)Memory Mapping and Parallelizing Random Forests for Speed and Cache Efficiency50th International Conference on Parallel Processing Workshop10.1145/3458744.3474052(1-5)Online publication date: 9-Aug-2021
    • (2020)MARTINIProceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop10.1145/3411495.3421353(77-90)Online publication date: 9-Nov-2020
    • (2020)Reliability Analysis for Unreliable FSM ComputationsACM Transactions on Architecture and Code Optimization10.1145/337745617:2(1-23)Online publication date: 29-May-2020
    • (2020)Accelerating Legacy String Kernels via Bounded Automata LearningProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378503(235-249)Online publication date: 9-Mar-2020
    • (2019)Accelerating raw data analysis with the ACCORDA software and hardware architectureProceedings of the VLDB Endowment10.14778/3342263.334263412:11(1568-1582)Online publication date: 1-Jul-2019
    • (2019)Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors - A Compilation-based ApproachProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304008(79-92)Online publication date: 4-Apr-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media