TiReX: Tiled Regular eXpression matching architecture

TiReX: Tiled Regular eXpressions
matching architecture
Alessandro Comodi, Davide Conficconi {alessandro.comodi, davide.conficconi}@mail.polimi.it
Alberto Scolari, Marco Santambrogio {alberto.scolari, marco.santambrogio}@polimi.it
25th Reconfigurable Architectures Workshop (RAW) 2018
21/05/2018

1
Context
Genomic
GenomicIntrusion Detection Systems

Current issues
• The trade off between performance and flexibility
2
• Current approaches lack flexibility
– If they use FPGA, require embedding the regex into
the architecture (= re-synthesis)
– ASIC technology no flexibility at all

Our solution and claims 3
Based on previous work [1] proposing Regular Expressions
as a high level language driving a custom processor
The improvements with respect to ReCPU are:
• A better preprocessing mechanism of the RegExp and a
renewed single core design
• A scalable multi-core architecture for parallelized
computations reaching 100x speedup over Flex
• Cross-platform design able easily integrable with
heterogeneous architectures
[1] M. Paolieri et al “ReCPU: A parallel and pipelined architecture for regular expression matching,” in Vlsi-Soc Springer 2009

Outline
• Related work
• TiReX design and implementation
• Evaluation
• Conclusions and future work
4

Related Work (1) 5
Most works use DFA (Deterministic Finite Automata) and address DFA
limitations, offering high matching speed at the cost of a fixed structure
Growth of memory usage along with RegExp complexity
• [1], [2] cluster states and group transitions
[1] L. Jiang et al“A fast regular expression matching engine for nids applying prediction scheme,” in Computers and Communication (ISCC), 2014
[2] J. van Lunteren and A. Guanella, “Hardware-accelerated regular expression matching at multiple tens of gb/s2” in INFOCOM, 2012
[3] K. Agarwal and R. Polig, “A high-speed and large-scale dictionary matching engine for information extraction systems,” in Application- Specific Systems,
Architectures and Processors (ASAP), 2013 IEEE 24th International Conference on. IEEE, 2013
[4] X.-T. Nguyen, H.-T. Nguyen, K. Inoue, O. Shimojo, and C.-K. Pham, “Highly parallel bitmap-based regular expression matching for text analytics,” in Circuits
and Systems (ISCAS), 2017
Other focus on achieving an efficient lookup process
• Hash based encoding scheme are another way to solve the problem [3]
• Bitmap index structures [4]

Related Work (2) 6
[5] C. R. Meiners et al “Fast regular expression matching using small tcams for network intrusion detection and prevention systems” 2010
[6] J. Yang et al “Pidfa: A practical multi-stride regular expression matching engine based on fpga” ICC 2016
[7] K. Atasu et al “Hardware-accelerated regular expression matching for high-throughput text analytics,” in FPL 2013
[8] G. Vasiliadis, M. Polychronakis, S. Antonatos, E. P. Markatos, and S. Ioannidis, “Regular expression matching on graphics hardware for intrusion detection,”
in International Workshop on Recent Advances in Intrusion Detection. Springer, 2009
Some works leverage hardware parallelism to match input against multiple RegExp
• [8] uses GPU to activate a new DFA for every initial character
Single character analysis for the basic version
• [5] Ternary Content Addressable Memories (TCAMs)
• [6],[7] precomputation of transitions
DFA encodes a single RegExp and matching one character at time, so it is
intrinsically sequential

Our Approach 7
As in ReCPU, RegExp are translated into program
instructions
TiReX matching core run instructions on input data based on
a dedicated Instruction Set Architecture (ISA)
RegExp is software compiled into a sequence of TiReX
instructions

Flow RE 8
Regular
Expression
Compiler
1 & ACGT
2 JIM offset
3 (
4 |)* AC
5 & TT
Instruction Set
ACGTCGGGGCGTGCAAATGCCCCGTGCGA
TTTGCGTGACGTCGGGGCGTGCAAATGCC
CCGTGCGATTTGCGTGACGTCGGGGCGTG
CAAATGCCCCGTGCGATTTGCGTGACGTC
GGGGCGTGCAAATGCCCCGTGCGATTTGC
GTGCGTGCGATTTGCGTGACGTCGGGGCG
TGCAAACGTGCGATTTGCGTGACGTCGGG
GCGTGCAAAGCTCGATCGATCGATCGA…
Data
Match results

TiReX ISA 9
Opcode RegExp Description Reference
0 00 000 NOP No Operation
1 00 000 ( Enter subroutine
0 10 000 AND And of cluster matches
0 01 000 OR Or of cluster matches
0 11 000 . Match any character 32 bits for
0 00 001 )* Match any number of sub-RE at most
0 00 010 )+ Match one or more sub-RE 4 characters
0 00 011 )| Match previous sub-RE or next one
0 00 100 ) End of subroutine
0 00 101 OKP Open Kleene Parenthesis
0 00 111 JIM Jump If Match

TiReX ISA 10
All characters in the Reference must be
equal to the input data to have a match
RegExp: ACCGTGGA
Input 1:
Input 2:
TGGA GACCTACACCG
ACCA TGGACTAGAGG

TiReX ISA 11
Special instruction to direct the jump backward
in the program like in a «for loop» with Kleene
operators
RegExp: (ACGT)+
Input 1: ACGT ACGT GACC

TiReX ISA 12
Special instruction to direct the jump forward in the
program like in «if else» statement with chained
ORs
RegExp: (TTTT)|(GCAT)|(CTGA)
Input 1: GCAT GACCTAC

Single Core Architecture: Overview 13
Instruction Memory Data Buffer
Fetch & Decode Execution
Control Path
Address Address
DataInstruction Opcode
Reference
MatchControl ControlOpcode

Single Core Architecture: Details 14

Fetch & Decode
F&D Unit A: Back up
F&D Unit B: Next one
F&D Unit C: Jump

Execute
4 Cluster of 4 Comparators
Engine compute stage result

Data Buffer
Addressable Buffer
Intermediate registers:
• Back up
• Hold data
• Shift of 1-4 characters

Control Path
Status Register of the computation
Stack for nesting parenthesis
Completely redesign FSM

Multi core 19
Being the recognition process highly parallelizable we adopt a multi-core
architecture
BRAM
TiReX
core1
BRAM
TiReX
core2
AGCT(A|C)*TT
AGCT
AG*(TTAC)
GTTTG(AC)*
Data
BRAM
TiReX
coren-1
BRAM
TiReX
coren
…
…

Multi core 20
Being the recognition process highly parallelizable we adopt a multi-core
architecture
BRAM
TiReX
core1
BRAM
TiReX
core2
BRAM
TiReX
coren-1
BRAM
TiReX
coren
AGCT(A|C)*TT
Data1 Data2
Datan-1 Datan…
…
…

Multi core: Boundary conditions 21
Customizable conditions to avoid boundary match
Data
Match of length N
Chunk 0
Chunk 1
Chunk 2
Chunk 3

Experimental setup and results 22
Evaluation environment:
• VC707 evaluation platform powered by a Virtex-7 FPGA
• Digilent PYNQ-Z1 board powered by a ZYNQ SoC
comprising an ARM CPU and a Xilinx FPGA
We compare against:
• Flex program compiled with O3 optimizations and
runs on an Intel i7 with a peak frequency of 2.8GHz

Single Core Area Utilization 23
VC707 Board Slice LUTs Slice Reg. F7 Muxes
Used 1921 1175 261
Percentage 0.63% 0.29% 0.17%
PYNQ Board Slice LUTs Slice Reg. F7 Muxes
Used 1845 1775 261
Percentage 3.46% 1.66% 0.98%
VC707 Resources utilization
PYNQ Resources utilization

VC707 and PYNQ Results 24
Regular Expression Flex 16-core (VC707)
@130 MHz
Speedup
ACCGTGGA 271 µs 2.07 µs 130.90x
(TTT)+CT 121 µs 4.54 µs 26.65x
(CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 3.36 µs 78.27x
Regular Expression Flex 8-core (PYNQ)
@ 70 MHz
Speedup
ACCGTGGA 271 µs 7.2 µs 37.63x
(TTT)+CT 121 µs 8.21 µs 14.73x
(CAGT)|(GGGG)|(TTGG)TGCA(C|G)+ 263 µs 30.3 µs 8.67x
Dataset with 16KB of the first Homo Sapiens chromosome

Comparisons with Related works 25
Solution Clock Frequency
[MHz]
Bitrate [Gb/s] Flexibility
VC707 16 – core 130 16.64 – 66.54
PYNQ 8 – core 70 4.48 – 17.92
[1] ASIC 318.47 10.19 – 18.18
[2] FPGA 150 230 – 430
[3] FPGA 100 3.2
[3] ASIC 1000 256
[1] M. Paolieri et al “Recpu: A parallel and pipelined architecture for regular expression matching,” in Vlsi-Soc: Advanced Topics on Systems on a Chip.
Springer, 2009
[2] L. Jiang et al“A fast regular expression matching engine for nids applying prediction scheme,” in Computers and Communication (ISCC), 2014 IEEE
Symposium on.
[3] V. Gogte et al “Hare: Hardware accelerator for regular expressions,” in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium
on.

[MHz]
VC707 16 – core 130 16.64 – 66.54
PYNQ 8 – core 70 4.48 – 17.92
[1] ASIC 318.47 10.19 – 18.18
[2] FPGA 150 230 – 430
[3] FPGA 100 3.2
[3] ASIC 1000 256
Springer, 2009
Symposium on.
on.

Conlusions and future work
• We have presented a multicore pattern matching
architecture implemented on an FPGA
• Overcome Flex solution gaining a 100x speedup
with a remarkable flexibility
• Future Works
– Performance improvements
• Exploration of different memory hierarchies
• Multicore interconnection studies
29

Conlusions and future work
• Future Works
– Performance improvements
• Exploration of different memory hierarchies
• Multicore interconnection studies
30
Thank you for your attention… Questions?
Alessandro Comodi, Davide Conficconi {alessandro.comodi, davide.conficconi}@mail.polimi.it
Alberto Scolari, Marco Santambrogio {alberto.scolari, marco.santambrogio}@polimi.it
NECST: www.necst.it
Slideshare NECST: www.slideshare.net/necstlab
RAW FB Group: facebook.com/groups/ReconfigurableArchitecturesWorkshop

TiReX: Tiled Regular eXpression matching architecture

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to TiReX: Tiled Regular eXpression matching architecture

Similar to TiReX: Tiled Regular eXpression matching architecture (20)

More from NECST Lab @ Politecnico di Milano

More from NECST Lab @ Politecnico di Milano (20)

Recently uploaded

Recently uploaded (20)

TiReX: Tiled Regular eXpression matching architecture