Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
We propose a stall-free hardware architecture by reorganising the order of operations in an LSTM system and develop a unique blocking-batching strategy.
Abstract—Long Short-Term Memory (LSTM) networks have been deployed in speech recognition, natural language processing and financial calculations in recent ...
Evaluation results show that our architecture can achieve up to 20.8 GOPS/W, which would be among the highest for FPGA designs targeting LSTM systems with ...
Abstract—Long Short-Term Memory (LSTM) networks have been deployed in speech recognition, natural language processing and financial calculations in recent ...
Efficient Weight Reuse for Large LSTMs. Zhiqiang Que1, Thomas Nugent1 ... - reuse weights for large LSTM systems. • Stall-free architecture (address C1).
A stall-free hardware architecture is proposed by reorganising the order of operations in an LSTM system and a unique blocking-batching strategy to reuse ...
Jul 9, 2020 · Our design achieves 1.65 times higher performance-per-watt efficiency and 2.48 times higher performance-per-DSP efficiency when compared with ...
Some works store weights in off-chip memory and reduce bandwidth requirements through data reuse [16, 19] . The authors of [19] split the weight matrix into ...
A novel hardware architecture to overcome data dependency and a new blocking-batching strategy to reuse the LSTM weights fetched from external memory to ...
Our design achieves 1.65 times higher performance-per-watt efficiency and 2.48 times higher performance-per-DSP efficiency when compared with the current state- ...