Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Quality-assured Approximate Hardware Accelerators–based on Machine Learning and Dynamic Partial Reconfiguration

Published: 20 August 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Machine learning is widely used these days to extract meaningful information out of the Zettabytes of sensors data collected daily. All applications require analyzing and understanding the data to identify trends, e.g., surveillance, exhibit some error tolerance. Approximate computing has emerged as an energy-efficient design paradigm aiming to take advantage of the intrinsic error resilience in a wide set of error-tolerant applications. Thus, inexact results could reduce power consumption, delay, area, and execution time. To increase the energy-efficiency of machine learning on FPGA, we consider approximation at the hardware level, e.g., approximate multipliers. However, errors in approximate computing heavily depend on the application, the applied inputs, and user preferences. However, dynamic partial reconfiguration has been introduced, as a key differentiating capability in recent FPGAs, to significantly reduce design area, power consumption, and reconfiguration time by adaptively changing a selective part of the FPGA design without interrupting the remaining system. Thus, integrating “Dynamic Partial Reconfiguration” (DPR) with “Approximate Computing” (AC) will significantly ameliorate the efficiency of FPGA-based design approximation. In this article, we propose hardware-efficient quality-controlled approximate accelerators, which are suitable to be implemented in FPGA-based machine learning algorithms as well as any error-resilient applications. Experimental results using three case studies of image blending, audio blending, and image filtering applications demonstrate that the proposed adaptive approximate accelerator satisfies the required quality with an accuracy of 81.82%, 80.4%, and 89.4%, respectively. On average, the partial bitstream was found to be 28.6 smaller than the full bitstream.

    References

    [1]
    Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
    [2]
    G. Dede and M. H. Sazlı. 2010. Speech recognition with artificial neural networks. Dig. Sig. Process. 20, 3 (2010), 763–768.
    [3]
    R. Hoang, D. Tanna, L. Jayet Bray, S. Dascalu, and F. Harris. 2013. A novel CPU/GPU simulation environment for large-scale biologically realistic neural modeling. Front. Neuroinform. 7 (2013), 19.
    [4]
    C. David Wright, P. Hosseini, and J. A. V. Diosdado. 2013. Beyond von-Neumann computing with nanoscale phase-change memory devices. Adv. Funct. Mater. 23, 18 (2013), 2248–2254.
    [5]
    G. Karunaratne, M. L. Gallo, G. Cherubini, L. Benini, Abbas Rahimi, and A. Sebastian. 2019. In-memory hyperdimensional computing. Nat. Electron. 3 (2019), 327–337.
    [6]
    E. Paxon Frady and Friedrich T. Sommer. 2019. Robust computation with rhythmic spike patterns. Nat. Acad. Sci. 116, 36 (2019), 18050–18059.
    [7]
    A. Rahimi, S. Datta, D. Kleyko, E. P. Frady, B. Olshausen, P. Kanerva, and J. M. Rabaey. 2017. High-dimensional computing as a nanoscalable paradigm. IEEE Trans. Circ. Syst. I: Reg. Pap. 64, 9 (2017), 2508–2521.
    [8]
    A. Alaghi and J. P. Hayes. 2018. Computing with randomness. IEEE Spect. 55, 3 (2018), 46–51.
    [9]
    Vincent T. Lee, Armin Alaghi, John P. Hayes, Visvesh Sathe, and Luis Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In Design, Automation and Test in Europe Conference. 13–18.
    [10]
    H. Jiang, J. Han, and F. Lombardi. 2015. A comparative review and evaluation of approximate adders. In Great Lakes Symposium on VLSI. ACM, 343–348.
    [11]
    M. Masadeh, O. Hasan, and S. Tahar. 2018. Comparative study of approximate multipliers. In Great Lakes Symposium on VLSI. ACM, 415–418.
    [12]
    S. Venkataramani, S. T. Chakradhar, K. Roy, and A. Raghunathan. 2015. Approximate computing and the quest for computing efficiency. In Design Automation Conference. 1–6.
    [13]
    S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy. 2016. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In Design, Automation Test in Europe Conference. 145–150.
    [14]
    W. Quan and A. D. Pimentel. 2016. Scenario-based run-time adaptive MPSoC systems. J. Syst. Archit. 62 (2016), 12–23.
    [15]
    H. Nakahara and T. Sasao. 2015. A deep convolutional neural network based on nested residue number system. In International Conference on Field Programmable Logic and Applications. 1–6.
    [16]
    M. Masadeh, O. Hasan, and S. Tahar. 2021. Machine-learning-based self-tunable design of approximate computing. IEEE Trans. Very Large Scale Integ. Syst. 29, 4 (2021), 800–813.
    [17]
    M. C. Herbordt, T. VanCourt, Y. Gu, B. Sukhwani, A. Conti, J. Model, and D. DiSabello. 2007. Achieving high performance with FPGA-based computing. Computer 40, 3 (2007), 50–57.
    [18]
    M. Kim and P. Smaragdis. 2016. Bitwise Neural Networks. arxiv:cs.LG/1601.06071
    [19]
    M. Kim and P. Smaragdis. 2018. Bitwise neural networks for efficient single-channel source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). 701–705.
    [20]
    D. Kleyko, E. P. Frady, and E. Osipov. 2017. Integer echo state networks: Hyperdimensional reservoir computing. CoRR abs/1706.00280 (2017).
    [21]
    D. Kleyko, M. Kheffache, E. P. Frady, U. Wiklund, and E. Osipov. 2020. Density encoding enables resource-efficient randomly connected neural networks. IEEE Trans. Neural Netw. Learn. Syst. (Early Access).
    [22]
    S. Ullah, S. Rehman, B. S. Prabakaran, F. Kriebel, M. A. Hanif, M. Shafique, and A. Kumar. 2018. Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators. In Design Automation Conference. 1–6.
    [23]
    2013. Partial Reconfiguration User Guide. Xilinx. Retrieved on April 26, 2013 from https://www.xilinx.com/support/documentation/sw_manuals/xilinx14_7/ug702.pdf, Last accessed on 2020-05-13.
    [24]
    M. Gao and G. Qu. 2020. Estimate and recompute: A novel paradigm for approximate computing on data flow graphs. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 39, 2 (2020), 335–345.
    [25]
    S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In ACM SIGSOFT Symposium. ACM, 124–134.
    [26]
    I. J. Chang, D. Mohapatra, and K. Roy. 2011. A priority-based 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Trans. Circ. Syst. Vid. Technol. 21, 2 (2011), 101–112.
    [27]
    V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy. 2013. Low-power digital signal processing using approximate adders. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 32, 1 (2013), 124–137.
    [28]
    S. Mittal. 2016. A survey of techniques for approximate computing. Comput. Surv. 48, 4 (2016).
    [29]
    Q. Xu, T. Mytkowicz, and N. S. Kim. 2016. Approximate computing: A survey. IEEE Des. Test 33, 1 (2016), 8–22.
    [30]
    A. Agrawal, J. Choi, K. Gopalakrishnan, S. Gupta, R. Nair, J. Oh, D. A. Prener, S. Shukla, V. Srinivasan, and Z. Sura. 2016. Approximate computing: Challenges and opportunities. In International Conference on Rebooting Computing. 1–8.
    [31]
    M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, and J. Henkel. 2016. Invited: Cross-layer approximate computing: From logic to architectures. In Design Automation Conference. 1–6.
    [32]
    W. Baek and T. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. SIGPLAN Not. 45, 6 (June 2010), 198–209.
    [33]
    M. Samadi, J. Lee, D. Jamshidi, A. Hormati, and S. Mahlke. 2013. SAGE: Self-tuning approximation for graphics engines. In International Symposium on Microarchitecture. 13–24.
    [34]
    B. Grigorian, N. Farahpour, and G. Reinman. 2015. BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing. In International Symposium on HPC Architecture. 615–626.
    [35]
    D. S. Khudia, B. Zamirai, M. Samadi, and S. Mahlke. 2015. Rumba: An online quality management system for approximate computing. In International Symposium on Computer Architecture. 554–566.
    [36]
    T. Wang, Q. Zhang, N. Kim, and Q. Xu. 2016. On Effective and efficient quality management for approximate computing. In International Symposium on Low Power Electronics and Design. 156–161.
    [37]
    X. Chengwen, W. Xiangyu, Y. Wenqi, X. Qiang, J. Naifeng, L. Xiaoyao, and J. Li. 2017. On quality trade-off control for approximate computing using iterative training. In Design Automation Conference. 1–6.
    [38]
    A. Raha, H. Jayakumar, and V. Raghunathan. 2016. Input-based dynamic reconfiguration of approximate arithmetic units for video encoding. IEEE Trans. Very Large Scale Integ. Syst. 24, 3 (2016), 846–857.
    [39]
    O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram. 2017. Dual-quality 4:2 Compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Trans. Very Large Scale Integ. Syst. 25, 4 (2017), 1352–1361.
    [40]
    J. Pirkl, A. Becher, J. Echavarria, J. Teich, and S. Wildermann. 2017. Self-adaptive FPGA-based image processing filters using approximate arithmetics. In International Workshop on Software and Compilers for Embedded Systems. ACM, 89–92.
    [41]
    S. Xu and B. C. Schafer. 2017. Approximate reconfigurable hardware accelerator: Adapting the micro-architecture to dynamic workloads. In International Conference on Computer Design. IEEE, 113–120.
    [42]
    S. Xu and B. C. Schafer. 2019. Toward self-tunable approximate computing. IEEE Trans. Very Large Scale Integ. Syst. 27, 4 (2019), 778–789.
    [43]
    M. Orlandić and K. Svarstad. 2018. An adaptive high-throughput edge detection filtering system using dynamic partial reconfiguration. J. Real-Time Image Process. 16, 1 (2018).
    [44]
    B. Krill, A. Ahmad, A. Amira, and H. Rabah. 2010. An efficient FPGA-based dynamic partial reconfiguration design flow and environment for image and signal processing IP cores. Sig. Process.: Image Commun. 25, 5 (2010), 377–387.
    [45]
    M. Nguyen, R. Tamburo, S. Narasimhan, and J. C. Hoe. 2019. Quantifying the benefits of dynamic partial reconfiguration for embedded vision applications. In International Conference on Field Programmable Logic and Applications. 129–135.
    [46]
    K. Vipin and S. A. Fahmy. 2018. FPGA dynamic and partial reconfiguration: A survey of architectures, methods, and applications. Comput. Surv. 51, 4 (2018), 72:1–72:39.
    [47]
    2019. VC707 Evaluation Board for the Virtex-7 FPGA: User Guide. Xilinx. Retrieved on February 20, 2019 from https://www.xilinx.com/support/documentation/boards_and_kits/vc707/ug885_VC707_Eval_Bd.pdf.
    [48]
    D. Koch. 2012. Partial Reconfiguration on FPGAs: Architectures, Tools and Applications. Springer.
    [49]
    2019. Xilinx Partial Reconfiguration Controller v1.3, LogiCORE IP Product Guide. Xilinx. Retrieved April 4, 2018 from https://www.xilinx.com/support/documentation/ip_documentation/prc/v1_3/pg193-partial-reconfiguration-controller.pdf.
    [50]
    2020. AXI HWICAP v3.0: LogiCORE IP Product Guide. Xilinx. Retrieved on October 5, 2016 from https://www.xilinx.com/support/documentation/ip_documentation/axi_hwicap/v3_0/pg134-axi-hwicap.pdf.
    [51]
    2020. Vivado Design Suite User Guide: Dynamic Function eXchange. Xilinx. Retrieved on January 15, 2020 from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug909-vivado-partial-reconfiguration.pdf.
    [52]
    D. S. Khudia, B. Zamirai, M. Samadi, and S. Mahlke. 2016. Quality control for approximate accelerators by error prediction. IEEE Des. Test 33, 1 (2016), 43–50.
    [53]
    S. Shalev-Shwartz and S. Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
    [54]
    M. Masadeh, O. Hasan, and S. Tahar. 2019. Using machine learning for quality configurable approximate computing. In Design, Automation & Test in Europe. 1554–1557.
    [55]
    P. Ashok, J. Křetínský, K. G. Larsen, A. Le Coënt, J. H. Taankvist, and M. Weininger. 2019. SOS: Safe, optimal and small strategies for hybrid markov decision processes. In Quantitative Evaluation of Systems. Springer International Publishing, 147–164.
    [56]
    M. Masadeh, O. Hasan, and S. Tahar. 2019. Input-conscious approximate multiply-accumulate (MAC) unit for energy-efficiency. IEEE Access 7 (2019), 147129–147142.
    [57]
    D. Mohapatra, V. K. Chippa, A. Raghunathan, and K. Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation Test in Europe. 1–6.
    [58]
    M. Masadeh, O. Hasan, and S. Tahar. 2019. Error analysis of approximate array multipliers. CoRR abs/1908.01343 (2019).
    [59]
    V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina. 2017. EvoApprox8B: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Design, Automation & Test in Europe. 258–261.
    [60]
    M. Masadeh, O. Hasan, and S. Tahar. 2020. Machine learning-based self-compensating approximate computing. In IEEE International Systems Conference. 1–6.
    [61]
    H. A. F. Almurib, T. N. Kumar, and F. Lombardi. 2016. Inexact designs for approximate low power addition by cell replacement. In Design, Automation and Test in Europe. 660–665.
    [62]
    S. García, J. Luengo, and F. Herrera. 2015. Data Preprocessing in Data Mining. Springer.
    [63]
    M. Masadeh, A. Aoun, O. Hasan, and S. Tahar. 2020. Decision tree-based adaptive approximate accelerators for enhanced quality. In IEEE International Systems Conference. 1–5.
    [64]
    W. J. Chan, A. B. Kahng, S. Kang, R. Kumar, and J. Sartori. 2013. Statistical analysis and modeling for error composition in approximate computation circuits. In International Conference on Computer Design. 47–53.
    [65]
    L. Breiman, J. Friedman, R. Olshen, and Ch. Stone. 1984. Classification and Regression Trees. Chapman and Hall, Wadsworth.
    [66]
    Su lin Pang and Ji zhang Gong. 2009. C5.0 classification algorithm and application on individual credit evaluation of banks. Systems Engineering - Theory and Practice 29, 12 (2009), 94–104.
    [67]
    2019. The R project for statistical computing. R. Foundation for Statistical Computing. Retrieved on 24 April, 2020 from https://www.r-project.org/.
    [68]
    The MathWorks, Inc. (2018). MATLAB and Classification Learner Toolbox Release. Natick, MA.https://www.mathworks.com/help/stats/classificationlearner-app.html.
    [69]
    The MathWorks, Inc. (2018). MATLAB and HDL Coder Toolbox Release. Natick, MA. The MathWorks, Inc.https://www.mathworks.com/help/stats/classificationlearner-app.html.
    [70]
    2020. Vivado Design Suite User Guide. xilinx. Retrieved on December 17, 2019 from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug973-vivado-release-notes-install-license.pdf.
    [71]
    2019. Vivado Design Suite User Guide: Partial Reconfiguration. Xilinx. Retrieved on June 12, 2019 from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug947-vivado-partial-reconfiguration-tutorial.pdf.
    [72]
    P. P. Chu. 2008. FPGA Prototyping by VHDL Examples: Xilinx Spartan -3 Version. John Wiley & Sons.
    [73]
    M. Laurenzano, P. Hill, M. Samadi, S. Mahlke, J. Mars, and L. Tang. 2016. Input responsiveness: Using canary inputs to dynamically steer approximation. In Programming Language Design and Implementation. ACM, 161–176.
    [74]
    A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145–175.
    [75]
    A. Oliva and A. Torralba. 2020. Modeling the shape of the scene: A holistic representation of the spatial envelope. Retrieved from http://people.csail.mit.edu/torralba/code/spatialenvelope/.
    [76]
    M. Barni. 2006. Document and Image Compression. CRC Press.
    [77]
    2020. 7 Series FPGAs Data Sheet: Overview. Xilinx. Retreived on September 8, 2020 from https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf.
    [78]
    K. Papadimitriou, A. Dollas, and S. Hauck. 2011. Performance of partial reconfiguration in FPGA systems: A survey and a cost model. ACM Trans. Reconfig. Technol. Syst. 4, 4 (2011), 24.
    [79]
    2020. BBC Sound Effects. Retrieved from http://bbcsfx.acropolis.org.uk/.
    [80]
    Ch. Solomon. 2011. Fundamentals of Digital Image Processing a Practical Approach with Examples in Matlab. Wiley-Blackwell.

    Cited By

    View all
    • (2024)Flexible Updating of Internet of Things Computing Functions through Optimizing Dynamic Partial ReconfigurationACM Transactions on Embedded Computing Systems10.1145/364382523:2(1-25)Online publication date: 18-Mar-2024
    • (2024)On the Malicious Potential of Xilinx’s Internal Configuration Access Port (ICAP)ACM Transactions on Reconfigurable Technology and Systems10.1145/363320417:2(1-28)Online publication date: 30-Apr-2024
    • (2024)Exploring Approximate Memory for Energy-Efficient Computing2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)10.1109/ICETSIS61505.2024.10459495(1685-1689)Online publication date: 28-Jan-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 17, Issue 4
    October 2021
    446 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3472280
    • Editor:
    • Ramesh Karri
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 20 August 2021
    Accepted: 01 April 2021
    Revised: 01 October 2020
    Received: 01 May 2020
    Published in JETC Volume 17, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Approximate computing
    2. approximate hardware accelerator
    3. decision tree
    4. input-aware approximation
    5. dynamic partial reconfiguration
    6. adaptive design
    7. FPGA

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Flexible Updating of Internet of Things Computing Functions through Optimizing Dynamic Partial ReconfigurationACM Transactions on Embedded Computing Systems10.1145/364382523:2(1-25)Online publication date: 18-Mar-2024
    • (2024)On the Malicious Potential of Xilinx’s Internal Configuration Access Port (ICAP)ACM Transactions on Reconfigurable Technology and Systems10.1145/363320417:2(1-28)Online publication date: 30-Apr-2024
    • (2024)Exploring Approximate Memory for Energy-Efficient Computing2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)10.1109/ICETSIS61505.2024.10459495(1685-1689)Online publication date: 28-Jan-2024
    • (2023)Design and Development of an FPGA-Based Real-Time Reconfigurable Computing PlatformProceedings of the NIELIT's International Conference on Communication, Electronics and Digital Technology10.1007/978-981-99-1699-3_24(367-380)Online publication date: 27-Jun-2023
    • (2022)Run Time Power and Accuracy Management with Approximate Circuits2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC54400.2022.9939639(1-6)Online publication date: 3-Oct-2022
    • (2022)Large Forests and Where to "Partially" Fit ThemProceedings of the 27th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC52403.2022.9712534(550-555)Online publication date: 17-Jan-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media