Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Design and analysis of adaptive processor

Published: 23 March 2012 Publication History

Abstract

A new computation model called CACHE (Cache Architecture for Configurable Hardware Engine) is proposed in this paper. This model does not require a dedicated host processor and its software to harness the reconfiguration. Autonomous reconfiguration is performed within a working-set of application datapaths. The CACHE model has lots of side effects; caching, resource allocation and assignment, placement and routing, and defragmentation, with a processing array itself and a special register called a working-set register file. The model aims to reduce three major workloads: (1) the processor and application design workload, (2) runtime resource management and scheduling workload, and (3) reconfiguration workload. In order to reduce these workloads, processor architecture is definitely different from traditional computing model and its microprocessor architecture. There are three major ideas to construct the computing system: (1) an on-chip working-set model mainly in order to control load and store of streams, namely to control traffics introducing overheads, (2) an on-chip deadlock properties model mainly in order to manage resources and to continuously configure datapaths corresponding to a working-set window, (3) a cache memory technique to work for these models, the mechanism is equivalent to the working-set window, and the cache memory's procedure is equivalent to resource request, acquirement, and release of deadlock properties. The first model focuses onto streaming applications, for example vector and matrix operations, filters, and so on, which takes coarser grained operations such as integer operations of C-language. Regarding performance compared with DSPs, that comes from constant throughput across different scale of the applications. In addition, extended model, we call Instant model that automatically generates instance of a datapath, outperforms the DSPs. This paper shows its computation model, architecture, low-level design, and analyses about basic characteristics of the execution.

References

[1]
Ainsworth, T. W. and Pinkston, T. M. 2007. Characterizing the cell eib on-chip network. IEEE Micro 27, 5, 6--14.
[2]
Asaovic, K. 1998. Vector microprocessors. Ph.D. thesis, University of California, Berkeley.
[3]
Bobda, C. 2007. Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications. Springer.
[4]
Bondalapati, K. and Prasanna, V. K. 2002. Reconfigurable computing systems. Proc. IEEE. 1201--1217.
[5]
Brebner, G. 1996. A virtual hardware operating system for the Xilinx XC6200. In Proceedings of the 6th International Workshop on Field-Programmable Logic and Applications (FPL'96). Springer, 327--336.
[6]
Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Rice University.
[7]
Brown, S. D., Francis, R., Rose, J., and Vranesic, Z. 1992. Field-Programmable Gate Arrays. Kluwer Academic Publishers.
[8]
Buell, D., El-Ghazawi, T., Gai, K., and Kindratenko, V. 2007. High-performance reconfigurable computing. IEEE Comput. 40, 3, 23--27.
[9]
Burns, J., Donlin, A., Hogg, L, Singh, S., and De Wit, M. 1997. A dynamic reconfiguration run-time system. In Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines. IEEE Computer Society Press, 66--75.
[10]
Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Not. 39, 4, 66--74.
[11]
Chen, G., Li, F., Son, S., and Kandemir, M. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th Design Automation Conference (DAC'08). ACM/IEEE. 620--625.
[12]
Compton, K., Cooley, L, Knol, S., and Hauck, S. 2002. Configuration relocation and defragmentation for fpgas. IEEE Trans. VLSI 10, 3, 209--220.
[13]
DeHon, A. 1996. Reconfigurable architectures for general-purpose computing. Tech. rep. Massachusetts Institute of Technology Artificial Intelligence Laboratory.
[14]
Denning, P. J. 1968. The working set model for program behavior. Comm. ACM 11, 5, 323--333.
[15]
Espasa, R. 1997. Advanced vector microprocessors. Ph.D. thesis, Universitat Po1itecnica de Catalunya.
[16]
Espasa, R., Valero, M., Padua, D., and Jimenez, M. 1995. Quantitative analysis of vector code. In Proceedings of the Euromicro Workshop on Parallel and Distributed Processing (PDP'95). IEEE Computer Society Press, 452--461.
[17]
Hammond, L., Nayfeh, B. A., and Olukotun, K. 1997. A single-chip multiprocessor. Comput. 30, 9, 79--85.
[18]
Hauser, J. and Wawrzynek, J. 1997. Garp: A mips processor with a reconfigurable coprocessor. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'97). IEEE Computer Society, Los Alamitos, CA, 12--21.
[19]
Holt, R. C. 1972. Some deadlock properties of computer systems. ACM Comput. Surv. 4, 3, 179--196.
[20]
Howard, J., Dighe, S., et al. 2011. A 48-core ia-32 processor in 45 nm cmos using on-die message-passing and dvfs for performance and power scaling. IEEE J. Solid-State Circ. 46, 1 173--183.
[21]
Huang, I.-J. and Peng, T.-C. 2002. Analysis of x86 instruction set usage for dos/windows application and its implication on superscalar design. IEICE Trans. Inf. Syst. E85-D, 6, 929--939.
[22]
Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Mattson, P., Namkoong, J., Owens, J. D., Towles, B., and Chang, A. 2001. Imagine: Media processing with streams. IEEE Micro 21, 2, 35--46.
[23]
Kozyrakis, C. 1999. A media-enhanced vector architecture for embedded memory systems. Tech. rep. UCB-CSD-99-1059, University of California, Berkeley.
[24]
Ludden, J. M., Roesner, W., et al. 2002. Functional verification of the power4 microprocessor and power4 multiprocessor systems. IBM J. Resear. Devel. 46, 1, 53--76.
[25]
Maestre, R., Fernandez, M., Kurdahi, F. J., Bagherzadeh, N., and Singh, H. 2000. Configuration management in multi-context reconfigurable systems for simultaneous performance and power optimization. In Proceedings of the International Symposium on System Synthesis. 107.
[26]
Mangione-Smith, W., Hutchings, B., et al. 1997. Seeking solutions in configurable computing. Comput. 30, 12, 38--43.
[27]
Manolios, P. 2005. Refinement maps for efficient verification of processor models. In Proceedings of the Conference on Design Automation and Test in Europe (DATE'05). IEEE Computer Society Press, 1304--1309.
[28]
Mattson, R. L., Gecsei, 1., Slutz, D. R., and Trainger, 1. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.
[29]
Matzke, D. 1997. Will physical scalability sabotage performance gains? IEEE Comput. 30, 9, 37--39.
[30]
Moore, G. E. 1995. Lithography and the future of Moore's law. In Advances in Resist Technology and Processing XII, R. D. Allen, Ed., 2--17.
[31]
Mueller, S. M., Paul, W. J., and Kroening, D. 1999. Proving the correctness of processors with delayed branch using delayed PC. http://www-wjp.cs.uni-saarland.de/publikationen/KMP99a.pdf.
[32]
Murray, J., Salett, R., Hetherington, R., and McKeen, F. 1990. Micro-architecture of the VAX 9000. In Proceedings of the 35th IEEE Computer Society International Conference, Digest of Papers, 44--53.
[33]
Nagarajan, R., Sankaralingam, K., Burger, D., and Keckler, S. W. 2001. A design space evaluation of grid processor architectures. In Proceedings of the 4th Annual International Symposium on Microarchitecture. IEEE Computer Society, Los Alamitos, CA, 40--51.
[34]
Olukotun, K., Hammond, L., and Laudon, J. 2007. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency Vol. 2. Morgan & Claypool Publishers, San Rafael, CA.
[35]
Palacharla, S., Jouppi, N. P., and Smith, J. E. 1997. Complexity-effective superscaJar processors. SIGARCH Comput. Archit. News 25, 2, 206--218.
[36]
Qi, S., Zhang, M., Li, J., Zhao, T., Zhang, C., and Li, S. 2010. A high performance router with dynamic buffer allocation for on-chip interconnect networks. In Proceedings of the IEEE International Conference on Computer Design. 462--467.
[37]
Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA'00). IEEE Computer Society, 375--386.
[38]
Sankaralingam, K., Nagarajan, R., et al. 2006. The distributed microarchitecture of the trips prototype processor. In Proceedings of the 39th International Symposium on Microarchitecture.
[39]
Schmit, H. 1997. Incremental reconfiguration for pipelined applications. In Proceedings of the 5th IEEE Symposium on FPGAsfor Custom Computing Machines (FCCM'97). IEEE Computer Society, Los Alamitos, CA, 47--55.
[40]
Seiler, L., Carmean, D., et al. 2008, Larrabee; a many-core x86 architecture for visual computing. ACM Trans. Graph. 27, 3, I--15.
[41]
Sima, D. 2000. The design space of register renaming techniques. IEEE Micro. 20, 5, 70--83.
[42]
SLDS. 2010. Lpdsp (low power dsp). http://semicon.sanyo.comlslds/product/lpdsp.html.
[43]
Smith, J. E. and Sohi, G. S. 1995. The microarchitecture of superscalar processors. Proc. IEEE.
[44]
Takano, S. 2004. Adaptive processor: A model of stream processing. In Proceedings of the IEEE Reconfigurable Architectures Workshop (RAW'04). associated with the 18th International Parallel and Distributed Processing Symposium, (IPDPS'04).
[45]
Tomasulo, R. M. 1967. An efficient algorithm for exploiti~ multiple arithmetic units. IBM J. Resear. Devel. 11, 1, 25--33.
[46]
Tran, A., Truong, D., and Baas, B. 2009. A GALS many-core heterogeneous DSP platform with sourcesynchronous on-chip interconnection network. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip. 214--223.
[47]
Trimberger, S., Carberry, D., Johnson, A., and Wong, J. 1997. A time-multiplexed fpga. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines, (FCCM'97). IEEE Computer Society, 22--28.
[48]
Tullsen, D., Eggers, S., and Levy, H. 1998. Simultaneous multithreading: maximizing on-chip parallelism. In ISCA'98: 25 Years of the International Symposia on Computer Architecture (Selected Papers). ACM, New York, NY, 533--544.
[49]
Victor, D. W., Ludden, J. M., et al. 2005. Functional verification of the power5 microprocessor and power5 multiprocessor systems. IBM J. Resear. Devel. 49, 4/5, 541--552.
[50]
Vuillemin, J., Bertin, P., Roncin, D., Shand, M., Touati, H., and Boucard, P. 1996. Programmable active memories: Reconfigurable systems come of age. IEEE Trans. VLSI Syst. 4, 56--69.
[51]
Wall, D. W. 1993. Limits of instruction-level parallelism. Resear. rep. 93/6. Compaq Computer Corp.
[52]
Weiss, S. and Smith, J. E. 1984. Instruction issue logic for pipelined supercomputers. SIGARCH Comput. Archit. News 12, 3, 110--118.
[53]
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.-C., III, J. F. B., and Agarwal, A. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 5, 15--31.
[54]
Wigley, G. and Kearney, D. 2001. The development of an operating system for reconfigurable computing. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM). IEEE.
[55]
Wirthin, M. J. and Hutchings, B. L. 1995. A dynamic instruction set computer. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'95). IEEE Computer Society, 99--109.
[56]
Wulf, W. A. and McKee, S. A. 1995. Hitting the memory wall: Implications of the obvious. Comput. Archit. News 23, 20--24.

Cited By

View all
  • (2017)Performance Scalability of Adaptive Processor ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/300790210:2(1-22)Online publication date: 11-Apr-2017
  • (2013)Very Large-Scale Integrated ProcessorInternational Journal of Networking and Computing10.15803/ijnc.3.1_23:1(2-14)Online publication date: 2013
  • (2012)Very Large-Scale Integrated ProcessorProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.101(821-828)Online publication date: 21-May-2012

Index Terms

  1. Design and analysis of adaptive processor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 5, Issue 1
    March 2012
    148 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/2133352
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 March 2012
    Accepted: 01 July 2011
    Revised: 01 June 2011
    Received: 01 September 2010
    Published in TRETS Volume 5, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Reconfigurable architecture
    2. Stream Processing
    3. deadlock properties model on chip
    4. design and analysis
    5. runtime management
    6. runtime reconfiguration
    7. stack structure
    8. working set model on chip

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Performance Scalability of Adaptive Processor ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/300790210:2(1-22)Online publication date: 11-Apr-2017
    • (2013)Very Large-Scale Integrated ProcessorInternational Journal of Networking and Computing10.15803/ijnc.3.1_23:1(2-14)Online publication date: 2013
    • (2012)Very Large-Scale Integrated ProcessorProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.101(821-828)Online publication date: 21-May-2012

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media