Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/339647.339702acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Allowing for ILP in an embedded Java processor

Published: 01 May 2000 Publication History

Abstract

Java processors are ideal for embedded and network computing applications such as Internet TV's, set-top boxes, smart phones, and other consumer electronics applications. In this paper, we investigate cost-effective microarchitectural techniques to exploit parallelism in Java bytecode streams. Firstly, we propose the use of a fill unit that stores decoded bytecodes into a decoded bytecode cache. This mechanism improves the fetch and decode bandwidth of Java processors by 2 to 3 times. These additional hardware units can also be used to perform optimizations such as instruction folding. This is particularly significant because experiments with the Verilog model of Sun Microsystems pico Java-II core demonstrates that instruction folding lies in the critical path. Moving folding logic from the critical path of the processor to the fill unit allows to improve the clock frequency by 25%. Out-of-order ILP exploitation is not investigated due to the prohibitive cost, but in-order dual-issue with a 64-entry decoded bytecode cache is seen to result in 10% to 14% improvement in execution cycles. Another contribution of the paper is a stack disambiguation technique that allows elimination of false dependencies between different types of stack accesses. Stack disambiguation further exposes parallelism and a dual in-order issue microengine with a 64-entry bytecode cache yields an additional 10% reduction in cycles, leading to an aggregate reduction of 17% to 24% in execution cycles.

References

[1]
T.H Romer, D. Lee, G. M. Voelker, A. Wolman, W. A. Wong, J-L. Baer, B. N. Bershad and H. M. Levy, "The Structure and Performance of Interpreters," in Proceedings of ASPLOS VII, pp. 150-159, 1996.
[2]
T. Cramer, R. Friedman, T. Miller, D. Seberger, R. Wilson, and M. Wolczko, "Compiling Java just in time," IEEE Micro, vol. 17, pp. 36-43, May-June 1997.
[3]
A. Wolfe, "First Java-specific chip takes wing," Electronic Engineering Times, April 1997. http://www, t echweb, corn/wire / news / 1997 / 09 / 0922j ava- .html.
[4]
H. McGhan and M. O'Connor, " PicoJava: A direct execution engine for Java bytecode," IEEE Computer, pp. 22- 30, October 1998.
[5]
M. O'Connor and M. Tremblay, " picoJava-I: The Java virtual machine in hardware," IEEE Micro, pp. 45-53, March-April 1997.
[6]
"SPEC JVM 98 Results." http://www.spec.org/osg/jvm98/results/jvm98.html.
[7]
"picoJava Technology FAQ." http://www, sun. corn / micro elect ronics / communitysource / picoj ava / t echfaq, ht ml.
[8]
L.-C. Chang, L.-R. Ton, M.-F. Kao and C.-P. Chung, "Stack operations folding in Java processors," IEE proceedings on Computers and Digital Techniques, vol. 145, pp. 333-340, Sept 1998.
[9]
M. Tremblay, "An Architecture for the New Millenium," in Proceedings of Hot Chips 11, August 1999.
[10]
"Community Source Licensing for picoJava Technology." http://www, sun. corn / micro elect ronics / communitysource- /picojava/.
[11]
W.-M. Hwu and Y.N. Patt, "HPSm, A High Performance Restricted Data Flow Architecture Having Minimal Functionality," in Proc. of 13th Annual International Symposium on Computer Architecture, pp. 297-306, 1986.
[12]
M. Franklin and M. Smotherman, "A Fill-Unit Approach to Multiple Instruction Issue," in Proceedings of Micro-27, pp. 162-171, 1994.
[13]
Y.N. Patt, W.-M. Hwu and M.C. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," in Proceedings of Micro-18, pp. 103-108, 1985.
[14]
Y.N. Patt, S.W. Melvin, W.-M, Hwu and M.C. Shebanow, "Critical Issues Regarding HPS, A High Performance Microarchitecture," in Proceedings of Micro-18, pp. 109-116, 1985.
[15]
Y.N. Patt, S.W. Melvin, W.-M. Hwu, M.C. Shebanow, C. Chen and J. We, "Run-Time Generation of HPS Microinstructions from a VAX Instruction Stream," in Proceedings of Micro-19, pp. 109-116, 1986.
[16]
M. Smotherman and M. Franklin, "Improving CISC Instruction Decoding Performance Using a Fill Unit," in Proceedings of the 28th International Symposium on Microarchitecture (MICRO-28), 1995.
[17]
D.H. Friendly, S.J. Patel and Y.N. Patt, "Putting the fill unit to work: dynamic optimizations for trace cache microprocessors," in Proceedings of the 31st Annual IEEE~A CM International Symposium on Microarchitecture (Micro-31), pp. 173-181, 1998.
[18]
Q. Jacobson and J.E. Smith, "Instruction pre-processing in trace processors," in Proceedings of 5th International Symposium on High-Performance Computer Architecture (HPCA-5), pp. 125-129, 1999.
[19]
N. Vijaykrishnan, Issues in the Design of a Java Processor Architecture. PhD thesis, College of Engineering, University of South Florida, Tampa, FL 33620, July 1998.
[20]
N. Vijaykrishnan, N. Ranganathan, and R. Gadekarla, "Object-oriented architectural support for a Java processor," in Proceedings of the 12th European Conference on Object-Oriented Programming, pp. 430-455, July 1998.
[21]
R. Radhakrishnan, N. Vijaykrishnan, L. John and A. Sivasubramanium, "Architectural issues in java runtime systerns," in Proceedings of 6th International Symposium on High-Performance Computer Architecture (HPCA-6), pp. 387-398, January 2000.
[22]
A. Barisone, F. Bellotti, R. Berta, and A. De Gloria, "Instruction Level Characterization of Java Virtual Machine Workload," in Digest of Workshop on Workload Characterization (WWC-99), 1999.
[23]
R. Radhakrishnan, J. Rubio, and L. John, "Characterization of Java applications at the bytecode level and at UltraSPARC-II Machine Code Level," in Proceedings of International Conference on Computer Design, pp. 281- 284, October 1999.
[24]
Sun Microsystems, picoJava-II Microarchitecture Guide, March 1999.
[25]
"Synopsys Online Documentation," Guidelines and Practices for Synthesis v.1997-08.
[26]
J. Rubio, "Characterization of java application at the bytecode level," Master's thesis, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, May 1999.
[27]
Sun Microsystems, picoJava-II Programmer's Reference Manual, March 1999.
[28]
"SPEC JVM98 Benchmarks." http: / / www. sp ec. o rg / osg/jvm 98/.

Cited By

View all
  • (2010)Application requirements and efficiency of embedded Java bytecode multi-coresProceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems10.1145/1850771.1850777(46-52)Online publication date: 19-Aug-2010
  • (2009)An accelerator design for speedup of Java execution in consumer mobile devicesComputers and Electrical Engineering10.1016/j.compeleceng.2008.11.02135:6(904-919)Online publication date: 1-Nov-2009
  • (2008)An instruction set extension for java bytecodes translation acceleration2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1109/ICSAMOS.2008.4664854(116-123)Online publication date: Jul-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
June 2000
327 pages
ISBN:1581132328
DOI:10.1145/339647
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 28, Issue 2
    Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
    May 2000
    325 pages
    ISSN:0163-5964
    DOI:10.1145/342001
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2000

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA00
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)22
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2010)Application requirements and efficiency of embedded Java bytecode multi-coresProceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems10.1145/1850771.1850777(46-52)Online publication date: 19-Aug-2010
  • (2009)An accelerator design for speedup of Java execution in consumer mobile devicesComputers and Electrical Engineering10.1016/j.compeleceng.2008.11.02135:6(904-919)Online publication date: 1-Nov-2009
  • (2008)An instruction set extension for java bytecodes translation acceleration2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation10.1109/ICSAMOS.2008.4664854(116-123)Online publication date: Jul-2008
  • (2008)A predecoding technique for ILP exploitation in Java processorsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2008.01.00854:7(707-728)Online publication date: 1-Jul-2008
  • (2006)Exploiting dataflow to extract java instruction level parallelism on a tag-based multi-issue semi in-order (TMSI) processorProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898953.1898986(52-52)Online publication date: 25-Apr-2006
  • (2006)Instruction folding in a hardware-translation based java virtual machineProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128041(139-146)Online publication date: 3-May-2006
  • (2006)Exploiting dataflow to extract Java instruction level parallelism on a tag-based multi-issue semi in-order (TMSI) processorProceedings 20th IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2006.1639289(9 pp.)Online publication date: 2006
  • (2006)On the design of a dual-execution modes processorProceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking10.1007/11942634_5(37-46)Online publication date: 4-Dec-2006
  • (2005)Parallel Queue Processor Architecture Based on Produced Order Computation ModelThe Journal of Supercomputing10.1007/s11227-005-0160-z32:3(217-229)Online publication date: 1-Jun-2005
  • (2004)Queue processor architecture for novel queue computing paradigm based on produced order schemeProceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004.10.1109/HPCASIA.2004.1324032(169-177)Online publication date: 2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media