Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/74925.74934acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

A dynamic storage scheme for conflict-free vector access

Published: 01 April 1989 Publication History

Abstract

Previous investigations into data storage schemes have focused on finding a storage scheme that permits conflict-free access for a set of frequently encountered access patterns. This paper considers an alternative approach. Rather than forcing a single storage scheme to be used for all access patterns, conflict-free accesses of any constant stride can be made by selecting a storage scheme for each vector based on the accessing patterns used with that vector.
By factoring the stride into two components, one a power of 2 and the other relatively prime to 2, a storage scheme can be synthesized which allows conflict-free access to the vector using the specified stride. All such schemes are based on a variation of the row rotation mechanism proposed by Budnik and Kuck[1]. Each storage scheme is based on two parameters, one describes the type of rotation to perform and the other describes the amount of memory to be rotated as a single block. Hardware required to implement this storage scheme is efficient.
The performance of the memory under access strides other than the stride used to specify the storage scheme is also considered. This models a vector being accessed with multiple strides, in particular the row/column access of a matrix, and situations when the stride can not be determined prior to initializing the vector. Simulation results show that if a single buffer is added to each memory port then the average performance of the dynamic scheme surpasses that of the interleaved scheme for arbitrary stride accesses.
For dynamic storage schemes to be effective, the compiler must be able to detect information about the stride of vector accesses. In general, this is within the capabilities of current vectorizing compilers. Dynamic storage schemes also may allow more flexibility in program transformation performed by vectorizing compilers during optimization.

References

[1]
P. Buduik and D. Kuck. "The organization and use of parallel memories," IEEE Trans. Computers, vol. C-20, no. 12, pp. 1566- 1569, December 1971.
[2]
D. Lawrie. "Access and alignment of data in an array processor," HZEE Trans. Computers, vol. C-24. no. 12, pp. 1145-1155, December 1975.
[3]
K. Batcher, "The multldlmensional access memory in STARAN." IEEE Trans. Computers, vol. C-26, pp. 174-177, February 1977.
[4]
R. Swanson, "Interconnections for parallel memories to unscramble p-ordered vectors," IEEE Trans. Computers, vol. C-23, pp. 1105- 1115, November 1974.
[5]
W. Oed and O. Lange, "On the effective bandwidth of interleaved memories in vector processing systems," IEEE Trans. Computers, vol. C-34, no. 10, pp. 949-957. October 1985.
[6]
H. Shaplm, "Theoretical limitations on the efficient use of parallel memories," IEEE Trans. Computers, vol. C-27, no. 5. pp. 421- 428, May 1978.
[7]
H. Wijshoff and I. van Leeuwen, "The structure of periodic storage schemes for parallel memories," IEEE Trans. Computers, vol. C- 34, no. 6. pp. 501-505, June 1985.
[8]
H. Wijshoff and J. van Leeuwen, "On linear skewing schemes and d-ordered vectors," IEEE Trans. Computers, vol. C-36, no. 2, pp. 233-239, February 1987.
[9]
D. Lawrie and C. Vera, "The prime memory system for array access," IEEE Trans. Computers, vol. C-31, no. 5, pp. 435-442, May 1982.
[10]
D. T. Harper Ill and J. R. Jump, "Vector access performance in parallel memories using a skewed storage scheme," IEEE Trans. Computers, vol. C-36, no. 12, pp. 1440-1449. 1987.
[11]
A. Ranade. "Interconnection networks and parallel memory organizations for array processing:" Int. Conf. on Parallel. Proc., pp. 4147, August 1985.
[12]
A. Norton and E. Melton, "A class of boolean linear transformations for conflict-free power-of-two stride access," fnt. Proc on Parallel. Proc., pp. 247-254, 1987.
[13]
W. R. Cowell and C. P. Thompson, "Transforming Fortran DO Loops to Improve Performance on Vector Anzhitectures." ACM Transactions on Mathematical Software, vol. 12, pp. 324-353. December 1986.

Cited By

View all
  • (2014)A data cache with multiple caching strategies tuned to different types of localityACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667170(217-226)Online publication date: 10-Jun-2014
  • (1997)Eliminating cache conflict misses through XOR-based placement functionsProceedings of the 11th international conference on Supercomputing10.1145/263580.263599(76-83)Online publication date: 11-Jul-1997
  • (1995)A data cache with multiple caching strategies tuned to different types of localityProceedings of the 9th international conference on Supercomputing10.1145/224538.224622(338-347)Online publication date: 3-Jul-1995
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture
April 1989
426 pages
ISBN:0897913191
DOI:10.1145/74925
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 17, Issue 3
    Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture
    June 1989
    400 pages
    ISSN:0163-5964
    DOI:10.1145/74926
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)11
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)A data cache with multiple caching strategies tuned to different types of localityACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667170(217-226)Online publication date: 10-Jun-2014
  • (1997)Eliminating cache conflict misses through XOR-based placement functionsProceedings of the 11th international conference on Supercomputing10.1145/263580.263599(76-83)Online publication date: 11-Jul-1997
  • (1995)A data cache with multiple caching strategies tuned to different types of localityProceedings of the 9th international conference on Supercomputing10.1145/224538.224622(338-347)Online publication date: 3-Jul-1995
  • (1994)PSIMProceedings of the 1994 International Conference on Parallel Processing - Volume 0110.1109/ICPP.1994.171(220-223)Online publication date: 15-Aug-1994
  • (1992)On storage schemes for parallel array accessProceedings of the 6th international conference on Supercomputing10.1145/143369.143421(282-291)Online publication date: 1-Aug-1992
  • (2010)Interleaving granularity on high bandwidth memory architecture for CMPs2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation10.1109/ICSAMOS.2010.5642060(250-257)Online publication date: Jul-2010

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media