Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1884795.1884803guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Towards metaprogramming for parallel systems on a chip

Published: 25 August 2009 Publication History
  • Get Citation Alerts
  • Abstract

    We demonstrate that the performance of commodity parallel systems significantly depends on low-level details, such as storage layout and iteration space mapping, which motivates the need for tools and techniques that separate a high-level algorithm description from low-level mapping and tuning. We propose to build a tool based on the concept of decoupled Access/Execute metadata which allow the programmer to specify both execution constraints and memory access pattern of a computation kernel.

    References

    [1]
    Howes, L.W., Lokhmotov, A., Donaldson, A.F., Kelly, P.H.: Deriving efficient data movement from decoupled Access/Execute specifications. In: Seznec, A., Emer, J., O'Boyle, M., Martonosi, M., Ungerer, T. (eds.) HiPEAC 2009. LNCS, vol. 5409, pp. 168-182. Springer, Heidelberg (2009).
    [2]
    Lin, C., Snyder, L.: Principles of Parallel Programming. Addison-Wesley, MA (2008).
    [3]
    NVIDIA: CUDA (2006-2009), http://www.nvidia.com/cuda
    [4]
    Bik, A.J.: The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance. Intel Press, Hillsboro (2004).
    [5]
    McCalpin, J.D.: STREAM: Sustainable memory bandwidth in high performance computers (1990-2009), http://www.cs.virginia.edu/stream/
    [6]
    Ueng, S.-Z., Lathara, M., Baghsorkhi, S.S., Hwu, W.m.W.: CUDA-lite: Reducing GPU programming complexity. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 1-15. Springer, Heidelberg (2008).
    [7]
    Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: Supercomputing'06, p. 83 (2006).
    [8]
    Ryoo, S., Rodrigues, C.I., Stone, S.S., Stratton, J.A., Ueng, S.Z., Baghsorkhi, S.S., Hwu, W.m.W.: Program optimization carving for GPU computing. J. Parallel Distrib. Comput. 68(10), 1389-1401 (2008).
    [9]
    The Khronos Group: OpenCL (2008-2009), http://www.khronos.org/opencl

    Cited By

    View all
    • (2016)Reduction DrawingProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967950(87-97)Online publication date: 11-Sep-2016
    • (2011)Generating GPU code from a high-level representation for image processing kernelsProceedings of the 2011 international conference on Parallel Processing10.1007/978-3-642-29737-3_31(270-280)Online publication date: 29-Aug-2011

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Euro-Par'09: Proceedings of the 2009 international conference on Parallel processing
    August 2009
    468 pages
    ISBN:3642141218

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 25 August 2009

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Reduction DrawingProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967950(87-97)Online publication date: 11-Sep-2016
    • (2011)Generating GPU code from a high-level representation for image processing kernelsProceedings of the 2011 international conference on Parallel Processing10.1007/978-3-642-29737-3_31(270-280)Online publication date: 29-Aug-2011

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media