Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2612669.2612673acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Executing dynamic data-graph computations deterministically using chromatic scheduling

Published: 21 June 2014 Publication History

Abstract

A data-graph computation — popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChi — is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round.
This paper introduces PRISM, a chromatic-scheduling algorithm for executing dynamic data-graph computations. PRISM uses a vertex-coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by PRISM to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze PRISM using work-span analysis. Let G=(V,E) be a degree-Δ graph colored with Χ colors, and suppose that Q⊆V is the set of active vertices in a round. Define size(Q)=[Q] + Σv∈Qdeg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of PRISM performs updates in Q using O(Χ(lg (Q/Χ)+lgΔ)+ lgP) span and Θ(size(Q)+Χ+P) work. These theoretical guarantees are matched by good empirical performance. We modified GraphLab to incorporate PRISM and studied seven application benchmarks on a 12-core multicore machine. PRISM executes the benchmarks 1.2–2.1 times faster than GraphLab's nondeterministic lock-based scheduler while providing deterministic behavior.
This paper also presents PRISM-R, a variation of PRISM that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. PRISM-R satisfies the same theoretical bounds as PRISM, but its implementation is more involved, incorporating a multivector data structure to maintain an ordered set of vertices partitioned by color.

References

[1]
L. Adams and J. Ortega. A multi-color SOR method for parallel computation. In ICPP, 1982.
[2]
E. Allen, D. Chase, J. Hallett, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. Steele Jr., and S. Tobin-Hochstadt. The Fortress Language Specification Version 1.0. Sun Microsystems, Inc., 2008.
[3]
N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 1986.
[4]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In ACM SIGKDD, 2006.
[5]
L. Barenboim and M. Elkin. Distributed $(ΔJ. Y.-T. Leung. phHandbook of Scheduling 1)$-coloring in linear (in Δ) time. In STOC, 2009.
[6]
R. Barik, Z. Budimlic, V. Cavè, S. Chatterjee, Y. Guo, D. Peixotto, R. Raman, J. Shirako, S. Taşırlar, Y. Yan, et al. The Habanero multicore software research project. In OOPSLA, 2009.
[7]
T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In ASPLOS, 2010.
[8]
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/C++ In OOPSLA, 2009.
[9]
D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., 1989.
[10]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, 2008.
[11]
G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90-190, School of Computer Science, Carnegie Mellon University, 1990.
[12]
G. E. Blelloch. NESL: A nested data-parallel language. Technical Report CS-92-103, Carnegie Mellon University, Pittsburgh, PA, 1992.
[13]
G. E. Blelloch. Programming parallel algorithms. CACM, 1996.
[14]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In Proceedings of Principles and Practice of Parallel Programming, pp. 181--192, 2012.
[15]
G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the Connection Machine CM-2. In SPAA, 1991.
[16]
R. D. Blumofe and C. E. Leiserson. Space-efficient scheduling of multithreaded computations. SICOMP, 1998.
[17]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 1999.
[18]
R. D. Blumofe and D. Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. Technical Report, University of Texas at Austin, 1999.
[19]
R. L. Bocchino, Jr., V. S. Adve, S. V. Adve, and M. Snir. Parallel programming must be deterministic by default. In First USENIX Conference on Hot Topics in Parallelism, 2009.
[20]
R. P. Brent. The parallel evaluation of general arithmetic expressions. JACM, 1974.
[21]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst., 1998.
[22]
A. Brodnik, S. Carlsson, E. Demaine, J. Ian Munro, and R. Sedgewick. Resizable arrays in optimal time and space. In Algorithms and Data Structures, volume 1663 of LNCS. Springer Berlin Heidelberg, 1999.
[23]
F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In ICFP, 1981.
[24]
V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java: the new adventures of old x10. In PPPJ. ACM, 2011.
[25]
B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby. ZPL: A machine independent programming language for parallel computers. IEEE TSE, 2000.
[26]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA, 2005.
[27]
R. Cole and U. Vishkin. Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms. In STOC, 1986.
[28]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press and McGraw-Hill, second edition, 2001.
[29]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.
[30]
J. C. Culberson. Iterated greedy graph coloring and the difficulty landscape. Technical report, University of Alberta, 1992.
[31]
T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM TOMS, 2011.
[32]
J. E. Dennis Jr. and T. Steihaug. On the successive projections approach to least-squares problems. SIAM J. Numer. Anal., 1986.
[33]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic shared memory multiprocessing. In ASPLOS, 2009.
[34]
J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman. RCDC: A relaxed consistency deterministic computer. In ASPLOS, 2011.
[35]
D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus efficiency in parallel systems. IEEETC, 1989.
[36]
M. Feng and C. E. Leiserson. Efficient detection of determinacy races in Cilk programs. In SPAA, 1997.
[37]
M. Feng and C. E. Leiserson. Efficient detection of determinacy races in Cilk programs. Theory of Computing Systems, 32(3):301--326, 1999.
[38]
M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and other Cilk- hyperobjects. In SPAA, 2009.
[39]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, 1998.
[40]
M. Garey, D. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1976.
[41]
A. E. Gelfand and A. F. M. Smith. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 1990.
[42]
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. PAMI, 1984.
[43]
P. B. Gibbons. A more practical PRAM model. In SPAA, 1989.
[44]
J. R. Gilbert, C. Moler, and R. Schreiber. Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl, 1992.
[45]
A. V. Goldberg, S. A. Plotkin, and G. E. Shannon. Parallel symmetry-breaking in sparse graphs. In SIAM J. Disc. Math, 1987.
[46]
M. Goldberg and T. Spencer. A new parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, 1989.
[47]
G. H. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. J. SIAM Numer. Anal., 1965.
[48]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[49]
R. L. Graham. Bounds for certain multiprocessing anomalies. The Bell System Technical Journal, 1966.
[50]
R. H. Halstead, Jr. Implementation of Multilisp: Lisp on a multiprocessor. In Lisp and Functional Programming, 1984.
[51]
R. H. Halstead, Jr. Multilisp: A language for concurrent symbolic computation. ACM TOPLAS, 1985.
[52]
W. Hasenplaugh, T. Kaler, T. B. Schardl, and C. E. Leiserson. Ordering heuristics for parallel graph coloring. In SPAA, 2014.
[53]
Y. He, C. E. Leiserson, and W. M. Leiserson. The Cilkview scalability analyzer. In SPAA, 2010.
[54]
F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematical Physics, 1927.
[55]
D. R. Hower, P. Dudnik, M. D. Hill, and D. A. Wood. Calvin: Deterministic or not- Free will to choose. In HPCA, 2011.
[56]
Intel Corporation. Intel Cilk Plus Language Specification, 2010. Available from http://software.intel.com/sites/products/cilk-plus/cilk_plus_language_s%pecification.pdf.
[57]
Intel Corporation. Intel(R) Threading Building Blocks, 2012. Available from http://software.intel.com/sites/products/documentation/doclib/tbb_sa/he%lp/index.htm.
[58]
K. E. Iverson. A Programming Language. John Wiley & Sons, 1962.
[59]
M. T. Jones and P. E. Plassmann. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, 1993.
[60]
C. H. Koelbel, D. B. Loveman, R. S. Schreiber, G. L. Steele Jr., and M. E. Zosel. The High Performance Fortran Handbook. The MIT Press, 1994.
[61]
F. Kuhn. Weak graph colorings: distributed algorithms and applications. In SPAA, 2009.
[62]
F. Kuhn and R. Wattenhofer. On the complexity of distributed graph coloring. In PODC, 2006.
[63]
M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A suite of parallel irregular programs. In ISPASS, 2009.
[64]
A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: large-scale graph computation on just a PC. In OSDI. USENIX, 2012.
[65]
C. Lasser and S. M. Omohundro. The Essential *Lisp Manual, Release 1, Revision 3. Thinking Machines Technical Report 86.15, Cambridge, MA, 1986.
[66]
D. Lea. A Java fork/join framework. In Conference on Java Grande, 2000.
[67]
E. A. Lee. The problem with threads. IEEE Computer, 2006.
[68]
I.-T. A. Lee, A. Shafi, and C. E. Leiserson. Memory-mapping support for reducer hyperobjects. In SPAA, 2012.
[69]
D. Leijen and J. Hall. Optimize managed code for multi-core machines. MSDN Magazine, 2007. Available from http://msdn.microsoft.com/magazine/.
[70]
C. E. Leiserson. The Cilk++concurrency platform. Journal of Supercomputing, 2010.
[71]
J. Leskovec. SNAP: Stanford network analysis platform. Available from http://snap.stanford.edu/data/index.html, 2013.
[72]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. CoRR, 2008.
[73]
N. Linial. Locality in distributed graph algorithms. SIAM J. Comput., 1992.
[74]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, pp. 716--727, 2012.
[75]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In UAI, 2010.
[76]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. In PVLDB, 2012.
[77]
C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. JACM, 1994.
[78]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010.
[79]
A. McCallum. Cora data set. Available from http://people.cs.umass.edu/mccallum/data.html.
[80]
D. McCrady. Avoiding contention using combinable objects. Microsoft Developer Network blog post, Sept. 2008.
[81]
T. Mitchell. NPIC500 data set. Available from http://www.cs.cmu.edu/tom/10709_fall09/NPIC500.pdf, 2009.
[82]
K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In UAI, 1999.
[83]
R. H. B. Netzer and B. P. Miller. What are race conditions? ACM Letters on Programming Languages and Systems, 1992.
[84]
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In CIKM, 2000.
[85]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In ASPLOS, 2009.
[86]
S. S. Patil. Closure properties of interconnections of determinate systems. In J. B. Dennis, editor, Record of the Project MAC Conference on Concurrent Systems and Parallel Computation. ACM, 1970.
[87]
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
[88]
J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007.
[89]
J. Shun, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Reducing contention through priority updates. In SPAA, 2013.
[90]
J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In SPAA, 2012.
[91]
P. Singla and P. Domingos. Entity resolution with markov logic. In ICDM, 2006.
[92]
G. L. Steele Jr. Making asynchronous parallelism safe for the world. In POPL, 1990.
[93]
J. Stoer, R. Bulirsch, R. H. Bartels, W. Gautschi, and C. Witzgall. Introduction to Numerical Analysis. Springer, New York, 2002.
[94]
M. Szegedy and S. Vishwanathan. Locality based graph coloring. In STOC, 1993.
[95]
A. M. Turing. Rounding-off errors in matrix processes. The Quarterly Journal of Mechanics and Applied Mathematics, 1948.
[96]
D. J. A. Welsh and M. B. Powell. An upper bound for the chromatic number of a graph and its application to timetabling problems. The Computer Journal, 1967.
[97]
J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009.
[98]
M. Zagha and G. E. Blelloch. Radix sort for vector multiprocessors. In Supercomputing, 1991.

Cited By

View all
  • (2022)High-performance and balanced parallel graph coloring on multicore platformsThe Journal of Supercomputing10.1007/s11227-022-04894-679:6(6373-6421)Online publication date: 7-Nov-2022
  • (2020)High-Throughput Image Alignment for Connectomics using Frugal Snap Judgments2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286243(1-9)Online publication date: 22-Sep-2020
  • (2019)PowerLyraACM Transactions on Parallel Computing10.1145/32989895:3(1-39)Online publication date: 22-Jan-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures
June 2014
356 pages
ISBN:9781450328210
DOI:10.1145/2612669
  • General Chair:
  • Guy Blelloch,
  • Program Chair:
  • Peter Sanders
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chromatic scheduling
  2. data-graph computations
  3. determinism
  4. multicore
  5. multithreading
  6. parallel programming
  7. reducers
  8. work stealing

Qualifiers

  • Research-article

Conference

SPAA '14

Acceptance Rates

SPAA '14 Paper Acceptance Rate 30 of 122 submissions, 25%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)High-performance and balanced parallel graph coloring on multicore platformsThe Journal of Supercomputing10.1007/s11227-022-04894-679:6(6373-6421)Online publication date: 7-Nov-2022
  • (2020)High-Throughput Image Alignment for Connectomics using Frugal Snap Judgments2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286243(1-9)Online publication date: 22-Sep-2020
  • (2019)PowerLyraACM Transactions on Parallel Computing10.1145/32989895:3(1-39)Online publication date: 22-Jan-2019
  • (2018)LaikaProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210395(415-426)Online publication date: 11-Jul-2018
  • (2018)Combining HTM with RCU to Speed Up Graph Coloring on Multicore PlatformsHigh Performance Computing10.1007/978-3-319-92040-5_18(350-369)Online publication date: 29-May-2018
  • (2017)A Multicore Path to Connectomics-on-DemandACM SIGPLAN Notices10.1145/3155284.301876652:8(267-281)Online publication date: 26-Jan-2017
  • (2017)A Multicore Path to Connectomics-on-DemandProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018766(267-281)Online publication date: 26-Jan-2017
  • (2017)Shared-Memory Parallelism Can Be Simple, Fast, and ScalableundefinedOnline publication date: 9-Jun-2017
  • (2016)Graph Analytics Through Fine-Grained ParallelismProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915238(463-478)Online publication date: 26-Jun-2016
  • (2016)High Performance Parallel Graph Coloring on GPGPUs2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.11(845-854)Online publication date: May-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media