Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1278177.1278191acmconferencesArticle/Chapter ViewAbstractPublication PagesissacConference Proceedingsconference-collections
Article

Probabilistic certification of divide & conquer algorithms on global computing platforms: application to fault-tolerant exact matrix-vector product

Published: 27 July 2007 Publication History

Abstract

In [6], a new approach for certifying the correctness of program executions in hostile environments has been proposed. The authors presented probabilistic certification by massive attack detection through two algorithms MCT and EMCT. The execution to certify is represented by a macro-data flow graph which is used to randomly extract some tasks to be re-executed on safe resources in order to determine whether the execution is correct or faulty. Bounds associated with certification have been provided for general graphs and for tasks with out-tree dependencies.
In this paper, we extend those results with a cost analysis of parallel certification based on on-line scheduling by work-stealing. In particular, we focus on Divide & Conquer algorithms that are commonly used in symbolic computations and demonstrate the efficiency of EMCT for the certification of the resulting Fork-Join graph. Finally, we show how to combine EMCT with BCH codes to make a matrix-vector product both tolerant to falsifications and massive attacks.

References

[1]
M. A. Bender and M. O. Rabin. Online scheduling of parallel programs on heterogeneous systems with applications to cilk. Theory Comput. Syst., 35(3):289--304, 2002.
[2]
Z. Chen and J. J. Dongarra. Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources. Rhodes Island, Greece, april 2006.
[3]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998.
[4]
N. V. R. R. George A. Reis, Jonathan Chang and D. I. August. SWIFT: Software Implemented Fault Tolerance. In Proceedings of the Third International Symposium on Code Generation and Optimization (CGO), March 2005.
[5]
S. Jafar, T. Gautier, A. W. Krings, and J.-L. Roch. A checkpoint/recovery model for heterogeneous data flow computations using work-stealing. In EUROPAR'2005, August 2005.
[6]
A. Krings, J.-L. Roch, S. Jafar, and S. Varrette. A Probabilistic Approach for Task and Result Certification of Large-scale Distributed Applications in Hostile Environments. In EGC2005, LNCS 3470. Springer Verlag, February 14-16 2005.
[7]
A. W. Krings, J.-L. Roch, and S. Jafar. Certification of large distributed computations with task dependencies in hostile environments. In EIT 2005, May 2005.
[8]
A. Li and B. Hong. A low-cost correction algorithm for transient data errors. In Ubiquity, volume 7, May 2006.
[9]
MOAIS Team. KAAPI. http://kaapi.gforge.inria.fr/, 2005.
[10]
J. S. Plank, Y. Kim, and J. Dongarra. Fault tolerant matrix operations for networks of workstations using diskless checkpointing. Journal of Parallel and Distributed Computing, 43(2):125--138, June 1997.
[11]
V. Pless. Introduction To The Theory of Error Correcting Codes. John Wiley Sons, 1990.
[12]
J.-L. Roch, D. Traore, and J. Bernard. On-line adaptive parallel prefix computation. In LNCS 4128, EUROPAR'2006, pages 843--850, August 2006.
[13]
G. K. Saha. Software based fault tolerance: a survey. Ubiquity, 7(25): 1--1, 2006.
[14]
L. F. G. Sarmenta. Volunteer Computing. PhD thesis, Dept. of Electrical Engineering and Computer Science, MIT, March 2001.
[15]
S. Varrette, J.-L. Roch, J. Montagnat, L. Seitz, J.-M. Pierson, and F. Leprvost. Safe Distributed Architecture for Image-based Computer Assisted Diagnosis. In IEEE 1st International Workshop on Health Pervasive Systems (HPS'06), Lyon, France, june 2006.

Cited By

View all
  • (2015)A MapReduce-based algorithm for parallelizing collusion detection in Hadoop2015 7th Conference on Information and Knowledge Technology (IKT)10.1109/IKT.2015.7288760(1-5)Online publication date: May-2015
  • (2013)Using data-flow analysis in MAS for power-aware HPC runs2013 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2013.6641407(158-160)Online publication date: Jul-2013
  • (2011)A signature scheme for distributed executions based on control flow analysisProceedings of the 2011 international conference on Security and Intelligent Information Systems10.1007/978-3-642-25261-7_7(85-102)Online publication date: 13-Jun-2011
  • Show More Cited By

Index Terms

  1. Probabilistic certification of divide & conquer algorithms on global computing platforms: application to fault-tolerant exact matrix-vector product

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computation
      July 2007
      116 pages
      ISBN:9781595937414
      DOI:10.1145/1278177
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 July 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. divide & conquer algorithms
      2. fork-join macro-data flow graph
      3. global computing
      4. result-checking

      Qualifiers

      • Article

      Conference

      ISSAC07
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)A MapReduce-based algorithm for parallelizing collusion detection in Hadoop2015 7th Conference on Information and Knowledge Technology (IKT)10.1109/IKT.2015.7288760(1-5)Online publication date: May-2015
      • (2013)Using data-flow analysis in MAS for power-aware HPC runs2013 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2013.6641407(158-160)Online publication date: Jul-2013
      • (2011)A signature scheme for distributed executions based on control flow analysisProceedings of the 2011 international conference on Security and Intelligent Information Systems10.1007/978-3-642-25261-7_7(85-102)Online publication date: 13-Jun-2011
      • (2010)Output-sensitive decoding for redundant residue systemsProceedings of the 2010 International Symposium on Symbolic and Algebraic Computation10.1145/1837934.1837985(265-272)Online publication date: 25-Jul-2010
      • (2009)Dynamic Adaptation Applied to Sabotage ToleranceProceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing10.1109/PDP.2009.19(237-244)Online publication date: 18-Feb-2009
      • (2009)Collusion Detection for Grid ComputingProceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2009.12(412-419)Online publication date: 18-May-2009
      • (2007)Multithreaded parallel implementation of arithmetic operations modulo a triangular setProceedings of the 2007 international workshop on Parallel symbolic computation10.1145/1278177.1278187(53-59)Online publication date: 27-Jul-2007

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media