Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hadoop at home: large-scale computing at a small college

Published: 04 March 2009 Publication History

Abstract

The potential benefits of data-intensive scalable computing (DISC) in CS education are considered in the context of a small college with an active student-operated Beowulf cluster initiative. The map-reduce computational model, of great importance in industry, is reviewed, and the Hadoop implementation of that model is connected to specific courses throughout the undergraduate CS curriculum. Concerns when running a local Hadoop-capable cluster at a small college are identified.

References

[1]
Amazon.com, Inc. Amazon elastic compute cloud (Amazon EC2). Retrieved August 22, 2008, from http://www.amazon.com/gp/browse.html?node=201590011, 2008.
[2]
Apache Software Foundation. Hadoop. Retrieved August 22, 2008, from http://hadoop.apache.org/core/, 2008.
[3]
Beowulf.org. Beowulf project overview. Retrieved August 22, 2008, from http://www.beowulf.org/overview/index.html, 2008.
[4]
C. Bisciglia and A. Kimball. Getting started with cluster computing for undergrads. Vendor session (Google) at SIGCSE '08, the 39th SIGCSE technical symposium on Computer Science education, March 13, 2008, 2008.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Retrieved August 22, 2008, from http://labs.google.com/papers/mapreduce.html, 2008.
[6]
J. Dougherty, organizer. DISC-E 08: NSF data-intensive scalable computing in education workshop. Workshop at the University of Washington, July 16-18, 2008, 2008.
[7]
T. Frederick. Parallelizing the computation of the spt statistic. In Proceedings of the Midwest Instruction and Computing Symposium, April 2008.
[8]
R. W. Jacobel. Center for geophysical studies of ice and climate. Retrieved August 22, 2008, from http://www.stolaf.edu/other/cegsic/, 2008.
[9]
A. Kimball, S. Michels-Slettvet, and C. Bisciglia. Cluster computing for web-scale data processing. In SIGCSE '08: Proceedings of the 39th SIGCSE technical symposium on Computer science education, pages 116--120, New York, NY, USA, 2008. ACM.
[10]
S. Lohr. Google and I.B.M. join in 'cloud computing' research. New York Times, October 8, 2007, 2007.
[11]
O. O'Malley. Experiences scaling up hadoop. Industry technical talk at DISC-E 08: NSF Data-Intensive Scalable Computing in Education Workshop, July 17, 2008, 2008.
[12]
A. Rasmussen, M. Johnson, R. Liao, R. Sridharan, D. Garcia, and B. Harvey. Infusing parallelism into introductory computer science curriculum using mapreduce. Poster at SIGCSE '08, the 39th SIGCSE technical symposium on Computer Science education, March 14, 2008, 2008.
[13]
A. Waldschmidt and Richard A. Brown. Simulation of nitrogen ow using the St. Olaf beowulf cluster. In Proceedings of the Midwest Instruction and Computing Symposium, April 2008.
[14]
Wikipedia. Grid computing. Retrieved August 22, 2008, from http://en.wikipedia.org/wiki/Grid_computing, 2008.

Cited By

View all
  • (2018)High Performance Storage for Big Data Analytics and VisualizationHandbook of Research on Big Data Storage and Visualization Techniques10.4018/978-1-5225-3142-5.ch010(254-275)Online publication date: 2018
  • (2017)Using Phoenix++ MapReduce to introduce undergraduate students to parallel computingJournal of Computing Sciences in Colleges10.5555/3069658.306968232:6(165-174)Online publication date: 1-Jun-2017
  • (2017)Developing content-based recommender system using Hadoop Map ReduceJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16924332:4(2997-3008)Online publication date: 29-Mar-2017
  • Show More Cited By

Recommendations

Reviews

Arthur Gittleman

One of the interesting challenges of computer science (CS) education is preparing students for the large-scale computing of today's world. This paper presents a model for what can be achieved at even a small college. Google uses a map-reduce model (MapReduce) for large-scale data intensive computing. Hadoop is an open-source implementation of the map-reduce model. The author uses Hadoop on a Beowulf cluster, enhancing the curriculum and student experience in several ways. As part of an undergraduate research course, a team of three students constructed the first Beowulf cluster from retired computers. Students do the system administration. Nineteen students and 11 faculty members in five departments have been involved in research projects using the cluster. The author explains that the map-reduce model provides a good example in hardware design, programming languages, algorithms, and operating systems courses, even if a cluster implementation is not available. Hands-on experience, of course, provides many increased benefits. Two examples of map-reduce programming have been introduced at St. Olaf, in the CS1 course and in a parallel computing systems course. In CS1, students used Hadoop with Wikipedia as a data source. In the parallel computing systems seminar, students used and understood map-reduce computation and parallel computing technology. In order to get a larger cluster for production computing, St. Olaf used virtualization to share new laboratory equipment. A section on the costs of clustering mentions cooling, space, and system administration. Student administrators have other priorities that need to be considered. Renting cluster resources was mentioned as a possibility. Brown shows how Hadoop makes it feasible for a small college to explore data-intensive computing. It is a very impressive contribution that can be emulated at many institutions. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM SIGCSE Bulletin
ACM SIGCSE Bulletin  Volume 41, Issue 1
SIGCSE '09
March 2009
553 pages
ISSN:0097-8418
DOI:10.1145/1539024
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGCSE '09: Proceedings of the 40th ACM technical symposium on Computer science education
    March 2009
    612 pages
    ISBN:9781605581835
    DOI:10.1145/1508865
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2009
Published in SIGCSE Volume 41, Issue 1

Check for updates

Author Tags

  1. beowulf
  2. cluster computing
  3. data-intensive scalable computing
  4. examples in education
  5. hadoop
  6. map-reduce
  7. student system management

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2018)High Performance Storage for Big Data Analytics and VisualizationHandbook of Research on Big Data Storage and Visualization Techniques10.4018/978-1-5225-3142-5.ch010(254-275)Online publication date: 2018
  • (2017)Using Phoenix++ MapReduce to introduce undergraduate students to parallel computingJournal of Computing Sciences in Colleges10.5555/3069658.306968232:6(165-174)Online publication date: 1-Jun-2017
  • (2017)Developing content-based recommender system using Hadoop Map ReduceJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16924332:4(2997-3008)Online publication date: 29-Mar-2017
  • (2017)One step at a timeJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.024105:C(4-17)Online publication date: 1-Jul-2017
  • (2016)Teaching Big Data with a Virtual ClusterProceedings of the 47th ACM Technical Symposium on Computing Science Education10.1145/2839509.2844651(175-180)Online publication date: 17-Feb-2016
  • (2016)Efficient top-k similarity document search utilizing distributed file systems and cosine similarityCluster Computing10.1007/s10586-015-0506-019:1(109-126)Online publication date: 1-Mar-2016
  • (2015)Speed-based Load Balancer for Scheduling Reduce Tasks to Process Intermediate Data of MapReduce Applications on Cloud ComputingProceedings of the ASE BigData & SocialInformatics 201510.1145/2818869.2818880(1-6)Online publication date: 7-Oct-2015
  • (2015)Secure Sketch Search for Document SimilarityProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0110.1109/Trustcom.2015.489(1102-1107)Online publication date: 20-Aug-2015
  • (2015)NoSQL Databases: A Software Engineering PerspectiveNew Contributions in Information Systems and Technologies10.1007/978-3-319-16486-1_73(741-750)Online publication date: 2015
  • (2014)Taking a walk on the wild sideProceedings of the 45th ACM technical symposium on Computer science education10.1145/2538862.2538931(535-540)Online publication date: 5-Mar-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media