research-article

Hadoop at home: large-scale computing at a small college

Author:

Richard A. BrownAuthors Info & Claims

ACM SIGCSE Bulletin, Volume 41, Issue 1

Pages 106 - 110

https://doi.org/10.1145/1539024.1508904

Published: 04 March 2009 Publication History

Get Access

Abstract

The potential benefits of data-intensive scalable computing (DISC) in CS education are considered in the context of a small college with an active student-operated Beowulf cluster initiative. The map-reduce computational model, of great importance in industry, is reviewed, and the Hadoop implementation of that model is connected to specific courses throughout the undergraduate CS curriculum. Concerns when running a local Hadoop-capable cluster at a small college are identified.

References

[1]

Amazon.com, Inc. Amazon elastic compute cloud (Amazon EC2). Retrieved August 22, 2008, from http://www.amazon.com/gp/browse.html?node=201590011, 2008.

Google Scholar

[2]

Apache Software Foundation. Hadoop. Retrieved August 22, 2008, from http://hadoop.apache.org/core/, 2008.

Google Scholar

[3]

Beowulf.org. Beowulf project overview. Retrieved August 22, 2008, from http://www.beowulf.org/overview/index.html, 2008.

Google Scholar

[4]

C. Bisciglia and A. Kimball. Getting started with cluster computing for undergrads. Vendor session (Google) at SIGCSE '08, the 39th SIGCSE technical symposium on Computer Science education, March 13, 2008, 2008.

Google Scholar

[5]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Retrieved August 22, 2008, from http://labs.google.com/papers/mapreduce.html, 2008.

Digital Library

Google Scholar

[6]

J. Dougherty, organizer. DISC-E 08: NSF data-intensive scalable computing in education workshop. Workshop at the University of Washington, July 16-18, 2008, 2008.

Google Scholar

[7]

T. Frederick. Parallelizing the computation of the spt statistic. In Proceedings of the Midwest Instruction and Computing Symposium, April 2008.

Google Scholar

[8]

R. W. Jacobel. Center for geophysical studies of ice and climate. Retrieved August 22, 2008, from http://www.stolaf.edu/other/cegsic/, 2008.

Google Scholar

[9]

A. Kimball, S. Michels-Slettvet, and C. Bisciglia. Cluster computing for web-scale data processing. In SIGCSE '08: Proceedings of the 39th SIGCSE technical symposium on Computer science education, pages 116--120, New York, NY, USA, 2008. ACM.

Digital Library

Google Scholar

[10]

S. Lohr. Google and I.B.M. join in 'cloud computing' research. New York Times, October 8, 2007, 2007.

Google Scholar

[11]

O. O'Malley. Experiences scaling up hadoop. Industry technical talk at DISC-E 08: NSF Data-Intensive Scalable Computing in Education Workshop, July 17, 2008, 2008.

Google Scholar

[12]

A. Rasmussen, M. Johnson, R. Liao, R. Sridharan, D. Garcia, and B. Harvey. Infusing parallelism into introductory computer science curriculum using mapreduce. Poster at SIGCSE '08, the 39th SIGCSE technical symposium on Computer Science education, March 14, 2008, 2008.

Google Scholar

[13]

A. Waldschmidt and Richard A. Brown. Simulation of nitrogen ow using the St. Olaf beowulf cluster. In Proceedings of the Midwest Instruction and Computing Symposium, April 2008.

Google Scholar

[14]

Wikipedia. Grid computing. Retrieved August 22, 2008, from http://en.wikipedia.org/wiki/Grid_computing, 2008.

Google Scholar

Cited By

View all

Fandango ARivera W(2018)High Performance Storage for Big Data Analytics and VisualizationHandbook of Research on Big Data Storage and Visualization Techniques10.4018/978-1-5225-3142-5.ch010(254-275)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-3142-5.ch010
Matthews S(2017)Using Phoenix++ MapReduce to introduce undergraduate students to parallel computingJournal of Computing Sciences in Colleges10.5555/3069658.306968232:6(165-174)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.5555/3069658.3069682
Gautam ABedi P(2017)Developing content-based recommender system using Hadoop Map ReduceJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16924332:4(2997-3008)Online publication date: 29-Mar-2017
https://doi.org/10.3233/JIFS-169243
Show More Cited By

Index Terms

Hadoop at home: large-scale computing at a small college

Recommendations

Hadoop at home: large-scale computing at a small college
SIGCSE '09: Proceedings of the 40th ACM technical symposium on Computer science education

The potential benefits of data-intensive scalable computing (DISC) in CS education are considered in the context of a small college with an active student-operated Beowulf cluster initiative. The map-reduce computational model, of great importance in ...
WebMapReduce: an accessible and adaptable tool for teaching map-reduce computing
SIGCSE '11: Proceedings of the 42nd ACM technical symposium on Computer science education

WebMapReduce (WMR) is a strategically simplified user interface for the Hadoop implementation of the map-reduce model for distributed computing on clusters, designed so that novice programmers in an introductory CS courses can perform authentic data-...
CSinParallel: using map-reduce to teach parallel programming concepts across the CS curriculum (abstract only)
SIGCSE '13: Proceeding of the 44th ACM technical symposium on Computer science education

Map-reduce, the cornerstone computational framework for cloud computing applications, has star appeal to draw students to the study of parallelism. Participants will carry out hands-on exercises designed for students at CS1/intermediate/advanced levels ...

Reviews

Reviewer: Arthur Gittleman

One of the interesting challenges of computer science (CS) education is preparing students for the large-scale computing of today's world. This paper presents a model for what can be achieved at even a small college. Google uses a map-reduce model (MapReduce) for large-scale data intensive computing. Hadoop is an open-source implementation of the map-reduce model. The author uses Hadoop on a Beowulf cluster, enhancing the curriculum and student experience in several ways. As part of an undergraduate research course, a team of three students constructed the first Beowulf cluster from retired computers. Students do the system administration. Nineteen students and 11 faculty members in five departments have been involved in research projects using the cluster. The author explains that the map-reduce model provides a good example in hardware design, programming languages, algorithms, and operating systems courses, even if a cluster implementation is not available. Hands-on experience, of course, provides many increased benefits. Two examples of map-reduce programming have been introduced at St. Olaf, in the CS1 course and in a parallel computing systems course. In CS1, students used Hadoop with Wikipedia as a data source. In the parallel computing systems seminar, students used and understood map-reduce computation and parallel computing technology. In order to get a larger cluster for production computing, St. Olaf used virtualization to share new laboratory equipment. A section on the costs of clustering mentions cooling, space, and system administration. Student administrators have other priorities that need to be considered. Renting cluster resources was mentioned as a possibility. Brown shows how Hadoop makes it feasible for a small college to explore data-intensive computing. It is a very impressive contribution that can be emulated at many institutions. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

ACM SIGCSE Bulletin Volume 41, Issue 1

SIGCSE '09

March 2009

553 pages

ISSN:0097-8418

DOI:10.1145/1539024

Issue’s Table of Contents

SIGCSE '09: Proceedings of the 40th ACM technical symposium on Computer science education
March 2009
612 pages
ISBN:9781605581835
DOI:10.1145/1508865
General Chairs:
Sue Fitzgerald
Metropolitan State University
,
Mark Guzdial
Georgia Institute of Technology
,
Program Chairs:
Gary Lewandowski
Xavier University
,
Steven Wolfman
University of British Columbia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2009

Published in SIGCSE Volume 41, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
2,643
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Fandango ARivera W(2018)High Performance Storage for Big Data Analytics and VisualizationHandbook of Research on Big Data Storage and Visualization Techniques10.4018/978-1-5225-3142-5.ch010(254-275)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-3142-5.ch010
Matthews S(2017)Using Phoenix++ MapReduce to introduce undergraduate students to parallel computingJournal of Computing Sciences in Colleges10.5555/3069658.306968232:6(165-174)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.5555/3069658.3069682
Gautam ABedi P(2017)Developing content-based recommender system using Hadoop Map ReduceJournal of Intelligent & Fuzzy Systems10.3233/JIFS-16924332:4(2997-3008)Online publication date: 29-Mar-2017
https://doi.org/10.3233/JIFS-169243
Bogaerts S(2017)One step at a timeJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.024105:C(4-17)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1016/j.jpdc.2016.12.024
Eckroth JAlphonce CTims JCaspersen MEdwards S(2016)Teaching Big Data with a Virtual ClusterProceedings of the 47th ACM Technical Symposium on Computing Science Education10.1145/2839509.2844651(175-180)Online publication date: 17-Feb-2016
https://dl.acm.org/doi/10.1145/2839509.2844651
Alewiwi MOrencik CSavaş E(2016)Efficient top-k similarity document search utilizing distributed file systems and cosine similarityCluster Computing10.1007/s10586-015-0506-019:1(109-126)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s10586-015-0506-0
Huang TChu KShieh CTsai M(2015)Speed-based Load Balancer for Scheduling Reduce Tasks to Process Intermediate Data of MapReduce Applications on Cloud ComputingProceedings of the ASE BigData & SocialInformatics 201510.1145/2818869.2818880(1-6)Online publication date: 7-Oct-2015
https://dl.acm.org/doi/10.1145/2818869.2818880
Orencik CAlewiwi MSavas E(2015)Secure Sketch Search for Document SimilarityProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0110.1109/Trustcom.2015.489(1102-1107)Online publication date: 20-Aug-2015
https://dl.acm.org/doi/10.1109/Trustcom.2015.489
Lourenço JAbramova VVieira MCabral BBernardino J(2015)NoSQL Databases: A Software Engineering PerspectiveNew Contributions in Information Systems and Technologies10.1007/978-3-319-16486-1_73(741-750)Online publication date: 2015
https://doi.org/10.1007/978-3-319-16486-1_73
Zhuang YMatthews CTredger SNess SShort-Gershman JJi LRebenich NFrench AErickson JClarkson KCoady YMcGeer RDougherty JNagel KDecker AEiselt K(2014)Taking a walk on the wild sideProceedings of the 45th ACM technical symposium on Computer science education10.1145/2538862.2538931(535-540)Online publication date: 5-Mar-2014
https://dl.acm.org/doi/10.1145/2538862.2538931
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations