research-article

Open access

Learning Big Data Systems via Emulation

Author:

Wensheng WuAuthors Info & Claims

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Pages 1449 - 1455

https://doi.org/10.1145/3626252.3630888

Published: 07 March 2024 Publication History

Abstract

Big data systems are becoming an integral part of computing and data science curriculum. However, the current curriculum is largely focused on how to use the systems. An effective approach to learning the internals of big data systems is through emulation. In this paper, we report on a study where students in a graduate database course were asked to complete a course project on emulating big data systems such as Hadoop and Spark. We present the design of the emulation projects and examine the impact of the projects on students' learning. Our key finding is that the emulation projects can greatly improve students' self-efficacy in completing tasks that require in-depth knowledge and skills on big data systems.

References

[1]

Albert Bandura et al. 2006. Guide for constructing self-efficacy scales. Self-efficacy beliefs of adolescents, Vol. 5, 1 (2006), 307--337.

[2]

Michael David Black and Priyadarshini Komala. 2011. A full system x86 simulator for teaching computer organization. In Proceedings of the 42nd ACM technical symposium on Computer science education. 365--370.

Digital Library

[3]

Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow. 2019. MongoDB: the definitive guide: powerful and scalable data storage. O'Reilly Media.

[4]

Richard A Brown. 2009. Hadoop at home: large-scale computing at a small college. In Proceedings of the 40th ACM technical symposium on Computer science education. 106--110.

Digital Library

[5]

datanovia. 2018. COMPARING MEANS OF TWO GROUPS IN R. https://www.datanovia.com/en/lessons/wilcoxon-test-in-r.

[6]

Debzani Deb, Muztaba Fuad, and Keith Irwin. 2019. A module-based approach to teaching big data and cloud computing topics at cs undergraduate level. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 2--8.

Digital Library

[7]

Wenliang Du, Honghao Zeng, and Kyungrok Won. 2022. SEED emulator: an internet emulator for research and education. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks. 101--107.

Digital Library

[8]

Joshua Eckroth. 2016. Teaching big data with a virtual cluster. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 175--180.

Digital Library

[9]

Jesse Eickholt and Sharad Shrestha. 2017. Teaching big data and cloud computing with a physical cluster. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 177--181.

Digital Library

[10]

Alan Fekete, Judy Kay, and Uwe Röhm. 2021. A data-centric computing curriculum for a data science major. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 865--871.

Digital Library

[11]

Adrian Gardiner, Cheryl Aasheim, Paige Rutner, and Susan Williams. 2018. Skill requirements in big data: A content analysis of job advertisements. Journal of Computer Information Systems, Vol. 58, 4 (2018), 374--384.

[12]

Patrick Garrity, Timothy Yates, Richard Brown, and Elizabeth Shoop. 2011. WebMapReduce: an accessible and adaptable tool for teaching map-reduce computing. In Proceedings of the 42nd ACM technical symposium on Computer science education. 183--188.

Digital Library

[13]

Hadoop. 2022. Streaming. hrefhttps://hadoop.apache.org/docs/r1.2.1/streaming.htmlhttps://hadoop.apache.org/docs/r1.2.1/streaming.html.

[14]

Marwah Ahmed Halwani, S Yasaman Amirkiaee, Nicholas Evangelopoulos, and Victor Prybutok. 2022. Job qualifications study for data science and big data professions. Information Technology & People, Vol. 35, 2 (2022), 510--525.

[15]

influentialpoints. 2023. Wilcoxon matched pairs signed rank test: Use & misuse. https://influentialpoints.com/Training.

[16]

Laurence Moroney, Anglin Moroney, and Anglin. 2017. Definitive Guide to Firebase. Springer.

[17]

Ariel S Rabkin, Charles Reiss, Randy Katz, and David Patterson. 2012. Experiences teaching MapReduce in the cloud. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 601--606.

Digital Library

[18]

Vennila Ramalingam and Susan Wiedenbeck. 1998. Development and validation of scores on a computer programming self-efficacy scale and group analyses of novice programmer self-efficacy. Journal of Educational Computing Research, Vol. 19, 4 (1998), 367--381.

[19]

Bina Ramamurthy. 2016. A practical and sustainable model for learning and teaching data science. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 169--174.

Digital Library

[20]

rpyc. 2023. RPyC - Transparent, Symmetric Distributed Computing. hrefhttps://rpyc.readthedocs.io/en/latest/https://rpyc.readthedocs.io/en/latest/.

[21]

Mariam Salloum, Daniel Jeske, Wenxiu Ma, Vagelis Papalexakis, Christian Shelton, Vassilis Tsotras, and Shuheng Zhou. 2021. Developing an interdisciplinary data science program. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 509--515.

Digital Library

[22]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). Ieee, 1--10.

Digital Library

[23]

Luiz Filipe M Vieira, Marcos Augusto M Vieira, and Newton J Vieira. 2004. Language emulator, a helpful toolkit in the learning process of computer theory. ACM Sigcse Bulletin, Vol. 36, 1 (2004), 135--139.

Digital Library

[24]

Tom White. 2012. Hadoop: The definitive guide. " O'Reilly Media, Inc.".

Digital Library

[25]

Wikipedia. 2023. Create, read, update and delete. hrefhttps://en.wikipedia.org/wiki/Create,_read,_update_and_delete https://en.wikipedia.org/wiki/Create,_read,_update_and_delete.

[26]

Gregory S Wolffe, William Yurcik, Hugh Osborne, and Mark A Holliday. 2002. Teaching computer organization/architecture with limited resources using simulators. In Proceedings of the 33rd SIGCSE technical symposium on Computer science education. 176--180.

Digital Library

[27]

Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10).

[28]

Yanyan Zhuang, Chris Matthews, Stephen Tredger, Steven Ness, Jesse Short-Gershman, Li Ji, Niko Rebenich, Andrew French, Josh Erickson, Kyliah Clarkson, et al. 2014. Taking a walk on the wild side: teaching cloud computing on distributed research testbeds. In Proceedings of the 45th ACM technical symposium on Computer science education. 535--540. io

Digital Library

Index Terms

Learning Big Data Systems via Emulation
1. Information systems
  1. Data management systems
2. Social and professional topics
  1. Professional topics
    1. Computing education

Recommendations

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Big Data Analytics Techniques in Virtual Screening for Drug Discovery
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications

Virtual screening (VS) is a computational method used in the drug discovery process by searching large libraries of small molecules to identify that represent leads for certain target. According to the use of information about the ligand, the target or ...
An experimental survey on big data frameworks
Abstract
Recently, increasingly large amounts of data are generated from a variety of sources.Existing data processing technologies are not suitable to cope with the huge amounts of generated data. Yet, many research works focus on Big Data, a buzzword ...
Highlights
- An overview of most popular Big Data frameworks.
- A categorization of the presented frameworks and techniques.
- An extensive set of experiments to evaluate the studied Big Data frameworks.
- A description of best practices related ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

March 2024

1583 pages

ISBN:9798400704239

DOI:10.1145/3626252

General Chairs:
Ben Stephenson
University of Calgary, Canada
,
Jeffrey A. Stone
Penn State University
,
Program Chairs:
Lina Battestilli
North Carolina State University, USA
,
Samuel A. Rebelsky
Grinnell College
,
Libby Shoop
Macalester College

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGCSE 2024

Sponsor:

SIGCSE

SIGCSE 2024: The 55th ACM Technical Symposium on Computer Science Education

March 20 - 23, 2024

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE Virtual 2024

Sponsor:
sigcse

1st ACM Virtual Global Computing Education Conference

December 5 - 8, 2024

Virtual Event , NC , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
89
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)21

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents