Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626252.3630888acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article
Open access

Learning Big Data Systems via Emulation

Published: 07 March 2024 Publication History

Abstract

Big data systems are becoming an integral part of computing and data science curriculum. However, the current curriculum is largely focused on how to use the systems. An effective approach to learning the internals of big data systems is through emulation. In this paper, we report on a study where students in a graduate database course were asked to complete a course project on emulating big data systems such as Hadoop and Spark. We present the design of the emulation projects and examine the impact of the projects on students' learning. Our key finding is that the emulation projects can greatly improve students' self-efficacy in completing tasks that require in-depth knowledge and skills on big data systems.

References

[1]
Albert Bandura et al. 2006. Guide for constructing self-efficacy scales. Self-efficacy beliefs of adolescents, Vol. 5, 1 (2006), 307--337.
[2]
Michael David Black and Priyadarshini Komala. 2011. A full system x86 simulator for teaching computer organization. In Proceedings of the 42nd ACM technical symposium on Computer science education. 365--370.
[3]
Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow. 2019. MongoDB: the definitive guide: powerful and scalable data storage. O'Reilly Media.
[4]
Richard A Brown. 2009. Hadoop at home: large-scale computing at a small college. In Proceedings of the 40th ACM technical symposium on Computer science education. 106--110.
[5]
datanovia. 2018. COMPARING MEANS OF TWO GROUPS IN R. https://www.datanovia.com/en/lessons/wilcoxon-test-in-r.
[6]
Debzani Deb, Muztaba Fuad, and Keith Irwin. 2019. A module-based approach to teaching big data and cloud computing topics at cs undergraduate level. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 2--8.
[7]
Wenliang Du, Honghao Zeng, and Kyungrok Won. 2022. SEED emulator: an internet emulator for research and education. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks. 101--107.
[8]
Joshua Eckroth. 2016. Teaching big data with a virtual cluster. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 175--180.
[9]
Jesse Eickholt and Sharad Shrestha. 2017. Teaching big data and cloud computing with a physical cluster. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 177--181.
[10]
Alan Fekete, Judy Kay, and Uwe Röhm. 2021. A data-centric computing curriculum for a data science major. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 865--871.
[11]
Adrian Gardiner, Cheryl Aasheim, Paige Rutner, and Susan Williams. 2018. Skill requirements in big data: A content analysis of job advertisements. Journal of Computer Information Systems, Vol. 58, 4 (2018), 374--384.
[12]
Patrick Garrity, Timothy Yates, Richard Brown, and Elizabeth Shoop. 2011. WebMapReduce: an accessible and adaptable tool for teaching map-reduce computing. In Proceedings of the 42nd ACM technical symposium on Computer science education. 183--188.
[13]
Hadoop. 2022. Streaming. hrefhttps://hadoop.apache.org/docs/r1.2.1/streaming.htmlhttps://hadoop.apache.org/docs/r1.2.1/streaming.html.
[14]
Marwah Ahmed Halwani, S Yasaman Amirkiaee, Nicholas Evangelopoulos, and Victor Prybutok. 2022. Job qualifications study for data science and big data professions. Information Technology & People, Vol. 35, 2 (2022), 510--525.
[15]
influentialpoints. 2023. Wilcoxon matched pairs signed rank test: Use & misuse. https://influentialpoints.com/Training.
[16]
Laurence Moroney, Anglin Moroney, and Anglin. 2017. Definitive Guide to Firebase. Springer.
[17]
Ariel S Rabkin, Charles Reiss, Randy Katz, and David Patterson. 2012. Experiences teaching MapReduce in the cloud. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 601--606.
[18]
Vennila Ramalingam and Susan Wiedenbeck. 1998. Development and validation of scores on a computer programming self-efficacy scale and group analyses of novice programmer self-efficacy. Journal of Educational Computing Research, Vol. 19, 4 (1998), 367--381.
[19]
Bina Ramamurthy. 2016. A practical and sustainable model for learning and teaching data science. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 169--174.
[20]
rpyc. 2023. RPyC - Transparent, Symmetric Distributed Computing. hrefhttps://rpyc.readthedocs.io/en/latest/https://rpyc.readthedocs.io/en/latest/.
[21]
Mariam Salloum, Daniel Jeske, Wenxiu Ma, Vagelis Papalexakis, Christian Shelton, Vassilis Tsotras, and Shuheng Zhou. 2021. Developing an interdisciplinary data science program. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 509--515.
[22]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). Ieee, 1--10.
[23]
Luiz Filipe M Vieira, Marcos Augusto M Vieira, and Newton J Vieira. 2004. Language emulator, a helpful toolkit in the learning process of computer theory. ACM Sigcse Bulletin, Vol. 36, 1 (2004), 135--139.
[24]
Tom White. 2012. Hadoop: The definitive guide. " O'Reilly Media, Inc.".
[25]
Wikipedia. 2023. Create, read, update and delete. hrefhttps://en.wikipedia.org/wiki/Create,_read,_update_and_delete https://en.wikipedia.org/wiki/Create,_read,_update_and_delete.
[26]
Gregory S Wolffe, William Yurcik, Hugh Osborne, and Mark A Holliday. 2002. Teaching computer organization/architecture with limited resources using simulators. In Proceedings of the 33rd SIGCSE technical symposium on Computer science education. 176--180.
[27]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10).
[28]
Yanyan Zhuang, Chris Matthews, Stephen Tredger, Steven Ness, Jesse Short-Gershman, Li Ji, Niko Rebenich, Andrew French, Josh Erickson, Kyliah Clarkson, et al. 2014. Taking a walk on the wild side: teaching cloud computing on distributed research testbeds. In Proceedings of the 45th ACM technical symposium on Computer science education. 535--540. io

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1
March 2024
1583 pages
ISBN:9798400704239
DOI:10.1145/3626252
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2024

Check for updates

Author Tags

  1. big data systems
  2. emulation
  3. hadoop
  4. mapreduce
  5. nosql
  6. spark

Qualifiers

  • Research-article

Conference

SIGCSE 2024
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE Virtual 2024
1st ACM Virtual Global Computing Education Conference
December 5 - 8, 2024
Virtual Event , NC , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 89
    Total Downloads
  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)21
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media