Abstract
Understanding phenomena based on the facts—on the data—is a touchstone of data science. The power of evidence-based, inductive reasoning distinguishes data science from science. Hence, this chapter argues that, in its initial stages, data science applications and the data science discipline itself be developed inductively and deductively in a virtuous cycle.
The virtues of the twentieth Century Virtuous Cycle (aka virtuous hardware-software cycle, Intel-Microsoft virtuous cycle) that built the personal computer industry (National Research Council, The new global ecosystem in advanced computing: Implications for U.S. competitiveness and national security. The National Academies Press, Washington, DC, 2012) were being grounded in reality and being self-perpetuating—more powerful hardware enabled more powerful software that required more powerful hardware, enabling yet more powerful software, and so forth. Being grounded in reality—solving genuine problems at scale—was critical to its success, as it will be for data science. While it lasted, it was self-perpetuating, due to a constant flow of innovation, and to benefitting all participants—producers, consumers, the industry, the economy, and society. It is a wonderful success story for twentieth Century applied science. Given the success of virtuous cycles in developing modern technology, virtuous cycles grounded in reality should be used to develop data science, driven by the wisdom of the sixteenth Century proverb, Necessity is the mother of invention.
This chapter explores this hypothesis using the example of the evolution of database management systems over the last 40 years. For the application of data science to be successful and virtuous, it should be grounded in a cycle that encompasses industry (i.e., real problems), research, development, and delivery. This chapter proposes applying the principles and lessons of the virtuous cycle to the development of data science applications; to the development of the data science discipline itself, for example, a data science method; and to the development of data science education; all focusing on the critical role of collaboration in data science research and management, thereby addressing the development challenges faced by the more than 150 Data Science Research Institutes (DSRIs) worldwide. A companion chapter (Brodie, What is Data Science, in Braschler et al (Eds.), Applied data science – Lessons learned for the data-driven business, Springer 2019), addresses essential questions that DSRIs should answer in preparation for the developments proposed here: What is data science? What is world-class data science research?
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ACM. (2015). Michael Stonebraker, 2014 Turing Award Citation, Association of Computing Machinery, April 2015. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm
AJTR. (2018). American Journal of Translational Research, e-Century Publishing Corporation. http://www.ajtr.org
Angwin, J., Larson, J., Mattu, S., Kirchner, L., Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks, ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Braschler, M., Stadelmann, T., & Stockinger, K. (Eds.). (2019). Applied data science – Lessons learned for the data-driven business. Berlin: Springer.
Brodie, M. L. (2015). Understanding data science: An emerging discipline for data-intensive discovery. In S. Cutt (Ed.), Getting data right: Tackling the challenges of big data volume and variety. Sebastopol, CA: O’Reilly Media.
Brodie, M. L. (2019a). What is data science? In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.
Brodie, M. L. (Ed.). (2019b, January). Making databases work: The pragmatic wisdom of Michael Stonebraker. ACM Books series (Vol. 22). San Rafael, CA: Morgan & Claypool.
Chipman, I., (2016). How data analytics is going to transform all industries. Stanford Engineering Magazine, February 13, 2016.
Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.
Davenport, T. H., & Patil, D. J. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70–76.
Demirkan, H. & Dal, B. (2014). The data economy: Why do so many analytics projects fail? Analytics Magazine, July/August 2014
Dohzen, T., Pamuk, M., Seong, S. W., Hammer, J., & Stonebraker, M. (2006). Data integration through transform reuse in the Morpheus project (pp. 736–738). ACM SIGMOD International Conference on Management of Data, Chicago, IL, June 27–29, 2006.
Economist. (2017). Who’s afraid of disruption? The business world is obsessed with digital disruption, but it has had little impact on profits, The Economist, September 30, 2017.
Economist. (March 2018a). GrAIt expectations, Special Report AI in Business, The Economist, March 31, 2018.
Economist. (March 2018b). External providers: Leave it to the experts, Special report AI in business, The Economist, March 31, 2018.
Economist. (March 2018c). The future: Two-faced, Special report AI in business, The Economist, March 31, 2018.
Economist. (March 2018d). Supply chains: In algorithms we trust, Special report AI in business, The Economist, March 31, 2018.
Economist. (March 2018e). America v China: The battle for digital supremacy: America’s technological hegemony is under threat from China, The Economist, March 15, 2018.
Economist. (2018f). A study finds nearly half of jobs are vulnerable to automation, The Economist, April 24, 2018.
Fang, F. C., & Casadevall, A. (2010). Lost in translation-basic science in the era of translational research. Infection and Immunity, 78(2), 563–566.
Forrester. (2015a). Brief: Why data-driven aspirations fail. Forrester Research, Inc., October 7, 2015.
Forrester. (2015b). Predictions 2016: The path from data to action for marketers: How marketers will elevate systems of insight. Forrester Research, November 9, 2015.
Forrester. (2017). The Forrester WaveTM: Data preparation tools, Q1 2017, Forrester, March 13, 2017.
Gartner G00310700. (2016). Survey analysis: Big data investments begin tapering in 2016, Gartner, September 19, 2016.
Gartner G00316349. (2016). Predicts 2017: Analytics strategy and technology, Gartner, report G00316349, November 30, 2016.
Gartner G00301536. (2017). 2017 Magic quadrant for data science platforms, 14 February 2017.
Gartner G00315888. (2017) Market guide for data preparation, Gartner, 14 December 2017.
Gartner G00326671. (2017). Critical capabilities for data science platforms, Gartner, June 7, 2017.
Gartner G00326456. (2018). Magic quadrant for data science and machine-learning platforms, 22 February 2018.
Gartner G00326555. (2018). Magic quadrant for analytics and business intelligence platforms, 26 February 2018.
Gartner G00335261. (2018) Critical capabilities for data science and machine learning platforms, 4 April 2018.
Harari, Y. N. (2016). Homo Deus: A brief history of tomorrow, Random House, 2016.
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124.
Lee, K-F., The real threat of artificial intelligence. New York Times, June 24, 2017.
Lohr, S. & Singer, N. (2016) How data failed us in calling an election. New York Times, November 10, 2016.
Marr, B., (2017). How big data is transforming every business. In Every Industry, Forbes.com, November 21, 2017.
Meierhofer, J., Stadelmann, T., & Cieliebak, M. (2019). Data products. In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.
Nagarajan, M., et al. (2015). Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15) (pp. 2019–2028). New York, NY: ACM.
National Research Council. (2012). The new global ecosystem in advanced computing: Implications for U.S. competitiveness and national security. Washington, DC: The National Academies Press.
Naumann, F. (2018). Genealogy of relational database management systems. Hasso-Plattner Institüt, Universität, Potsdam. https://hpi.de/naumann/projects/rdbms-genealogy.html
Nedelkoska, L., & Quintini, G. (2018) Automation, skills use and training. OECD Social, Employment and Migration Working Papers, No. 202, OECD Publishing, Paris, doi:https://doi.org/10.1787/2e2f4eea-en.
New York Times. (2018). H&M, a Fashion Giant, has a problem: $4.3 Billion in unsold clothes. New York Times, March 27, 2018.
O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. New York, NY: Crown Publishing Group.
Olson, M. (2019). Stonebraker and open source, to appear in (Brodie 2019b)
Palmer, A. (2019) How to create & run a Stonebraker Startup – The Real Story, to appear in (Brodie 2019b).
Piatetsky, G. (2016). Trump, failure of prediction, and lessons for data scientists, KDnuggets, November 2016.
Ramanathan, A. (2016). The data science delusion, Medium.com, November 18, 2016.
Russel, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Boston, MA: Pearson Education.
Spangler, S., et al. (2014). Automated hypothesis generation based on mining scientific literature. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14) (pp. 1877–1886). New York, NY: ACM.
STM. (2018). Science Translational Medicine, a journal of the American Association for the Advancement of Science.
Stonebraker, M. (2019a). How to start a company in 5 (not so) easy steps, to appear in (Brodie 2019b).
Stonebraker, M. (2019b). Where do good ideas come from and how to exploit them? to appear in (Brodie 2019b).
Stonebraker, M., & Kemnitz, G. (1991). The postgres next generation database management system. Communications of the ACM, 34(10), 78–92.
Stonebraker, M., Wong, E., Kreps, P., & Held, G. (1976). The design and implementation of INGRES. ACM Transactions on Database Systems, 1(3), 189–222.
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., et al. (2005). C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 2005.
Stonebraker, M., Castro Fernandez, R., Deng, D., & Brodie, M. L. (2016a). Database decay and what to do about it. Communications of the ACM, 60(1), 10–11.
Stonebraker, M., Deng, D., & Brodie, M. L. (2016b). Database decay and how to avoid it. In Proceedings of the IEEE International Conference on Big Data (pp. 1–10), Washington, DC.
Stonebraker, M., Deng, D., & Brodie, M. L. (2017). Application-database co-evolution: A new design and development paradigm. In New England Database Day (pp. 1–3).
van der Aalst, W. M. P. (2014). Data scientist: The engineer of the future. In K. Mertins, F. Bénaben, R. Poler, & J.-P. Bourrières (Eds.) Presented at the Enterprise Interoperability VI (pp. 13–26). Cham: Springer International Publishing.
Veeramachaneni, K. (2016). Why you’re not getting value from your data science. Harvard Business Review, December 7, 2016.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
Acknowledgments
Thanks to Dr. Thilo Stadelmann, Zurich University of Applied Sciences, Institute for Applied Information Technology in the Swiss Fachhochschule system, for insights into these ideas; and to Dr. He H. (Anne) Ngu, Texas State University, for insights into applying these principles and pragmatics to the development of Texas State University’s Twenty-First Century Applied PhD Program in Computer Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Brodie, M.L. (2019). On Developing Data Science. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-11821-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11820-4
Online ISBN: 978-3-030-11821-1
eBook Packages: Computer ScienceComputer Science (R0)