Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

The R Language: A Powerful Tool for Taming Big Data

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Definition

The R language (R Core Team 2017; Chambers 2008; Matloff 2011) is currently the most popular tool in the general data science field. It features outstanding graphics capabilities and a rich set of more than 10,000 library packages to draw upon. (Other notable languages in data science are Python and Julia. Python is popular among those trained in computer science. Julia, a new language, has as top priority producing fast code.) Its interfaces to SQL databases and the C/C++ language are first rate. All of this, along with recent developments regarding memory issues, makes R well poised as a highly effective tool in Big Data applications. In this chapter, the use of R in Big Data settings will be presented.

It should be noted that Big Data can be “big” in one of two ways, phrased in terms of the classical n × p matrix representing a dataset:

  • Big-n: Large number of data points.

  • Big-p: Large number of variables/features.

Both senses will come into play later. For now, though,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Breshears C (2009) The art of concurrency: a thread monkey’s guide to writing parallel applications. O’Reilly Media, Sebastopol

    Google Scholar 

  • Bühlmann P, Drineas P, Kane M, van der Laan M (2016) Handbook of big data. Chapman & Hall/CRC handbooks of modern statistical methods. CRC Press, Boca Raton

    Google Scholar 

  • Chambers J (2008) Software for data analysis: programming with R. Statistics and computing. Springer, New York

    Google Scholar 

  • Chang W (2013) R graphics cookbook. Oreilly and associate series. O’Reilly Media, Sebastopol, CA

    Google Scholar 

  • Dowle M (2017) Data analysis using data.table. https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

  • Eddelbuettel D (2013) Seamless R and C++ integration with Rcpp. Use R! Springer, New York

    Google Scholar 

  • Inselberg A (2009) Parallel coordinates: visual multidimensional geometry and its applications. Springer, New York

    Google Scholar 

  • Kane MJ, Emerson J, Weston S (2013) Scalable strategies for computing with massive data. J Stat Softw 55(14):1–19

    Google Scholar 

  • Lichman M (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Luraschi J, Ushey K, Allaire J (2017) Sparklyr: R interface to Apache Spark. https://CRAN.R-project.org/package=sparklyr

  • Matloff N (2011) The art of R programming: a tour of statistical software design. No starch press series. No Starch Press, San Francisco

    Google Scholar 

  • Matloff N (2015) Parallel computing for data science: with examples in R, C++ and CUDA. Chapman & Hall/CRC the R series. CRC Press, Boca Raton

    Google Scholar 

  • Matloff N (2016) Software Alchemy: turning complex statistical computations into embarassingly–parallel ones. J Stat Softw 71(4):1–15

    Google Scholar 

  • Matloff N, Fitzgerald C, Davis R, Yancey R, Huang S (2017a) partools: tools for the ‘Parallel’ package. https://github.com/matloff/partools

  • Matloff N, Yang V, Nguyen H (2017b) cdparcoord: top frequency-based parallel coordinates. https://CRAN.R-project.org/package=cdparcoodr

  • Murrell P (2011) R graphics, 2nd edn. Chapman & Hall/CRC the R series. Taylor & Francis, Boca Raton, FL

    Google Scholar 

  • Nielsen F (2016) Introduction to HPC with MPI for data science. Undergraduate topics in computer science. Springer International Publishing, Cham

    Google Scholar 

  • Plotly Technologies Inc (2015) Collaborative data science. https://plot.ly

  • R Core Team (2017) R: a language and environment for statistical computing. In: R foundation for statistical computing, Vienna. https://www.R-project.org/

  • Reinders J (2007) Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly series. O’Reilly Media, Sebastopol

    Google Scholar 

  • Sarkar D (2008) Lattice: multivariate data visualization with R. Use R! Springer, New York

    Google Scholar 

  • Unwin A, Theus M, Hofmann H (2007) Graphics of large datasets: visualizing a million. Statistics and computing. Springer, New York

    Google Scholar 

  • Weston S (2017) foreach: provides foreach looping construct for R. https://CRAN.R-project.org/package=foreach

  • Wickham H (2016) Ggplot2: elegant graphics for data analysis. Use R! Springer International Publishing, New York

    Google Scholar 

  • Yang V, Nguyen H, Matloff N, Xie Y (2017) Top-frequency parallel coordinates plots (arxiv). arXiv:1709.00665

    Google Scholar 

  • Yu H (2014) [Rmpi] news. http://www.stats.uwo.ca/faculty/yu/Rmpi/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norman Matloff .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Matloff, N., Fitzgerald, C., Yancey, R. (2018). The R Language: A Powerful Tool for Taming Big Data. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_294-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_294-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Living Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics