R Programming Fundamentals: Deal with data using various modeling techniques
()
About this ebook
Study data analysis and visualization to successfully analyze data with R
Key Features
- Get to grips with data cleaning methods
- Explore statistical concepts and programming in R, including best practices
- Build a data science project with real-world examples
Book Description
R Programming Fundamentals, focused on R and the R ecosystem, introduces you to the tools for working with data. To start with, you'll understand you how to set up R and RStudio, followed by exploring R packages, functions, data structures, control flow, and loops.
Once you have grasped the basics, you'll move on to studying data visualization and graphics. You'll learn how to build statistical and advanced plots using the powerful ggplot2 library. In addition to this, you'll discover data management concepts such as factoring, pivoting, aggregating, merging, and dealing with missing values.
By the end of this book, you'll have completed an entire data science project of your own for your portfolio or blog.
What you will learn
- Use basic programming concepts of R such as loading packages, arithmetic functions, data structures, and flow control
- Import data to R from various formats such as CSV, Excel, and SQL
- Clean data by handling missing values and standardizing fields
- Perform univariate and bivariate analysis using ggplot2
- Create statistical summary and advanced plots such as histograms, scatter plots, box plots, and interaction plots
- Apply data management techniques, such as factoring, pivoting, aggregating, merging, and dealing with missing values, on the example datasets
Who this book is for
R Programming Fundamentals is for you if you are an analyst who wants to grow in the field of data science and explore the latest tools.
Related to R Programming Fundamentals
Related ebooks
Learning RStudio for R Statistical Computing Rating: 4 out of 5 stars4/5Modern R Programming Cookbook: Recipes to simplify your statistical applications Rating: 0 out of 5 stars0 ratingsPractical Data Wrangling: Expert techniques for transforming your raw data into a valuable source for analytics Rating: 0 out of 5 stars0 ratingsData Science with SQL Server Quick Start Guide: Integrate SQL Server with data science Rating: 0 out of 5 stars0 ratingsData Wrangling with R: Load, explore, transform and visualize data for modeling with tidyverse libraries Rating: 0 out of 5 stars0 ratingsExpert Data Visualization Rating: 0 out of 5 stars0 ratingsR Object-oriented Programming Rating: 3 out of 5 stars3/5Learning pandas - Second Edition Rating: 0 out of 5 stars0 ratingsTableau Prep Cookbook: Use Tableau Prep to clean, combine, and transform your data for analysis Rating: 0 out of 5 stars0 ratingsEffective Amazon Machine Learning Rating: 0 out of 5 stars0 ratingsHands-On Data Analysis with Scala: Perform data collection, processing, manipulation, and visualization with Scala Rating: 0 out of 5 stars0 ratingsTableau 10.0 Best Practices Rating: 0 out of 5 stars0 ratingsLearning Tableau 2019: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsSQL Server 2017 Machine Learning Services with R: Data exploration, modeling, and advanced analytics Rating: 0 out of 5 stars0 ratingsExtending Excel with Python and R: Unlock the potential of analytics languages for advanced data manipulation and visualization Rating: 0 out of 5 stars0 ratingsBig Data Analytics with Hadoop 3: Build highly effective analytics solutions to gain valuable insight into your big data Rating: 0 out of 5 stars0 ratingsHands-On Exploratory Data Analysis with R: Become an expert in exploratory data analysis using R packages Rating: 0 out of 5 stars0 ratingsHands-On Data Science with R: Techniques to perform data manipulation and mining to build smart analytical models using R Rating: 0 out of 5 stars0 ratingsData Analytics with SAS: Explore your data and get actionable insights with the power of SAS (English Edition) Rating: 0 out of 5 stars0 ratingsBecome a Python Data Analyst: Perform exploratory data analysis and gain insight into scientific computing using Python Rating: 0 out of 5 stars0 ratingsR Graph Essentials Rating: 0 out of 5 stars0 ratingsMastering PostgreSQL 11: Expert techniques to build scalable, reliable, and fault-tolerant database applications, 2nd Edition Rating: 0 out of 5 stars0 ratingsMathematica Data Analysis: Learn and explore the fundamentals of data analysis with power of Mathematica Rating: 0 out of 5 stars0 ratingsBig Data Analytics with Java Rating: 0 out of 5 stars0 ratingsPython Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI Rating: 0 out of 5 stars0 ratings
Data Visualization For You
Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5How to Lie with Maps Rating: 4 out of 5 stars4/5Teach Yourself VISUALLY Power BI Rating: 0 out of 5 stars0 ratingsHands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsEffective Data Storytelling: How to Drive Change with Data, Narrative and Visuals Rating: 4 out of 5 stars4/5How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics Rating: 0 out of 5 stars0 ratingsData Visualization with Excel Dashboards and Reports Rating: 4 out of 5 stars4/5Data Visualization: A Practical Introduction Rating: 5 out of 5 stars5/5DAX Patterns: Second Edition Rating: 5 out of 5 stars5/5How to Become a Data Analyst: My Low-Cost, No Code Roadmap for Breaking into Tech Rating: 0 out of 5 stars0 ratingsVisualizing Graph Data Rating: 0 out of 5 stars0 ratingsTableau For Dummies Rating: 4 out of 5 stars4/5The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios Rating: 4 out of 5 stars4/5Data Pipelines with Apache Airflow Rating: 0 out of 5 stars0 ratingsData Structures & Algorithms Interview Questions You'll Most Likely Be Asked Rating: 1 out of 5 stars1/5Exploratory Data Analysis: Uncovering Insights from Your Data Rating: 0 out of 5 stars0 ratingsVisual Analytics with Tableau Rating: 0 out of 5 stars0 ratingsData Analytics & Visualization All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsLearn Microsoft Fabric: A practical guide to performing data analytics in the era of artificial intelligence Rating: 0 out of 5 stars0 ratingsData Visualization For Dummies Rating: 2 out of 5 stars2/5Regression Analysis with Python Rating: 0 out of 5 stars0 ratingsLearning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsTableau 10 Business Intelligence Cookbook Rating: 0 out of 5 stars0 ratings
Reviews for R Programming Fundamentals
0 ratings0 reviews
Book preview
R Programming Fundamentals - Kaelen Medeiros
R Programming Fundamentals
Deal with data using various modeling techniques
Kaelen Medeiros
BIRMINGHAM - MUMBAI
R Programming Fundamentals
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Acquisitions Editors: Aditya Date, Bridget Neale
Content Development Editor: Madhura Bal
Production Coordinator: Ratan Pote
First published: September 2018
Production reference: 1270918
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78961-299-8
www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why Subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Packt.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the Author
Kaelen Medeiros is a content quality developer at DataCamp, where she works to improve course content and tracks quality metrics across the company. She also works as a data scientist/developer for HealthLabs, who develop automated methods for analyzing large amounts of medical data. She received her MS in biostatistics from Louisiana State University Health Sciences Center in 2016. Outside of work, she has one cat, listens to way too many podcasts, and enjoys running.
Packt is Searching for Authors Like You
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Table of Contents
Title Page
Copyright and Credits
R Programming Fundamentals
Packt Upsell
Why Subscribe?
Packt.com
Contributors
About the Author
Packt is Searching for Authors Like You
Preface
Who This Book is for
What This Book Covers
To Get the Most Out of This Book
Download the Example Code Files
Conventions Used
Get in Touch
Reviews
Introduction to R
Using R and RStudio, and Installing Useful Packages
Using R and RStudio
Executing Basic Functions in the R Console
Setting up a New Project
Installing Packages
Activity: Installing the Tidyverse Packages
Variable Types and Data Structures
Variable Types
Numeric and Integers
Character
Dates
Activity: Identifying Variable Classes and Types
Data Structures
Vectors
Lists
Matrices
Dataframes
Activity: Creating Vectors, Lists, Matrices, and Dataframes
Basic Flow Control
If/else
For loop
While loop
Activity: Building Basic Loops
Data Import and Export
Excel Spreadsheets
Activity: Exporting and Importing the mtcars Dataset
Getting Help with R
Package Documentation and Vignettes
Activity: Exploring the Introduction to dplyr Vignette
RStudio Community, Stack Overflow, and the Rest of the Web
Summary
Data Visualization and Graphics
Creating Base Plots
The plot() Function
Factor Variables
Model Objects
Plotting More Than One Plot at a Time
Creating and Plotting a Linear Model Object
Titles and Axis Labels
Changing the Color of Base Plots
Activity: Recreating Plots with Base Plot Methods
ggplot2
ggplot2 Basics
Histogram
Creating Histograms using ggplot2
Bar Chart
Creating a Bar Chart with ggplot2 using Two Different Methods
Scatterplot
Creating a Scatterplot of Two Continuous Variables
Boxplot
Creating Boxplots using ggplot2
Activity: Recreating Plots Using ggplot2
Digging into aes()
Bar Chart
Using Different Bar Chart Aesthetic Options
Facet Wrapping and Gridding
Utilizing Facet Wrapping and Gridding to Visualize Data Effectively
Boxplot + coord_flip()
Scatterplot
Utilizing Different Aesthetics for Scatterplots, Including Shapes, Colors, and Transparencies
Activity: Utilizing ggplot2 Aesthetics
Extending the Plots with Titles, Axis Labels, and Themes
Interactive Plots
Plotly
Shiny
Exploring Shiny and Plotly
Summary
Data Management
Factor Variables
Creating Factor Variables in a Dataset
Using ifelse() Statements
Using the recode() Function
Examining and Changing the Levels of Pre-existing Factor Variables
Creating an Ordered Factor Variable
Activity: Creating and Manipulating Factor Variables
Summarizing Data
Data Summarization Tables
Tables in R
Creating Different Tables Using the table() Function
Using dplyr Methods to Create Data Summary Tables
Activity: Creating Data Summarization Tables
Summarizing Data with the Apply Family
Using the apply() Function to Create Numeric Data Summaries
Activity: Implementing Data Summary
Splitting, Combining, Merging, and Joining Datasets
Splitting and Combining Data and Datasets
Splitting and Unsplitting Data with Base R and the dplyr Methods
Splitting Datasets into Lists and Then Back Again
Combining Data
Combining Data with rbind()
Combining Matrices of Objects into Dataframes
Splitting Strings
Using stringr Package to Manipulate a Vector of Names
Combining Strings Using Base R Methods
Activity: Demonstrating Splitting and Combining Data
Merging and Joining Data
Demonstrating Merges and Joins in R
Activity: Merging and Joining Data
Summary
Solutions
Chapter 1: Introduction to R
Activity: Installing the Tidyverse Packages
Activity: Identifying Variable Classes and Types
Activity: Creating Vectors, Lists, Matrices, and Dataframes
Activity: Building Basic Loops
Activity: Exporting and Importing the mtcars Dataset
Activity: Exploring the Introduction to dplyr Vignette
Chapter 2: Data Visualization and Graphics
Activity: Recreating Plots with Base Plot Methods
Activity: Recreating Plots Using ggplot2
Activity: Utilizing ggplot2 Aesthetics
Chapter 3: Data Management
Activity: Creating and Manipulating Factor Variables
Activity: Creating Data Summarization Tables
Activity: Implementing Data Summary
Activity: Demonstrating Splitting and Combining Data
Activity: Merging and Joining Data
Other Books You May Enjoy
Leave a Review - Let Other Readers Know What You Think
Preface
Demand for data scientists is growing exponentially and demand in the US is expected to increase by 28 percent by the year 2020, with this trend reflected across the world. R is a tool often used by data scientists to clean, examine, analyze, and report on data. It is a great starting point for those familiar with analysis in Excel or MS SQL and is an excellent place to begin to learn programming fundamentals.
This book begins by addressing the setup of R and RStudio on the machine and progresses from there, demonstrating how to import datasets, clean them, and explore their contents. It balances theory and exercises, and contains multiple open-ended activities that use real-life business scenarios for you to practice and apply your newly acquired skills in a highly relevant context. We have included over 50 practical activities and exercises across 11 topics, along with a mini project that will allow you to begin your data science project portfolio. With this book, we have created a definitive guide to beginning data science in R.
Who This Book is for
This book is for analysts who are looking to grow their data science skills beyond the tools they have used before, such as MS Excel and other statistical tools.
What This Book Covers
Chapter 1, Introduction to R, deals with installation of R, RStudio, and other useful packages, and talks about variable types and data structures. The chapter then introduces the different kinds of loops that can be used in R, explains how to import and export data, and also talks about getting help with R programming.
Chapter 2, Data Visualization and Graphics, covers the basic plots built into R and how to create them, and then introduces ggplot, a popular graphics package in R. Finally, the chapter briefly talks about two tools, Shiny and Plotly, that can be used to design interactive plots.
Chapter 3, Data Management, discusses how to create and manipulate factor variables, examine data using tables, apply the family of functions to generate summaries, and split, combine, merge, or join datasets in R.
The Appendix contains the solutions to all the activities within the chapters.
To Get the Most Out of This Book
You will require a computer system with at least an i3 processor, 2 GB RAM, 10 GB of storage space, and an internet connection. Along with this, you would require the following software:
Operating System: Windows 8 64-bit
R and Rstudio
Browsers (Google Chrome and Mozilla Firefox - latest versions)
Download the Example Code Files
You can download the