Introduction To Excel For Data Science
Introduction To Excel For Data Science
Introductory
Big Data
College of Engineering
Chapter -2-
Introduction to Excel for
Data Science
Dr Heba Ismail
Introduction to Excel
for Data Science
2
Learning Objectives
3
Introduction to Spreadsheets
• What is a Spreadsheet?
• A spreadsheet is a powerful tool for organising information.
4
Spreadsheet Applications
• Microsoft Excel
• The most common spreadsheet application by far
• Part of paid bundle known as Microsoft Office
• Online version known as Excel online
• Google Sheets
• Very common for basic usage of spreadsheets
• Free with a google account
• Available online only
5
Spreadsheet Applications
Apple
Numbers
6
Spreadsheet Applications
7
Spreadsheets Advantages
• Accurate calculations
• Automatic calculations
• Organize and Access data
• Format, filter, and sort data
• Edit, undo, and error-check
• Analyze data
• Create charts, graphs, and reports
8
Personal Usage of spreadsheets
9
Business Usage of Spreadsheets
• Finance forecasting
• Data entry and storage
• Statistical analysis
• Comparing large datasets
• Profit and loss accounting
• Modelling and planning
• Budgeting
• Charting
• Identifying trends
• Identifying trends
• Forensic auditing
• Flowcharts for business
• Payroll and tax reporting
processes
• Invoicing
• Tracking business sales
• Scheduling
10
Data Analyst usage of spreadsheets
11
Excel Basic Terminology
• Cell: The basic building block of a worksheet
• Can store data such as numbers or text or the results of a formula
• Range: A range is simply any collection of cells
• Workbook: Excel file name, or project name
• Files that end with “.xlsx” are typically saved workbooks
• Worksheet: A worksheet is where Excel stores all your text, numbers and
formulas
• Each workbook contains at least one worksheets
• A worksheet can contain 1,048,576 rows, 16,384 columns, or ~17B cells
• Ribbon: The ribbon is Excels graphical menu interface for commands you
can perform
12
Opening Excel
New
Workbook
Recent
Workbooks
Other
Workbooks 13
A New Excel sheet
Tabs
Ribbon
Workbook
Column C
Row 5
Cell C5
Range
C11:D13
Worksheets
14
Ribbon Menu
File Save
Home Print
Insert New
Draw Open
Page layout
Formulas
Data
Review
View
Help
15
Ribbon Menu
File Save workbook
Home Print worksheet
Insert Open a New workbook
Page layout Open a already saved workbook
Formulas Access settings
Data
Review
View
Help
16
Ribbon Menu
File
Home Clipboard (copy, cut, paste, …)
Insert Font (Bold, Italic, Underline, …)
Page layout Text alignment (merge or split cells, wrap, Align, …)
Formulas Cells (Insert, Delete, Format)
Data Editing (Find, Sort, Fill, …)
Review
View
Help
17
Ribbon Menu
File
Home
Insert Insert chart (Bar, line, or Scatter plots, …)
Page layout Insert symbols (mathematical or other letters)
Formulas Insert illustrations (Squares, Circles, Lines, …)
Data Insert text (Text Box, Header, Footer, …)
Review
View
Help
18
Ribbon Menu
File
Home
Insert
Page layout Themes
Formulas Page setup (Margins)
Data Scaling (Width and Height)
Review Sheet options (View or print heading/gridlines)
View
Help
19
Ribbon Menu
File
Home
Insert
Page layout
Formulas
Insert new function
Data
Pick a predefined function
Review
Define new functions
View
Automatic or Manual
Help
20
Ribbon Menu
File
Home
Insert
Page layout
Formulas
Data
Get data from (Files, Tables, Online, Offline, …)
Review
Sort (A Z or Z A)
View
Filter (Remove, delete, compact certain values)
Help
Remove Duplicates, Text to column, Forecasting
21
Ribbon Menu
File
Home
Insert
Page layout
Formulas
Data
Review Protect Sheet or workbook
View Insert, delete, or show Comments
Help
Translate, check spelling
Share workbook
22
Ribbon Menu
File
Home
Insert
Page layout
Formulas
Data
Review
View
Zoom in or out
Help
Worksheet layout (work on multiple worksheets at once)
Freeze some parts of the sheets
Create macros
23
Ribbon Menu
File
Home
Insert
Page layout
Formulas
Data
Review
View
Help
Costumer center
Look up a feature
Community (Forum)
Suggest a feature 24
Excel Shortcuts
25
Open/Save
• Save / Save as
• Allows us to save the workbook into a particular location in the computer
• By default, save will save the file under ".xlxs" extension
• It can be modified using other file systems such as
• Tab separated values (.TSV)
• Comma separated values (.CSV)
• PDF
• Open
• You could similarly import data from CSV and TSV files or others
26
Class Activity 1 - Data
1. Access the following link(enter the link for your workbook here)
2. Pick a row that is free and do the following:
1. Enter your Last name in the first cell
2. Enter your First name in the second cell
3. Enter a number between 0 and 100 in the third cell
4. Enter a "2" in the 4th cell
3. Select all the data
4. Copy the data using Ctrl + C
5. Open Excel on your computer and create a new workbook
6. Click on A1 cell and paste the data using Ctrl + V
7. Save file as tsv file
27
Excel Functions
28
Top Functions in excel
• AVERAGE(number1, number2,)
• Returns the average (arithmetic mean) which can be number or names,
arrays, or references that contain numbers
• SUM(number1, number2,)
• Adds all the number in the range of cells
• MAX(number1, number2,)
• Returns the largest number in a range of cells
• MIN(number1, number2,)
• Returns the smallest number in a range of cells
29
Top Functions in excel
• SUMIF(range, criteria, sum_range)
• Adds the cells specified by a given condition or criteria
• COUNT(value1,value2,)
• Counts the number of cells in a range that contains numbers
• COUNTA(value1,value2,)
• Counts the number of cells in a range that are not empty
• COUNTBLANK(range,)
• Counts the number of cells in a range that are empty
• COUNTIF(range, criteria)
• Count the number of cells specified by a given condition or criteria
• IF(logical_test, value_if_true, value_if_false)
• Check if a condition is met, and returns a value if True and another value if false
30
Top Functions in excel
• Concatenate(T1, T2)
• Concatenates a list or range of text strings
• T1: A string, or array of strings, such as a range of cells.
• T2: Additional text items to be joined.
• VLOOKUP(V1, V2, V3, V4)
• find things in a table or a range by row
• V1: what you want to look up
• V2: where you want to look for it,
• V3: the column number in the range containing the value to return,
• V4: return an Approximate or Exact match – indicated as 1/TRUE, or 0/FALSE
• Drop down list
• Video link
31
Class Activity 2 - Functions
• Let's now do the sum of all numbers in the row we previously created
• Open excel, and create a new workbook
• Now open the TSV file you previously saved
• Let's now try to recreate this table below
32
Charting
33
Class Activity 3 - Charts
34
Pros and Cons of Data Analysis on Excel
• Pros
• You can clearly see the data in front of you
• Easy to determine the type and format of the data
• Quick and easy ways to conduct functions and receive results
• Great tool for under 20K samples of data
• Cons
• Complicated to reproduce that were previously taken on data
• Too many functions, hard to find the ones you are looking for
• Slow and tricky for over 20K lines of data (Excel could crash)
• Less flexibility for high level analysis and presentation
35
Conclusion
• There are several spreadsheet applications available in the marketplace; the most commonly used and fully-
featured spreadsheet application is Microsoft Excel
• Spreadsheets provide several advantages over manual calculation methods, and they help you keep data
organized and easily accessible
• As a Data Analyst, you can use spreadsheets as a tool for your data analysis tasks
• The ribbon provides access to all the features and tools required to view, enter, edit, manipulate, clean,
and analyze data in Excel
• There are several ways to navigate around a worksheet and workbook in Excel
36