Data Science - UNIT-1- Notes
Data Science - UNIT-1- Notes
Work book: Its an entire excel file. By default its name will be Book1. A workbook is another word for your Excel
file. Excel automatically creates a blank workbook when you open it. A worksheet is a collection of cells where you
keep and manipulate the data. By default, each Excel workbook contains three worksheets. When you open Excel,
Excel automatically selects Sheet1 for you. The name of the worksheet appears on its sheet tab at the bottom of the
document window.
Work sheet: an excel file/work book can have multiple work sheet. We can rename the sheet. In 2010 by default 3
sheets will be present named sheet1, sheet2 and sheet3. But in 2016 only one sheet will be present which is named as
sheet1. We can create as many sheets we want. To give a worksheet a more specific name, execute the following
steps.
1. Right click on the sheet tab of Sheet1.
2. Choose Rename.
Cell Name: Its named as Column Letter followed by Row Number. Here its A1. It can be given other names too. In
each cell there may be the following data types:
1. Labels -- (text with no numerical value)
2. Number data (constant values)
3. Formulas (mathematical equation used to calculate)
Formula Bar: Here we build our own formulas beginning with equal sign. We can use predefined functions that are
created by the excel. By clicking on fx we can view various available formulas.
Ribbon: It Contain various functionalities. These functions can be used directly by clicking on those options or by
using shortcuts appropriately.
Creating Excel tables:
The data in the worksheet looks like it is already in a table simply because it's organized in rows and columns.
However, the data in a tabular format is not a true "table" unless you've specifically made it such. Excel table is a
special object that works as a whole and allows you to manage the table's contents independently from the rest of the
worksheet data. The most obvious difference is that the table is styled. However, an Excel table is far more than a
range of formatted data with headings. There are many powerful features inside:
Excel tables are dynamic by nature, meaning they expand and contract automatically as you add or remove
rows and columns.
Integrated sort and filter options; visual filtering with slicers.
Easy formatting with inbuilt table styles.
Column headings remain visible while scrolling.
Quick totals allow you to sum and count data as well as find average, min or max value in a click.
Calculated columns allow you to compute an entire column by entering a formula in one cell.
Easy-to-read formulas due to a special syntax that uses table and column names rather than cell references.
Dynamic charts adjust automatically as you add or remove data in a table.
How to create a table in Excel
With the source data organized in rows and columns, carry out the below steps to covert a range of cells into a table:
1. Select any cell within your data set.
2. On the Insert tab, in the Tables group, click the Table button or press the Ctrl + T shortcut or On
the Home tab, in the Styles group, click Format as Table, and select one of the predefined table styles.
3. The Create Table dialog box appears with all the data selected for you automatically; you can adjust the
range if needed. If you want the first row of data to become the table headers, make sure the My table has
headers box is selected.
4. Click OK.
As the result, Excel converts your range of data into a true table with the default style:
Apart from changing table styles, the Design tab lets you turn the following table elements on or off:
Header row - displays column headers that remain visible when you scroll the table data.
Total row - adds the totals row at the end of the table with a number of predefined functions to choose form.
Banded rows and banded columns - display alternate row or column shading, respectively.
First column and last column - display special formatting for the first and last column of the table.
Filter button - shows or hides filter arrows in the header row.
The screenshot below shows the default Table Style Options:
expands automatically to accommodate new values. To undo the table expansion, click the Undo button on the
Quick Access Toolbar, or press Ctrl+Z like you usually do to revert the latest changes.
5. Quick totals (total row)
To quickly total the data in your table, display the totals row at the end of the table, and then select the required
function from the drop-down list. To add a total row to your table, right click any cell within the table, point
to Table, and click Totals Row. Or, go to the Design tab > Table Style Options group, and select the Total Row box:
Either way, the total row appears at the end of your table. You choose the desired function for each total row cell,
and a corresponding formula is entered in the cell automatically:
Total Row tips: Excel table functions are not limited to the functions in the drop-down list. You can enter any
function you want in any total row cell by clicking More Functions in the dropdown list or entering a formula
directly in the cell. Total row inserts the SUBTOTAL function that calculates values only in visible cells and leaves
out hidden (filtered out) cells. If you want to total data in visible and invisible rows, enter a corresponding formula
manually such as SUM, COUNT, AVERAGE, etc.
Another great benefit of an Excel table is that it lets you calculate the entire column by entering a formula in a single
cell. For example, to create a calculated column in our sample table, enter an Average formula in cell E2:
As soon as you click Enter, the formula is immediately copied to other cells in the column and properly adjusted for
each row in the table:
If a calculated column is not created in your table, make sure the Fill formulas in tables to create calculated
columns option is turned on in your Excel. To check this, click File > Options, select Proofing in the left pane, click
the AutoCorrect Options button, and switch to AutoFormat As You Type tab.
Entering a formula in a cell that already contains data does not create a calculated column. In this case,
the AutoCorrect Options button appears (like in the screenshot below) and lets you overwrite the data in the entire
column so that a calculated column is created.
You can quickly undo a calculated column by clicking the Undo Calculated Column in AutoCorrect Options, or
clicking the Undo button on the Quick Access toolbar.
7. Easy-to-understand table formulas (structured references)
An indisputable advantage of tables is the ability to create dynamic and easy-to-read formulas with structured
references, which use table and column names instead of regular cell addresses.
For example, this formula finds an average of all the values in columns Jan through Mar in the Sales_table:
=AVERAGE(Sales_table[@[Jan]:[Mar]])
The beauty of structured references is that, firstly, there are created automatically by Excel without you having to
learn their special syntax, and secondly, they adjust automatically when data is added or removed from a table, so
you don't have to worry about updating the references manually.
You can select cells and ranges in a table with the mouse like you normally do. You can also select table rows and
columns in a click.
9. Dynamic charts
When you create a chart based on a table, the chart updates automatically as you edit the table data. Once a new row
or column is added to the table, the graph dynamically expands to take the new data in. When you delete some data
in the table, Excel removes it from the chart straight away. Automatic adjustment of a chart source range is an
extremely useful feature when working with data sets that frequently expand or contract.
If you want to print just the table and leave out other stuff on the worksheet, select any sell within your table and
press Ctrl+P or click File > Print. The Print Selected Table option will get selected automatically without you
having to adjust any print settings:
Understand how to Add, Subtract, Multiply, Divide in Excel
Types of operators
The Excel provides four different types of calculation operators: arithmetic, comparison, text concatenation, and
reference.
When you enter a formula, Excel expects specific types of values for each operator. If you enter a different kind of
value than is expected, Excel may convert the value.
1. In arguments of the logical functions, you can use cell references, numeric and text values, Boolean values,
comparison operators, and other Excel functions. However, all arguments must evaluate to the Boolean
values of TRUE or FALSE, or references or arrays containing logical values.
2. If an argument of a logical function contains any empty cells, such values are ignored. If all of the
arguments are empty cells, the formula returns #VALUE! error.
3. If an argument of a logical function contains numbers, then zero evaluates to FALSE, and all other numbers
including negative numbers evaluate to TRUE. For example, if cells A1:A5 contain numbers, the formula
=AND(A1:A5) will return TRUE if none of the cells contains 0, FALSE otherwise.
4. A logical function returns the #VALUE! error if none of the arguments evaluate to logical values.
5. A logical function returns the #NAME? error if you've misspell the function's name or attempted to use the
function in an earlier Excel version that does not support it. For example, the XOR function can be used in
Excel 2016 and 2013 only.
6. In Excel 2016, 2013, 2010 and 2007, you can include up to 255 arguments in a logical function, provided
that the total length of the formula does not exceed 8,192 characters. In Excel 2003 and lower, you can
supply up to 30 arguments and the total length of your formula shall not exceed 1,024 characters.
The AND function is the most popular member of the logic functions family. It comes in handy when you have to
test several conditions and make sure that all of them are met. Technically, the AND function tests the conditions
you specify and returns TRUE if all of the conditions evaluate to TRUE, FALSE otherwise. The syntax for the Excel
AND function is as follows:
AND(logical1, [logical2], …)
Where logical is the condition you want to test that can evaluate to either TRUE or FALSE. The first condition
(logical1) is required, subsequent conditions are optional. And now, let's look at some formula examples that
demonstrate how to use the AND functions in Excel formulas.
Formula Description
Returns TRUE if A2 contains "Bananas" and B2 is greater than
=AND(A2="Bananas", B2>C2) C2, FALSE otherwise.
Returns TRUE if B2 is greater than 20 and B2 is equal to C2,
=AND(B2>20, B2=C2) FALSE otherwise.
=AND(A2="Bananas", B2>=30, Returns TRUE if A2 contains "Bananas", B2 is greater than or
B2>C2) equal to 30 and B2 is greater than C2, FALSE otherwise.
By itself, the Excel AND function is not very exciting and has narrow usefulness. But in combination with other
Excel functions, AND can significantly extend the capabilities of worksheets. One of the most common uses of the
Excel AND function is found in the logical_test argument of the IF function to test several conditions instead of just
one. For example, you can nest any of the AND functions above inside the IF function and get a result similar to
this: =IF(AND(A2="Bananas", B2>C2), "Good", "Bad")
An Excel formula for the BETWEEN condition: If you need to create a between formula in Excel that picks all
values between the given two values, a common approach is to use the IF function with AND in the logical test. For
example, you have 3 values in columns A, B and C and you want to know if a value in column A falls between B
and C values. To make such a formula, all it takes is the IF function with nested AND and a couple of comparison
operators: Formula to check if X is between Y and Z, inclusive:
=IF(AND(A2>=B2,A2<=C2),"Yes", "No")
As demonstrated in the screenshot above, the formula works perfectly for all data types - numbers, dates and text
values. When comparing text values, the formula checks them character-by-character in the alphabetic order. For
example, it states that Apples in not between Apricot and Bananas because the second "p" in Apples comes before
"r" in Apricot. Please see Using Excel comparison operators with text values for more details. As you see, the IF
/AND formula is simple, fast and almost universal. We say "almost" because it does not cover one scenario. The
above formula implies that a value in column B is smaller than in column C, i.e. column B always contains the
lower bound value and C - the upper bound value. This is the reason why the formula returns "No" for row 6, where
A6 has 12, B6 - 15 and C6 - 3 as well as for row 8 where A8 is 24-Nov, B8 is 26-Dec and C8 is 21-Oct. But what if
you want between formula to work correctly regardless of where the lower-bound and upper-bound values reside?
In this case, use the Excel MEDIAN function that returns the median of the given numbers (i.e. the number in the
middle of a set of numbers). So, if you replace AND in the logical test of the IF function with MEDIAN, the formula
will go like:
=IF(A2=MEDIAN(A2:C2),"Yes","No")
As well as AND, the Excel OR function is a basic logical function that is used to compare two values or statements.
The difference is that the OR function returns TRUE if at least one if the arguments evaluates to TRUE, and returns
FALSE if all arguments are FALSE. The OR function is available in all versions of Excel 2016 - 2000. The syntax
of the Excel OR function is very similar to AND:
OR(logical1, [logical2], …)
Where logical is something you want to test that can be either TRUE or FALSE. The first logical is required,
additional conditions (up to 255 in modern Excel versions) are optional.
Formula Description
Returns TRUE if A2 contains "Bananas" or "Oranges", FALSE
=OR(A2="Bananas", A2="Oranges")
otherwise.
Returns TRUE if B2 is greater than or equal to 40 or C2 is greater than
=OR(B2>=40, C2>=20)
or equal to 20, FALSE otherwise.
=OR(B2=" ", C2="") Returns TRUE if either B2 or C2 is blank or both, FALSE otherwise.
As well as Excel AND function, OR is widely used to expand the usefulness of other Excel functions that perform
logical tests, e.g. the IF function. Here are just a couple of examples:
The formula returns "Good" if a number in cell B3 is greater than 30 or the number in C2 is greater than 20, "Bad"
otherwise.
We can use both functions, AND & OR, in a single formula if the business logic requires this. There can be infinite
variations of such formulas that boil down to the following basic patterns:
For example, if you wanted to know what consignments of bananas and oranges are sold out, i.e. "In stock" number
(column B) is equal to the "Sold" number (column C), the following OR/AND formula could quickly show this to
you:
1. Generate the factorial for the numbers 1 to 10 using function and using arithmetic operations.
2. Check the largest of given two numbers using logical functions.
3. Check the largest of given three numbers using logical functions.
4. Generate the first 25 Fibonacci series numbers using arithmetic operations.
5. Considering the marks of the student, generate the formula for displaying the grade of a student. If
student’s percentage is
a. <40 Display “F”
b. <60 Display “C”
c. <75 Display “B”
d. <90 Display “A”
e. <=100 Display “O”
There exist 400+ functions in Excel, and the number is growing by version to version. Of course, it's next to
impossible to memorize all of them, and you actually don't need to. The Function Wizard will help you find the
function best suited for a particular task, while the Excel Formula Intellisense will prompt the function's syntax and
arguments as soon as you type the function's name preceded by an equal sign in a cell:
The basic functions we must know
What follows below is a list of simple helpful functions that are a necessary skill for everyone who wishes to turn
from an Excel novice to an Excel professional.
SUM
The first Excel function you should be familiar with is the one that performs the basic arithmetic operation
of addition:
SUM(number1, [number2], …)
In the syntax of all Excel functions, an argument enclosed in [square brackets] is optional, other arguments are
required. Meaning, your Sum formula should include at least 1 number, reference to a cell or a range of cells. For
example:
=SUM(B2:B6) - adds up values in cells B2 through B6.
=SUM(B2, B6) - adds up values in cells B2 and B6.
If necessary, you can perform other calculations within a single formula, for example, add up values in cells B2
through B6, and then divide the sum by 5:
=SUM(B2:B6)/5
To sum with conditions, use the SUMIF function: in the 1st argument, you enter the range of cells to be tested
against the criteria (A2:A6), in the 2nd argument - the criteria itself (D2), and in the last argument - the cells to sum
(B2:B6):
=SUMIF(A2:A6, D2, B2:B6)
In your Excel worksheets, the formulas may look something similar to this:
AVERAGE
The Excel AVERAGE function does exactly what its name suggests, i.e. finds an average, or arithmetic mean, of
numbers. Its syntax is similar to SUM's:
AVERAGE(number1, [number2], …)
Having a closer look at the formula from the previous section (=SUM(B2:B6)/5), what does it actually do? Sums
values in cells B2 through B6, and then divides the result by 5. And what do you call adding up a group of numbers
and then dividing the sum by the count of those numbers? Yep, an average! The Excel AVERAGE function
performs these calculations behind the scenes. So, instead of dividing sum by count, you can simply put this formula
in a cell:
=AVERAGE(B2:B6)
To average cells based on condition, use the following AVERAGEIF formula, where A2:A6 is the criteria range, D3
is he criteria, and B2:B6 are the cells to average:
=AVERAGEIF(A2:A6, D3, B2:B6)
MAX & MIN
The MAX and MIN formulas in Excel get the largest and smallest value in a set of numbers, respectively. For our
sample data set, the formulas will be as simple as:
=MAX(B2:B6)
=MIN(B2:B6)
If you are curious to know how many cells in a given range contain numeric values (numbers or dates), don't waste
your time counting them by hand. The Excel COUNT function will bring you the count in a heartbeat:
COUNT(value1, [value2], …)
While the COUNT function deals only with those cells that contain numbers, the COUNTA function counts all cells
that are not blank, whether they contain numbers, dates, times, text, logical values of TRUE and FALSE, errors or
empty text strings (""):
COUNTA(value1, [value2], …)
For example, to find out how many cells in column B contain numbers, use this formula:
=COUNT(B:B)
To count all non-empty cells in column B, go with this one:
=COUNTA(B:B)
In both formulas, you use the so-called "whole column reference" (B:B) that refers to all the cells within column B.
The following screenshot shows the difference: while COUNT processes only numbers, COUNTA outputs the total
number of non-blank cells in column B, including the the text value in the column header.
IF
Judging by the number of IF-related comments on our blog, it's the most popular function in Excel. In simple terms,
you use an IF formula to ask Excel to test a certain condition and return one value or perform one calculation if the
condition is met, and another value or calculation if the condition is not met:
For example, the following IF statement checks if the order is completed (i.e. there is a value in column C) or not.
To test if a cell is not blank, you use the "not equal to" operator ( <>) in combination with an empty string (""). As
the result, if cell C2 is not empty, the formula returns "Yes", otherwise "No":
If your obviously correct Excel formulas return just a bunch of errors, one of the first things to check is extra spaces
in the referenced cells (You may be surprised to know how many leading, trailing and in-between spaces lurk
unnoticed in your sheets just until something goes wrong!). There are several ways to remove unwanted spaces in
Excel, with the TRIM function being the easiest one:
TRIM(text)
For example, to trim extra spaces in column A, enter the following formula in cell A1, and then copy it down the
column:
=TRIM(A1)
It will eliminate all extra spaces in cells but a single space character between words:
LEN
Whenever you want to know the number of characters in a certain cell, LEN is the function to use:
LEN(text)
Wish to find out how many characters are in cell A2? Just type the below formula into another cell:
=LEN(A2)
Please keep in mind that the Excel LEN function counts absolutely all characters including spaces:
AND & OR
These are the two most popular logical functions to check multiple criteria. The difference is how they do this:
AND returns TRUE if all conditions are met, FALSE otherwise.
OR returns TRUE if any condition is met, FALSE otherwise.
While rarely used on their own, these functions come in very handy as part of bigger formulas. For example, to
check the test results in columns B and C and return "Pass" if both are greater than 60, "Fail" otherwise, use the
following IF formula with an embedded AND statement:
=IF(AND(B2>60, B2>60), "Pass", "Fail")
If it's sufficient to have just one test score greater than 60 (either test 1 or test 2), embed the OR statement:
CONCATENATE
In case you want to take values from two or more cells and combine them into one cell, use the concatenate operator
(&) or the CONCATENATE function:
CONCATENATE(text1, [text2], …)
For example, to combine the values from cells A2 and B2, just enter the following formula in a different cell:
=CONCATENATE(A2, B2)
To separate the combined values with a space, type the space character (" ") in the arguments list:
=CONCATENATE(A2, " ", B2)
To see the current date and time whenever you open your worksheet without having to manually update it on a daily
basis, use either:
=TODAY() to insert the today's date in a cell.
=NOW() to insert the current date and time in a cell.
The beauty of these functions is that they don't require any arguments at all, you type the formulas exactly as written
above.
Excel Data Validation
Excel Data Validation is a feature that restricts (validates) user input to a worksheet. Technically, you create a
validation rule that controls what kind of data can be entered into a certain cell. Here are just a few examples of what
Excel's data validation can do:
Allow only numeric or text values in a cell.
Allow only numbers within a specified range.
Allow data entries of a specific
Restrict dates and times outside a given time frame.
Restrict entries to a selection from a drop-down list.
Validate an entry based on another cell.
Show an input message when the user selects a cell.
Show a warning message when incorrect data has been entered.
Find incorrect entries in validated cells.
For instance, you can set up a rule that limits data entry to 4-digit numbers between 1000 and 9999. If the user types
something different, Excel will show an error alert explaining what they have done wrong:
Select one or more cells to validate, go to the Data tab > Data Tools group, and click the Data Validation button.
You can also open the Data Validation dialog box by pressing Alt > D > L, with each key pressed separately.
2. Create an Excel validation rule
On the Settings tab, define the validation criteria according to your needs. In the criteria, you can supply any of the
following:
Values - type numbers in the criteria boxes like shown in the screenshot below.
Cell references - make a rule based on a value or formula in another cell.
Formulas - allow to express more complex conditions like in this example.
As an example, let's make a rule that restricts users to entering a whole number between 1000 and 9999:
With the validation rule configured, either click OK to close the Data Validation window or switch to another tab to
add an input message or/and error alert.
If you want to display a message that explains to the user what data is allowed in a given cell, open the Input
Message tab and do the following:
Make sure the Show input message when cell is selected box is checked.
Enter the title and text of your message into the corresponding fields.
Click OK to close the dialog window.
As soon as the user selects the validated cell, the following message will show up:
In addition to the input message, you can show one of the following error alerts when invalid data is entered in a
cell.
Stop (default)
The strictest alert type that prevents users from entering invalid data.
You click Retry to type a different value or Cancel to remove the entry.
Warning
Warns users that the data is invalid, but does not prevent entering it.
You click Yes to input the invalid entry, No to edit it, or Cancel to remove the entry.
Information
The most permissive alert type that only informs users about an invalid data entry.
You click OK to enter the invalid value or Cancel to remove it from the cell.
To configure a custom error message, go to the Error Alert tab and define the following parameters:
Check the Show error alert after invalid data is entered box (usually selected by default).
In the Style box, select the desired alert type.
Enter the title and text of the error message into the corresponding boxes.
Click OK.
And now, if the user enters invalid data, Excel will display a special alert explaining the error.
Note: If you do not type your own message, the default Stop alert with the following text will show up: This value
does not match the data validation restrictions defined for this cell.
When adding a data validation rules in Excel, you can choose one of the predefined settings or specify custom
criteria based on your own validation formula. Below we will discuss each of the built-in options, and next week we
will have a closer look at Excel data validation formulas in a separate tutorial. As you already know, the validation
criteria are defined on the Settings tab of the Data Validation dialog box (Data tab > Data Validation).
To restrict data entry to a whole number or decimal, select the corresponding item in the Allow box. And then,
choose one of the following criteria in the Data box:
Equal to or not equal to the specified number
Greater than or less than the specified number
Between the two numbers or not between to exclude that range of numbers
For example, this is how you create an Excel validation rule that allows any whole number greater than 0:
Date and time validation in Excel
To validate dates, select Date in the Allow box, and then pick an appropriate criteria in the Data box. There are quite
a lot of predefined options to choose from: allow only dates between two dates, equal to, greater than or less than a
specific date, and more. Similarly, to validate times, select Time in the Allow box, and then define the required
criteria. For example, to allow only dates between Start date in B1 and End date in B2, apply this Excel date
validation rule:
To validate data based on the current time, use the predefined Time rule with your own data validation formula:
In the Allow box, select Time.
In the Data box, pick either less than to allow only times before the current time, or greater than to allow
times after the current time.
In the End time or Start time box (depending on which criteria you selected on the previous step), enter one
of the following formulas:
o To validate dates and times based on the current date and time:
=NOW()
o To validate times based on the current time:
=TIME( HOUR(NOW()), MINUTE(NOW()), SECOND(NOW()))
The screenshot below shows a rule that allows only times greater than the current time:
Text length
To allow data entry of a specific length, select Text length in the Allow box, and choose the validation criteria in
accordance with your business logic.
For example, to limit the input to 10 characters, create this rule:
Note. The Text length option limits the number of characters but not the data type, meaning the above rule will allow
both text and numbers under 10 characters or 10 digits, respectively.
To add a drop-down list of items to a cell or a group of cells, select the target cells and do the following:
Open the Data Validation dialog box (Data tab > Data Validation).
On the Settings tab, select List in the Allow
In the Source box, type the items of your Excel validation list, separated by commas. For example, to limit
the user input to three choices, type Yes, No, N/A.
Make sure the In-cell dropdown box is selected in order for the drop-down arrow to appear next to the
cell.
Click OK.
The resulting Excel data validation list will look similar to this:
To find the invalid data that had made its way into your worksheets before you added data validation, go to
the Data tab, and click Data Validation > Circle Invalid Data.
This will highlight all cells that don't meet the validation criteria:
As soon as you correct an invalid entry, the circle will be gone automatically.
To remove all circles, go to the Data tab, and click Data Validation > Clear Validation Circles.
Sorting
Sorting lists is a common spreadsheet task that allows you to easily reorder your data. The most common type of
sorting is alphabetical ordering, which you can do in ascending or descending order. The following steps needs to be
followed for sorting in an alphabetical order:
Select a cell in the column you want to sort (In this example, we choose a cell in column A).
Click the Sort & Filter command in the Editing group on the Home tab.
Select Sort A to Z. Now the information in the Category column is organized in alphabetical order.
We can Sort in reverse alphabetical order by choosing Sort Z to A in the list. In order to sort from smallest to
largest we need to follow the following steps:
Select a cell in the column you want to sort (a column with numbers).
Click the Sort & Filter command in the Editing group on the Home tab.
Select From Smallest to Largest. Now the information is organized from the smallest to largest amount.
Similarly we can sort in reverse numerical order by choosing From Largest to Smallest in the list.
Click the Sort & Filter command in the Editing group on the Home tab.
Select Custom Sort from the list to open the dialog box. OR
Select the Data tab.
Locate the Sort and Filter group.
Click the Sort command to open the Custom Sort dialog box. From here, you can sort by one item or
multiple items.
Click the drop-down arrow in the Column Sort by field, then choose one of the options—in this example,
Category.
Choose what to sort on. In this example, we'll leave the default as Value.
Choose how to order the results. Leave it as A to Z so it is organized alphabetically.
Click Add Level to add another item to sort by.
Select an option in the Column Then by field. In this example, we chose Unit Cost.
Choose what to sort on. In this example, we'll leave the default as Value.
Choose how to order the results. Leave it as smallest to largest.
Click OK.
The spreadsheet has been sorted. All of the categories are organized in alphabetical order, and within each category
the unit cost is arranged from smallest to largest. Remember that all of the information and data is still here—it's just
in a different order.
Grouping cells using the Subtotal command
Grouping is a useful Excel feature that gives you control over how the information is displayed. You must sort
before you can group. In this section, we will learn how to create groups using the Subtotal command.
Decide how you want things grouped. In this example, we will organize by Category.
Select a function. In this example, we will leave the SUM function selected.
Select the column where you want the Subtotal to appear. In this example, Total Cost is selected by default.
Click OK. The selected cells are organized into groups with subtotals.
To collapse or display the group:
Click the black minus sign, which is the hide detail icon, to collapse the group.
Click the black plus sign, which is the show detail icon, to expand the group.
Use the Show Details and Hide Details commands in the Outline group to collapse and display the group
as well.
Filtering cells
Filtering, or temporarily hiding, data in a spreadsheet is simple. This allows you to focus on specific spreadsheet
entries.
To filter data:
Click the Filter command on the Data tab. Drop-down arrows will appear beside each column heading.
Click the drop-down arrow next to the heading you would like to filter. For example, if you would like to
only view data regarding Flavors, click the drop-down arrow next to Category.
Filtering may look a little like grouping, but the difference is that now you can filter on another field if you want to.
For example, let’s say you want to see only the vanilla-related flavors. Just click the drop-down arrow next to Item,
then select Text Filters. From the menu, choose Contains because you want to find any entry that has the
word vanilla in it. A dialog box appears. Type vanilla, then click OK. Now we can see that the data has been filtered
again and that only the vanilla-related flavors appear.
You can display your data analysis reports in a number of ways in Excel. However, if your data analysis results can
be visualized as charts that highlight the notable points in the data, your audience can quickly grasp what you want
to project in the data. It also leaves a good impact on your presentation style.
In this chapter, you will get to know how to use Excel charts and Excel formatting features on charts that enable you
to present your data analysis results with emphasis.
In Excel, charts are used to make a graphical representation of any set of data. A chart is a visual representation of
the data, in which the data is represented by symbols such as bars in a Bar Chart or lines in a Line Chart. Excel
provides you with many chart types and you can choose one that suits your data or you can use the Excel
Recommended Charts option to view charts customized to your data and select one of those. Here you will
understand the different techniques that you can use with the Excel charts to highlight your data analysis results
more effectively.
Creating Combination Charts
Suppose you have the target and actual profits for the fiscal year 2015-2016 that you obtained from different
regions.
As you observe, it is difficult to visualize the comparison quickly between the targets and actual in this chart. It does
not give a true impact on your results.
A better way of distinguishing two types of data to compare the values is by using Combination Charts. In Excel
2013 and versions above, you can use Combo charts for the same purpose.
Use Vertical Columns for the target values and a Line with Markers for the actual values.
Click the DESIGN tab under the CHART TOOLS tab on the Ribbon.
Click Change Chart Type in the Type group. The Change Chart Type dialog box appears.
Click Combo.
Change the Chart Type for the series Actual to Line with Markers. The preview appears under Custom
Combination.
Click OK.
Suppose you have the data on the number of units of your product that was shipped and the actual profits for the
fiscal year 2015-2016 that you obtained from different regions.
If you use the same combination chart as before, you will get the following −
In the chart, the data of No. of Units is not visible as the data ranges are varying significantly.
In such cases, you can create a combination chart with secondary axis, so that the primary axis displays one range
and the secondary axis displays the other.
You can observe the values for Actual Profits on the primary axis and the values for No. of Units on the secondary
axis.
A significant observation in the above chart is for Quarter 3 where No. of Units sold is more, but the Actual Profits
made are less. This could probably be assigned to the promotion costs that were incurred to increase sales. The
situation is improved in Quarter 4, with a slight decrease in sales and a significant rise in the Actual Profits made.
Suppose you want to project the Actual Profits made in Years 2013-2016.
As you observe, the data visualization is not effective as the years are not displayed. You can overcome this by
changing year to category.
Remove the header year in the data range.
Now, year is considered as a category and not a series. Your chart looks as follows −
Chart Elements and Chart Styles
Chart Elements give more descriptions to your charts, thus helping visualizing your data more meaningfully.
Chart Elements
Chart Styles
Chart Filters
For a detailed explanation of these, refer to Excel Charts tutorial.
Data Labels
Excel 2013 and later versions provide you with various options to display Data Labels. You can choose one Data
Label, format it as you like, and then use Clone Current Label to copy the formatting to the rest of the Data Labels in
the chart.
The Data Labels in a chart can have effects, varying shapes and sizes.
It is also possible to display the content of a cell as part of the Data Label with Insert Data Label Field.
Quick Layout
You can use Quick Layout to change the overall layout of the chart quickly by choosing one of the predefined layout
options.
Select the layout you like. The chart will be displayed with the chosen layout.
You can create more emphasis on your data presentation by using a picture in place of columns.
Click on a Column on the Column Chart.
In the Format Data Series, click on Fill.
Select Picture.
Under Insert picture from, provide the filename or optionally clipboard if you had copied an image earlier.
The picture you have chosen will appear in place of columns in the chart.
Band Chart
You might have to present customer survey results of a product from different regions. Band Chart is suitable for
this purpose. A Band Chart is a Line Chart with an added shaded area to display the upper and lower boundaries of
groups of data.
Suppose your customer survey results from the east and west regions, month wise are −
Here, in the data < 50% is Low, 50% - 80% is Medium and > 80% is High.
With Band Chart, you can display your survey results as follows −
Create a Line Chart from your data.
Thermometer Chart
When you have to represent a target value and an actual value, you can easily create a Thermometer Chart in Excel
that emphatically shows these values.
With Thermometer chart, you can display your data as follows −
As you observe the Primary Axis and Secondary Axis have different ranges.
Both Primary Axis and Secondary Axis will be set to 0% - 100%. The Target Column hides the Actual Column.
You got your thermometer chart, with the actual value as against target value being shown. You can make this
thermometer chart more impressive with some formatting.
Insert a rectangle shape superimposing the blue rectangular part in the chart.
In Format Shape options, select −
o Gradient fill for FILL
o Linear for Type
o 1800 for Angle
Set the Gradient stops at 0%, 50% and 100%.
For the Gradient stops at 0% and 100%, choose the color black.
For the Gradient stop at 50%, choose the color white.
Gantt Chart
A Gantt chart is a chart in which a series of horizontal lines shows the amount of work done in certain periods of
time in relation to the amount of work planned for those periods. In Excel, you can create a Gantt chart by
customizing a Stacked Bar chart type so that it depicts tasks, task duration, and hierarchy. An Excel Gantt chart
typically uses days as the unit of time along the horizontal axis.
Consider the following data where the column −
Waterfall Chart
Waterfall Chart is one of the most popular visualization tools used in small and large businesses. Waterfall charts are
ideal for showing how you have arrived at a net value such as net income, by breaking down the cumulative effect
of positive and negative contributions. Excel 2016 provides Waterfall Chart type. If you are using earlier versions of
Excel, you can still create a Waterfall Chart using Stacked Column Chart. The columns are color coded so that you
can quickly tell positive from negative numbers. The initial and the final value columns start on the horizontal axis,
while the intermediate values are floating columns. Because of this look, Waterfall Charts are also called Bridge
Charts.
Consider the following data.
Prepare the data for Waterfall Chart
Ensure the column Net Cash Flow is to the left of the Months Column (This is because you will not include
this column while creating the chart)
Add 2 columns – Increase and Decrease for positive and negative cash flows respectively
Add a column Start - the first column in the chart with the start value in the Net Cash Flow
Add a column End - the last column in the chart with the end value in the Net Cash Flow
Add a column Float – that supports the intermediate columns
Compute the values for these columns as follows
In the Float column, insert a row in the beginning and at the end. Place n arbitrary value 50000. This just to
have some space to the left and right of the chart
The data will be as follows.
Sparklines are tiny charts placed in single cells, each representing a row of data in your selection. They provide a
quick way to see trends.
You can add Sparklines with Quick Analysis tool.
Quick Analysis button appears at the bottom right of your selected data.
Click on the Quick Analysis button. The Quick Analysis Toolbar appears with various options.
Click SPARKLINES. The chart options displayed are based on the data and may vary.
Click Line. A Line Chart for each row is displayed in the column to the right of the data.
Pivot Charts
Pivot Charts are used to graphically summarize data and explore complicated data. A PivotChart shows Data Series,
Categories, and Chart Axes the same way a standard chart does. Additionally, it also gives you interactive filtering
controls right on the chart so that you can quickly analyze a subset of your data. Pivot Charts are useful when you
have data in a huge PivotTable, or many complex worksheet data that includes text and numbers. A PivotChart can
help you make sense of this data.
You can create a PivotChart from
A PivotTable.
A Data Table as a standalone without PivotTable.
The filtered data appears on both the PivotChart and the PivotTable.
In this example, we will assume we are trading the Euro currency and would like to get the exchange rates from the
European Central Bank web service. When we are using web data source the system must have the internet
connectivity to access the web data. The currency exchange rate API link is
http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml
Click on OK button
You will get the following data
Let's take another example, this time you have local xml not in form of web link. Here we import an XML file from
source system not from the web. Here we have to make sure that
Click on OK button
You will get the following data
Handling CSV Data in Excel
CSV, or comma separated values, is a common format for storing and transmitting content including contacts,
calendar appointments and statistical data. Excel reads CSV files by default but in most cases when you open a CSV
file in Excel, you see scrambled data that’s impossible to read. Here we see how to convert an excel file into a CSV
file and also how to import CSV files in Excel and view them without converting anything.
Comma-separated values (CSV) is a widely used file format that stores tabular data (numbers and text) as plain text.
Its popularity and viability are due to the fact that a great deal of programs and applications support csv files, at least
as an alternative import / export format. Moreover, the csv format allows users to glance at the file and immediately
diagnose the problems with data, if any, change the CSV delimiter, quoting rules, etc. All this is possible because a
CSV file is plain text and an average user or even a novice can easily understand it without any learning curve.
Here we see way to export data from Excel to CSV and learn how to convert Excel to CSV keeping all special
characters and foreign symbols intact. The below method work for all versions of Excel 2016, 2013, 2010 and 2007.
Besides CSV (comma delimited), a few other csv formats are available to you:
CSV (comma delimited). This format saves an Excel file as a comma-separated text that can be used in another
Windows program or another version of Windows operating system.
CSV (Macintosh). This format saves your Excel workbook as a comma-separated file for use on Mac operating
system.
Note: All of the above mentioned formats save only the active Excel sheet.
Choose the destination folder where you want to save your Excel file in the CSV format, and then click Save.
After you click Save, Excel will display two dialogs. Don't worry, these are not error messages and everything is
going right. The first dialog reminds you that only the active Excel spreadsheet will be saved to the CSV file format.
If this is what you are looking for, click OK.
If you need to save the contents of all the sheets your workbook contains, click Cancel and then save each
spreadsheet individually as a separate Excel file (workbook).
After that save each Excel file as CSV.
Clicking OK in the first dialog will display a second message informing you that your worksheet may contain
features unsupported by the CSV encoding. This is Okay, so simply click Yes.
This is how you convert Excel to CSV. The process is quick and straightforward, and you are unlikely to run into
any hurdles along the way.
Navigate to the CSV file you wish to open and click “Import”.
From the newly-opened window, choose “Delimited”. Then click “Next”.
Check the box next to the type of delimiter – in most cases this is either a semicolon or a comma. Then click “Next”.
Click “Finish”.
That’s it; you have just imported a CSV file to Excel!
1. On the Data tab, in the Get & Transform Data group, click Get Data.
4. Click Import.
5. Select a table on the left side of the Navigator window and click Load.
Result. Your database records in Excel.
6. When your Access data changes, you can easily refresh the data in Excel. First, select a cell inside the table.
Next, on the Design tab, in the External Table Data group, click Refresh.
4. The "Get External Data - Excel Spreadsheet" wizard appears. In the File name field, browse to the Excel file.
Select the "Import the source data into a new table in the current database" option and click OK.
5. Select the worksheet to import. Click Next.
6. If the first row contains headers, mark the "First Row Contains Column Headings" checkbox. Click Next.
7. Select the options for each column or just leave it at the default and click Next.
8. Accept the default of "Let Access add primary key." Click Next.
9. The Import to Table field defaults to the worksheet name. Update it if needed. Click Finish. The worksheet
imports into a table.
The following steps needs to be followed to import an Excel spreadsheet into an existing table in Access:
1. Ensure that the spreadsheet uses the exact headers as the existing Access database.
2. Open the Access database.
3. If you receive a security warning, click the Enable Content button.
4. On the Office ribbon, select the External Data tab and click Excel.
5. The "Get External Data - Excel Spreadsheet" wizard appears. In the File name field, browse to the Excel
file. Select the "Append a copy of records to the table" option. In the drop-down, select the appropriate table
and click OK.
6. Select the worksheet to import. Click Next.
7. Click Finish.
Working with multiple Work Sheets
Every workbook contains at least one worksheet by default. When working with a large amount of data, you can
create multiple worksheets to help organize your workbook and make it easier to find content. You can
also group worksheets to quickly add information to multiple worksheets at the same time.
To insert a new worksheet:
Locate and select the New sheet button near the bottom-right corner of the Excel window.
By default, any new workbook you create in Excel will contain one worksheet, called Sheet1. To change the default
number of worksheets, navigate to Backstage view, click Options, then choose the desired number of worksheets
to include in each new workbook.
To copy a worksheet:
If you need to duplicate the content of one worksheet to another, Excel allows you to copy an existing worksheet.
Right-click the worksheet you want to copy, then select Move or Copy from the worksheet menu.
The Move or Copy dialog box will appear. Choose where the sheet will appear in the Before sheet: field. In our
example, we'll choose (move to end) to place the worksheet to the right of the existing worksheet.
Check the box next to Create a copy, then click OK.
The worksheet will be copied. It will have the same title as the original worksheet, as well as a version number. In
our example, we copied the November worksheet, so our new worksheet is named November (2). All content from
the November worksheet has also been copied to the new worksheet.
You can also copy a worksheet to an entirely different workbook. You can select any workbook that is currently
open from the To book: drop-down menu.
To rename a worksheet:
Right-click the worksheet you want to rename, then select Rename from the worksheet menu.
To delete a worksheet:
Right-click the worksheet you want to delete, then select Delete from the worksheet menu.
If you want to prevent specific worksheets from being edited or deleted, you can protect them by right-clicking the
desired worksheet and selecting Protect Sheet from the worksheet menu.
Switching between worksheets
If you want to view a different worksheet, you can simply click the tab to switch to that worksheet. However, with
larger workbooks this can sometimes become tedious, as it may require scrolling through all of the tabs to find the
one you want. Instead, you can simply right-click the scroll arrows in the lower-left corner, as shown below.
A dialog box will appear with a list of all of the sheets in your workbook. You can then double-click the sheet you
want to jump to.
Press and hold the Ctrl key on your keyboard. Select the next worksheet you want in the group.
Continue to select worksheets until all of the worksheets you want to group are selected, then release the Ctrl key.
The worksheets are now grouped.
While worksheets are grouped, you can navigate to any worksheet within the group. Any changes made to one
worksheet will appear on every worksheet in the group. However, if you select a worksheet that is not in the group,
all of your worksheets will become ungrouped.
To ungroup worksheets:
Right-click a worksheet in the group, then select Ungroup Sheets from the worksheet menu.
The worksheets will be ungrouped. Alternatively, you can simply click any worksheet not included in the group
to ungroup all worksheets.