Lesson 2 - Data Connections
Lesson 2 - Data Connections
Data Connections
What’s In It For Me
It is important to format data before it can be analyzed using Tableau; this helps to save time and prevent
errors.
Tableau offers the following tools to help prep data for analysis:
• The Joining method is used to combine the related data in those common fields.
• A Join results in a virtual table that is typically extended horizontally by adding columns.
Preparing Data for Tableau
DATA JOINS: EXAMPLE
Shown here is the analysis of data on product sales with two files:
The product ID field serves as the primary key to join the data from the two sets.
Preparing Data for Tableau
TYPES OF JOINS
The resulting table includes data that is present in BOTH data sets.
Inner
Venn Diagram
Left
Right
Full Outer
Preparing Data for Tableau
TYPES OF JOINS
The resulting table contains all values from the LEFT table and any matches from the RIGHT table.
When a value in the LEFT table doesn’t have a corresponding match in the RIGHT table, you see a null
value in the data grid.
Inner
Venn Diagram
Left
Right
Full Outer
Preparing Data for Tableau
TYPES OF JOINS
The resulting table contains all values from the RIGHT table and any matches from the LEFT table.
When a value in the RIGHT table doesn’t have a corresponding match in the LEFT table, you see a null
value in the data grid.
Inner
Venn Diagram
Left
Right
Full Outer
Preparing Data for Tableau
TYPES OF JOINS
The resulting table contains all values from BOTH tables. When a value from EITHER table doesn't
have a match with the other table, you see a null value in the data grid.
Inner
Venn Diagram
Left
Right
Full Outer
Preparing Data for Tableau
JOINS FROM DATABASE
Joining tables from the same database requires only a single connection in the data source.
Tableau Data
Database
Engine
Demo—Perform a Single Database Join
Preparing Data for Tableau
CROSS DATABASE JOINS
• Cross-database Joins require setting up a multi-connection data source by creating a new connection
to each database.
• Multi-connection data sources are helpful when different internal systems are used.
Database 1
Tableau Data
Engine Database 2
Demo—Perform a Cross Database Join
Data Blending
• Blending is a method of combining related data from multiple sources in a single view in order to analyze it.
• There is always one primary data source, while the rest become secondary data sources.
Tableau identifies
common dimensions
within the data sources
so that the user knows
which fields can be
used.
Blending does not create joins at the row level; instead, it automatically creates an outer join to
the secondary source(s).
Data Blending
BLEND VS JOIN
• You have to combine data from • Your data format is consistent across
different databases that do not all sources.
support cross-table joins. • You are working with relatively small
• Data within the different sources are amounts of data.
at different levels of detail. • Data within each source is at the
• Using a Join causes duplicate rows. same level of detail.
• You are working with large amounts
of data
Splits
Splitting data from one field into multiple columns is used often in data preparation.
Example:
City: Denver
City/State: Denver CO
State: CO
This was typically remedied in Excel with the “Text to Columns” function.
Splits
AUTOMATIC SPLITS
• A string field can be split automatically based on a common separator that Tableau detects (space or
underscore).
• This split can be used to automatically separate a field’s value into a maximum of ten new fields,
depending on the type of data connection.
Splits
CUSTOM SPLITS
The custom split can also separate a string field into a maximum of ten new fields based on a separator
within the original field.
Splits
CUSTOM SPLITS
• The Metadata Grid view in Tableau allows you to quickly perform actions, such as rename, hide, and
others, on multiple fields with a single command.
Pivot
Data is often not organized as a typical data set: field names along the columns and members along the rows.
Example:
The Pivot function in Tableau allows you to select the columns you want to manipulate and format them into
a typical data set ready for analysis.
Union
• Data often also resides in multiple, separate files and may need to be combined into a “master file.”
• Tableau’s “Union” feature helps you assemble data from multiple small files into one large file.
Data Interpreter
• This function automatically “cleans” your data and preps it for analysis.
• Examples of items that need to be cleaned prior to analysis:
o Merged cells
o Titles
o Footnotes
o Blank rows or columns
Data Connections
Topic 2: Connecting Data to Tableau
Connecting Data to Tableau
You can create a connection to an SQL database such as MySQL or Microsoft SQL
Server.
• With Tableau, you can connect to applications such as Google Analytics, Market, Salesforce, and others.
• For other applications, Tableau includes a wizard for building custom web data connectors.
These can be
• Text files
• Spreadsheets
• Statistical files such as SAS, tab/character delimited
• Tableau extracts/workbooks/data sources
Sub-Query
Query Fusion Execution
Disk
Methods of Performance Optimization
DATA ENGINE VECTORIZATION
Running
Queries in
Parallel
Data Engine Tableau’s data engine takes advantage of vector instructions on current processors.
Vectorization
The data engine uses SIMD instructions to perform low-level operations such as plus,
minus, divide, min, max, sum, etc., on multiple data in parallel.
External
Query
Caching This means that basic computations can be performed more quickly.
Query Fusion
Methods of Performance Optimization
EXTERNAL QUERY CACHING
Running
Queries in
Parallel
Data Engine
Vectorization
• Tableau saves query results from the previous time the dashboard was opened.
• A single short query is run to fetch the cache data when the workbook is opened.
External
Query
Caching
Query Fusion
Methods of Performance Optimization
QUERY FUSION
Running
Queries in
Parallel
Data Engine
Vectorization
This is a technology for database connections that looks at all of the queries in the dashboard
and finds ways to consolidate them into fewer queries.
External
Query
Caching
Query Fusion
Data Connections
Topic 4: Tableau Data Extract Capabilities
Tableau Data Extracts
Data from your data source can be “extracted” into a file called a Tableau data extract; it
transforms data into a Tableau-friendly format, improving query efficiency.
c. You need to combine data from sources that don’t support cross-table joins
c. You need to combine data from sources that don’t support cross-table joins
In each of these situations, you should consider using a Blend instead of a Join to combine data from
different databases.
a. Union
b. Join
c. Split
d. Data Interpreter
QUIZ The Tableau _______ function separates a string field into multiple string fields.
2
a. Union
b. Join
c. Split
d. Data Interpreter
The Split function is used to separate a string field into multiple string fields.
a. Query fusion
b. Parallel aggregation
d. Query optimization
Technology for database connections that will look at all of the queries in your dashboard and find
QUIZ ways to consolidate them into fewer queries is called ________.
3
a. Query fusion
b. Parallel aggregation
d. Query optimization
Query fusion is a technology that examines all queries in your dashboard and consolidates them to reduce
the number of queries hitting your processors.
Genelia needs to analyze Sales data for her company, and the data she needs is
coming from multiple databases. The data sources include Excel spreadsheet files,
an SQL database, and Salesforce.com. Additionally, the data is at different levels of
detail and forms an extremely large data set.
• Which tools should she consider to prepare the data for analysis?
• What steps should she consider to ensure that queries are executed efficiently?
Solution
• Which tools should she consider to prepare the data for analysis?
Genelia should consider the Union functionality to combine the Excel-based data.
She should use the Pivot functionality to align the levels of detail within the different databases.
Given that she is working with a large set of data, and the data sources are at different levels of
detail, she should use a Blend to combine the data.
Genelia should consider creating a Tableau data extract and filter out data that is not necessary
for this analysis. This will improve the performance of her queries.
Key Takeaways