Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
24 views

CBDA Domain-II Source Data v0.1

CBDA prep material
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

CBDA Domain-II Source Data v0.1

CBDA prep material
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Business Data Analytics

Business Analysis Viewpoint

Domain – II – Source Data

Domain II of IIBA®- CBDA certification

Sudhakar Velagada, MBA, IIBA®-CBDA, CBAP®


Agenda

1 IIBA-CBDA – Domain – II – Source Data

2 Case Study

3 Quiz
5 Practice Domains

1. Identify the research questions 2. Source Data


Formulate the research question(s) that business Determining what type of data must be used for
analytics will be used to answer. the formulated research question and assessing
the quality of the data. Typically, involve the
most amount of effort.

3. Analyze Data 4. Interpret and Report Results


Deciding how data analysis will be performed, including Draw business insights from the analysis results and
which models and mathematical or statistical techniques determine the best possible way to communicate and
will be used. It includes performing the data analysis and report the outcome to intended stakeholders. Think like a
making adjustments to the approach, when analysis results designer and build the data story with the right visuals
are found to be not helping answer the research question. to explain the insights that effectively aids decision-making.

5. Use Results to Influence Business Decision-Making


Translate the analytics outcomes to business recommendations that the
stakeholders and decision-makers can consume. Any business action that is
influenced by the result of analytics initiative may result in an
organizational change that affects people, process, offerings, or technology.
2. Source Data
The Source Data domain is a top-down exercise to determine This task is usually done by a data analyst, data scientist, or
the right data needed for a given research question. The a business analysis professional. While data scientists see
tasks in this domain are performed by individuals who possess datasets as a set of variables, it is the business analysis
strong technical skills related to the data architecture of the professional who brings the insight to determine
organization and the skills required to extract or make the whether a dataset might be useful to explore within a
relevant data available from different data sources. business context.
Meaning of Some Terms

 Data - are collections of any number of related observations. This is also called raw data. [Information before it is
arranged and analyzed is called raw data. It is raw because, it is unprocessed by statistical methods.
For example, Yards produced yesterday by each of the 10 carpet looms in a Carpet production company.

16.2 15.8 16.7 16.2 15.9


16.4 19.9 16.0 15.6 10.1

Data Set – A collection of data is called a data set, and a single observation is a data point.

 A data source is where that data that is being used to run a report or gain information is originating from. For a
database management system, the source is the database. For computer programs, the data source is a spreadsheet,
XML file, data sheet or hard-coded data within the program. Depending on the computer system or program, data
sources will differ.
Examples may include – Operations, ERP systems, Legacy Systems, Point of Sale, RFID Systems, Web Usage, External Sources,
Suppliers etc.

 Data can be Structured (e.g., data residing in a database management system (DBMS)) and/or Unstructured. (e.g.,
text from word processing documents, emails, social media sites, image, audio, or video files)
Meaning of Some Terms

Tests for data – Before relying on any interpreted data, we need to test the data by asking the following questions –

1)Where did the data come from? Is the source biased – that is, is it likely to have an interest in supplying data points that will lead to
one conclusion over the other?

2)Do the data support or contradict other evidence that we have?

3)Is any evidence missing that might cause us to come to a different conclusion?

4)How many observations' do we have? Do they represent all groups that we wish to study?

5)Is the conclusion logical? Have we made any conclusions that the data do not support?
Tasks in “Source Data” Domain

Plan Data Collection Determine the Data Sets. Select Techniques for
Source Data
Performing a review of the data expected
What data is needed, the availability of
from the data sources and determining Acceptance and
the data, the need for historical data,
specifics such as data types, data evaluation criteria
determining when and how the data will
dimensions, sample size, and
be collected, and how the data will be Data Dictionary
relationships between different data
validated once collected.
elements Interface Analysis
Collect Data Validate Data Survey or
Questionnaire
Activities performed to support data Objective is high-level validation – can
professionals with data setup, preparation, include business validation and technical Data Mapping
and collection; Includes profiling of data. validation. Assessing the quality of the data
on the basis of Accuracy, Completeness,
Consistency, Uniqueness, Timeliness
Task – Plan Data Collection

Areas to focus upon to


Plan Data Collection–
 Short-term/Long-
Plan Data Collection
Output
term effect on A Data Collection Plan
business decision- that would
making  Specify the frequency
 Aim of the initiative Techniques – of data collection
 Non functional 1) Elicitation techniques  Specify which sources
requirements 2) Brainstorming to use
 Availability of the 3) Document Analysis  Specify how the data
data will be validated once
 Need for historical Skills & Competencies - collected
data Business Knowledge
 Structured vs Organizational Knowledge
Unstructured data Solution Knowledge
Problem Solving & Decision making
Plan Data Collection - Additional notes
 Aim of the initiative will help us to understand two very important aspects of our planning activity -

 What type of data is to be focused on - Quantitative, Qualitative or both?

 Which data methods to choose? Sample representation below can be referred –

Method When to use How to collect data

Survey To understand the general characteristics or opinions of Distribute a list of questions to a sample
a group of people. online, in person or over-the-phone.
Experiment To test a causal relationship. Manipulate variables and measure their
effects on others.
Interview/focus group To gain an in-depth understanding of perceptions or Verbally ask participants open-ended
opinions on a topic. questions in individual interviews or focus
group discussions.
Secondary data collection To analyze data from populations that you can’t access Find existing datasets that have already been
first-hand. collected, from sources such as government
agencies or research organizations.
Observation To understand something in its natural setting. Measure or survey a sample without trying to
affect them.
Ethnography To study the culture of a community or organization Join and participate in a community and
first-hand. record your observations and reflections.
Task – Determine Data Sets

Areas to focus upon to Determine Data Sets


determine the data sets –
 Review of the
expected data from Output
the various sources in Techniques –
terms of - What Data 1) Visualization  Decision on which
types, Sample Size, 2) Data Modeling (e.g., ERD diagrams) whole/partial data sets
Data Dimensions, 3) Concept Modeling to be collected
Relationship between 4) Process Modeling  Identifying Data gaps
different data 5) Data Profiling & Sampling
elements can be 6) Data Dictionary
used. Skills & Competencies -
Creative thinking
Conceptual thinking
Business Acumen
Determine Data Sets - Additional notes
Data Gap Analysis - specifically relates to the cases like missing data or non-existing data that may have occurred due
to inefficient data collection process.

Use Five Vs’ assessment to determine which data sets to consider -

 Volume: The amount of data being produced and the size of the data sets that we need to process determines the
Volume.
 Velocity: The speed at which data is generated and the frequency by which the data needs to be collected and
processed determines the Velocity.
 Variety: is determined by the variety of data sources, formats, and types needing to be processed
 Veracity: implies the trustworthiness of the data and also represents the uncertainties‘ and inconsistencies existing
in the data. It is the ability of “managing the reliability and predictability of inherently imprecise data types”.
 Value: indicates towards the necessity of putting in time and effort in any analytics initiative from real, valuable
business goals.

 Data discovery is a term used to describe the process for collecting data from various sources by detecting patterns
and outliers with the help of guided advanced analytics and visual navigation of data, thus enabling consolidation of
all business information. Some commonly used tools include - Looker, Qlik Sense, Tableau.
An Example on incomplete or biased data
An advertisement claim by a Truck Association –
“75% of everything you use travels by truck”

What is your interpretation?

An easy interpretation could be –


“Cars, Railways, Airplanes, Ships and other forms of transportation carry only 25% of what we use”

Is this interpretation correct?

Missing part – Question on double counting - What did they do when something was carried to your city by rail
and then delivered by truck? How are packages treated when they went by airmail and then by truck?

When the issue of double counting is resolved, it turns out that, although trucks are involved in delivering a relatively
high proportion of what you use, railways and ships still carry more goods for more total kilometers.
Visualization Technique use in Determining Data Sets
Visualization provides a unique perspective on the dataset. You
can visualize data in lots of different ways.
 Tables are very powerful when you are dealing with a
relatively small number of data points. – Good at showing
one-dimensional outliers but poor at showing comparing
multiple dimensions at the same time (e.g., population per
country over time)
 Charts are helpful in displaying data over multi-dimensions.

 Use of state/region/country maps - The power of maps is to


re-connect the data to our very physical world

(Source: Data insights: a visualization (Gregor Aisch)


Task – Collect Data

Areas to focus upon while


Collecting Data– Collect Data
 Inputs from the data Output
collection plan (task)
• Determine if data  Ensure disparate
is originating sources represent the
from different Techniques –
1) Interface Analysis same data in the same
sources way
• What are those 2) Surveys
3) Acceptance and evaluation criteria  Data formatting
sources (DB;  Identify problems with
Excel…) 4) Data Dictionary
5) Data flow diagrams the data collection
• Transformations approach
needed, if any.
 Data management Skills & Competencies -
plan Trustworthiness
Ethics
Business Acumen
Collect Data - Additional notes
 Collecting data implies all activities that are required to be performed to support -
 Data Setup
 Data Preparation
 Data collection

 2 main approaches to data collection -


 Passive data collection – e.g., use of automated systems to record daily transaction data
 Active data collection – Not readily available; Special efforts like survey etc., will be used to actively collect the
data; Analysts can contribute in designing the survey.
Data Dictionary – (technique) - Sample
Task – Validate Data

Validate Data

Areas to focus upon to Output


Validate Data –
 Understand business  Business Validation of
Techniques –
context 1) Data Mapping the data
 Quality aspects of the 2) Business Rules Analysis  Technical Validation of
data 3) Acceptance and evaluation criteria the data

Skills & Competencies -


Conceptual thinking
Systems thinking
Business Knowledge
Validate Data - Additional notes
 Data Validation guarantees for fitness and consistency of data in an application or automated system, typically
performed prior to importing and processing.

 Data validation may be performed by a data analyst, data scientist, or business analysis practitioner with
sufficient skills to use the necessary tools to access data and the underlying competencies to analyze the results.

e.g., If the outcome of data


analysis is expected to be a
Business report, business validation
Validation involves validating the
format and data elements
to be included in the report.

involves technical testing and Technical


validation to assess data quality Validation
Characteristics of High-Quality Data
Accuracy Completeness
Data is Not misleading - data data is comprehensive and
is correct and represents what includes what is expected and
was intended by the source. nothing is missing

Consistency Uniqueness
how reliable the data is. No duplicates exist
Implies value of a data
element is the same
across sources

Timeliness
data is current and not
out of date
Case Study – Data Collection
Let’s say we are researching employee perceptions of their direct managers in a large organization. How do we
proceed about collecting data?
• Our first aim is to assess whether there are significant differences in perceptions of managers across different
departments and office locations.
• Our second aim is to gather meaningful feedback from employees to explore new ideas for how managers can
improve.

Using our plan data collection task – we determine which type of data shall we focus on – Quantitative, Qualitative or
both?....We decide to use a mixed-methods approach to collect both quantitative and qualitative data.
We also decide to use the Survey Method.

We operationalize the plan by transforming the conceptual understanding of what we want to study into operational
definition of what will actually measure.

Operationalizing the conceptual definition would result in the following -


• We ask managers to rate their own leadership skills on 5-point scales assessing the ability to delegate,
decisiveness and dependability.
• We ask their direct employees to provide anonymous feedback on the managers regarding the same topics.
Case Study – Data Collection
Before going full-on with collecting data, we decide to test the data collection approach by using a small number of
observations. We focus on small population to run the test survey and study the results..

Using multiple ratings of a single concept can help us cross-check our data and assess the test validity of our
measures.

We make use of our Sampling plan, Agreed Standardization procedures , Data Management Plan (all part of our data
collection plan) to actually collect and store the collected data.

Implementing the data collection plan –

1) We administer a survey with closed- and open-ended questions to a sample of 300 company employees across
different departments and locations.

2) The closed-ended questions ask participants to rate their manager’s leadership skills on scales from 1–5. The data
produced is numerical and can be statistically analyzed for averages and patterns.

3) The open-ended questions ask participants for examples of what the manager is doing well now and what they
can do better in the future. The data produced is qualitative and can be categorized through content analysis
[categorize or “code” words, themes, and concepts within the texts and then analyze the results] for further insights.
Quiz

Q.1 The degree of potential harm caused by faulty data collection depends on
the nature of the investigation and whether it is used to support public
policy.-

a) False

b) True
Quiz

Q.1 Answer – The correct answer is True. Faulty data collection compromises
the validity of the results, regardless of analytic procedure used.

You also risk potentially serious implications (economic and public safety)
when such an initiative is for public policy.
Quiz

Q.2 Quality control identifies the actions necessary to correct faulty data
collection practices-

a)but fails to address future occurrences


b)and also minimizes future occurrences
c)and also maximizes future occurrences
d)but is unrelated to future occurrences
Quiz

Q.2 Answer – (B) The correct answer is that it minimizes future occurrence.
Quiz

Q.3 __________ , refers to activities that take place before data collection
begins.

(a) Quality preparation

(b) Quality control

(c) Quality assurance

(d) Quality indexing


Quiz

Q.3 Answer – (C) Quality assurance refers to activities that take place before
data collection begins.
Quiz

Q.4 The selection of appropriate data collection instruments and provision of


clearly delineated instructions for their correct use ___________ the likelihood
of errors occurring.

(a) compromises

(b) has no impact on

(c) decreases

(d) increases
Quiz

Q.4 Answer – (C) The correct answer is to decrease the likelihood of errors
occurring.
Quiz

Q.5 The primary rationale for preserving data integrity is to:

(a) determine statistical significance

(b) compare data collection instruments

(c) encourage higher rates of participant recruitment

(d) support the detection of errors


Quiz

Q.5 Answer – (D) The correct answer is to detect the presence of errors.
Business Data Analytics Domains - Relationships

You might also like