CompTIA Data+ Practice Test
CompTIA Data+ Practice Test
CompTIA Data+ Practice Test
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form, electronic, mechanical,
photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States
Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, ExamsDigest, LLC., Holzmarktstraße 73, Berlin, Germany or online at
https://www.examsdigest.com/contact.
Trademarks: ExamsDigest, examsdigest.com and related trade dress are trademarks or registered trademarks of ExamsDigest
LLC. and may not be used without written permission. Amazon is a registered trademark of Amazon, Inc. All other trademarks
are the property of their respective owners. ExamsDigest, LLC. is not associated with any product or vendor mentioned in this
book.
Examsdigest publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard
print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or
DVD that is not included in the version you purchased, you may find this material at https://examsdigest.com
CONTENTS AT A GLANCE
Contents at a glance
Introduction
Chapter 1 Data Concepts And Environments
Questions 1-10
Answers 1-10
Questions 11-30
Answers 11-30
Questions 31-50
Answers 31-50
Chapter 4 Visualizations
Questions 51-65
Answers 51-65
Questions 66-80
Answers 66-80
BONUSES & DISCOUNTS
INTRODUCTION
CompTIA Data+ is an early-career data analytics certification for professionals tasked with
developing and promoting data-driven business decision-making.
This book has been designed to help you prepare for the style of questions you will receive on
the CompTIA Data+ DA0-001 exams. It also helps you understand the topics you can expect to
be tested on for each exam.
In order to properly prepare for the CompTIA Data+ DA0-001, I recommend that you:
✓ Do practice test questions: After you review a reference book and perform some hands-on
work, attack the questions in this book to get you “exam ready”! Also claim your free 1-month
access on our platform to dive into to more questions, flashcards and much much more.
The online practice that comes free with this book offers you the same questions and answers
that are available here and more.
The beauty of the online questions is that you can customize your online practice to focus on the
topic areas that give you the most trouble.
So if you need help with the domain Network Security, then select questions related to this topic
online and start practicing.
Whether you practice a few hundred problems in one sitting or a couple dozen, and whether you
focus on a few types of problems or practice every type, the online program keeps track of the
questions you get right and wrong so that you can monitor your progress and spend time
studying exactly what you need.
You can access these online tools by sending an email to the info@examsdigest.com to claim
access on our platform. Once we confirm the purchase you can enjoy your free access.
Exam Content
Content Outline
Candidates are encouraged to use this document to help prepare for the CompTIA Data+ (DA0 -
001) certification exam. This exam will certify the successful candidate has the knowledge and
skills required to transform business requirements in support of data-driven decisions by:
Analyzing complex datasets while adhering to governance and quality standards throughout
the entire data life cycle This is equivalent to 18–24 months of hands-on experience working
in a business intelligence report/data analyst job role. These content examples are meant to
clarify the test objectives and should not be construed as a comprehensive listing of all the
content of this examination.
The table below lists the domains measured by this examination and the extent to which they are
represented:
Question 2. A web developer is developing a new application using React.js for the front-end
and Python for the backend. He wants to store data in JavaScript Object Notation (JSON) format
in order to transmit the data between the web application and the server.
Question 3. You have been tasked to create the login form for the website of your client. You
design the database for the login form but you forgot to add the column that keeps a boolean type
(0/1) which shows if a user has an account or not. If a user has an account, then the database
should be stored the value 1, alternatively the value 0.
Which of the following data types should you use to complete the database?
(A) Date
(B) Numeric
(C) Alphanumeric
(D) Currency
Question 4. A web developer which develops an e-commerce marketplace designs its database
to capture, store and process data from transactions in real-time. The e-commerce platform will
deal with many standard and straightforward queries such as insert, delete, and update and the
data will be stored in 3NF (third normal form).
In which of the following databases would the developer store the data?
Question 5. Maria, a senior database manager, wants to create a new DB for the sales
department. She needs to create 2 tables, Employees and Sales respectively. The Employees table
contains a single row representing an employee with each employee assigned a unique id
(primary key). The second table, Sales, contains individual sales records associated with the
employee who made the sale.
Question 6. Which of the following are common examples of structured data? (Choose TWO)
Question 7. Which of the following are common examples of unstructured data? (Choose TWO)
Question 8. A junior web developer is developing a new application where users can upload
short videos. The first task is to create a homepage that shows the headline “Upload Your Short
Videos” and a clickable button that says “upload now”.
Which of the following HTML commands would help the developer to complete the task
successfully?
Question 9. Which of the following statements BEST describes the difference between discrete
and continuous data types?
(A) Continuous data is a numerical type of data that includes numbers with fixed data values
determined by counting. Discrete data includes complex numbers and varying data values that
are measured over a specific time interval
(B) Discrete data is a numerical type of data that includes numbers with fixed data values
determined by counting. Continuous data includes complex numbers and varying data values that
are measured over a specific time interval
(C) Continuous data is a alphanumeric type of data that includes characters with fixed data
values determined by counting. Discrete data includes complex characters and varying data
values that are measured over a specific time interval
(D) Discrete data is a alphanumeric type of data that includes characters with fixed data values
determined by counting. Continuous data includes complex characters and varying data values
that are measured over a specific time interval
Question 10. A business analyst requests an analysis of data to display a table of all of a
company’s cloth products that were sold in the UK in June, compare the sales figures with those
in September, and then compare them with other product sales in the UK over the same period.
Which of the following methods enables the data scientist to extract and query data in order to
analyze it from different angles?
Question 1. Drag and drop the data file formats into their respective use cases.
(A) XML
(B) Flat
(C) JSON
(D) HTML
Explanation 1.
JSON transmits data in web applications.
XML provides a standard method to access information.
HTML creates web pages and web applications.
Flat imports data in data warehousing projects.
Question 2. A web developer is developing a new application using React.js for the front-end
and Python for the backend. He wants to store data in JavaScript Object Notation (JSON) format
in order to transmit the data between the web application and the server.
A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation
(JSON) format, which is a standard data interchange format. It is primarily used for transmitting
data between a web application and a server.
This JSON is an array of objects, each object contains some information about a person.
[
{“name”:”Nick”, “age”:29, “email”:”info@examsdigest.com”},
{“name”:”Milanie”, “age”:25, “email”:”info@examsdigest.com”}
]
Question 3. You have been tasked to create the login form for the website of your client. You
design the database for the login form but you forgot to add the column that keeps a boolean type
(0/1) which shows if a user has an account or not. If a user has an account, then the database
should be stored the value 1, alternatively the value 0.
Which of the following data types should you use to complete the database?
(A) Date
(B) Numeric
(C) Alphanumeric
(D) Currency
Explanation 3. The correct answer is: Numeric. Numeric is any intrinsic data type (Byte,
Boolean, Integer, Long, Currency, Single, Double). Numeric data types are numbers stored in
database columns. These data types are typically grouped by: Exact numeric types, values where
the precision and scale need to be preserved.
Date is incorrect. The Date is used for storing a date or a date/time value in the database.
Question 4. A web developer which develops an e-commerce marketplace designs its database
to capture, store and process data from transactions in real-time. The e-commerce platform will
deal with many standard and straightforward queries such as insert, delete, and update and the
data will be stored in 3NF (third normal form).
In which of the following databases would the developer store the data?
(A) Online analytical processing
(B) Online query processing
(C) Online standard processing
(D) Online transactional processing
Online transaction processing (OLTP) captures, stores, and processes data from transactions in
real-time. An OLTP database stores and manages data related to everyday operations within a
system or a company. However, OLTP is focused on transaction-oriented tasks.
OLTP typically deals with query processing (inserting, updating, deleting data in a database),
and maintaining data integrity and effectiveness when dealing with numerous transactions
simultaneously. Each transaction involves individual database records made up of multiple fields
or columns. Examples include banking and credit card activity or retail checkout scanning.
Online analytical processing is incorrect. Online analytical processing (OLAP) uses complex
queries to analyze aggregated historical data from OLTP systems. OLTP and OLAP are two
systems that complement each other. While OLTP deals with processing day-to-day transactions,
OLAP helps analyze the processed data.
Online query processing and Online standard processing are incorrect as these are
imaginary terms.
Question 5. Maria, a senior database manager, wants to create a new DB for the sales
department. She needs to create 2 tables, Employees and Sales respectively. The Employees table
contains a single row representing an employee with each employee assigned a unique id
(primary key). The second table, Sales, contains individual sales records associated with the
employee who made the sale.
A relational database, also called Relational Database Management System (RDBMS) or SQL
database, stores data in tables and rows also referred to as records. A relational database works
by linking information from multiple tables through the use of “keys.”
A key is a unique identifier that can be assigned to a row of data contained within a table. This
unique identifier called a “primary key,” can then be included in a record located in another table
when that record has a relationship to the primary record in the main table. When this unique
primary key is added to a record in another table, it is called a “foreign key” in the associated
table. The connection between the primary and foreign keys then creates the “relationship”
between records contained across multiple tables.
Some of the more popular NoSQL databases are MongoDB, Apache Cassandra, Redis,
Couchbase, and Apache HBase.
There are four popular non-relational types: document data store, column-oriented database, key-
value store, and graph database. Often combinations of these types are used for a single
application.
Star is incorrect Star schema in the data warehouse, in which the center of the star can have one
fact table and a number of associated dimension tables. It is known as star schema as its structure
resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema.
Question 6. Which of the following are common examples of structured data? (Choose TWO)
Structured data is data that adheres to a pre-defined data model and is therefore straightforward
to analyze. Structured data conforms to a tabular format with relationships between the different
rows and columns. Common examples of structured data are Excel files or SQL databases. Each
of these has structured rows and columns that can be sorted.
Structured data depends on the existence of a data model – a model of how data can be stored,
processed, and accessed. Because of a data model, each field is discrete and can be accessed
separately or jointly along with data from other fields. This makes structured data extremely
powerful: it is possible to quickly aggregate data from various locations in the database.
Structured data is considered the most traditional form of data storage since the earliest versions
of database management systems (DBMS) were able to store, process and access structured data.
Question 7. Which of the following are common examples of unstructured data? (Choose TWO)
Unstructured data is more or less all the data that is not structured. Even though unstructured
data may have a native, internal structure, it’s not structured in a predefined way. There is no
data model; the data is stored in its native format. Common examples of unstructured data
include audio, video files or No-SQL databases.
Question 8. A junior web developer is developing a new application where users can upload
short videos. The first task is to create a homepage that shows the headline “Upload Your Short
Videos” and a clickable button that says “upload now”.
Which of the following HTML commands would help the developer to complete the task
successfully?
The <h1> to <h6> tags are used to define HTML headings. <h1> defines the most important
heading. <h6> defines the least important heading.
Note: Only use one <h1> per page – this should represent the main heading/subject for the whole
page.
Question 9. Which of the following statements BEST describes the difference between discrete
and continuous data types?
(A) Continuous data is a numerical type of data that includes numbers with fixed data values
determined by counting. Discrete data includes complex numbers and varying data values that
are measured over a specific time interval
(B) Discrete data is a numerical type of data that includes numbers with fixed data values
determined by counting. Continuous data includes complex numbers and varying data values that
are measured over a specific time interval
(C) Continuous data is a alphanumeric type of data that includes characters with fixed data
values determined by counting. Discrete data includes complex characters and varying data
values that are measured over a specific time interval
(D) Discrete data is a alphanumeric type of data that includes characters with fixed data values
determined by counting. Continuous data includes complex characters and varying data values
that are measured over a specific time interval
Explanation 9. The correct answer is: Discrete data is a numerical type of data that
includes numbers with fixed data values determined by counting. Continuous data includes
complex numbers and varying data values that are measured over a specific time interval.
Question 10. A business analyst requests an analysis of data to display a table of all of a
company’s cloth products that were sold in the UK in June, compare the sales figures with those
in September, and then compare them with other product sales in the UK over the same period.
Which of the following methods enables the data scientist to extract and query data in order to
analyze it from different angles?
Online Analytical Processing (OLAP) is a method that enables users to easily and selectively
extract and query data in order to analyze it from different angles. OLAP queries help with trend
analysis, financial reporting, sales forecasting, budgeting, and other planning purposes, among
other things.
OLAP is used for business analyses, including planning, budgeting, forecasting, data mining, and
deals with few queries, but they are complex and involve a lot of data (for example, aggregate
queries). Mainly uses the select statement.
OLTP typically deals with query processing (inserting, updating, deleting data in a database),
and maintaining data integrity and effectiveness when dealing with numerous transactions
simultaneously. Each transaction involves individual database records made up of multiple fields
or columns. Examples include banking and credit card activity or retail checkout scanning.
Online query processing and Online standard processing are incorrect as these are
imaginary terms.
CHAPTER 2
DATA MINING
Questions 11-30
Question 11. A database administrator designs a new database with two tables.
1. Employee
2. Department
The Employee table has the following columns:
1. Employee_birthdate
2. Employee_Year_Of_Registration_In_Company
3. Employee_Total_Years_Of_Registration
Which of the following refers to the situation that there is data in the database that can be
removed without losing information?
(A) Non-parametric data
(B) Redundant data
(C) Duplicate data
(D) Invalid data
Question 12. An online shop sends orders based on the zip code customers type during the
checkout process. Many orders never received the customers because customers misspell the zip
code into the input field.
Which of the following solutions SHOULD the online shop implement to solve this issue?
(A) Non-parametric data
(B) Data outliers
(C) Data type validation
(D) Specification mismatch
Question 13. A web analyst wants to use automated bots to crawl through the internet and
extract data from targeted sites. He wants the collected data to be delivered in a readable form
such as CSV for further analysis.
Which of the following data collection methods is the MOST appropriate method to employ?
Question 14. A web developer wants to ensure that malicious users can’t type SQL statements
when they asked for input, like their username/userid. Which of the following query optimization
techniques would effectively prevent SQL Injection attacks?
(A) Indexing
(B) Temporary table in the query set
(C) Parametrization
(D) Subset of records
Question 15. Which of the following SQL commands will be used to prevent SQL injection
attacks?
Question 16. Consider the following dataset which contains information about houses that are
for sale.
Which of the following string manipulation commands will combine the address and region
name columns to create a full address?
full_address
-----------------------------------------
85 Turner St, Northern Metropolitan
25 Bloomburg St, Northern Metropolitan
5 Charles St, Northern Metropolitan
40 Federation La, Northern Metropolitan
55a Park St, Northern Metropolitan
Question 17. An online shop wants to expand its customer base using SMS marketing
campaigns. The online shop has already a database with customers with the following columns.
1. First_Name
2. Last_Name
3. Email
The head of marketing wants to add one more column titled “Phone_Numbers”. Also, he wants
to match the customer’s names with the respective phone numbers.
Which of the following data manipulation techniques would the head of marketing use to add the
phone numbers into the database without affecting the existing data?
(A) Imputation
(B) Data append
(C) Transpose
(D) Normalize data
Question 18. Consider the following data sets.
Question 19. Consider the following dataset which contains information about houses that are
for sale.
Which of the following string manipulation commands will replace the “St” characters in the
address column with the word “Street”?
Question 20. Which of the following data integration processes combines data from multiple
data sources into a single, consistent data store that is loaded into a data warehouse?
Which of the following data manipulation techniques should the data analyst use to create
“Income Categories”?
Question 22. Which of the following Date function commands would effectively calculate the
number of days, months, or years between two dates?
(A) MONTH()
(B) DATEDIF()
(C) DATEVALUE()
(D) DAYS()
(A) CURDATE()
(B) Filtering
(C) Sorting
(D) IsEmpty()
Question 24. A data analyst wants to show and hide information on his sheet based on selected
criteria. Which of the following data manipulation techniques is most appropriate in this case?
(A) Sorting
(B) Parametrization
(C) Filtering
(D) Indexing
Question 25. A web analyst wants to display a chart of daily hits on the keyword “it
certifications” for four different states with the help of Google Trends and Python.
Which of the following data collection methods will effectively request data from google trends
and display them in a chart?
Question 27. A SQL database administrator wants to display the total number of employees.
Which of the following SQL commands should the administrator use to display the total
number?
Question 28. A data scientist wants to see which products make the most money and which
products attract the most customer purchasing interest in their company.
Which of the following data manipulation techniques would he use to obtain this information?
Question 30. A database administrator is responsible to optimise the data queries from 800ms to
300ms by using indexing.
Which of the following techniques, indexing uses to make columns faster to query?
(A) Duplicate the table where data is stored with less data
(B) Sort the data alphabetically
(C) Create pointers where data is stored within a database
(D) Filter the data using logical functions
Answers 11-30
Question 11. A database administrator designs a new database with two tables.
1. Employee
2. Department
The Employee table has the following columns:
1. Employee_birthdate
2. Employee_Year_Of_Registration_In_Company
3. Employee_Total_Years_Of_Registration
Which of the following refers to the situation that there is data in the database that can be
removed without losing information?
(A) Non-parametric data
(B) Redundant data
(C) Duplicate data
(D) Invalid data
The term data redundancy is in the same context, relational database design, used to refer to the
situation that there is data in the database that can be removed without losing information.
Consider our case where for each employee there is a birthdate, a year of registration in the
company, and the total number of years in the company. The last piece of information can be
derived from the other two, and so is redundant.
Question 12. An online shop sends orders based on the zip code customers type during the
checkout process. Many orders never received the customers because customers misspell the zip
code into the input field.
Which of the following solutions SHOULD the online shop implement to solve this issue?
(A) Non-parametric data
(B) Data outliers
(C) Data type validation
(D) Specification mismatch
Data validation refers to the process of ensuring the accuracy and quality of data. It is
implemented by building several checks into a system or report to ensure the logical consistency
of input and stored data. In automated systems, data is entered with minimal or no human
supervision. Therefore, it is necessary to ensure that the data that enters the system is correct and
meets the desired quality standards. The data will be of little use if it is not entered properly and
can create bigger downstream reporting issues.
Question 13. A web analyst wants to use automated bots to crawl through the internet and
extract data from targeted sites. He wants the collected data to be delivered in a readable form
such as CSV for further analysis.
Which of the following data collection methods is the MOST appropriate method to employ?
Web scraping is a process of using automated bots to crawl through the internet and extract data.
The bots collect information by first breaking down the targeted site to its most basic form,
HTML text, then scan through to gather data according to some preset parameters. After that, the
collected data is delivered in CSV or Excel format, so it is readable for whoever wants to use it.
Question 14. A web developer wants to ensure that malicious users can’t type SQL statements
when they asked for input, like their username/userid. Which of the following query optimization
techniques would effectively prevent SQL Injection attacks?
(A) Indexing
(B) Temporary table in the query set
(C) Parametrization
(D) Subset of records
Parameterized SQL queries allow you to place parameters in an SQL query instead of a constant
value. A parameter takes a value only when the query is executed, allowing the query to be
reused with different values and purposes. Parameterized SQL statements are available in some
analysis clients, and are also available through the Historian SDK.
For example, you could create the following conditional SQL query, which contains a parameter
for the collector name:
Explanation 15. The correct answer is: SELECT * FROM ExamsDigest_Courses WHERE
certName= ? ORDER BY date
Parameterized SQL queries allow you to place parameters in an SQL query instead of a constant
value. A parameter takes a value only when the query is executed, which allows the query to be
reused with different values and for different purposes. Parameterized SQL statements are
available in some analysis clients, and are also available through the Historian SDK.
For example, you could create the following conditional SQL query, which contains a parameter
for the collector name:
If your analysis client were to pass the parameter CompTIA Data+ along with the query, it would
look like this when executed
The benefit of parameterized SQL queries is that you can prepare them ahead of time and reuse
them for similar applications without having to create distinct SQL queries for each case.
SQL Injection is best prevented through the use of parameterized queries as well.
Question 16. Consider the following dataset which contains information about houses that are
for sale.
Which of the following string manipulation commands will combine the address and region
name columns to create a full address?
full_address
-----------------------------------------
85 Turner St, Northern Metropolitan
25 Bloomburg St, Northern Metropolitan
5 Charles St, Northern Metropolitan
40 Federation La, Northern Metropolitan
55a Park St, Northern Metropolitan
String manipulation (or string handling) is the process of changing, parsing, splicing, pasting, or
analyzing strings. SQL is used for managing data in a relational database.
Syntax
CONCAT(string1, string2, …., string_n)
Parameter Values
Parameter Description
Question 17. An online shop wants to expand its customer base using SMS marketing
campaigns. The online shop has already a database with customers with the following columns.
1. First_Name
2. Last_Name
3. Email
The head of marketing wants to add one more column titled “Phone_Numbers”. Also, he wants
to match the customer’s names with the respective phone numbers.
Which of the following data manipulation techniques would the head of marketing use to add the
phone numbers into the database without affecting the existing data?
(A) Imputation
(B) Data append
(C) Transpose
(D) Normalize data
Data append is a process that involves adding new data elements to an existing database. An
example of a common data append would be the enhancement of a company’s customer files. A
data append takes the information they have, matches it against a larger database of business
data, allowing the desired missing data fields to be added.
Imputation is incorrect. Imputation is the process of replacing missing data with substituted
values.
Transpose is incorrect. Transposing data is where the data in the rows are turned into columns,
and the data in the columns is turned into rows.
Example:
Normalize data is incorrect. Data normalization is the process of structuring your relational
customer database, following a series of normal forms. This improves the accuracy and integrity
of your data while ensuring that your database is easier to navigate.
Microsoft Excel provides 4 logical functions to work with the logical values. The functions are
AND, OR, XOR, and NOT. You use these functions when you want to carry out more than one
comparison in your formula or test multiple conditions instead of just one. As well as logical
operators, Excel logical functions return either TRUE or FALSE when their arguments are
evaluated.
The following table provides a short summary of what each logical function does to help you
choose the right formula for a specific task.
Question 19. Consider the following dataset which contains information about houses that are
for sale.
Which of the following string manipulation commands will replace the “St” characters in the
address column with the word “Street”?
Explanation 19. The correct answer is: SELECT address, REPLACE(address, ‘St’,
‘Street’) AS new_address
FROM melb
LIMIT 5;
String manipulation (or string handling) is the process of changing, parsing, splicing, pasting, or
analyzing strings. SQL is used for managing data in a relational database.
The REPLACE() function replaces all occurrences of a substring within a string, with a new
substring.
Parameter Description
Question 20. Which of the following data integration processes combines data from multiple
data sources into a single, consistent data store that is loaded into a data warehouse?
Explanation 20. The correct answer is: Extract, transform, load (ETL).
ETL, which stands for extract, transform and load, is a data integration process that combines
data from multiple data sources into a single, consistent data store that is loaded into a data
warehouse or other target system.
Extract, load, transform (ELT) is incorrect. ELT stands for “Extract, Load, and Transform.”
In this process, data gets leveraged via a data warehouse in order to do basic transformations.
That means there’s no need for data staging. ELT uses cloud-based data warehousing solutions
for all different types of data – including structured, unstructured, semi-structured, and even raw
data types.
Delta load is incorrect. Delta is the incremental load between the last data load and now.
For example, if your yesterday’s load inserted 50 records into your target table and today 80 new
records have come to your source system, you insert only the latest 80 records into the target
after checking against the target table. These 80 records are the delta records.
Application programming interfaces (APIs) is incorrect. API is the acronym for Application
Programming Interface, which is a software intermediary that allows two applications to talk to
each other. Each time you use an app like Facebook, send an instant message, or check the
weather on your phone, you’re using an API.
When you use an application on your mobile phone, the application connects to the Internet and
sends data to a server. The server then retrieves that data, interprets it, performs the necessary
actions and sends it back to your phone. The application then interprets that data and presents
you with the information you wanted in a readable way. This is what an API is – all of this
happens via API.
Question 21. A data analyst wants to create “Income Categories” that would be calculated based
on the existing variable “Income”.
Which of the following data manipulation techniques should the data analyst use to create
“Income Categories”?
Derived variables are variables that you create by calculating or categorizing variables that
already exist in your data set.
Data merge is incorrect. Data merging is the process of combining two or more data sets into a
single data set.
Data blending is incorrect. Data blending involves pulling data from different sources and
creating a single, unique, dataset for visualization and analysis.
Data append is incorrect. A data append is a process that involves adding new data elements to
an existing database.
Question 22. Which of the following Date function commands would effectively calculate the
number of days, months, or years between two dates?
(A) MONTH()
(B) DATEDIF()
(C) DATEVALUE()
(D) DAYS()
Explanation 22. The correct answer is: DATEDIF()
Examples:
Description
Start_date End_date Formula (Result)
(A) CURDATE()
(B) Filtering
(C) Sorting
(D) IsEmpty()
Sorting lets you organize all or part of your data in ascending or descending order. Note that you
cannot undo a sort after it has been saved so you’ll want to make sure that all of your rows in
your sheet, including parent rows in a hierarchy, are ordered the way you want before saving.
Question 24. A data analyst wants to show and hide information on his sheet based on selected
criteria. Which of the following data manipulation techniques is most appropriate in this case?
(A) Sorting
(B) Parametrization
(C) Filtering
(D) Indexing
Question 25. A web analyst wants to display a chart of daily hits on the keyword “it
certifications” for four different states with the help of Google Trends and Python.
Which of the following data collection methods will effectively request data from google trends
and display them in a chart?
Explanation 25. The correct answer is: Application programming interface (API)
Sorting lets you organize all or part of your data in ascending or descend
An increasingly popular method for collecting data online is via a Representational State
Transfer Application Program Interface REST API or simply API. Google makes many APIs
available for public use. One such API is Google Trends. This provides data on query activity of
any keyword you can think of. Basically, this data tells you what topics people are interested in
getting information on.
Python can be used for API calls. For example, to access the Google trends API we can use the
Python library pytrends. Here we plot trends for the keywords “CompTIA” and “ExamsDigest”
from pytrends.request import TrendReq
import pandas as pd
from matplotlib import pyplot
import time
startTime = time.time()
pytrend = TrendReq(hl='en-US', tz=360)
keywords = ['CompTIA','ExamsDigest']
pytrend.build_payload(
kw_list=keywords,
cat=0,
timeframe='2020-02-01 2020-07-01',
geo='US',
gprop='')
data = pytrend.interest_over_time()
data.plot(title="Google Search Trends")
pyplot.show()
Question 26. The human resources (HR) department of the ACME Corporation needs to monitor
changes in employee satisfaction over time. Which of the following data collection methods will
BEST help the HR department to monitor the changes?
Question 27. A SQL database administrator wants to display the total number of employees.
Which of the following SQL commands should the administrator use to display the total
number?
An aggregate function performs a calculation on a set of values and returns a single value.
APPROX_COUNT_DISTINCT
AVG
CHECKSUM_AGG
COUNT
COUNT_BIG
GROUPING
GROUPING_ID
MAX
MIN
STDEV
STDEVP
STRING_AGG
SUM
VAR
VARP
Question 28. A data scientist wants to see which products make the most money and which
products attract the most customer purchasing interest in their company.
Which of the following data manipulation techniques would he use to obtain this information?
Data blending is combining multiple data sources to create a single, new dataset, which can be
presented visually in a dashboard or other visualization and can then be processed or analyzed.
Enterprises get their data from a variety of sources, and users may want to temporarily bring
together different datasets to compare data relationships or answer a specific question.
Data append is incorrect. Data append is a process that involves adding new data elements to
an existing database. An example of a common data append would be the enhancement of a
company’s customer files. A data append takes the information they have, matches it against a
larger database of business data, allowing the desired missing data fields to be added.
Normalize data is incorrect. Data normalization is the process of structuring your relational
customer database, following a series of normal forms. This improves the accuracy and integrity
of your data while ensuring that your database is easier to navigate.
Data merge is incorrect. Data merging is the process of combining two or more data sets into a
single data set.
Question 29. Which of the following data manipulation techniques improves the accuracy and
integrity of your data while ensuring that your database is easier to navigate?
Data normalization is the process of structuring your relational customer database, following a
series of normal forms. This improves the accuracy and integrity of your data while ensuring that
your database is easier to navigate.
Data append is incorrect. Data append is a process that involves adding new data elements to
an existing database. An example of a common data append would be the enhancement of a
company’s customer files. A data append takes the information they have, matches it against a
larger database of business data, allowing the desired missing data fields to be added.
Data blending is incorrect. Data blending is combining multiple data sources to create a single,
new dataset, which can be presented visually in a dashboard or other visualization and can then
be processed or analyzed.
Enterprises get their data from a variety of sources, and users may want to temporarily bring
together different datasets to compare data relationships or answer a specific question.
Data merge is incorrect. Data merging is the process of combining two or more data sets into a
single data set.
Question 30. A database administrator is responsible to optimise the data queries from 800ms to
300ms by using indexing.
Which of the following techniques, indexing uses to make columns faster to query?
(A) Duplicate the table where data is stored with less data
(B) Sort the data alphabetically
(C) Create pointers where data is stored within a database
(D) Filter the data using logical functions
Explanation 30. The correct answer is: Create pointers where data is stored within a
database
Indexing makes columns faster to query by creating pointers to where data is stored within a
database.
Indexes allow us to create sorted lists without having to create all new sorted tables, which
would take up a lot of storage space.
An index is a structure that holds the field the index is sorting and a pointer from each record to
their corresponding record in the original table where the data is actually stored. Indexes are used
in things like a contact list where the data may be physically stored in the order you add people’s
contact information but it is easier to find people when listed out in alphabetical order.
CHAPTER 3
DATA ANALYSIS
Questions 31-50
Question 31. A junior web developer wants to develop a new eCommerce app using python for
the back end, MongoDB for the database, and Vue.js for the front end. He wants to create a
simple list called apparel with three lists items (“t-shirts, hoodies, jackets”).
Which of the following command does he need to type to create the list on Python?
(A) apparel = ["t-shirts", "hoodies", "jackets"]
(B) apparel = {"t-shirts", "hoodies", "jackets"}
(C) apparel = "t-shirts", "hoodies", "jackets"
(D) apparel = ("t-shirts", "hoodies", “jackets”)
Question 32. A data analyst wants to make a rough comparison of two graphs of variability,
considering only the most extreme cases.
Which of the following measures of dispersion would effectively make the comparison?
(A) Distribution
(B) Range
(C) Variance
(D) Standard deviation
Question 33. Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Which of the following value is the measure of the central tendency “median”?
(A) 54
(B) 55
(C) 56
(D) 57
Question 34. Which of the following measures of dispersion would help analysts to measure
market and security volatility — and predict performance trends?
(A) Range
(B) Distribution
(C) Variance
(D) Standard deviation
Question 35. Which of the following inferential statistical methods determine if there is a
significant difference between the means of two groups?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
Question 36. A forecasting analyst wants to predict sales for a company based on weather,
previous sales, and GDP growth.
Which of the following inferential statistical methods is MOST suitable for this case?
(A) Regression
(B) t-tests
(C) Correlation
(D) Chi-squared
Question 37. Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2
Which of the following value is the measure of the central tendency “mode”?
(A) 54
(B) 55
(C) 56
(D) 57
Question 38. Which of the following inferential statistical methods is a statistical test used to
compare observed results with expected results?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
Question 39. Suppose that we have observed the following n = 5 resting pulse rates: 64, 68, 74,
76, 78
Which of the following values is the variance of the resting pulse rates?
(A) 27.1
(B) 27.2
(C) 27.3
(D) 27.4
Question 40. An online payment company wants to develop a new fraud detection system to
protect customers against disputes and fraudulent payments.
Which of the following data analytics tools is MOST suitable for developing the fraud detection
system?
Question 41. Which of the following inferential statistical methods is a number describing how
likely it is that your data would have occurred by random chance?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
Question 42. A data analyst is looking for software that includes accounting and budgeting
templates for easy use and built-in calculating and formula features to organize and synthesize
results.
Which of the following data analytics tools is MOST suitable in this case?
(A) R
(B) Microsoft Excel
(C) IBM Cognos
(D) Minitab
Question 43. One of the benefits of using Amazon QuickSight is:
(A) QuickSight dashboards can be accessed only from mobile devices
(B) Scale from one to one of thousands of users
(C) Subscription-based charges
(D) Embed BI dashboards in your applications
Question 44. Τype I error is the mistaken rejection of the null hypothesis, also known as a “false
positive”, while a type II error is the mistaken acceptance of the null hypothesis, also known as a
“false negative”. (TRUE/FALSE)
(A) TRUE
(B) FALSE
Question 45. Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Which of the following value is the measure of the central tendency “mean”?
(A) 56.3
(B) 56.6
(C) 56
(D) 56.9
Question 46. Which of the following data analytics tools is a cloud-based platform designed to
provide direct, simplified, real-time access to business data for decision-makers across the
company with minimal IT involvement?
(A) Snowflake
(B) OLTP
(C) Domo
(D) Delta load
Question 47. Which of the following value is the measure of dispersion “range” between the
scores of ten students in a test.
The scores of ten students in a test are 17, 23, 30, 36, 45, 51, 58, 66, 72, 77.
(A) 60
(B) 70
(C) 80
(D) 90
Question 48. Which of the following inferential statistical methods is a numerical measurement
that describes a value’s relationship to the mean of a group of values?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
Question 49. Which of the following statistical methods refers to the probability that a
population parameter will fall between a set of values for a certain proportion of times?
(A) Confidence intervals
(B) Percent difference
(C) Percent change
(D) Frequencies/percentages
Question 50. Which of the following data analytics tools can execute the following command?
(A) Python
(B) Microsoft Excel
(C) Structured Query Language (SQL)
(D) R
Answers 31-50
Question 31. A junior web developer wants to develop a new eCommerce app using python for
the back end, MongoDB for the database, and Vue.js for the front end. He wants to create a
simple list called apparel with three lists items (“t-shirts, hoodies, jackets”).
Which of the following command does he need to type to create the list on Python?
(A) apparel = ["t-shirts", "hoodies", "jackets"]
(B) apparel = {"t-shirts", "hoodies", "jackets"}
(C) apparel = "t-shirts", "hoodies", "jackets"
(D) apparel = ("t-shirts", "hoodies", “jackets")
Explanation 31. The correct answer is: apparel = ["t-shirts", "hoodies", "jackets"]
Lists are one of 4 built-in data types in Python used to store collections of data, the other 3 are
Tuple, Set, and Dictionary, all with different qualities and usage.
List items are indexed, the first item has index [0], the second item has index [1] etc.
Question 32. A data analyst wants to make a rough comparison of two graphs of variability,
considering only the most extreme cases.
Which of the following measures of dispersion would effectively make the comparison?
(A) Distribution
(B) Range
(C) Variance
(D) Standard deviation
Range is the interval between the highest and the lowest score. Range is a measure of variability
or scatteredness of the varieties or observations among themselves and does not give an idea
about the spread of the observations around some central value.
Computation of Range:
Example 1:
The scores of ten students in a test are:
10, 20, 25, 50, 80, 85, 90
In the example, the highest score is 90 and the lowest score is 10.
So the range is the difference between these two scores:
Range = 90 – 10 = 80
Question 33. Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Which of the following value is the measure of the central tendency “median”?
(A) 54
(B) 55
(C) 56
(D) 57
There are three main measures of central tendency: the mode, the median and the mean. Each
of these measures describes a different indication of the typical or central value in the
distribution.
The median divides the distribution in half (there are 50% of observations on either side of the
median value). In a distribution with an odd number of observations, the median value is the
middle value.
Looking at the retirement age distribution (which has 11 observations), the median is the middle
value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the
two middle values. In the following distribution, the two middle values are 56 and 57, therefore
the median equals 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Question 34. Which of the following measures of dispersion would help analysts to measure
market and security volatility — and predict performance trends?
(A) Range
(B) Distribution
(C) Variance
(D) Standard deviation
Standard deviation is a statistical measurement in finance that, when applied to the annual rate of
return of an investment, sheds light on that investment’s historical volatility. The greater the
standard deviation of securities, the greater the variance between each price and the mean, which
shows a larger price range. For example, a volatile stock has a high standard deviation, while the
deviation of a stable blue-chip stock is usually rather low.
Standard deviation is an especially useful tool in investing and trading strategies as it helps
measure market and security volatility—and predict performance trends.
Question 35. Which of the following inferential statistical methods determine if there is a
significant difference between the means of two groups?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
p-value is incorrect. A p-value, or probability value, is a number describing how likely it is that
your data would have occurred by random chance (i.e. that the null hypothesis is true). The level
of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-
value, the stronger the evidence that you should reject the null hypothesis.
Chi-squared is incorrect. A chi-square test is a statistical test used to compare observed results
with expected results. The purpose of this test is to determine if a difference between observed
data and expected data is due to chance, or if it is due to a relationship between the variables you
are studying. Therefore, a chi-square test is an excellent choice to help us better understand and
interpret the relationship between our two categorical variables.
Question 36. A forecasting analyst wants to predict sales for a company based on weather,
previous sales, and GDP growth.
Which of the following inferential statistical methods is MOST suitable for this case?
(A) Regression
(B) t-tests
(C) Correlation
(D) Chi-squared
Regression is a statistical method used in finance, investing, and other disciplines that attempt to
determine the strength and character of the relationship between one dependent variable (usually
denoted by Y) and a series of other variables (known as independent variables).
Regression helps investment and financial managers to value assets and understand the
relationships between variables, such as commodity prices and the stocks of businesses dealing
in those commodities.
Regression can help finance and investment professionals as well as professionals in other
businesses. Regression can also help predict sales for a company based on weather, previous
sales, GDP growth, or other types of conditions. The capital asset pricing model (CAPM) is an
often-used regression model in finance for pricing assets and discovering costs of capital.
Correlation is incorrect. Correlation, in the finance and investment industries, is a statistic that
measures the degree to which two securities move in relation to each other. Correlations are used
in advanced portfolio management, computed as the correlation coefficient, which has a value
that must fall between -1.0 and +1.0.
Chi-squared is incorrect. A chi-square test is a statistical test used to compare observed results
with expected results. The purpose of this test is to determine if a difference between observed
data and expected data is due to chance, or if it is due to a relationship between the variables you
are studying. Therefore, a chi-square test is an excellent choice to help us better understand and
interpret the relationship between our two categorical variables.
Question 37. Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2
Which of the following value is the measure of the central tendency “mode”?
(A) 54
(B) 55
(C) 56
(D) 57
There are three main measures of central tendency: the mode, the median and the mean. Each
of these measures describes a different indication of the typical or central value in the
distribution.
The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.
Question 38. Which of the following inferential statistical methods is a statistical test used to
compare observed results with expected results?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
A chi-square test is a statistical test used to compare observed results with expected results. The
purpose of this test is to determine if a difference between observed data and expected data is due
to chance, or if it is due to a relationship between the variables you are studying. Therefore, a
chi-square test is an excellent choice to help us better understand and interpret the relationship
between our two categorical variables.
p-value is incorrect. A p-value, or probability value, is a number describing how likely it is that
your data would have occurred by random chance (i.e. that the null hypothesis is true). The level
of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-
value, the stronger the evidence that you should reject the null hypothesis.
Question 39. Suppose that we have observed the following n = 5 resting pulse rates: 64, 68, 74,
76, 78
Which of the following values is the variance of the resting pulse rates?
(A) 27.1
(B) 27.2
(C) 27.3
(D) 27.4
Variance tells you the degree of spread in your data set. The more spread the data, the larger the
variance is in relation to the mean.
Which of the following data analytics tools is MOST suitable for developing the fraud detection
system?
IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used
to build predictive models and conduct other analytic tasks.
Question 41. Which of the following inferential statistical methods is a number describing how
likely it is that your data would have occurred by random chance?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
A p-value, or probability value, is a number describing how likely it is that your data would have
occurred by random chance (i.e. that the null hypothesis is true). The level of statistical
significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the
stronger the evidence that you should reject the null hypothesis.
Question 42. A data analyst is looking for software that includes accounting and budgeting
templates for easy use and built-in calculating and formula features to organize and synthesize
results.
Which of the following data analytics tools is MOST suitable in this case?
(A) R
(B) Microsoft Excel
(C) IBM Cognos
(D) Minitab
Microsoft Excel gives businesses the tools they need to make the most of their data. At its most
basic level, Excel is an excellent tool for both data entry and storage. Excel even includes
accounting and budgeting templates for easy use. From there the software’s built-in calculating
and formula features are available to help you organize and synthesize results. Businesses often
employ multiple systems (i.e CRM, inventory) each with its own database and logs.
Minitab is incorrect. Minitab empowers all parts of an organization to predict better outcomes,
design better products, and improve processes to generate higher revenues and reduce costs.
Explanation 43. The correct answer is: Embed BI dashboards in your applications
QuickSight is serverless and can automatically scale to tens of thousands of users without any
infrastructure to manage or capacity to plan for. It is also the first BI service to offer pay-per-
session pricing, where you only pay when your users access their dashboards or reports, making
it cost-effective for large-scale deployments.
Benefits
Scale from tens to tens of thousands of users
Embed BI dashboards in your applications
Access deeper insights with Machine Learning
Ask questions of your data, receive answers
Question 44. Τype I error is the mistaken rejection of the null hypothesis, also known as a “false
positive”, while a type II error is the mistaken acceptance of the null hypothesis, also known as a
“false negative”. (TRUE/FALSE)
(A) TRUE
(B) FALSE
Just like a judge’s conclusion, an investigator’s conclusion may be wrong. Sometimes, by chance
alone, a sample is not representative of the population. Thus the results in the sample do not
reflect reality in the population, and the random error leads to an erroneous inference.
A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually
true in the population; a type II error (false-negative) occurs if the investigator fails to reject a
null hypothesis that is actually false in the population. Although type I and type II errors can
never be avoided entirely, the investigator can reduce their likelihood by increasing the sample
size (the larger the sample, the lesser is the likelihood that it will differ substantially from the
population).
False-positive and false-negative results can also occur because of bias (observer, instrument,
recall, etc.). (Errors due to bias, however, are not referred to as type I and type II errors.) Such
errors are troublesome, since they may be difficult to detect and cannot usually be quantified.
Question 45. Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Which of the following value is the measure of the central tendency “mean”?
(A) 56.3
(B) 56.6
(C) 56
(D) 56.9
There are three main measures of central tendency: the mode, the median and the mean. Each
of these measures describes a different indication of the typical or central value in the
distribution.
The mean is the sum of the value of each observation in a dataset divided by the number of
observations. This is also known as the arithmetic average.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Question 46. Which of the following data analytics tools is a cloud-based platform designed to
provide direct, simplified, real-time access to business data for decision-makers across the
company with minimal IT involvement?
(A) Snowflake
(B) OLTP
(C) Domo
(D) Delta load
2. Enhance your existing data warehouse and BI tools or build custom apps, automate data
pipelines, and make data science accessible with automated insights and augmented analytics.
3. Publish your data and analytics content dynamically to customers and partners, enable them to
integrate their data with yours, and build customized data experiences to commercialize data.
Question 47. Which of the following value is the measure of dispersion “range” between the
scores of ten students in a test.
The scores of ten students in a test are 17, 23, 30, 36, 45, 51, 58, 66, 72, 77.
(A) 60
(B) 70
(C) 80
(D) 90
Range is the interval between the highest and the lowest score. Range is a measure of variability
or scatteredness of the varieties or observations among themselves and does not give an idea
about the spread of the observations around some central value.
Range = 77 – 17 = 60
Question 48. Which of the following inferential statistical methods is a numerical measurement
that describes a value’s relationship to the mean of a group of values?
(A) t-tests
(B) Z-score
(C) p-values
(D) Chi-squared
Explanation 48. The correct answer is: Z-score
p-value is incorrect. A p-value, or probability value, is a number describing how likely it is that
your data would have occurred by random chance (i.e. that the null hypothesis is true). The level
of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-
value, the stronger the evidence that you should reject the null hypothesis.
Chi-squared is incorrect. A chi-square test is a statistical test used to compare observed results
with expected results. The purpose of this test is to determine if a difference between observed
data and expected data is due to chance, or if it is due to a relationship between the variables you
are studying. Therefore, a chi-square test is an excellent choice to help us better understand and
interpret the relationship between our two categorical variables.
Question 49. Which of the following statistical methods refers to the probability that a
population parameter will fall between a set of values for a certain proportion of times?
(A) Confidence intervals
(B) Percent difference
(C) Percent change
(D) Frequencies/percentages
The confidence interval (CI) is a range of values that’s likely to include a population value with a
certain degree of confidence. It is often expressed as a % whereby a population mean lies
between an upper and lower interval.
Question 50. Which of the following data analytics tools can execute the following command?
(A) Python
(B) Microsoft Excel
(C) Structured Query Language (SQL)
(D) R
Explanation 50. The correct answer is: Structured Query Language (SQL)
SQL is a standard language for storing, manipulating and retrieving data in databases.
What is SQL?
1. SQL stands for Structured Query Language
2. SQL lets you access and manipulate databases
3. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of
the International Organization for Standardization (ISO) in 1987
Although SQL is an ANSI/ISO standard, there are different versions of the SQL language.
However, to be compliant with the ANSI standard, they all support at least the major commands
(such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner.
CHAPTER 4
VISUALIZATIONS
Questions 51-65
Question 51. An Amazon seller wants to generate a revenue report that includes the number of
sales and the number of refunds between June and August.
Which of the following should the seller use to generate the report?
Question 52. A web analyst has a Microsoft Access report, listing all employees. He wants to
limit the report to employees whose first names start with “N” and print the report with just that
data.
Question 53. Which of the following recurring reports help General Data Protection Regulation
(GDPR) to maintain and prove compliance?
Question 54. The ACME Corporation is using 2 different font families and 2 different color
codes for its logo, as shown below.
Font families:
1. Roboto Regular
2. Lato
Which of the following best practices they SHOULD follow for clean dashboard development?
(A) The dashboard should be designed with different font families and colors
(B) The dashboard should be designed with different font families but with the same colors
(C) The dashboard should be designed with the same font families and the same colors
(D) The dashboard should be designed with the same font families but with different colors
Question 55. A company wants to create a chart that presents the growth in online sales broken
down by customer type, based on the mix of channels they used similar to the one you see below.
Question 57. Static reporting requires pulling up various reports from different sources and
analyzing insights from a longer time period in order to provide a snapshot of data while
Dynamic reports provide deep insights, and allow users to interact with the data rather than just
view it. (TRUE/FALSE)
(A) TRUE
(B) FALSE
Question 58. What’s the difference between ad-hoc and structured reports? (Select TWO)
(A) Structured reports use a large volume of data and are produced using a formalized
reporting template
(B) Ad hoc reports are generated as needed for a one-time-use, in a visual format relevant to
the audience
(C) Ad hoc reports use a large volume of data and are produced using a formalized reporting
template
(D) Structured reports are generated as needed for a one-time-use, in a visual format relevant
to the audience
Question 59. Sort the steps to develop a clean and well-structured dashboard according to
CompTIA dashboard development process. Starting from the top (first step) to the bottom (last
step).
1. Mockup/wireframe design
2. Develop dashboard
3. Deploy to production
4. Approval granted
Question 60. A developer wants to develop a new dashboard. On the main page, he wants to
show data variables and trends to help companies to make predictions about the results of data
not yet recorded.
Question 61. Which of the following type of visualizations represents the chart below?
(A) Infographic
(B) Heat map
(C) Histogram
(D) Waterfall
Question 62. Which of the following recurring reports help companies to reach business goals,
identify strengths, weaknesses, and trends?
(A) Compliance reports
(B) Risk and regulatory reports
(C) Operational reports
(D) Business goal reports
Question 63. Which of the following type of visualization is a graphical representation of word
frequency that gives greater prominence to words that appear more frequently in a source text?
(A) Word counter
(B) Word frequency
(C) Word rate
(D) Word cloud
Question 65. Which of the following is the MOST appropriate Report Cover Page?
(A)
(B)
(C)
(D)
Answers 51-65
Question 51. An Amazon seller wants to generate a revenue report that includes the number of
sales and the number of refunds between June and August.
Which of the following should the seller use to generate the report?
A date range report is a custom report that allows you either to select a month to include in the
report or to choose specific start and end dates for the data included in the report.
Question 52. A web analyst has a Microsoft Access report, listing all employees. He wants to
limit the report to employees whose first names start with “N” and print the report with just that
data.
When you view an Access report on the screen, you can apply filters to zero in on the data you
want to see. And then you can print the report with just that data.
To filter data in a report, open it in Report view (right-click it in the Navigation pane and
click Report View). Then, right-click the data you want to filter.
For example, in a report listing all employees, you might want to limit the report to employees
whose last names start with “L”:
1. Right-click any last name, and click Text Filters > Begins With.
2. Enter “N” in the box that appears, and click OK.
Question 53. Which of the following recurring reports help General Data Protection Regulation
(GDPR) to maintain and prove compliance?
Compliance reporting is the process of presenting information to auditors that show that your
company is adhering to all the requirements set by the government and regulatory agency under
a particular standard. It is often the IT department’s responsibility to generate these reports.
Compliance reports typically include information on how customer/company data is dealt with –
how it is controlled or protected, obtained and stored, and how it is secured and distributed
internally and externally.
Some regulations and the industries to which they apply are as follows:
Question 54. The ACME Corporation is using 2 different font families and 2 different color
codes for its logo, as shown below.
Font families:
1. Roboto Regular
2. Lato
Colors codes in hex:
1. #fa1d53
2. #1dfac4
The Corporation wants to develop a new dashboard for visualizing information and KPIs.
Which of the following best practices they SHOULD follow for clean dashboard development?
(A) The dashboard should be designed with different font families and colors
(B) The dashboard should be designed with different font families but with the same colors
(C) The dashboard should be designed with the same font families and the same colors
(D) The dashboard should be designed with the same font families but with different colors
Explanation 54. The correct answer is: The dashboard should be designed with the same
font families and the same colors
When it comes to color, you can choose to stay true to your company identity (same colors,
fonts). The important thing here is to stay consistent and not use too many different colors – an
essential consideration when learning how to design a dashboard. This is one of the most
important of all dashboard design best practices.
Question 55. A company wants to create a chart that presents the growth in online sales broken
down by customer type, based on the mix of channels they used similar to the one you see below.
Which of the following type of visualization is MOST suitable?
A waterfall visualization shows how an initial value is increased and decreased by a series of
intermediate values, leading to a final cumulative value shown in the far right column. The
intermediate values can either be time-based or category-based.
A waterfall chart is a specific type of bar chart that reveals the story behind the net change in
something’s value between two points. Instead of just showing a beginning value in one bar and
an ending value in a second bar, a waterfall chart dis-aggregates all the unique components that
contributed to that net change and visualizes them individually.
Concerning dashboard best practices in design, your audience is one of the most important
principles you have to take into account. You need to know who’s going to use the dashboard.
To do so successfully, you need to put yourself in your audience’s shoes. The context and device
on which users will regularly access their dashboards will have direct consequences on the style
in which the information is displayed.
Question 57. Static reporting requires pulling up various reports from different sources and
analyzing insights from a longer time period in order to provide a snapshot of data while
Dynamic reports provide deep insights, and allow users to interact with the data rather than just
view it. (TRUE/FALSE)
(A) TRUE
(B) FALSE
Static reporting works on data that only has significance for a specific period of time. Static
reports include data about inventories such as resources and data that is generated periodically.
Static data are generated in Excel, Word or PowerPoint and exported in HTML or PDF format.
Static reporting works for data that have a very short life span. This means that this source of
information cannot drill down to future insights. Static reports are easy to use as tools for
reviewing behavior, patterns, and outcomes.
Dynamic reporting is also called live or real-time reporting. Dynamic Reporting is a real-time
reporting web base application that can be accessed from anywhere and from any device with
internet connectivity. These dynamic reporting provides dynamic information at all times and
provides users with real-time interactions with the dashboard according to their needs.
Dynamic reporting approach can enable business meeting where various executive level
members gather for business goals. The dynamic dashboard here provides a clear picture of a
business and allows the executive to think better and make decisions quickly.
Question 58. What’s the difference between ad-hoc and structured reports? (Select TWO)
(A) Structured reports use a large volume of data and are produced using a formalized
reporting template
(B) Ad hoc reports are generated as needed for a one-time-use, in a visual format relevant to
the audience
(C) Ad hoc reports use a large volume of data and are produced using a formalized reporting
template
(D) Structured reports are generated as needed for a one-time-use, in a visual format relevant
to the audience
Ad hoc reporting is a report created for one-time use. A BI tool can make it possible for anyone
in an organization to answer a specific business question and present that data in a visual format,
without burdening IT staff.
Ad hoc reporting differs from structured reporting in many ways. Structured reports use a large
volume of data and are produced using a formalized reporting template. Ad hoc reports are
generated as needed, in a visual format relevant to the audience.
Structured reports are produced by people who have a high degree of technical experience
working with business intelligence tools to mine and aggregate large amounts of data. Ad hoc
reporting relies on much smaller amounts of data. This makes it easier for people in an enterprise
to report on a specific data point that answers a specific business question.
Question 59. Sort the steps to develop a clean and well-structured dashboard according to
CompTIA dashboard development process. Starting from the top (first step) to the bottom (last
step).
1. Mockup/wireframe design
2. Develop dashboard
3. Deploy to production
4. Approval granted
The steps to develop a clean and well-structured dashboard according to CompTIA dashboard
development process are:
1. Mockup/wireframe design
2. Approval granted
3. Develop dashboard
4. Deploy to production
Question 60. A developer wants to develop a new dashboard. On the main page, he wants to
show data variables and trends to help companies to make predictions about the results of data
not yet recorded.
Line graphs are useful in that they show data variables and trends very clearly and can help to
make predictions about the results of data not yet recorded. If seeing the trend of your data is the
goal, then this is the chart to use.
Line charts show time-series relationships using continuous data. They allow a quick assessment
of acceleration (lines curving upward), deceleration (lines curving downward), and volatility
(up/down frequency). They are excellent for tracking multiple data sets on the same chart to see
any correlation in trends.
They can also be used to display several dependent variables against one independent variable.
Pie chart is incorrect. Pie charts can be used to show percentages of a whole, and represents
percentages at a set point in time. Unlike bar graphs and line graphs, pie charts do not show
changes over time.
Bubble chart is incorrect. A bubble chart is a variation of a scatter chart in which the data
points are replaced with bubbles, and an additional dimension of the data is represented in the
size of the bubbles. Just like a scatter chart, a bubble chart does not use a category axis — both
horizontal and vertical axes are value axes. In addition to the x values and y values that are
plotted in a scatter chart, a bubble chart plots x values, y values, and z (size) values.
Question 61. Which of the following type of visualizations represents the chart below?
(A) Infographic
(B) Heat map
(C) Histogram
(D) Waterfall
A histogram is a chart that shows frequencies for intervals of values of a metric variable.
Why are histograms so useful? Well, first of all, charts are much more visual than tables; after
looking at a chart for 10 seconds, you can tell much more about your data than after inspecting
the corresponding table for 10 seconds. Generally, charts convey information about our data
faster than tables -albeit less accurately.
On top of that, histograms also give us much more complete information about our data. Keep in
mind that you can reasonably estimate a variable’s mean, standard deviation, skewness, and
kurtosis from a histogram. However, you can’t estimate a variable’s histogram from the
aforementioned statistics.
Heat map is incorrect. Heat Maps are graphical representations of data that utilize color-coded
systems. The primary purpose of Heat Maps is to better visualize the volume of locations/events
within a dataset and assist in directing viewers towards areas on data visualizations that matter
most.
Waterfall is incorrect. A waterfall visualization shows how an initial value is increased and
decreased by a series of intermediate values, leading to a final cumulative value shown in the far
right column. The intermediate values can either be time-based or category-based.
Question 62. Which of the following recurring reports help companies to reach business goals,
identify strengths, weaknesses, and trends?
(A) Compliance reports
(B) Risk and regulatory reports
(C) Operational reports
(D) Business goal reports
A KPI report which is an operational report is a management tool that facilitates the
measurement, organization, and analysis of the most important business key performance
indicators. These reports help companies to reach business goals, identify strengths, weaknesses,
and trends.
Typically presented in the form of an interactive dashboard, this kind of report provides a visual
representation of the data associated with your predetermined set of key performance indicators.
Question 63. Which of the following type of visualization is a graphical representation of word
frequency that gives greater prominence to words that appear more frequently in a source text?
(A) Word counter
(B) Word frequency
(C) Word rate
(D) Word cloud
Word clouds or tag clouds are graphical representations of word frequency that give greater
prominence to words that appear more frequently in a source text. The larger the word in the
visual the more common the word was in the document(s).
This type of visualization can assist evaluators with exploratory textual analysis by identifying
words that frequently appear in a set of interviews, documents, or other text. It can also be used
for communicating the most salient points or themes in the reporting stage.
The rest options are incorrect because they are fictitious terms.
Explanation 64. The correct answer is: Support the analysis of geospatial data through the
use of interactive visualization
The purpose of the Geographic map is to support the analysis of geospatial data through the use
of interactive visualization. Map visualization is used to analyze and display geographically
related data and present it in the form of maps. This kind of data expression is clearer and more
intuitive. We can visually see the distribution or proportion of data in each region. It is
convenient for everyone to mine deeper information and make better decisions.
There are many types in map visualization, such as administrative maps, heatmaps, statistical
maps, trajectory maps, bubble maps, etc. And maps can be divided into 2D maps, 3D maps or
static maps, dynamic maps, interactive maps… They are often used in combination with points,
lines, bubbles, and more.
Question 65. Which of the following is the MOST appropriate Report Cover Page?
(A)
(B)
(C)
(D)
It basically gives a reflection of the whole document and what is contained in it. The cover page
helps the reader in deciding whether the document is of interest to him or not. In addition, the
cover page is also important because it sets the first impression on whoever glances at the
document.
The cover page of the report gives the ‘Big Idea’ of what the report is about as it states the
report’s title. It should be clear, professional, formal and appropriate for the topic or area covered
in the report. The cover page of the report varies slightly based on the formatting style (such as
APA, MLA, Harvard, etc.) that is being used by the report. However, the main information
included is:
Question 66. A company uses AWS cloud technologies for its scalability and high-performance
features. The company wants to give access to software engineers to control the AWS
infrastructure and deny access to the rest employees.
Which of the following access control method the company SHOULD use to fulfill the
requirement?
(A) Developer-based
(B) AWS-based
(C) Role-based
(D) Software-based
Question 67. Which of the following policies is shown in the given image?
(A) Privacy Policy
(B) Acceptable use policy
(C) Terms of Services
(D) Cookies Policy
Question 68. In which of the following data quality dimensions the data is following a set of
standard data definitions like data type, size, and format (e.g. date of birth of customer is in the
format “mm/dd/yyyy”)?
(A) Completeness
(B) Consistency
(C) Integrity
(D) Conformity
Question 69. Cardinality refers to the maximum number of times an instance in one entity can
relate to instances of another entity while ordinality is the minimum number of times an instance
in one entity can be associated with an instance in the related entity. (TRUE/FALSE)
(A) TRUE
(B) FALSE
Question 70. Which of the following categories would contain information about an individual’s
biometric data, genetic information, and sexual orientation?
Question 71. Which of the following types of vulnerability scans an organization should perform
as they store credit card information that follows the Payment Card Industry Data Security
Standard (PCI DSS)?
Question 72. A financial analyst gathers information, assembles spreadsheets, and writes
reports. He wants every time he makes changes to the files, these will be synced and updated
across all of his devices.
Which of the following storage environments does the analyst need to save his file?
(A) Share drive
(B) Local storage
(C) Cloud storage
(D) Sync storage
Question 73. The Acme Corporation is working on a new data warehouse and business
intelligence (DW/BI) project. They need to uncover data quality issues in data sources, and what
needs to be corrected in Extract, transform, load (ETL).
Which of the following methods should they use to validate the data?
(A) Cross-validation
(B) Data auditing
(C) Data profiling
(D) Data correction
Question 74. A data analyst wants to measure how well a piece of information reflects reality.
Which of the following data quality dimensions does the data analyst need to assess?
(A) Data consistency
(B) Data accuracy
(C) Data completeness
(D) Data integrity
Question 75. Records from governmental agencies, student records information, and existing
human research subjects’ data are examples of:
(A) Release approvals
(B) Data transmission
(C) Data use agreements
(D) Data constraints
Question 76. Acme Corporation wants to reduce the probability of a data breach in order to
reduce the risk of fines in the future.
Question 77. Which of the following ways can be used to achieve the desired quality output for
standardized names in a master data management (MDM) architecture? (Select TWO)
(A) Use different locales to standardize names properly
(B) Define and apply customized schemes to standardize differently spelled words to
common words (e.g. Assoc, Assocn. and Assn. to Association)
(C) Don't define and apply customized schemes to standardize differently spelled words to
common words (e.g. Assoc, Assocn. and Assn. to Association)
(D) Use the same locales to standardize names properly
Question 78. The ACME Corporation hired an analyst to detect data quality issues in their excel
documents. Which of the following are the most common issues? (Select TWO)
(A) Duplicates
(B) Commas
(C) Symbols
(D) Misspellings
(E) Apostrophe
Question 79. Which of the following policies is a set of guidelines that helps organizations keep
track of how long information must be kept and how to dispose of the information when it’s no
longer needed?
(A) Data processing policy
(B) Data deletion policy
(C) Data retention policy
(D) Acceptable use policy
Question 80. Which of the following categories would contain information about an individual’s
demographic information, medical histories, laboratory results, and mental health conditions?
(A) Personally identifiable information
(B) Personal health information
(C) Sensitive personal information
(D) Intellectual property
Answers 66-80
Question 66. A company uses AWS cloud technologies for its scalability and high-performance
features. The company wants to give access to software engineers to control the AWS
infrastructure and deny access to the rest employees.
Which of the following access control method the company SHOULD use to fulfill the
requirement?
(A) Developer-based
(B) AWS-based
(C) Role-based
(D) Software-based
Role-based access control (RBAC), also known as role-based security, is an access control
method that assigns permissions to end-users based on their role within your organization.
RBAC provides fine-grained control, offering a simple, manageable approach to access
management that is less error-prone than individually assigning permissions.
By adding a user to a role group, the user has access to all the permissions of that group. If they
are removed, access becomes restricted. Users can also be assigned temporary access to certain
data or programs to complete a task and be removed after.
In each of these roles, there may be a management tier and an individual contributor tier that has
different levels of permission inside the individual applications granted to each role.
Question 67. Which of the following policies is shown in the given image?
An acceptable use policy (AUP) is a document stipulating constraints and practices that a user
must agree to for access to a corporate network or the Internet.
Many businesses and educational facilities require that employees or students sign an acceptable
use policy before being granted a network ID.
When you sign up with an Internet service provider (ISP), you will usually be presented with an
AUP, which states that you agree to adhere to stipulations such as:
1. Not using the service as part of violating any law
2. Not attempting to break the security of any computer network or user
3. Not posting commercial messages to Usenet groups without prior permission
4. Not attempting to send junk e-mail or spam to anyone who doesn’t want to receive it
5. Not attempting to mail bomb a site with mass amounts of e-mail in order to flood their server
Users also typically agree to report any attempt to break into their accounts.
Question 68. In which of the following data quality dimensions the data is following a set of
standard data definitions like data type, size, and format (e.g. date of birth of customer is in the
format “mm/dd/yyyy”)?
(A) Completeness
(B) Consistency
(C) Integrity
(D) Conformity
For example, a customer’s first name and last name are mandatory but middle name is optional;
so a record can be considered complete even if a middle name is not available.
Consistency means data across all systems reflects the same information and are in synch with
each other across the enterprise. Examples:
1. A business unit status is closed but there are sales for that business unit.
2. Employee status is terminated but pay status is active.
For example, in a customer database, there should be a valid customer, addresses and
relationship between them. If there is an address relationship data without a customer then that
data is not valid and is considered an orphaned record.
Question 69. Cardinality refers to the maximum number of times an instance in one entity can
relate to instances of another entity while ordinality is the minimum number of times an instance
in one entity can be associated with an instance in the related entity. (TRUE/FALSE)
(A) TRUE
(B) FALSE
Explanation 69. The correct answer is: TRUE
Cardinality refers to the maximum number of times an instance in one entity can relate to
instances of another entity while ordinality is the minimum number of times an instance in one
entity can be associated with an instance in the related entity.
Cardinality specifies how many instances of an entity relate to one instance of another entity.
Ordinality is also closely linked to cardinality. While cardinality specifies the occurrences of a
relationship, ordinality describes the relationship as either mandatory or optional. In other words,
cardinality specifies the maximum number of relationships and ordinality specifies the absolute
minimum number of relationships.
Question 70. Which of the following categories would contain information about an individual’s
biometric data, genetic information, and sexual orientation?
Sensitive Personal Information (SPI) refers to information that does not identify an individual,
but is related to an individual, and communicates information that is private or could potentially
harm an individual should it be made public. This includes things like biometric data, genetic
information, sex, trade union membership, sexual orientation, etc.
Protected health information is incorrect. Protected health information (PHI), also referred to
as personal health information, generally refers to demographic information, medical histories,
test and laboratory results, mental health conditions, insurance information, and other data that a
healthcare professional collects to identify an individual and determine appropriate care.
Intellectual property is incorrect. Intellectual property (IP) is a term for any intangible asset —
something proprietary that doesn’t exist as a physical object but has value. Examples of
intellectual property include designs, concepts, software, inventions, trade secrets, formulas, and
brand names, as well as works of art. Intellectual property can be protected by copyright,
trademark, patent, or other legal measures.
Question 71. Which of the following types of vulnerability scans an organization should perform
as they store credit card information that follows the Payment Card Industry Data Security
Standard (PCI DSS)?
Discovery scan is incorrect. A discovery scan as its name implies is a type of vulnerability scan
that is used to discover systems on the network by performing a ping scan and then a port scan
on those targets to discover ports that are open. A discovery scan is not a full vulnerability scan
that looks for vulnerabilities; it is used to find systems on the network.
Full scan is incorrect. A full scan performs many different tests to identify vulnerabilities in the
system.
Stealth scan is incorrect. A stealth scan (sometimes known as a half-open scan) is much like a
full open scan with a minor difference that makes it less suspicious on the victim’s device. The
primary difference is that a full TCP three-way handshake does not occur. Looking at the
following diagram, the initiator (device A) would send a TCP SYN packet to device B for the
purpose of determining whether a port is open. Device B will respond with a SYN/ACK packet
to the initiator (device A) if the port is open. Next, device A will send an RST to terminate the
connection. If the port is closed, device B will send an RST packet:
Stealth scan showing open and closed service ports. The benefit of using this type of scan is that
it reduces the chances of being detected.
Question 72. A financial analyst gathers information, assembles spreadsheets, and writes
reports. He wants every time he makes changes to the files, these will be synced and updated
across all of his devices.
Which of the following storage environments does the analyst need to save his file?
(A) Share drive
(B) Local storage
(C) Cloud storage
(D) Sync storage
Cloud services are popular because enable many businesses to access application software
without the need for investing in computer software and hardware. Other benefits include
scalability, reliability, and efficiency. All these advantages allow organizations to focus on other
relevant aspects such as product development and innovation.
Pros of cloud storage:
Cost
Buying physical storage or hardware can be expensive. Cloud storage is cheaper per GB than
using external drives.
Accessibility
Cloud storage gives you access to your files from anywhere. All you need is an internet
connection.
Security
Cloud storage is safer than local storage because providers have added additional layers of
security to their services. Thanks to the use of encryption algorithm, only authorized personnel
such as you and your employees have access to the documents and files stored in the cloud.
Recovery
In case of a hard drive failure or other hardware malfunction, you can access your files on the
cloud, which acts as a backup solution for your local storage on physical drives.
Question 73. The Acme Corporation is working on a new data warehouse and business
intelligence (DW/BI) project. They need to uncover data quality issues in data sources, and what
needs to be corrected in Extract, transform, load (ETL).
Which of the following methods should they use to validate the data?
(A) Cross-validation
(B) Data auditing
(C) Data profiling
(D) Data correction
Data profiling is the process of reviewing source data, understanding structure, content and
interrelationships, and identifying potential for data projects.
Question 74. A data analyst wants to measure how well a piece of information reflects reality.
Which of the following data quality dimensions does the data analyst need to assess?
(A) Data consistency
(B) Data accuracy
(C) Data completeness
(D) Data integrity
How can you assess your data quality? Data quality meets six dimensions: accuracy,
completeness, consistency, timeliness, validity, and uniqueness. Read on to learn the definitions
of these data quality dimensions.
Accuracy
Completeness
Consistency
Timeliness
Validity
Uniqueness
Six data quality dimensions to assess
Question 75. Records from governmental agencies, student records information, and existing
human research subjects’ data are examples of:
(A) Release approvals
(B) Data transmission
(C) Data use agreements
(D) Data constraints
A Data Use Agreement (DUA) is a contractual document used for transferring non-
public or restricted-use data. Examples include records from governmental agencies,
institutions or corporations, student records information, and existing human research subjects’
data.
A data use agreement (DUA) is an agreement that is required under the Privacy Rule and must
be entered into before there is any use or disclosure of a limited data set (defined below) to an
outside institution or party. A limited data set is still protected health information (PHI), and for
that reason, covered entities like Stanford must enter into a data use agreement with any recipient
of a limited data set from Stanford.
At a minimum, any DUA must contain provisions that address the following:
1.Establish the permitted uses and disclosures of the limited data set;
2. Identify who may use or receive the information;
3. Prohibit the recipient from using or further disclosing the information, except as permitted by
the agreement or as otherwise permitted by law;
4. Require the recipient to use appropriate safeguards to prevent an unauthorized use or
disclosure not contemplated by the agreement;
5. Require the recipient to report to the covered entity any use or disclosure to which it becomes
aware;
6. Require the recipients to ensure that any agents (including any subcontractors) to whom it
discloses the information will agree to the same restrictions as provided in the agreement; and
7. Prohibit the recipient from identifying the information or contacting the individuals.
Question 76. Acme Corporation wants to reduce the probability of a data breach in order to
reduce the risk of fines in the future.
Companies can reduce the probability of a data breach and thus reduce the risk of fines in the
future if they chose to use encryption of personal data. The processing of personal data is
naturally associated with a certain degree of risk. Especially nowadays, where cyber-attacks are
nearly unavoidable for companies above a given size. Therefore, risk management plays an ever-
larger role in IT security and data encryption is suited, among other means, for these companies.
In general, encryption refers to the procedure that converts clear text into a hashed code using a
key, where the outgoing information only becomes readable again by using the correct key. This
minimizes the risk of an incident during data processing, as encrypted contents are basically
unreadable for third parties who do not have the correct key. Encryption is the best way to
protect data during transfer and one way to secure stored personal data. It also reduces the risk of
abuse within a company, as access is limited only to authorized people with the right key.
Question 77. Which of the following ways can be used to achieve the desired quality output for
standardized names in a master data management (MDM) architecture? (Select TWO)
(A) Use different locales to standardize names properly
(B) Define and apply customized schemes to standardize differently spelled words to
common words (e.g. Assoc, Assocn. and Assn. to Association)
(C) Don't define and apply customized schemes to standardize differently spelled words to
common words (e.g. Assoc, Assocn. and Assn. to Association)
(D) Use the same locales to standardize names properly
Master data management (MDM) arose out of the necessity for businesses to improve the
consistency and quality of their key data assets, such as product data, asset data, customer data,
location data, etc.
There can be many problems while standardizing name and address information. Some of the
most common problems and their respective approaches are described below.
Question 78. The ACME Corporation hired an analyst to detect data quality issues in their excel
documents. Which of the following are the most common issues? (Select TWO)
(A) Duplicates
(B) Commas
(C) Symbols
(D) Misspellings
(E) Apostrophe
The most common data quality issues are difficult to resolve in Excel because of their rigidity. It
forces analysts to do a ton of manual work, which results in a high probability of an error being
introduced to the data set.
Those common issues include:
1. Blanks
2. Nulls
3. Outliers
4. Duplicates
5. Extra spaces
6. Misspellings
7. Abbreviations and domain-specific variations
8. Formula error codes
When introduced, these errors can skew or even invalidate the resulting analysis. A smart tool
would minimize the possibility of error by automating the manual work.
In Excel, you might look for data quality issues in one of two ways. First, you might use auto
filters on specific columns to scan for anomalies and blanks or you might use a pivot table to find
gaps and discrepancies.
In either case, you’re scanning for the anomalies yourself. Suffice it to say that’s not a very
efficient process. It also means accuracy is only as good as the analyst’s eye, so the probability
of error varies throughout the day.
Question 79. Which of the following policies is a set of guidelines that helps organizations keep
track of how long information must be kept and how to dispose of the information when it’s no
longer needed?
(A) Data processing policy
(B) Data deletion policy
(C) Data retention policy
(D) Acceptable use policy
Explanation 79. The correct answer is: Data retention policy
A data retention policy is a set of guidelines that helps organizations keep track of how long
information must be kept and how to dispose of the information when it’s no longer needed.
The policy should also outline the purpose of processing personal data. This ensures that you
have documented proof that justifies your data retention and disposal periods.
Question 80. Which of the following categories would contain information about an individual’s
demographic information, medical histories, laboratory results, and mental health conditions?
(A) Personally identifiable information
(B) Personal health information
(C) Sensitive personal information
(D) Intellectual property
Protected health information (PHI), also referred to as personal health information, generally
refers to demographic information, medical histories, test and laboratory results, mental health
conditions, insurance information, and other data that a healthcare professional collects to
identify an individual and determine appropriate care.
Intellectual property is incorrect. Intellectual property (IP) is a term for any intangible asset —
something proprietary that doesn’t exist as a physical object but has value. Examples of
intellectual property include designs, concepts, software, inventions, trade secrets, formulas, and
brand names, as well as works of art. Intellectual property can be protected by copyright,
trademark, patent, or other legal measures.
BONUSES
&
DISCOUNTS
About ExamsDigest.
ExamsDigest started in 2019 and haven’t stopped smashing it since. Examsdigest is a global,
education tech-oriented company that doesn’t sleep. Their mission is to be a part of your life
transformation by providing you the necessary training to hit your career goals.