Data and Databases
Data and Databases
Databases
Learning Objectives
Upon successful completion of this chapter,
you will be able to:
• Describe the differences between data,
information, and knowledge;
• Describe why database technology must be
used for data resource management;
Introduction
You have already been introduced to the first two
components of information systems: hardware and
software. However, those two components by themselves
do not make a computer useful. Imagine if you turned on
a computer, started the word processor, but could not
save a document. Imagine if you opened a music player
but there was no music to play. Imagine opening a web
browser but there were no web pages. Without data,
hardware and software are not very useful! Data is the
third component of an information system.
Data, Information, and Knowledge
There have been many definitions and theories about data,
information, and knowledge. The three terms are often used
interchangeably, although they are distinct in nature. We define
and illustrate the three terms from the perspective of information
systems.
Data are the raw facts, and may be devoid of
context or intent. For example, a sales order of
computers is a piece of data. Data can be
quantitative or qualitative. Quantitative data is
numeric, the result of a measurement, count, or
some other mathematical calculation. Qualitative data
is descriptive. “Ruby Red,” the color of a 2013 Ford
Focus, is an example of qualitative data. A number
can be qualitative too: if I tell you my favorite
number is 5, that is qualitative data because it is
descriptive, not the result of a measurement or
mathematical calculation.
Information is processed data that possess context,
relevance, and purpose. For example, monthly sales
calculated from the collected daily sales data for the past
year are information. Information typically involves the
manipulation of raw data to obtain an indication of
magnitude, trends, in patterns in the data for a purpose.
Knowledge in a certain area is human beliefs or
perceptions about relationships among facts or concepts
relevant to that area. For example, the conceived
relationship between the quality of goods and the sales
is knowledge. Knowledge can be viewed as information
that facilitates action.
Big Data
Almost all software programs require data to do anything
useful. For example, if you are editing a document in a word
processor such as Microsoft Word, the document you are working
on is the data. The word-processing software can manipulate the
data: create a new document, duplicate a document, or modify a
document. Some other examples of data are: an MP3 music file,
a video file, a spreadsheet, a web page, a social media post, and
an e-book.
Recently, big data has been capturing the attention of all
types of organizations. The term refers to such massively large
data sets that conventional data processing technologies do not
have sufficient power to analyze them. For example, Walmart
must process millions customer transactions every hour across
the world. Storing and analyzing that much data is beyond the
power of traditional data management tools. Understanding and
developing the best tools and techniques to manage and analyze
these large data sets are a problem that governments and
businesses alike are trying to solve.
Databases
The goal of many information systems is to
transform data into information in order to
generate knowledge that can be used for decision
making. In order to do this, the system must be
able to take data, allow the user to put the data
into context, and provide tools for aggregation and
analysis. A database is designed for just such a
purpose.
Why Databases?
Data is a valuable resource in the organization. However,
many people do not know much about database technology, but
use non-database tools, such as Excel spreadsheet or Word
document, to store and manipulate business data, or use poorly
designed databases for business processes. As a result, the data
are redundant, inconsistent, inaccurate, and corrupted. For a
small data set, the use of non-database tools such as
spreadsheet may not cause serious problem. However, for a large
organization, corrupted data could lead to serious errors and
destructive consequences. The common defects in data resources
management are explained as follows.
(1) No control of redundant data
People often keep redundant data for convenience.
Redundant data could make the data set inconsistent. We
use an illustrative example to explain why redundant data
are harmful. Suppose the registrar’s office has two separate
files that store student data: one is the registered student
roster which records all students who have registered and
paid the tuition, and the other is student grade roster which
records all students who have received grades.
As you can see from the two spreadsheets, this data
management system has problems. The fact that “Student 4567
is Mary Brown, and her major is Finance” is stored more than
once. Such occurrences are called data redundancy. Redundant
data often make data access convenient, but can be harmful. For
example, if Mary Brown changes her name or her major, then all
her names and major stored in the system must be changed
altogether. For small data systems, such a problem looks trivial.
However, when the data system is huge, making changes to all
redundant data is difficult if not impossible. As a result of data
redundancy, the entire data set can be corrupted.
(2) Violation of data integrity
Data integrity means consistency among the stored data.
We use the above illustrative example to explain the concept of
data integrity and how data integrity can be violated if the data
system is flawed. You can find that Alex Wilson received a grade
in MKT211; however, you can’t find Alex Wilson in the student
roster. That is, the two rosters are not consistent. Suppose we
have a data integrity control to enforce the rules, say, “no
student can receive a grade unless she/he has registered and
paid tuition”, then such a violation of data integrity can never
happen.
(3) Relying on human memory to store
and to search needed data
The third common mistake in data resource management is
the over use of human memory for data search. A human can
remember what data are stored and where the data are stored,
but can also make mistakes. If a piece of data is stored in an un-
remembered place, it has actually been lost. As a result of relying
on human memory to store and to search needed data, the
entire data set eventually becomes disorganized.
To avoid the above common flaws in data resource
management, database technology must be applied. A database
is an organized collection of related data. It is
an organized collection, because in a database, all data is
described and associated with other data. For the purposes of
this text, we will only consider computerized databases.
Though not good for replacing databases, spreadsheets can
be ideal tools for analyzing the data stored in a database. A
spreadsheet package can be connected to a specific table or
query in a database and used to create charts or perform
analysis on that data.
Write an essay for these questions.
• How would you describe big data in your own words?
• Why data is very important to manage?
• What is the purpose of database?
Please submit in the given google drive link. Your surname is your file
name. You can use microsoft word or google docs. Thank you and GOD
bless