Datamining and Data Warehouse: By, M.E.Paar Rivanan
Datamining and Data Warehouse: By, M.E.Paar Rivanan
Datamining and Data Warehouse: By, M.E.Paar Rivanan
By,
M.E.PAAR
RIVANAN
Abstract:
The purpose of this paper is giving a short introduction to the
concepts of Data mining and Data warehousing and an explanation
of their general possibilities and a short description of their uses in
the field of Enterprise System Integration. We also describe the
concept of data mining by comparing traditional marketing
research with relationship marketing. The background of data
mining is discussed with special emphasis paid to the various terms
in data mining such as data warehouses and data marts as well as
knowledge discovery in databases (KDD) and continuous
relationship marketing (CRM). Steps necessary for companies to
implement successful data mining projects are enumerated and
there is much scope for future research. An enterprise website
nowadays becomes one of the most important channels between
the enterprise and its existing/potential customers (visitors).
We envision a better management of visitor relationship will bring
about loyalty from the existing customers and stimulate the
interests in the enterprise from the potential customers. In this
paper, we apply the concept of CRM to the management of an
enterprise website, that is, visitor relationship management. In
other words, customers are differentiated with their different
values and served with different relationship strengthening
practices with the understanding of the visitors.
I. INTRODUCTION:
a) Data warehousing:
The volume of data that a company collects may be very large, like also the databases
may be numerous. In such a case, a system that makes easier and faster the process of
retrieval information is needed. This instrument is a Data Warehouse.
A common definition of data warehouse: “A Data Warehouse is a repository of integrated
information, available for queries and analysis. Data and information are extracted from
heterogeneous sources as they are generated. This makes it much easier and more
efficient to run queries over data that originally came from different sources.” A data
warehouse is a database in which are stored data from the other databases of the
company, after that these data have been pre-processed in order to make them more
accessible.
b) Data mining:
Data mining is a method for data processing; nowadays it could be considered the
powerful one. Data mining is also known as Knowledge Discovery in Databases – KDD,
and it can be defined as a method for retrieving information from data. Information and
data is not the same thing: data is just something stored somewhere; information is
something richer.
Data mining becomes a hot topic in the last year’s thanks to increase of computing
power: previous data, which have been compiled and never analysed, have been analysed
and the data mining techniques have been improved.
The power of data mining is the ability to achieve not visible information stored in
the data. Data mining finds patterns to classify data into information. None of other
traditional data process methods is so unrelated with human way of thinking: data mining
doesn’t need a “guide” to achieve information: there’s no need to say to it what to search,
that’s way it can find precious information previously unknown.
a) Data warehousing:
In a company where there are different databases, organized in different ways according
to the needs of the single department or unit of the enterprise, the retrieval of the useful
information for the strategy or other “high level” decisions, like marketing or customer
service decisions, may be a difficult and slow process. On the other hand, the databases
of an enterprise are often based on different systems, like mainframes and “old” systems,
called legacy systems, and “newer” systems such as server-client architecture. So, in
order to provide an instrument that can support high-level decisions and give the right
information at the right time, integration of databases and pre-processing of the great
amount of data are needed. These are the functions that a data warehouse implements.
There is another task that data warehouse can perform. It could be useful not only to
retrieve information, but also for “create” new knowledge from the available data. In fact,
data warehouse is often used like a support for the activity of data mining.
b) Data mining:
There are a lot of methods for processing data, but most of them are deeply related with
the ideas and way of thinking of the people who are using them. They need to be guided
in some way by human intelligence. Also data mining can work in this way, but it can
work also in a more independent way from human minds. This is very useful if there’s
not a concrete idea of the information to be found. This feature in some field of research
could be very important: discover previous unconsidered relation between some diseases
and other factors, for example, can lead to find a new approach to the study of these
diseases.
a) Data warehousing:
Often information is split in different database according to the needs of the different
components of the company. The marketing division has its own database, with a
structure to fulfill its needs, and so on for the sales division, the product development
division; Data stored in these ways are not very helpful for the management purpose and
for having a complete overview of the company. So through data warehousing is possible
to process and combine data in an automated way in order to fulfill needs previous
unsatisfied. This is needed for developing a decision support system.
b) Data mining:
This instrument can be a very important help for discover new information that can
support the planning of new strategies for the company, the analysis of current strategies,
the development of new products, and so on. One of the most important fields, related
with ESI, in which data mining is used, is CRM (Customer Relationship Management).
CRM is a process that manages the interactions between a company and its customers.
The primary users of CRM software applications are database marketers who are looking
to automate the process of interacting with customers. Data mining applications automate
the process of searching the mountains of data to find patterns that are good predictors of
purchasing behaviors. After mining the data, marketers must feed the results into
campaign management software that, as the name implies, manages the campaign
directed at the defined market segments.
Data mining helps marketing users to target marketing campaigns more accurately; and
also to align campaigns more closely with the needs, wants, and attitudes of customers
and prospects. If the necessary information exists in a database, the data mining process
can model virtually any customer activity. The key is to find patterns relevant to current
business problems.
External data
source
Decision
Support
system
EXTRACT
CLEAN
TRANSFORM Metadata OLAP
Repository SERVES
LOAD
REFRESH
DATA
MINING
Data
Warehouse
Operational
Database
External data source: The source available outside and that can be access to the
system of dataware house.
Clean: What ever data that can not usable in some extents of time that is clean.
That is removal of older data.
LOAD: Once the data is extracted from the source system, it is then typically
loaded into a temporary data store in order for it to be cleaned up and made
consistent. These checks can be quite complex, and identify consistency issues
when integrating data from a number of data source. In addition, as data changes
over time, errors become apparent that have gone unnoticed because the day-to-
day discrepancies were too small to detect.
OLAP: The term OLAP is an acronym for online analytical processing. Much has
been written about the subject in the computer literature, and for a detailed
discussion should consult some of that work. For our purpose it is sufficient to
understand of the term.
OLAP is primarily all about being able to access live data online and analyze it. It is
about the methods, structures and tools required to perform this analysis. OLAP is
about rapid access to and analysis of data. OLAP tools are designed to allow
reasonably large quantities of data to be analyzed online. An OLAP tool will allow a
user to quickly perform standard analytical functions on the data and to represent both
data results graphically. The idea is to allow the user to easily manipulates and
visualize the data.
Relational technology has been around for many years, and is family well understood
these days.
Decision Support system: DSS (decision-support system) also had known as EIS
(executive information system) support an organization’s leading decision makers
with higher level data for complex and important decisions.
a) Data warehousing
Data warehousing is something more than a second copy of data, otherwise it would be a
simply backup database. Creating and maintaining a data warehouse implies other
operations, which can be classified in: extraction, consolidation, filtering, cleansing,
transformation, aggregation and updating.
Extraction: periodical download of new data from various databases.
Consolidation: combination of data from different databases in order to
perform data analysis.
Filtering: elimination of data not needed for analysis.
Cleansing: finding and repairing errors due to data manipulations.
Transformation: modification of data in order to make them consistent.
Aggregation: summarization of data into appropriate units for analysis.
Updating: adding new data.
b) Data mining:
There are a lot of techniques related with data mining, but the general process can be
described using the following steps:
OLAP
Reformattin DSSI
Cleaning MATADATA
g EIS
DATA
DATA
Databases MINING
Customer is not new, Relations are as old as a buyer and a seller and so is not
Management. The concepts of CRM have been there since the concept of buying and
selling came into being. Then, what is creating waves in today's CRM industry? Is that
small electronic 'e' changing the trend?
CRM is considered to be a software tool and a technology solution in this Information
Technology industry. In fact CRM is a strategy towards achieving a holistic view of any
partner engagement. CRM, which is a combination of marketing and business processes,
is the basic understanding of customers and how organizations measure them. The mantra
behind CRM is catering to customized needs "centrally".
As defined by "gurus" of CRM - Customer Relationship Management is a business
strategy to select and manage the most valuable customer relationships. CRM requires
customer-centric business philosophy and culture to support effective marketing, sales
and service processes. CRM applications can enable effective customer relationship
management, provided that an enterprise has the right leadership, strategy and culture.
USE OF CRM: Keeping in mind the pace at which technology is changing today, any
company which is a step ahead of others because of some web product or service will not
be able to hold on to that advantage for long. Key to stability in today's dynamic
marketplace is forging long-term relationships with the customers.
Customers can be divided into three zones:
1. Zone of defection where customers are extremely hostile and have the lowest level
of satisfaction.
2. Zone of indifference where customers are not sure. They have a medium level of
satisfaction and loyalty towards the company.
3. The third level of customers is in the zone of affection described as "Apostles".
CRM focuses on bringing customers from level 1 to level 3 and retaining apostle
customers.
philosophy a company must change the entire business operation so that research and
development and marketing, work seamlessly and financial resources are allocated in the
"right" places.
The producers and suppliers must be able to put together the right mix of service and
information surrounding the differentiated or personalized products of the future. This
mix will be customized by creating very separate portraits of individual customers.
The technology to develop these portraits exists in today's data mining technology.
Companies are able to take information from their own company's database and augment
it with enhancement information provided by a data compiler and then apply a predictive
model to the augmented data set using sophisticated data mining techniques. In this way
we can understand some of the things the individuals in the year 2020 will want to
achieve as customers.
References:
An Introduction to Database by C.J.Data
DATA WAREHOUSING IN REAL WORLD BY Sam Anahory & Dennis
Murray.
Fundamentals of Database System by Remez Elmasri & Shamkant B. Navathe
www.cisco.com/edu