Data mining, the extraction of hidden predictive information from
large databases, is a powerful new technology with great potential to help
companies focus on the most important information in their data
warehouses. Data mining tools predict future trends and behaviors, allowing
businesses to make proactive, knowledge-driven decisions.
Data mining tools can answer business questions that traditionally
were too time consuming to resolve. They scour databases for hidden
patterns, finding predictive information that experts may miss because it lies
outside their expectations.Data mining techniques can be implemented
rapidly on existing software and hardware platforms to enhance the value of
existing information resources, and can be integrated with new products and
systems as they are brought on-line.
This white paper provides an introduction to the basic technologies
of data simulated mining. Examples of profitable applications illustrate its
relevance to todays business environment as well as a basic description of
how data warehouse architectures can evolve to deliver the value of data
mining to end users.
Data mining techniques are the result of a long process
of research and product development. This evolution began when business
data was first stored on computers, continued with improvements in data
access, and more recently, generated technologies that allow users to
navigate through their data in real time.
Data mining takes this evolutionary process beyond
retrospective data access and navigation to prospective and proactive
information delivery. Data mining is ready for application in the business
community because it is supported by three technologies that are now
sufficiently mature!
"assive data collection
#owerful multiprocessor computers
Data mining algorithms
$n the evolution from business data to business
information, each new step has built upon the previous one. %or example,
dynamic data access is critical for drill-through in data navigation
applications, and the ability to store large databases is critical to data mining.
%rom the users point of view, the four steps listed in below table were
revolutionary because they allowed new business questions to be answered
accurately and quickly.
coupled with high-performance relational database engines and broad data
integration efforts, make these technologies practical for current data
warehouse environments.
Data mining is able to tell you important things that you
didn;t know or what is going to happen next. The technique that is used to
perform these feats in data mining is called modeling. "odeling is simply
the act of building a model in one situation where you know the answer and
then applying it to another situation that you don;t.
%or instance, if you were looking for a sunken 5panish
galleon on the high seas the first thing you might do is to research the times
when 5panish treasure had been found by others in the past. <ou might note
that these ships often tend to be found off the coast of 0ermuda and that
there are certain characteristics to the ocean currents, and certain routes that
have likely been taken by the ships captains in that era. <ou note these
similarities and build a model that includes the characteristics that are
common to the locations of these sunken treasures. .ith these models in
hand you sail off looking for treasure where your model indicates it most
likely might be given a similar situation in the past. =opefully, if you;ve got
a good model, you find your treasure.This act of model building is thus
something that people have been doing for a long time, certainly before the
advent of computers or data mining technology.
.hat happens on computers, however, is not much different
than the way people build models. &omputers are loaded up with lots of
information about a variety of situations where an answer is known and then
the data mining software on the computer must run through that data and
distill the characteristics of the data that should go into the model. 8nce the
model is built it can then be used in similar situations where you don;t know
the answer.
%or example, say that you are the director of marketing for a
telecommunications company and you;d like to acquire some new long
distance phone customers. <ou could >ust randomly go out and mail coupons
to the general population - >ust as you could randomly sail the seas looking
for sunken treasure. $n neither case would you achieve the results you
desired and of course you have the opportunity to do much better than
random - you could use your business experience stored in your database to
build a model.
2s the marketing director you have access to a lot of
information about all of your customers! their age, sex, credit history and
long distance calling usage. The good news is that you also have a lot of
information about your prospective customers! their age, sex, credit history
etc. <our problem is that you don;t know the long distance calling usage of
these prospects 'since they are most likely now customers of your
competition,. <ou;d like to concentrate on those prospects who have large
amounts of long distance usage. <ou can accomplish this by building a
model.0elow table illustrates the data used for building a model for new
customer prospecting in a data warehouse.
The goal in prospecting is to make some calculated guesses about
the information in the lower right hand quadrant based on the model that we
build going from &ustomer :eneral $nformation to &ustomer #roprietary
$nformation. %or instance, a simple model for a telecommunications
company might be!
)3@ of my customers who make more than A*+,+++Byear spend more than
A3+Bmonth on long distance
This model could then be applied to the prospect data to try to tell
something about the proprietary information that this telecommunications
company does not currently have access to. .ith this model in hand new
customers can be selectively targeted.
Test marketing is an excellent source of data for this kind of
modeling. "ining the results of a test market representing a broad but
relatively small sample of prospects can provide a foundation for identifying
good prospects in the overall market.0elow table shows another common
scenario for building models! predict what is going to happen in the future.
$f someone told you that he had a model that could predict customer usage
how would you know if he really had a good model/ The first thing you
might try would be to ask him to apply his model to your customer base -
where you already knew the answer. .ith data mining, the best way to
accomplish this is by setting aside some of your data in a vault to isolate it
from the mining process. 8nce the mining is complete, the results can be
tested against the data held in the vault to confirm the models validity. $f
the model works, its observations should hold for the vaulted data.
Arc'#t%ct"r% *r DataM#n#n+$
To best apply these advanced techniques, they must be
fully integrated with a data warehouse as well as flexible interactive
business analysis tools. "any data mining tools currently operate outside of
the warehouse, requiring extra steps for extracting, importing, and analyCing
the data.
%urthermore, when new insights require operational
implementation, integration with the warehouse simplifies the application of
results from data mining. The resulting analytic data warehouse can be
applied to improve business processes throughout the organiCation, in areas
such as promotional campaign management, fraud detection, new product
rollout, and so on. 0elow figure illustrates an architecture for advanced
analysis in a large data warehouse.
Int%+rat%! Data M#n#n+ Arc'#t%ct"r%
The ideal starting point is a data warehouse containing a
combination of internal data tracking all customer contact coupled with
external market data about competitor activity. 0ackground information on
potential customers also provides an excellent basis for prospecting. This
warehouse can be implemented in a variety of relational database systems!
5ybase, 8racle, 1edbrick, and so on, and should be optimiCed for flexible
and fast data access.
2n 872# '8n-7ine 2nalytical #rocessing, server enables a
more sophisticated end-user business model to be applied when navigating
the data warehouse. The multidimensional structures allow the user to
analyCe the data as they want to view their business D summariCing by
product line, region, and other key perspectives of their business.
The Data "ining 5erver must be integrated with the data
warehouse and the 872# server to embed 18$-focused business analysis
directly into this infrastructure. 2n advanced, process-centric metadata
template defines the data mining ob>ectives for specific business issues like
campaign management, prospecting, and promotion optimiCation.
$ntegration with the data warehouse enables operational decisions to be
directly implemented and tracked. 2s the warehouse grows with new
decisions and results, the organiCation can continually mine the best
practices and apply them to future decisions.
This design represents a fundamental shift from conventional
decision support systems. 1ather than simply delivering data to the end user
through query and reporting software, the 2dvanced 2nalysis 5erver applies
users business models directly to the warehouse and returns a proactive
analysis of the most relevant information. These results enhance the
metadata in the 872# 5erver by providing a dynamic metadata layer that
represents a distilled view of the data. 1eporting, visualiCation, and other
analysis tools can then be applied to plan future actions and confirm the
impact of those plans.
Pr*#tab)% A&&)#cat#ns$
2 wide range of companies have deployed successful
applications of data mining. .hile early adopters of this technology have
tended to be in information-intensive industries such as financial services
and direct mail marketing, the technology is applicable to any company
looking to leverage a large data warehouse to better manage their customer
Some successful application areas include!
2 pharmaceutical company can analyCe its recent sales force activity
and their results to improve targeting of high-value physicians and
determine which marketing activities will have the greatest impact in
the next few months. The data needs to include competitor market
activity as well as information about the local health care systems.
The results can be distributed to the sales force via a wide-area
network that enables the representatives to review the
recommendations from the perspective of the key attributes in the
decision process. The ongoing, dynamic analysis of the data
warehouse allows best practices from throughout the organiCation to
be applied in specific sales situations.
2 credit card company can leverage its vast warehouse of customer
transaction data to identify customers most likely to be interested in a
new credit product. Esing a small test mailing, the attributes of
customers with an affinity for the product can be identified. 1ecent
pro>ects have indicated more than a F+-fold decrease in costs for
targeted mailing campaigns over conventional approaches.
2 diversified transportation company with a large direct sales force
can apply data mining to identify the best prospects for its services.
Esing data mining to analyCe its own customer experience, this
company can build a unique segmentation identifying the attributes of
high-value prospects. 2pplying this segmentation to a general
business database such as those provided by Dun 9 0radstreet can
yield a prioritiCed list of prospects by region.
2 large consumer package goods company can apply data mining to
improve its sales process to retailers. Data from consumer panels,
shipments, and competitor activity can be applied to understand the
reasons for brand and store switching. Through this analysis, the
manufacturer can select promotional strategies that best reach their
target customer segments.
Each of these examples have a clear common ground. They leverage the
knowledge about customers implicit in a data warehouse to reduce costs and
improve the value of customer relationships. These organiCations can now
focus their efforts on the most important 'profitable, customers and
prospects, and design targeted marketing strategies to best reach them.
&omprehensive data warehouses that
integrate operational data with customer, supplier, and market information
have resulted in an explosion of information. &ompetition requires timely
and sophisticated analysis on an integrated view of the data. =owever, there
is a growing gap between more powerful storage and retrieval systems and
the users ability to effectively analyCe and act on the information they
contain. 0oth relational and 872# technologies have tremendous
capabilities for navigating massive data warehouses, but brute force
navigation of data is not enough. 2 new technological leap is needed to
structure and prioritiCe information for specific end-user problems. The data
mining tools can make this leap. 6uantifiable business benefits have been
proven through the integration of data mining with current information
systems, and new products are on the horiCon that will bring this integration
to an even wider audience of users.
