Data Integration
Data Integration
Data Integration
Data Integration:
Creating a Trustworthy Data
Foundation for Business Intelligence
Author: MAS Strategies
Contributors: Darren Cunningham, MaryLouise Meckler, Jennifer Meegan, David Nguyen, Philip On
Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence i
Executive Summary
To make sound decisions and comply with governmental reporting requirements, an organization
must first establish a solid data foundation. This foundation must combine historical data with
current values from operational systems in order to provide a single version of the truth that can
be then used to identify trends and predict future outcomes. Data integration technology is the
key to consolidating this data and delivering an information infrastructure that will meet strategic
business intelligence (BI) initiatives and tactical and governmental reporting requirements. Data
integration is the enabling technology for providing trustworthy information, enhancing IT and
end-user productivity, and helping organizations achieve and maintain a competitive edge. Data
integration enables mid-size and large organizations to effectively and efficiently leverage their
data resources in order to satisfy their analysis and reporting requirements.
While a homegrown data integration effort frequently yields a quick and dirty solution that may
initially appear inexpensive, any upfront savings are often soon lost as demands on resources and
personnel change. Vendor-supported packaged solutions, on the other hand, have withstood the
test of time. Since they include capabilities such as metadata integration, ongoing updates and
maintenance, access to a wider variety of data sources and types, and design and debugging
options rarely offered by in-house solutions, they serve to increase the productivity of the IT
organization. This is an important advantage as few organizations have unlimited resources and
most are under constant pressure to do more with less. Additionally, most homegrown data
integration solutions are almost never integrated with an organization’s BI tools. Such
integration is, however, available with commercial offerings either by adherence to industry
standards and/or through integration with the BI tools in the data integration vendor’s total
product portfolio.
This paper will discuss the importance of data integration and help you identify the key
challenges of integrating data. It will also provide you with an overview of data warehousing and
its variations, as well as summarize the benefits and approaches to integrating data.
ii Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Introduction
Imagine you work with one of your organization’s mission-critical operational systems. Your
organization considers you the go-to person for any query or reporting request associated with
this system.
The chief marketing officer asks you to identify the 100 customers who produced the most
revenue for your company last year. These companies would be placed on a “preferred customer
list” and their requests given special handling and top priority. As your company’s sales and
service departments each tracked customer revenues in their own departmental systems, you had
to first match customers and add these revenues together. When you proudly present the list to
the CMO, he gives you a strange look and asks why a company he expected to rank among the
top 25 was not even on the list.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 1
Why Is Data Integration Important?
To draw valid conclusions, Without the entire picture, it’s difficult to make sound and
dependable business decisions. That’s because good decision-
an organization needs to be making requires a complete and accurate view of data. The
able to analyze both current ability to access and integrate all of your data sources is the
start to getting the complete picture—and the key to not
and historical data from compromising your decision-making process.
multiple disparate sources.
Though your organization needs a complete view of operations,
With a bit of luck, the the data you need often resides in a variety of application
organization can consolidate systems that do not necessarily all use the same database
the data from these disparate management system. Furthermore, these application systems
may only contain current data values. They may not store prior
sources without resorting to data values needed to provide historical context and to discover
“desperate measures.” trends.
2 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Approaches for Data Integration
An organization can integrate its data through a variety of methods including:
Hybrid approaches are common and include, for example, departmental data marts populated
from enterprise data warehouses or an EII deployment
that access a data warehouse for historical data and
operational systems for the latest values. Most
organizations use a combination of methods as part of By integrating data,
their overall information architecture. Whatever the form organizations can more
(see Appendix for additional details), the intent is to
create a data platform for analytical purposes. By
effectively use this data
consolidating, standardizing, and, in many cases, for analytical purposes.
summarizing the data contained in multiple operational
systems, an organization can analyze the combined data
to achieve a “single and trustworthy version of the truth.”
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 3
Data Warehouse
There are a multitude of benefits resulting from integrating operational data within a data
warehouse or data mart. You can build these to:
Facilitate the adoption of corporate data standards without having to modify existing
operational systems
Provide historical breadth and enable trend analysis
Consider, for example, the analogy of someone with two checking accounts. An operational data
store can be used to determine the total current balance while a data warehouse can be used to
track a given expenditure over the last several years. An EII solution would allow you to do both,
assuming the data warehouse and the operational data store were both being accessed. In the
business world, EII can be used to simultaneously query multiple inventory sites to see if there is
sufficient stock on hand to immediately satisfy an incoming order. It could also be used to identify
products with excess inventory and be linked to a system that would send email offers, with
special price incentives, to targeted prospects to encourage additional purchases of these items.
4 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
You can use enterprise information integration to:
Provide an integrated view across all sources—production systems, operational data stores,
data warehouses, and data marts
Obtain a real-time view of data spread across federated (perhaps one at each manufacturing
location) operational systems
Enable operational, or real-time, business intelligence by accessing historical values in data
warehouses or data marts and the real-time values in operational systems
Jump-start data integration efforts by first deploying an EII solution, perhaps to quickly
satisfy an important user requirement, and then deciding if the data should ultimately be
extracted to a data warehouse, data mart, or operational data store
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 5
WARNING SIGNS: Does Your Organization Suffer from Poor Data
Integration?
• No single version of the truth. Managers are arguing about the fact that analyses results
differ—even though the data came from the same operational system.
• Inability to comply with governmental reporting requirements. The CEO and CFO are
uncomfortable signing off on the company’s financial statements because there is no way
to trace the numbers back to its original source. The Sarbanes-Oxley Act requires isolated
financial data be integrated and that the CEO and CFO certify, subject to penalties that
include imprisonment, the accuracy of their company’s financial statements.
• Incomplete data foundation. Presentations that include an analysis prefaced by a
statement such as, “…except for the data that we were unable to obtain from…” Or
worse, a presentation that begins with, “Due to the discovery of data not included in last
period’s analysis, we are reversing our decision…”
• Poor audit trail and data lineage. An analyst alerts management to a potential problem
discovered while running a query against the data in an operational system. The analyst
cannot, however, answer the follow-up question, “How long has this problem existed?”
• Inability to consolidate data from multiple sources. As a result of an out-of-stock
condition for a critical part, an organization must expedite an order and purchase the
item at a premium price. Once the order arrives, the organization discovers another
division had an excess quantity of the same part and was trying to sell it at a discount to
balance its inventory.
• Poorly integrated, stovepipe operational systems. While analysts use a variety of
business intelligence tools to generate reports from application systems, they re-enter
relevant summary values into a spreadsheet for any analyses requiring data from more
than one application.
• Lack of common data definitions. With a series of very convincing charts and graphs, an
executive presents what appears to be a thorough analysis of the cause of a particular
problem. However, while the format of the presentation qualifies as a work of art, the
executive’s credibility suffers greatly when someone says, “That’s not what that data
means, where in the world did you get that?”
• Historical values not retained in a data warehouse or data mart. An analyst runs the
same report each week against an application system. However, in order to see period-to-
period comparisons, the analyst maintains a spreadsheet. Each week, he must manually
add a new column and enter that week’s report values.
6 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
• Lack of an integrated 360° view. The CEO of one of your largest customers has called
your company’s CEO to complain that when his people contact your call center for
support, they are not receiving the attention he believes they deserve.
• High cost of maintaining in-house “one-time” code. Your company is in the process of
developing a new order entry system that, when deployed, promises to provide a
significant advantage over your competitors. Things are going smoothly until, six months
into the project, the lead programmer is called away to “patch some extract code” that no
longer seems to work with the latest version of the ERP system from which the data is
sourced. Because she last modified her code three years earlier and she has not kept up
with the new version of the ERP system, this task takes much longer than anyone
anticipated and the deployment of the new order entry system is now behind schedule.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 7
The Benefits of Data Integration
An organization can reap many benefits from data integration. These include the ability to:
In order to be trustworthy, the lineage of the data (i.e., where it originated and how it was
transformed) must also be known and auditable. It is also important to be able to perform impact
analysis to see what reports and processes are dependent on a given data element.When
ultimately standardized, the long-term benefits of having a common set of business rules and
common set of definitions and terms can greatly improve efficiency and effectiveness.
8 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
the most current values, these values may not be appropriate for tracking and analyzing how
something has changed over time. A data warehouse or data mart is usually needed if access to
historical values is required.
Using data integration to consolidate the data from the various operational systems serves to
create a “single version of truth” so you can treat data as the enormous asset it is. To do this
effectively, the lineage of the data, including its origin and/or derivation, must be readily
available and not lost.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 9
Because it can now determine the total amount it purchases from the vendor, it would likely
receive a higher percentage discount than each division would have received by negotiating
independently.
A great benefit of any data integration effort is the discovery that different parts of the same
organization do not necessarily speak a common language or use the same business processes.
When ultimately standardized, the long-term benefits of having a common set of business rules
and common set of definitions and terms can greatly improve efficiency and effectiveness.
For example, when determining departmental productivity using “cost per employee” as a metric,
do two part-time employees, each working a four-hour day, count as one employee or two? The
answer is likely to differ by department and unless an organization-wide definition is established,
departmental comparisons are not meaningful.
10 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
integration (EII) can be used to access and combine real-
time data residing in multiple operational systems; once the
data warehouse is deployed, EII can be used to access it as Don’t try to boil the
well and thus, provide a historical perspective. Some
organizations have used EII to provide a quick view of
ocean. A phased,
operational data in order to decide if a more formal effort incremental approach
should then be undertaken to add this data to an existing to an overall enterprise
data warehouse.
information
management
Create and Maintain Organization-Wide architecture can begin
Reference Files
with a data mart or an
All organizations have data used across the several
departments. Examples of these “reference data” files enterprise information
include customer data, product data, employee data, integration (EII)
vendor data, and even financial data such as the company’s
chart-of-accounts. In many organizations. individual
solution.
departments maintain their own reference files and
problems frequently arise when different departments use
different identifiers or keys for the same customer, making it
difficult, if not impossible, to accurately combine. For
example, if a customer’s revenues from both the sales and
the service departments can’t be accurately combined, the Every organization has
total value of that customer’s account would be
understated. data, such as customer
and product files, that
While the term “Master Data Management” is receiving a
tremendous amount of attention, it is simply an extension of are used across the
the reference file concept, a concept behind the use of organization. These
centralized Rolodex files even before the common business
use of computers. Data integration technology, combined
reference files facilitate
with data quality software, is the underlying technology for the organization’s
creating organization-wide reference files and master data ability to create a “360
management solutions.
degree view” of the
Reference files are a subset of metadata management; for
example the definition and allowable values of the data
subject they reference.
elements collected for each customer or product are
examples of metadata.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 11
Maintain the Response and Performance of Operational Systems
The days of having to “submit queries and run reports against the production databases only
between noon and 1 pm or after 6 pm” are hopefully long past. Yet running queries or reports
against the database used by an online application can still negatively impact the performance
and user response time of that application. Performance counts! If an analysis request negatively
impacts the response of an operational system, the analysis request will be deferred, perhaps
permanently! With a data warehouse or data mart, you offload the query to an environment
where the period can be optimized for this purpose.
Combine current and past values from disparate sources in order to see the big picture
In-House Development
Organizations that develop their own data integration solutions frequently do so in a somewhat
piecemeal fashion, without any overall data integration strategy. They generally assign an analysis
request that requires access to data from multiple sources to the IT department. A programmer
then writes the code necessary to access and integrate all of the data.
If the programmer is fortunate, the source systems are well documented, the content of the data
fields conform to the documentation, and each of the individual systems use the same value lists
and code sets to represent the individual values of common data elements. If this is not the case,
the programmer’s task quickly expands to include data value transformations. This frequently
causes the schedule to slip, especially if the data mappings are not simple one-to-one
transformations.
Satisfying the initial consolidation requirement is only the beginning of the overall integration
effort. As any experienced programmer knows, the initial coding effort is followed by ongoing
support and maintenance especially if a new analysis request requires additional data fields or the
file structure of the source systems changes. One of the givens in any applications environment is
the ongoing need to respond to change; another is that “quick and dirty” one-time coding efforts
frequently evolve into scheduled production jobs.
12 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Moreover, a series of uncoordinated, individual integration tasks, even if each one were
successfully accomplished, ultimately result in an assortment of uncoordinated (and usually
undocumented) solutions that collectively, quickly become unmanageable. The problem is further
compounded if a different programmer is responsible for each individual data integration
solution—as most programmers have their own individual programming idiosyncrasies and may
have even used different programming languages
Programmer turnover is another factor to consider. While programming the initial extract
program may involve creativity, future maintenance of these programs is often a thankless task. In
general, programmers prefer new challenges and the original authors of the extract program may
no longer be available to maintain them. And even if they are, they may not go out of their way to
mention their initial involvement in the creation of the extract programs.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 13
Integration with Commercial Application Software Packages
A commercial data integration solution that can work directly with third-party packaged software
applications minimizes, or even avoids, many of the problems associated with continually
modifying and retesting homegrown integration programs. This retesting of an in-house
developed solution is required whenever there are changes to the packaged application software.
Commercial data integration solutions usually do this as part of their normal maintenance. Even
if your organization is currently using homegrown applications software, it is likely to use
enterprise application software sometime in the future as it grows and expands. A good
commercial data integration software offering should be able to integrate data from these
packaged applications and facilitate the population a data warehouse or data mart. Some data
integration vendors also offer easy-to-deploy yet highly customizable data marts, designed to
quickly integrate with a wide variety of enterprise application software packages.
14 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Data Lineage Tracking and Impact Analysis
Impact analysis, or the ability to determine how a change to a source system data field can affect a
business intelligence report or analysis, is only possible through metadata integration and the
resultant ability to track end-to-end data lineage. Data lineage is especially important when a
target field is derived from multiple source system fields. A good commercial data integration
solution facilitates change data management by providing strong impact analysis capabilities
including “what-if” developer scenarios.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 15
Data Integration Build Versus Buy—the Bottom Line
As a general rule, unless the data integration task is truly a “one-time” effort, organizations
should strongly consider a packaged data integration solution. The short-term initial costs
associated with an in-house programming effort are likely to be less than the acquisition cost of a
packaged product. But on-going maintenance costs and the indirect costs associated with an
inability to respond quickly to change will just as quickly consume the initial cost savings.
As organizations grow, their data integration needs tend to multiply and a commercial data
integration solution is usually acquired. Organizations anticipating this should consider
deploying a commercial data integration solution early on. While it may be tempting to try and
solve each data integration challenge with an in-house band-aid approach, the deployment of a
commercial data integration solution is an investment that will yield both immediate and future
benefits for both IT and the user communities.
16 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Conclusion
Reliable data is the basis for sound decision making. And data integration is also the key to
delivering trusted information —do users of business intelligence tools feel they are basing their
decisions on trustworthy data? The best tools are of little value if the data they analyze is not
complete, accurate, and trustworthy.
Operational and analytical systems complement each other. Organizations must effectively deploy
both in order to succeed. For analytic purposes such as trend analysis and forecasting, it’s
necessary to collect time-stamped data values from multiple sources in a data warehouse or data
mart. For operational purposes, it’s frequently necessary to have real-time access to data resident
in operational systems. Organizations can use an operational data store to consolidate current
data values from multiple operational systems. They can use an enterprise information integration
solution to combine current operational and historical data warehouse data and/or to directly
access data spread across several operational systems.
Data integration technology is used to bring this data together. In fact, data integration and data
quality solutions are the keys to achieving trusted information. While some organizations choose
to develop their own in-house data integration solutions, those that use packaged software
solutions can benefit from the vendor’s expertise and experience in working with multiple, and
sometimes esoteric data sources. This also frees up their staffs for more productive tasks that help
gain a competitive advantage. Additionally, commercial data integration products usually provide
metadata interoperability with other tools and track data lineage and provide impact analysis.
Regardless of how obtained, data integration enables data warehouses, data marts, and
operational data stores—which all provide organizations with the means to make reliable
business decisions and comply with government reporting requirements. Successful data
integration is a key factor for an organization’s ultimate business intelligence success. It is the
cornerstone of any successful enterprise information management architecture.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 17
Appendix
Many consider Bill Inmon the father of data warehousing. In his book Building the Data
Warehouse , he defined a data warehouse as, “a subject-oriented, integrated, nonvolatile, time-
variant collection of data in support of management’s decisions.”
Source: Updated from "Data Warehouse—Concepts and Implementation Strategies" presentation, M. Schiff.
While these characteristics are not meant as absolutes for each environment, they represent
general statements as to what is typical of each environment. For example, although data
warehouse content is obviously updated with new values each time a new snapshot is added, the
general use of the data in the warehouse is for read-only analysis purposes. A data warehouse
typically adds new, time-stamped values of existing data elements; a production system usually
modifies existing values. For example, a production application for payroll might contain the
salary of each employee; a data warehouse might contain the salary history for each employee.
When an employee receives a salary change, the new value would replace the old value in the
payroll system while an additional record, containing the new salary and effective date, would be
added to the data warehouse content where it would reside along with the previous salary and
quite likely all past salary amounts (or a least a reasonable history) for each employee as well.
There are also times when an organization needs to collect data from several operational systems
for additional operational purposes such as determining current part quantities across all of its
inventory control systems. This data warehouse variant is commonly referred to as an operational
data store. While it differs from the classic data warehouse as it stores relatively current values
and minimal history, the process of bringing this data together is another classic example of data
integration.
Bill Inmon and Claudia Imhoff highlighted this difference in their book, Building the Operational
Data Store , when they defined an operational data store as a “subject oriented, integrated, current
valued data store, containing only corporate detailed data.”
18 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
Appendix (continued)
Enterprise Information Integration or EII is a somewhat hybrid approach that directly accesses
data contained in a several operational systems in order to provide a transparent view that makes
appear as if the data resided in a single source. Assuming that the data values are compatible, EII
can be of value in operational or real-time BI environments or to enable quick analysis of data that
has not yet been incorporated into a data warehouse. An EII solution is especially useful when it
can access both operational systems and a data warehouse, as it can then provide both real-time
and historical values.
1
W.H. Inmon and Claudia Imhoff, 1996, John Wiley & Sons, Inc.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 19
About MAS Strategies
Michael A. Schiff is the founder and principal analyst of MAS Strategies. MAS Strategies
specializes in helping vendors market and position their business intelligence and data
warehousing products in today's highly competitive market. Typical engagements include SWOT
analyses, market research, due diligence support, technology white papers, public presentations,
and helping organizations evaluate tactical and strategic product and marketing decisions. MAS
Strategies also assists user organizations in data warehouse procurement evaluations, needs
analysis, and project implementations.
With over 30 years of industry experience as a developer, consultant, vendor, industry analyst,
and end-user, Michael, is an expert in developing, marketing, and implementing solutions that
transform operational data into useful decision-enabling information. Michael was the Vice
President of the Data Warehousing and Business Intelligence service at Current Analysis, Inc., an
industry analyst firm where he provided tactical market intelligence and analysis while managing
the company’s E-Business analyst team.
Michael was the Executive Director - Data Warehousing and Advanced Decision Support for
Oracle Corporation's Public Sector Group and Director of Software AG's Data Management
program where he was one of the industry's earliest proponents of the data mart concept. In 1984,
while at Digital Equipment Corporation, he formulated the architecture for one of the first
successful data warehouse implementations. In previous positions as IT Director and Systems and
Programming Manager he acquired practical, first-hand, knowledge of the technical, business,
and political realities that must be addressed for any successful systems implementation or
product launch.
Michael earned his Bachelor and Master of Science degrees from MIT's Sloan School of
Management where he specialized in operations research as an undergraduate, and in
information systems as a graduate.
For further information about MAS Strategies, visit its web site at: www.mas-strategies.com.
20 Business Objects • Data Integration: Creating a Trustworthy Data Foundation for Business Intelligence
About Business Objects
Business Objects is the world's leading business intelligence software company. Business
intelligence enables organizations to track, understand, and manage enterprise performance. The
company's solutions leverage the information that is stored in an array of corporate databases,
enterprise resource planning (ERP), and customer relationship management (CRM) systems.
In December 2003, Business Objects completed the acquisition of Crystal Decisions, the leader in
enterprise reporting. The combined product line includes software for reporting, query and
analysis, performance management, analytic applications, and data integration. In addition,
Business Objects offers consulting and education services to help customers effectively deploy
their business intelligence projects.
Business Objects has more than 24,000 customers in over 80 countries. The company’s stock is
traded under the ticker symbols NASDAQ: BOBJ and Euronext Paris (ISIN: FR0004026250 - BOB).
It is included in the SBF 120 and IT CAC 50 French stock market indexes. Business Objects can be
reached at +1 800 877 2340 and www.businessobjects.com.
Business Objects • Data Integration: Creating a Trustworthy Data Foundayion for Buisness Intelligence 21
Printed in the United States – December 2005 PT# WP2084-D.
www.businessobjects.com
For a complete listing of our sales offices, please visit our web site.
Business Objects owns the following U.S. patents, which may cover products that are offered and licensed by Business Objects:
5,555,403; 6,247,008 B1; 6,578,027 B2; 6,490,593; and 6,289,352. Business Objects and the Business Objects logo,
BusinessObjects, Crystal Reports, Crystal Enterprise, Crystal Analysis, WebIntelligence, RapidMarts, and BusinessQuery are
trademarks or registered trademarks of Business Objects SA or its affiliated companies in the United States and other countries.
All other names mentioned herein may be trademarks of their respective owners. Copyright © 2005 Business Objects. All rights reserved.