Business Intelligence Article
Business Intelligence Article
Business Intelligence Article
Executive
BI: Prescription for Business Advantage . . . . . . . . . . . . . . . . .2
TRENDS, STRATEGIES AND RESOURCES
COMPUTERWORLD
BI for the Masses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Hurdles to BI Implementations . . . . . . . . . . . . . . . . . . . . . . . . .7 Spreadsheet Overload? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Users Speed Feeds to Data Warehouses . . . . . . . . . . . . . . . .11 Case Study: BI Dashboards . . . . . . . . . . . . . . . . . . . . . . . . .13 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 Predictive Analytics Grows Up . . . . . . . . . . . . . . . . . . . . . . .19 Outsourcing Predictive Analytics . . . . . . . . . . . . . . . . . . . . .22
Business intelligence software and dataanalysis technologies are entrenched in the enterprise, but IT managers need to take special care planning and managing these sophisticated tools.
Compliments of
ONLINE PROJECTS EDITOR: Ian Lamont EXECUTIVE DESIGN DIRECTOR: Stephanie Faucher MANAGING EDITOR/PRODUCTION: Michele Lee DeFilippo COPY EDITORS: Bob Rawson, Eugene Dematre, Mike Parent, Monica Sambataro
BULLETIN EDITOR: David Ramel
Overview
Highmark Inc. noticed data analysis that indicated cataracts were a predictor of future heart attacks. That didnt seem logical, so they discounted the information as a spurious anomaly. Subsequent medical research found that heart disease affects oxygenation of the blood and cataracts reflected that condition. Highmark now offers services to prevent heart attacks among people thus afflicted. Welcome to the world of business intelligence.
campaigns and precisely target its programming and focus on viewers who are most likely to make financial contributions. l A mortgage company hopes to convert recordings of phone calls into a text format that can be used to study the behavioral patterns of delinquent borrowers and try to identify customers who plan to file lawsuits against the company. l A recreational equipment retailer identifies good locations for new stores by finding places with high concentrations of online and catalog customers. It also tailors its stores product mixes to local market preferences and uncovers patterns that suggest future purchases by customers. But implementing a successful BI program can be daunting. Industry experts warn that it can take years to get the right people and systems in place and operating efficiently enough to generate ROI. Besides technical, business and process obstacles, there are softer considerations such as cultural and political
decisions.
n CONSIDER what information executives need in order to facilitate quick, accurate decisions.
Companies are awash in data. Think of all the databases, spreadsheets, e-mail messages, reports, research, meeting notes and interactions with customers. Its organizing, analyzing and using these disparate pools of information the basics of BI that can help you gain an edge over your competitors. Granted, most BI applications dont deal with life-and-death situations. But todays tools are resulting in more innovative strategies that companies use to gain a business advantage. Consider the examples you will read about in this report: l A financial services company helps its clients predict which of their customers will be late with payments, which will be lying when they say the check is in the mail and which will be likely to default altogether. l A company in the education field can analyze its ability to attract students and the success of new campus locations and acquisitions. l A public radio and TV broadcaster hopes to analyze promotional
Computerworld Executive Bulletin
SOURCE: THE BRAIN BEHIND THE BIG, BAD BURGER AND OTHER TALES OF BUSINESS INTELLIGENCE, CIO MAGAZINE, MARCH 15, 2005
issues standing in your way. And its the people issues that can be the most perplexing. Workers can get set in their ways and become resistant to change they might not want to give up their trusty spreadsheets for standardized BI reports. Some may have become experts in spreadsheet analysis and feel that their jobs are threatened by the intrusion of BI for everybody. ManBI and the Data Matrix 2
Overview
agers might not want to give up control of information they have traditionally kept to themselves. Staffers might not want to even use any BI tools, especially if they arent championed by a strong and influential executive or are introduced because of a merger or acquisition and dont help them do their individual jobs better. Several strategies can be used to solve these human issues, such as finding a strong executive sponsor, reaching out to users to sell them on the concept and making sure the BI system is based on business and user needs rather than on technical considerations. And sometimes these strategies are more extreme. One example: This report will detail how a chemical company actually tied employees pay to their level of cooperation on a BI project. But first come the technical problems. Moving to enterprisewide BI can be likened to painting a house. You might want to just grab a brush and start painting so you can quickly see the results of your labor. But if you dont spend enough time in preparation scraping, sanding and mixing the paint your project will fail. In BI, you might want to start generating those interesting reports and quickly capitalize on the information you find. But again, your project will fail without proper preparation. First and foremost in BI preparation comes organizing the raw data. Knightsbridge Solutions LLC in a January 2005 report listed Taking Data Quality Seriously as the No. 1 trend for the year in BI and data warehousing. Besides ensuring that structured data culled from myriad databases is scrubbed and standardized and stored correctly in a data warehouse, more and more companies today are delving into unstructured data from other sources. Cutting-edge BI tools are now using data from printed documents, e-mail messages and even voice mail
Computerworld Executive Bulletin
Categories of BI tools
Different types of users require different types of BI tools. Here are descriptions of some common types of tools used in corporate environments:
n
Users can create their own reports with these tools, which require no programming.
n
risks especially with widespread dissemination via the Internet only a few clicks away. One municipality moved to a BI system to consolidate information from thousands of spreadsheets, so it made all its employees read and sign a document detailing proper security procedures, such as not sharing passwords. But even with the associated increased risks, moving to enterprisewide BI is clearly a growing trend, according to many research firms. They say BI is growing in market size and is near the top of many CIOs to-do lists. This report will help you use BI to keep up with them and hone your competitive edge. It may not prevent a heart attack, but it just might help prevent heartburn.
David Ramel
Dashboard/scorecard tools:
and recorded telephone conversations. While data from these far-flung sources can provide valuable insights, it can also pose new problems. For example, one executive warns of vampire data that can come back to bite you in the neck, such as old e-mail messages that could be subpoenaed during litigation. Other risks might arise from providing BI information to many more users throughout an organization. While this proliferation can be beneficial, it opens up new security
BI and the Data Matrix 3
Kahler adds the benefits of BI are too nebulous to measure. We know inherently that we have efficiency improvements throughout the organization, he says. People work much faster and more accurately, and they are able to do more than they did before. But its very difficult to try to quantify the benefit.
The potential gains include business process enhancements, increased customer satisfaction and cost reductions in areas such as sales and marketing. Organizations that have broadly deployed BI are realizing some of these benefits. Sara Lee Household and Body Care, a division of Sara Lee Corp., began using QlikView BI software from QlikTech International AB three years ago to create a repository for sales data. Today, field salespeople, marketers and managers use the product to access a variety of information about customer interactions, buying trends, products and other data that drives sales. The software has helped the division improve the accuracy and timeliness of demand forecasts for specific products in different locations, says Gary Kahler, director of sales and operations planning at Sara Lee in Exton, Pa. Workers use the product to download BI data into Excel spreadsheets on their PCs, Kahler says. Managers at headquarters use the application, which runs on a Dell server, to compare regions by customer and prodComputerworld Executive Bulletin
Get Smart
HERE ARE SOME TIPS
TRAIN employees not only how to use BI applications correctly, but also how to report and analyze data accurately.
n
PROVIDE BI capabilities only to those workers who stand to benefit from the information gleaned.
n n
ASSESS vendor BI products thoroughly to ensure that nonstatisticians will be able to benefit from their use.
uct brands over different time periods, he adds. Its simple enough for anyone in the company to use but powerful enough to answer any questions they can ask, he says.
analytics, he says. In some cases, the IT department cant keep up with the requests. Vesset says there might also be technical issues to deal with, such as integrating BI applications with existing business systems. Organizations must also guard against giving BI tools to too many people, says Shaku Atre, president of Atre Group Inc., a consultancy in Santa Cruz, Calif. If too many people use the system, companies can run into problems maintaining resources and controlling usage and application performance. There are also security concerns to consider, including the possibility that customer data could be compromised. Once information is made available to the masses, if the proper controls
are not in place, theres a risk of data falling into the wrong hands, says Atre. Because of the Internet, information could be misused. This is something you have to be very careful about. For example, health care organizations must secure patient information to comply with the Health Insurance Portability and Accountability Act. Companies must also guard against critical information getting into the hands of competitors because so many people have access to it, says Dipendra Malhotra, Atre Groups chief technology officer. BI will grow in popularity as vendors link its capabilities with familiar tools such as spreadsheets, Atre says. If you want to provide something for the masses, look at what is
BI and the Data Matrix 5
Hurdles to BI Implementations
OMPANIES CAN USE business intelligence tools to make big improvements in their operations, but numerous technical, cultural and internal-process challenges must be overcome first, according to some IT managers. The people and process issues can be even more daunting than the technical ones, says Bubba Tyler, CIO at Quaker Chemical Corp. in Conshohocken, Pa. For the past 10 years, Quaker Chemical has used software from SAS Institute Inc. to do data analysis and reporting. But the project wasnt a simple matter of installing the applications and giving workers access to them, he notes.
It was a nine- to 10-year process and a heck of a big investment for a company of our size, says Tyler. He notes that the chemical company moved ahead slowly and had to continually re-examine the BI model it
was putting in place. The addition of the SAS 8 software also required Quaker Chemical to collaborate and share information on a global basis, prompting it to tie employees pay to their level of co-
operation on the project. In addition, the company had to create a common BI language a time-consuming task and speed up the collection of data, Tyler says. It also developed a homegrown query tool to make SAS 8 palatable for widespread use, although Tyler has said he might replace that with a set of simplified user interfaces built into a SAS 9 upgrade. Andy George, senior vice president of technology at ProfitLine Inc. in San Diego, recommends that companies phase in their BI implementations. ProfitLine, which manages billing and other administrative functions for telecommunications companies, uses Business Objects SAs WebIntelligence software to analyze and audit customer bills. George says that during ProfitLines rollout, ensuring the validity of data was a big challenge because so many people were accessing information and inadvertently corrupting it. That prompted the company to put a data czar in charge of maintain-
built-in rules for automating the process of parsing the information in claims. H&R Block Inc.s Option One Mortgage Corp. subsidiary plans to custom-develop a system that can convert inbound and outbound calls into searchable data, said Matt Slonaker, director of business information at the Irvine, Calif.-based lender. Option One already uses a mix of software from Oracle Corp. and Microsoft Corp. to help employees assess and manage the risks on loans, Slonaker said. Now, he added, it hopes to convert recordings of phone calls into a text format that can be used to study the behavioral patterns of delinquent borrowers and try to identify customers who plan to file lawsuits against the company. For instance, the application will be able to track how many times a borrower says the word litigate during a call and then help Option One employees score the likelihood that he will take the company to court, Slonaker said.
Spreadsheet Overload?
for the PC. Lotus 1-2-3 subsequently took over, before yielding the throne to Microsoft Corp.s Excel. Today, spreadsheets are so easy to use and ubiquitous that theyve sprouted like weeds throughout most companies. And they often hold important financial data. But what if Marys sales spreadsheet differs from Toms and has faulty data or a modeling error? What if Tom hoards his spreadsheet data its a form of power, after all and wont let go? How do you get the data from dozens of far-flung spreadsheets into a companywide planning or budgeting system that meets the latest accounting standards?
terest and has potential for improvement, but in the scheme of things, its not high on the list of priorities, says Joe Iannello, CIO at watchmaker Movado Group Inc. in Paramus, N.J.
ing, spreadsheets pose challenges not dreamed of when they first began popping up on PCs across the land. Here are three of the more significant spreadsheet issues that companies have to address:
DECENTRALIZATION. Mentor Graphics Corp. in Wilsonville, Ore., had a central 25MB Excel spreadsheet and 1,200 budget spreadsheets across the enterprise, one for every cost center. But having numerous spreadsheets makes it difficult to collect important data. Spreadsheets are great analysis tools, but at some point you start using them as a planning system, and thats where Excel starts breaking down, says JanWillem Beldman, Mentors enterprise data architect. So Mentor decided to use SAP AG software as a centralized database of accounting transactions and Hyperion Solutions Corp. software as a budget-planning tool. The Hyperion system allows Mentor to quickly do a what-if analysis of, say, changing employee benefits in various countries. These are things you might be able to model in Excel, but if you have a lot of details, its much more than you could have in a spreadsheet, says Beldman. COMPLIANCE. Having financial data in a hodgepodge of spreadsheets also makes it hard to maintain one version of the truth, which is important for complying with the law. For example, the Sarbanes-Oxley Act requires companies to maintain a good audit trail, and generating such a trail is difficult to do with Excel,
BI and the Data Matrix 9
Various studies report that 47% to 64% of companies use stand-alone spreadsheets for planning and budgeting, for example. But critics say spreadsheets invented as a personal productivity tool arent well suited to collaboration, data quality or regulatory compliance. Excel is a tool of information mavericks, says Eleanor Taylor, manager of business intelligence strategy at software vendor SAS Institute Inc. in Cary, N.C. Besides being extremely unwieldy for processes involving large volumes of data and multiple users, spreadsheets often contain substantial, material errors, according to academic research, notes Paul Hamerman, a Forrester Research Inc. analyst. Companies are just starting to look at the problems caused by spreadsheet proliferation, says Gartner Inc. analyst Michael Silver. Some enterprises are addressing it, but most arent, he says. No one is suggesting that the spreadsheet is going away anytime soon or that its a top-of-mind IT issue. The subject is certainly of inComputerworld Executive Bulletin
Spreadsheets are great analysis tools, but at some point you start using them as a planning system, and that's where Excel starts breaking down.
JAN-WILLEM BELDMAN, ENTERPRISE DATA ARCHITECT, MENTOR GRAPHICS CORP.
Trying to get people not to save data locally and not to do their own spreadsheets is a cultural problem based on 15 years of PC use.
MICHAEL SILVER, ANALYST, GARTNER Consulting Inc. in White Plains, N.Y. He says its true that spreadsheets arent a good corporate data store, and they arent good for managing processes like planning and budgeting because theres too much errorprone manual work involved. For Sarbanes-Oxley compliance, its easier for executives to sign off on the integrity of a financial process if its fully automated, without manual steps like in spreadsheets, Iervolino says. But that doesnt mean spreadsheets are down and out, he continues. Iervolino and other observers say the future of the spreadsheet is as a user interface for manipulating data extracted from a central, backend database. [Spreadsheets] are a great manipulation and analysis tool; theyre not such a great database, says Beldman at Mentor Graphics. Besides, it would be hard to snatch spreadsheets away from the power users. Youd have to pull the spreadsheets from the cold, dead hands of the analysts, Iervolino quips. Thats why the vendors of even the most sophisticated business performance management tools have interfaces for connecting to spreadsheets its a market requirement. People can quickly become computer-literate [with spreadsheets]. They feel empowered; their confidence is boosted, Atre says. So be prepared for resistance when moving to a centralized system. Trying to get people not to
spreadsheets is poor data quality. As you make changes or add information, your spreadsheet will have errors or mismatched formulas, says Ed Chen, director of IT at KQED Inc., which operates public television and radio stations in San Francisco. Thats why some users are moving from decentralized data held in spreadsheets to a centralized database. The quality of data improves greatly because you have much more control of the different calculations, Beldman says. Spreadsheet incompatibilities can even cause conflicts within a company. If I have developed a spreadsheet, I trust my spreadsheet more than yours, even if yours [is really] more accurate. That creates political problems, observes Shaku Atre, president of Atre Group Inc., a database and BI consultancy in Santa Cruz, Calif.
save data locally and not to do their own spreadsheets is a cultural problem based on 15 years of PC use, Gartners Silver says. Although spreadsheets have significant shortcomings, they provide enough benefits usability, what-if analysis and presentation graphics that most observers say theyll be around for the foreseeable future. They will persist as an interface that people will continue to use to manipulate and store data, says Herbert A. Edelstein, president of Two Crows Corp., a data mining consultancy in Potomac, Md. I cant envision a world where the spread-
Besides being extremely unwieldy for processes involving large volumes of data and multiple users, spreadsheets often contain substantial, material errors, according to academic research.
PAUL HAMERMAN, ANALYST, FORRESTER RESEARCH sheet will disappear. Prashant Dholakia, senior vice president at FreeMarkets Inc., a procurement services provider in Pittsburgh, isnt so sure. Someday, large corporations may have to consider a postspreadsheet world, Dholakia says. Spreadsheets can go only so far, he says. Something will have to replace it, but theres no consensus of what that is.
Reality Check
To some extent, the criticism its been called the demonization of spreadsheets comes from vendors pushing their own, more expensive financial software, such as business performance management software. Vendors put out press releases with headlines like Spreadsheets Out, Hyperion In and Extensive Reliance on Spreadsheets Dulls CFOs Strategic Edge, while arguing that spreadsheets wont help companies comply with the Sarbanes-Oxley Act. Only to a degree is that true, says Chris Iervolino, head of ITEC
Computerworld Executive Bulletin
Stanley, the Las Vegas-based gaming companys CIO. The new setup is based on an architecture that Harrahs developed in mid-2002. The company is using adapters from Tibco Software Inc. to feed information from transactional systems into its Teradata warehouse to help workers interact with customers at Harrahs properties, on the phone or on the Harrahs Web site. It uses Teradatas transactional database and also has direct access to all the historical data, Stanley says. You dont have to have two databases talk to each other.
Warehouse Challenges
Implementers of data warehouses most often cite these issues as challenges: 1. Data quality 2. Security 3. Availability 4. Data standards/consistency 5. Web-based access 6. Performance/scalability
SOURCE: IDC, FRAMINGHAM, MASS.
Changing Needs
Eric Rogge, an analyst at Ventana Research Inc. in San Mateo, Calif., says that because BI tools are being used more often for operational decision-making, many companies are finding that they need to refresh their data warehouses more frequently than on a nightly basis. Its not about loading a data warehouse so a small department of business analysts can forecast two years out its for daily decisions, he says. For 18 months, Avnet Electronics Marketing has been using a nearreal-time data warehouse that captures orders and updates of logistics data from its back-end system every 15 minutes, says Kevin Harrington, director of IT delivery for global information solutions at the Phoenixbased electronics distributor. Avnet uses tools from Informatica Corp. to move the data into the warehouse. Because of the integration infrastructure, it took only 24 hours in late July to begin populating the warehouse with order and customer information from a company that Avnet recently acquired, Harrington says.
BI and the Data Matrix 11
enue that would not normally have happened, Garcella says. You cant wait until the next day or three hours later to get that data. He declined to specify how much Overstock is spending on the warehousing project, other than to say the cost is in the millions of dollars. Harrahs Entertainment Inc. is testing a real-time data warehouse that combines operational and historical customer data, says Tim
CASE STUDY:
BI Dashboards
ANAGERS AT Blue Cross Blue Shield of Massachusetts used to show up at their monthly meetings armed with several pounds of paper documents departmental performance reports, printouts of e-mail and PowerPoint slides and lots and lots of spreadsheets. The managers eventually agreed to lighten their load by regularly tracking a total of 45 business performance measures, which were printed out in eight-point type to fit on a single sheet of paper.
Early Questions
As the BCBS cross-functional team trolled for meaningful metrics, these were the questions they asked: Where are we currently getting data?
n
What are the most important pieces of information executives look to in order to run the business?
n
After watching the group bounce between the two extremes, the CIO stepped in and showed several corporate vice presidents and the chief operating officer a demo of a digital dashboard, which pulls data from multiple sources to graphically present select performance metrics on a single screen. The executive group took to it almost immediately. They ultimately decided to track 10 key performance measures, which now are accessible by all 3,000 of the health insurers employees via a Web-based dashboard that aggregates data from 20 systems. The report enables us to focus on whether were carrying out our strategies and how were performing against our business goals, says Karen Thompson-Yancey, senior director, strategic business integration. What happened at BCBS is a textbook example of how to do dashboards right. Top executives drove the effort. They kept the dashboard simple and made it ubiquitous throughout the enterprise. Users even willingly parted with their beloved paper spreadsheets. Heres how they did it.
Computerworld Executive Bulletin
Information Overload
We used to have all these reports and we spent too much time on Where did you get that info? as opposed to How is the business running? says Thompson-Yancey. After the dashboard demo, Our CFO and COO made it very clear that it was critical for us to collaborate and have the same information and not confuse the organization with various sources and instead to focus on how were doing. The managers formed a small, cross-functional team representing each area of the company. It was important to us, as part of our culture, to ensure that we had sponsorship of our executives and to make sure it was a cross-functional, collaborative group, says ThompsonYancey. We knew the reports we had were not getting us what we needed. So we went off to get recommendations about what information we needed to look at.
What frequency of information do we need daily, weekly, monthly, quarterly? Does frequency vary depending on the information?
n n Where do we want the information to go in the organization? (The audience will determine the level of detail we need.)
They are close enough to day-today operations to distinguish between tactical and strategic levels of information, and because theyre the ones giving updates to the executives in the staff meetings, they have a eye into which information executives were interested in seeing. Moreover, she says, getting more people involved leads to more buyin at every level. When you open up a dialogue and ask people for their opinions, it helps get everyone working together. The group came back with a long list of metrics, then they worked with the executives to fine-tune it.
BI and the Data Matrix 13
Success Factors
ASSURE executive sponsorship and
Time to Test
The next step was to test the relevance of various kinds of information with a few different audiences. Give yourself enough time to get the right information, ThompsonYancey suggests. Sometimes you put things together too quickly in order to meet a date, but unless you get it right, youre not doing yourself or the company any service. The entire project took BCBS about six months. At the end of that time, they were looking at toplevel business performance metrics such as member satisfaction and retention, sales and financials, staff retention and IT system uptime and availability. Because BCBS placed such a high value on collaboration, ThompsonYancey says, trust in the system was never a problem. They [the business people] were involved in finding the metrics that were most important and they reviewed and signed off on the reports we used prior to the release of information. So we didnt have to talk about the accuracy of the information just the impact. This was a group that all agreed theyd drive this effort right from the top, recalls former IT director Jim Humphrey, who then headed information delivery and knowledge management. We had no problem getting people to part with paper reports. They liked the idea of using better technology. Today, they use the metrics to drive the agenda of their monthly meetings. Thompson-Yancey says the BCBS
Computerworld Executive Bulletin
Data Mining
VERY INTERACTION your company has with a customer or supplier likely generates a data trail and that data provides a wealth of information for marketers. Extracting that information and getting it into usable shape, however, requires sophisticated data mining tools. The same technology that police departments use to identify patterns in crime data and to deploy officers accordingly can help chief marketing officers uncover customer trends and better focus their marketing resources.
future purchases by customers. We know people are involved in lots of different activities, even though they might not have bought all the gear at REI, Polenz says. So well send our cycling catalog to someone who might not have bought cycling equipment but who probably is interested in cycling, based on their other activities associated with cycling. Camping is one such tip-off, she says.
such as retailers, are ideal candidates for data mining technology. Wal-Mart Stores Inc., for example, is famed for its use of data mining to analyze market baskets, the combinations of items consumers group together in one purchase. Pharmaceutical makers rely heavily on data mining technology to track their drugs effects, while financial companies use it for identifying new customer opportunities.
Text Mining
NSTRUCTURED DATA, most of it in the form of text files, typically accounts for 85% of an organizations knowledge stores, but its not always easy to find, access, analyze or use. We are drowning in information but are starving for knowledge, says Mani Shabrang, technical leader in research and development at Dow Chemical Co.s business intelligence center in Midland, Mich. Information is only useful when it can be located and synthesized into knowledge.
But a new generation of text mining tools allows companies to extract key elements from large unstructured data sets, discover relationships and summarize the information. Many organizations are deploying or considering such software to deal with their mountains of text, despite the need for specialized skills to make implementations work. For example, since 2000 Dows research staff has been using ClearResearch software from ClearForest Corp. in New York to extract data from a centurys worth of chemical patent abstracts, published research papers and the companys own files. By managing the information better and eliminating the irrelevant, weve been able to reduce the time it takes for [researchers] to find what they need to read, says Shabrang. Text mining tools take a variety of approaches. ClearResearch uses a proprietary pattern-matching methodology to search for information, categorize it and graphically show its relationship to other data. The software can see, discover and extract concepts, not just words, says Shabrang. It gives us a pictorial representation of the text in the documents in an easy-tounderstand chart.
Computerworld Executive Bulletin
Adoption Roadblocks
The text mining software available now doesnt yet match the accuracy of data mining tools, but vendors are improving their products ability to understand context, which is key to making text mining tools effective. Understanding linguistics and overcoming its challenges is a horizon that has not been dealt with well, says William McKnight, president of McKnight Associates Inc., a data warehousing consulting firm in Plano, Texas. Basic text mining is possible, but the performance needs to be improved and the tools dont scale well. Because of these limitations, text mining tools are still niche products generally restricted to specific parts of an organization. But they are starting to catch on. Over the last 12 to 18 months, I have seen a lot of interest in using these tools for regulatory compliance, says Brian Babineau, a research analyst at Enterprise Storage Group Inc. in Milford, Mass. But once that seems to be under control, people will retrofit these applications for other purposes, like data warehousing and CRM. While there are software systems that analyze both structured and unstructured data, many companies
use traditional BI software on their structured data and then turn to separate tools to analyze text-based data. Electronic Data Systems Corp., for example, has all of its 130,000 employees fill out an online questionnaire about their jobs once a year. Another three times a year, 20,000 employees answer an additional survey. Some of the survey questions are multiple choice, making it easy for EDS to plug the answers into BI software from SAS Institute Inc. in Cary, N.C., and SPSS Inc. in Chicago, where its aggregated, dissected and analyzed. Some of the most important feedback, however, comes in the responses to open-ended questions. In the past, those responses were forwarded to the line managers to draw conclusions, since they didnt fit into any easy-to-manage structure.
PREDICTION
We are getting an increasing understanding of what things are possible with text mining. But there is a huge skills problem in this area, which is why it hasnt gotten much traction so far.
ALEXANDER LINDEN, ANALYST, GARTNER INC. tens of thousands of categories, standard categorical analysis simply will not work, Cerrito says. But by treating it as unstructured data, I can then get some very useful information from it. By examining thousands of patient outcomes with Text Miner, she has found useful information that prescribing certain medications can
Computerworld Executive Bulletin
undifferentiated mass.
CLUSTERING - Grouping similar documents based on their content. EXTRACTION - Extracting relevant information from a document for example,
industry-standard or customized. Some tools can automatically generate a taxonomy based on analysis of the data store.
VISUALIZATION - Graphically presenting the mined data so relationships are easier
relies on visions from precogs, people who can predict crimes, to catch criminals before they can act. While the film takes place in the future, the predictive analytics tool sets available to businesses today are bringing similar scenarios to life.
Predictive analytic tools are also used to predict outright fraud. For example, at health insurer Highmark Inc. in Pittsburgh, such systems are set to anticipate and block fraudulent claims. The adoption of predictive analytics systems is on an upswing, driven by technology advances and the potential for large bottom-line benefits. The number of preconfigured and proven models available for specific industries and applications is increasing, while the model-creation process is more automated than it once was. That means analysts can build models faster and refresh them more frequently in response to
For example, LoanPerformance uses such tools to help its clients predict which of their customers will be late with payments, which will be lying when they say the check is in the mail and which will be likely to default altogether. The San Francisco-based firm operates a cooperative database of loan payment information for financial institutions. Richard Harmon, senior vice president of scoring and analytic services at LoanPerformance, says its customers, which include mortgage servicers, use the data to encourage on-time payments or to put delinquent accounts on the fast track to foreclosure.
changing business needs. Successful models can pay off big. At LoanPerformance, a model that predicts which accounts that are 90 days in arrears will default saved one client $2 million in six months. The total cost of deployment was $400,000. Those types of returns are one reason why IDC research shows the sale of predictive analytics tools growing to $3 billion by 2008, which would be a nearly 40% increase from 2004. Such tools make up 25% of the business intelligence market. As the volumes of business data have increased, the desire to extract value from that information has intensified. Fortunately, predictive analytics tools have become easier to use, says Harmon, allowing more streamlined model-building workflows and enabling analysts steeped in business issues to do more without the involvement of statisticians. This is where the future lies, he says. The tools are being automated. The biggest benefits, however, are coming on two fronts: the inclusion of unstructured data into the predictive modeling process to improve accuracy and a push to execute pre-
Each new point in the time series is the average of some number of earlier consecutive data points, sometimes chosen to eliminate seasonal factors or other irregularities.
EXPONENTIAL SMOOTHING:
Similar to the moving average, except more recent data points are given more weight.
MEMORY-BASED REASONING:
Toolbox
REGRESSION:
Sometimes called the nearest neighbor method, its an artificial intelligence technique that can forecast something by identifying the most similar past cases and applying that information to a new case.
ARTIFICIAL NEURAL NETWORKS:
Fits a line to a set of historical data points to minimize the sum of the squares of the distances of the data points to the line. For example, if the line expresses the relationship between independent variables such as age, sex and income to a dependent variable such as sales, then it defines an equation that can be used to forecast sales.
into three categories: enabling realtime scoring on the front end when, say, a new loan application comes in; updating the back-end databases; and accelerating the pace at which models can be refreshed to deal with changing scenarios, which can be helpful because criminals are constantly devising new ways to commit fraud, for example.
Patterned after the human brain, theyre composed of a large number of processing elements (neurons) tied together with weighted connections (synapses). Theyre trained by looking at real world examples for example, historical sales data and the past values of variables that may influence sales. The training adjusts the weights, which store the data needed to solve specific problems, such as sales forecasting.
DECISION TREES:
Texting It Up
Harmon says he was surprised at how much text mining increased the accuracy of his predictive models. The previous model included structured information such as loan histories, credit reports and demoComputerworld Executive Bulletin
Sequential decisions are drawn as branches of a tree, stemming from an initial decision point and branching out to multiple possible outcomes. The trees can be used to predict the most likely outcome and to forecast financial outcomes by multiplying costs or returns at each branch by the probability of that branch being taken.
Real Deal
A critical difference in using predictive analytics is the speed at which models can be refreshed, Keithly says. While the mainframe systems he used years ago allowed model development only every two years, his current tool set allows him to refresh the model every 90 days. But thats still not real time. For most applications, the ability to refresh the model every quarter is adequate, says Keithly. However, he sees areas in which real-time models would be useful, such as fraud, where assumptions must be changed in response to changing perpetrator tactics. Keithly expects to see real-time modeling in the next decade. It will be worth it as long as it doesnt take a massive investment to make it work, he says. But a massive investment is often required for organizations to provide real-time access to data. I4 is relatively small and built its IT systems from the ground up in 2001 using state-of-the-art technology, including Solaris servers and Oracle databases. For large companies with older equipment and databases, thats more of a challenge. If data is divergent across multiple sources and you need to bring a data warehouse together, thats considerably more money, says Christopher Scheib, manager of decision support at Highmark. Peter Heijt, vice president of marketing and sales at Fortis Banque SA/NV in Utrecht, Netherlands, wants to provide real-time access to data for predictive analytics applications that will improve the success rate of sales campaigns. The investment is more or less double the cost of the data structure we have now in data warehouse, data mart and CRM. So the payoff has to be big. Were looking for a 40% increase in sales effectiveness, he says. Heijt is experimenting with a small part of his CRM database to see if the investment is justified. Scheib says he needs access to outside data in real time to facilitate
decisions on how to price policies. Prescription information we can get in very close to real time, and we can use that to make predictions about health risks, he says. Thats useful for actuaries who are trying to price clients in as near to real time as they can get. While predictive analytics tools have gotten easier to use, successful enterprise implementations still require collaboration among business analysts, statistics experts and database administrators, say users. Data preparation can be 60% of the effort, says Lou Agosta, an independent technology analyst in Chicago. But the biggest challenge may be in learning how to take full advantage of the opportunities that predictive analytics can provide. Developing the right responses is what takes the most time, says Harmon. Having better predictive models has allowed everyone to re-evaluate their strategies. Thats where the intellectual capital is spent, he says.
Farness wants to analyze the newspapers market of 1.6 million people and target segments with promotions that would have an improved likelihood of success. For example, she says, if we know a certain segment uses our news online during the week but wants printed products on weekends, we know what to offer them. In addition to using its own subscriber database, the Times purchased demographic data from two sources to apply to its survey subjects. To address the complexity of integrating that data and building a successful model, in August 2004, Farness turned to Apollo Data Technologies LLC. They asked us to look at five or
six demographic questions . . . and predict whos likely to subscribe and what category they fall into, says Jeff Kaplan, principal of data mining technology at Chicago-based Apollo. To do the work, Apollo used a beta version of SQL Server 2005, which includes new data mining features, and developed a model that uses neural network algorithms to predict outcomes. Using the embedded algorithms available in SQL Server was a good fit for the Times, according to Howard Mendel, director of systems development. The use of SQL Server will enable easy integration with other SQL Server databases and .Net applications that are already in use, he says.
While Apollo did the initial data integration and model development, the newspapers IT organization will run the algorithms against the full 1.6 million prospect list using its own SQL Server database. The data will be processed locally and then uploaded into a marketing database hosted by Astech Intermedia Inc. in Denver. From there, targeted marketing campaigns will be launched. We build repeatable processes and code on our back end. Our model is to build these predictive models and turn them over to the business user to play with, says Kaplan. He says The Seattle Times IT group played a great skeptic role as the initial test on 60,000 subjects rolled out. Mendel says the challenge is to come up to speed quickly on SQL Server 2005. We have paired together experts in both database administration and application development, he says. His staff is creating scripts to automate the movement and processing of data, which will occur weekly. Once started, we expect the entire process of data transfers and execution of the model to be fully automated, Mendel says.
https://store.computerworld.com
Computerworld has Executive Briefings and Bulletins on many subjects including Outsourcing, Wireless, Storage, ROI and Security.