BA - Topic1 - Introduction To Business Analytics PDF
BA - Topic1 - Introduction To Business Analytics PDF
BA - Topic1 - Introduction To Business Analytics PDF
Business analytics
• is a set of statistical and operations research techniques, artificial
intelligence, information technology and management strategies
used for framing a business problem, collecting data, and
analyzing the data to create value to organizations
• 3 components of Business Analytics:
– Business Context
– Technology
– Data Science
• Business Context: the ability to ask the right questions
• Technology: automation of actionable items derived from
analytical models; automation of actionable items is usually
achieved using IT
• Data Science: identify the most appropriate statistical
model/machine learning algorithm that can be used
bigbasket.com
• 30% of the items they intend to buy.
• Fernandes et al. (2013) reported that on average, customers
forget 30% of the items they intend to buy
• Forgetfulness can have significant cost impact for the online
grocery stores
– customers may buy the forgotten items from a nearby store
where they live
– customer may place another order for forgotten items
• The ability to predict the items that a customer may have
forgotten to order can have a significant impact on the profits of
online grocers such as bigbasket.com
• did you forget feature is by the Indian online grocery store
bigbasket.com
Akshaya Patra Foundation
• Vasanthapura kitchen in Bangalore, approximately 84000
school children from 650 schools in South Bangalore were
provided mid-day meals
• The Vasanthapura kitchen used 35 vehicles to distribute the
cooked food. To minimize the cost of distribution, they need to
solve a complex vehicle routing problem (VRP).
• To simplify this problem, assume that they divide the number
of schools equally among the vehicles; each vehicle would
then have to deliver food to approximately 20 schools (few
vehicles are kept as standby). For each vehicle, we need to
find the best route.
Changing Business Environments – 1/2
• Traditional applications
– Payroll
– Bookkeeping
• Current applications
– Complex managerial areas, such as design & automated
factories
– Evolution of proposed mergers and acquisitions
– Nearly all executives use IT as it is vital to their business
• Analytics are used to develop reports on
– What is happening?
– Predict what is likely to happen?
– Make decisions to make best use of the situation
Changing Business Environments – 2/2
• Transition from processing and monitoring activities to problem
analysis and solution
• Transition from standalone applications to cloud-based
technologies and mobile devises
• Analytics and BI tools to Modern Management
– Data warehousing
– Data mining
– Online Analytical Processing (OLAP)
– Dashboards
• High speed network systems
– Wirelines/Wireless
• Automation of routine decisions – eliminate need for managerial
interventions
Factors Facilitated Growth BI & Analytics – 1/2
• Increased hardware, software, and network capabilities
• Group communication and collaboration
– Helps Supply chain to react marketplace changes faster
• Improved data management
– Complex computation using multiple databases, multiple media
– Systems can search, store, transmit data quickly from distinct
locations, economically, securely & transparently
• Managing giant data warehouses and Big Data
– Cost of big data storage and mining are declining rapidly
– Special methods for organize, search and mine
▪ Parallel computing, Hadoop/Spark
Hadoop and Spark
• Hadoop and Spark, both developed by the Apache Software Foundation, are
widely used open-source frameworks for big data architectures
• The Hadoop ecosystem
– Enables big data analytics processing tasks to be split into smaller tasks.
– The small tasks are performed in parallel by using an algorithm (e.g., MapReduce),
– Then distributed across a Hadoop cluster (i.e., nodes that perform parallel
computations on big data sets)
– It is a highly scalable, cost-effective solution that stores and processes structured,
semi-structured and unstructured data
• Hadoop is ideal for batch processing and linear data processing. Spark is ideal
for real-time processing and processing live unstructured data streams
Factors Facilitated Growth BI & Analytics – 1/2
• Analytical support
– Perform complex simulations, check many possible
scenarios
– Asses diverse impacts quickly & economically
• Overcoming cognitive limits in processing and storing information
– Quickly accessing and processing vast amounts of stored
information
• Knowledge management
– Text Analytics and IBM Watson derive value from
unstructured communication between stakeholders
• Anywhere, anytime support
– Perhaps the biggest change
Evolution of Computerized Decision
Support to Analytics/Data Science
Decision Support Systems (DSS)
• Interactive computer-based systems, which help decision makes
utilize data and models to solve unstructured problems (Gorry &
Scott-Moeton 1971)
• Decision support systems couple the intellectual resources of
individuals with the capabilities of the computer to improve the
quality of decisions. It is a computer-based support system for
management decision makers who deal with semi-structured
problems (Keen & Scott-Moeton 1978)
Enterprise Resource Planning (ERP)
• Integrated enterprise-level information systems
• Sequential and non-standardised data representation schemas
are replaced with relational database systems (RDBM)
• Improve capture, storage of the data and relationship between
data fields – thus reducing replication significantly
• Improve data integrity and consistently and effectiveness of
business practices
• Data from different functions is connected and integrated into
consistent schema – single version available organization wide
• Decision makers could decide when they needed to or wanted
to create specialized reports to investigate organizational
problems and opportunities.
Executive Information Systems (EIS)
• Need for more versatile reporting led to development of EIS
• Graphical dashboards and scorecards to keep track of KIPs
• Middle data tier (Data Warehouse – DW) is created to maintain
transactional integrity of business information system intact
• Dashboards and scorecards got data from DW. This helped in
keeping the efficiency of ERP systems intact.
• DW driven DSSs began to be called BI systems
Data Warehousing (DW)
• Data in DW is updated periodically, hence does not reflect the
latest information
• Real-time data warehousing (right-time data warehousing)
overcome information latency problem, by need based data
refreshing policy
• DWs are very large and feature rich
– Data mining & Text mining is required to discover new and
useful knowledge to improve business processes &
practices
• More storage & more processing power is needed to handle
increasing volumes and variety of data
– Service oriented architecture
– Software & Infrastructure as-a-service
Big Data
• In 2010s new data generation mediums emerged due to
widespread use of Internet
– Radio-frequency identification (RFID) tags, digital meters,
clickstream weblogs, smart home devices, wearable health
monitoring, social media
• Analysis of such unstructured data rich in information content
poses significant challenges – software and hardware
• Storage: Store data in chunks on different machines connected
by a network, both logically and physically. It is originally used
by Google (Google File System) and later released as Apache
project as Hadoop Distributed File System (HDFD)
• Processing: Push computation to data, known as MapReduce
program and later released as Apache project as Hadoop
MapReduce.
Business Intelligence (BI)
• BI is an umbrella term that combines architectures, tools, data
bases, analytical tools, applications and methodologies
• BIs major objective
– Enable interactive access to data (often in real time)
– Enable manipulation of data
– Ability conduct appropriate analyses
– Enable them to make more informed & better decisions
• BI process – transformation of data to information then to
decisions and finally to actions
• By 2005, BI systems started including artificial intelligence
capabilities as well as powerful analytical capabilities
• Managers need the right information at the right time and in the
right place
Evolution of Business Intelligence (BI)
Four Components of BI Architecture
1 2 3
4
Transaction Processing Vs Analytic
Processing
• Online Transaction Processing (O L T P)
– Constantly involved in handling updates to operational
databases (ERP)
– Handle a company’s routine ongoing business
– Inefficient fro end-user ad hoc reports, queries, analysis
• Online Analytical Processing (O L A P)
– DWs contain data from OLTP in a reorganized and
structured way that is fast, efficient for querying, analysis
and decision support
Appropriate Planning and Alignment with
the Business Strategy
Prediction
Association
Segmentation
• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities
Text Mining Concepts
• Vast majority of business data is stored in text documents that
are unstructured
• 85% of all corporate data is in some kind of unstructured form
(e.g., text)
• Unstructured corporate data is doubling in size every 18
months
• Tapping into these data & information sources will have the
necessary knowledge to make better decisions, leading to a
competitive advantage over those businesses lag behind
• Goal of both text analytics & text mining is to turn unstructured
textual data into actionable information through application of
Natural Language Processing (NLP)
Text Analytics and Text Mining
Text Analytics vs Text Mining
• Text Analytics is broader concept, which includes:
– Information retrieval based on set of key terms
– Information extraction,
– data mining and
– web mining
• Text Mining is primarily focused on discovering new and useful
knowledge from the textual data sources
– Information retrieval
– Text mining
• Text Analytics is more commonly used in business application
context and Text Mining is frequently used in academic
research circles.
• Text Analytics and Text Mining are used synonymously
Text Mining
• Is the semi-automated process of extracting patterns (useful
information & knowledge) from large amounts of unstructured data
sources, such as word, pdf, text, XML etc.
• Text mining process
– Impose structure on the text-based data sources
– Extracting relevant information & knowledge using data mining
tools and techniques
• Text mining is useful in areas such as
– Law (court orders), Academic research, Finance (quarterly
reports), Medicine (discharge summaries), biology (molecular
interactions), Technology (patents), Marketing (customer
comments)
– Electronic communication
▪ Spam filtering, prioritization and categorization
▪ Automatic response generation
Text Mining – 1/2
1. Text pre-processing: transforms a raw text file into clearly-
explained sequence of linguistically-meaningful units
– Text Clean-up: removing advertisements from web pages to
cutting out tables and figures, etc.
– Tokenization: segmentation of sentences into words by
erasing spaces, commas etc.
– Filtering: extricates irrelevant content-information including
articles, conjunctions, prepositions, etc. Even the words of
frequent repetitions are also removed.
– Stemming: transforming words to its stem. For example, the
word “go” is the stem goes, going and gone.
– Lemmatization: reorganizes the word to correct root
linguistically
– Linguistic processing: Involving Part-of-speech tagging (POS),
Word Sense Disambiguation (WSD) and Semantic structure
Text Mining – 1/2
2. Text transformation: choosing the subset of significant features
that are used in creating a model. It diminishes the
dimensionality through excluding redundant and unnecessary
features
3. Text Mining Methods: such as classification, clustering,
summarization, and many more are used
Text Mining Applications
• Information extraction: identification of key phrases &
relationships within text by looking for predefined objects and
sequences in text by way of pattern matching
• Topic tracking: Predict documents on interest based on user
profile & past history
• Summarization: a document to save time to reader
• Categorization: Identify main theme & place in predefined category
• Clustering: grouping similar documents without predefined
categories
• Concept linking: connects related documents by identifying shared
concepts (help find information not found using search methods)
• Question answering: Finding best answer through knowledge
driven pattern matching
Data Visualization vs Visual Analytics
• The use of visual representations to explore, make sense of,
and communicate data.”
• Data visualization presents information following the
aggregation, summarization, and contextualization of data
• Data visualization is aimed at answering (associated with BI)
– What happened?
– What is happening?
• Visual analytics is the combination of visualization and
predictive analytics. Visual analytics is aimed at answering
– Why is it happening?
– What is more likely to happen?
• Visual analytics is usually associated with business analytics
Visual Analytics by SAS Institute
Information Dashboards
• Dashboards provide visual displays of important information that
is consolidated and arranged on a single screen so that
information can be digested at a single glance and easily drilled
in and further explored
• The fundamental challenge of dashboard design is to display all
the required information on a single screen, clearly and without
distraction, in a manner that can be assimilated quickly
• Three layer of information
– Monitoring: to monitor key performance metrics
– Analysis: to find root cause of problems
– Management: Identify what actions to take to resolve problem
Performance Dashboards
What to look for in a dashboard
• Use of visual components to highlight data and exceptions that
require action
• Transparent to the user, meaning that they require minimal
training and are extremely easy to use
• Combine data from a variety of systems into a single,
summarized, unified view of the business
• Enable drill-down or drill-through to underlying data sources or
reports
• Present a dynamic, real-world view with timely data
• Require little coding to implement, deploy, and maintain
Web Mining Overview
• Customers are expecting companies to offer their
products/services over the internet
• Customers are using internet for
– Buy products/services
– Talking about companies
– Sharing transactional/usage experience with others
• Delays in service, manufacturing, shipping, delivery and customer
inquires are no longer private incidents
• Successful companies are embracing internet for
– Betterment of business processes
– Better communicate with customers
– Understand their needs and wants
– Serve them thoroughly and expeditiously
Challenges for Knowledge Discovery @Web
• Because of its sheer size and complexity, mining the web is not an
easy undertaking by any means
• Search engines constantly search web and index web pages
under certain keywords.
• Simple keyword-based search engine suffers from deficiencies
– A topic of any breadth contains hundreds/thousands of pages
– Many documents that are highly relevant may not contain
exact key words defining them
• Web mining can identify authoritative web pages, classify web
documents and resolve many ambiguities and subtleties raised in
keyword-based web search engines
Web Mining vs Web Analytics
• Web mining is the process of discovering intrinsic relationships
(interesting & useful information) from web data, which are
expressed in the form of textual, linkage, or usage information
• Web mining is inclusive of all the data generated via internet
including transaction, social, and usage data.
• Web mining aims to discover previously unknown patters and
relationships (using novel predictive or prescriptive analytics)
• Web mining relies heavily on data mining and text mining and
their enabling tools and techniques
• Web analytics primarily web side usage data focused
• Web analytics aims to describe what has happened on the web
site (metric-driven descriptive analytics)
Types of Web Mining
Data Text
Mining Mining
Web Mining
Page Rank Information Retrieval Graph Mining Social Analytics Clickstream Analysis
Search Engine Optimization Social Network Analysis Social Media Analytics Weblog Analysis
Marketing Attribution Customer Analytics 360 Customer View Voice of the Customer
Web Content/Structure Mining
• Mining the textual content on the Web
• Data collection via Web crawlers
• Web pages include hyperlinks
– Authoritative pages
– Hubs
– Hyperlink-induced topic search (HITS) alg.
Web Usage Mining (Web Analytics)
• Extraction of information from data generated through Web page
visits and transactions. Clickstream data
– data stored in server access logs, referrer logs, agent logs,
and client-side cookies
• Web analytics holds promise of revolutionizing how business is
done on the web
– Tool for e-business market research to improve e-commerce
• Two categories of web analytics
– Off-site: measurement takes place outside your website
– Onsite: on-site visitor measurement in commercial context
• Website data is compared against KPI and improve
marketing campaign’s audience response
Data Collection – Web Analytics
• Traditional method: server log files
– Web server records file requests made by browsers
• Page tagging: mouse clicks are captured by JavaScript
embedded in site pages and data sent to third-party analytics
dedicated server
• Other Sources:
– Email, direct mail campaign data,
– sales and lead history
– social media originated data
Web Usage Mining Applications
• Determine the lifetime value of clients
• Design cross-marketing strategies across products.
• Evaluate promotional campaigns
• Target electronic ads and coupons at user groups based on user
access patterns
• Predict user behavior based on previously learned rules and
users' profiles
• Present dynamic information to users based on their interests and
profiles
Web Usage Mining (Clickstream Analysis)