Module 3

MODULE 3
BUSINESS DATA
ANALYSIS
INTRODUCTION
• Business analytics begins with a data set or commonly with a database.
• As databases grow, they need to be stored somewhere like computer and
data warehousing.
• Storage areas become so large and new term is used t o describe them.
• Big data describes the collection of data set that are so large and
complex that software systems are hardly able to process them.
• Analytics is the use of data, information technology,statistical analysis,
quantitative methods, and mathematical or computer based models to
help managers to gain insight into their business operations and make
better decisions.
FEATURES OF NEW GENERATION
COMPUTERS
Speed of
operation Accuracy Storage
Automatic Diligence
Versatality operation
Complexity Reliability
FEATURES continuation
• Speed –faster, perform complex calculations very fast, powerful computer
capable of performing about 3-4 million simple instructions per second, speed is
expressed in terms of microseconds.
• Accuracy- very accurate in processing data, if correct data is entered with a
correct program, the result will also be correct, GIGO- Garbage in Garbage Out.
• Storage- have the capacity to store large amount of data and instructions and
can retrieve it when required, large memory capacity, memory of CPU is limited,
secondary storage devices can be used. Storage measured in terms of bytes,
kilobytes, megabytes, gigabytes and terabytes.
• Versatility – versatile machines and can perform any task as long as it can be
broken down into a series of logical steps, used for a wide variety of purpose at
the same time, digital computers are used for scientific as well as business
purposes.
FEATURES CONTINUATION
• Automatic operation- computer can carry out a particular task on the
basis of the instructions given to it until a stop instruction is executed.
• Diligence – computers are highly consistent, free from tiredness,
boredom, and resulting lack of concentration, can work for hours without
creating errors.
• Complexity – even a complex mathematical model can be analysed easily.
• Reliability –reliable , reliability is measured in terms of the performance
with a predetermined standard of operations, if a program is executed
any number of times with the same set of data, everytime the result
would be the same.
CONCEPT OF DATA ANALYSIS
• World is becoming more and more data driven.
• Data analysis is the process of evaluating data using analytical or statistical
tools to discover useful information
• Used by all types of businesses.
• Data is collected and sorted, the results are interpreted to make decisions.
• End results are delivered as a summary. Or as a visual like a chart or graph.
• Data analysis is a process of collecting, transforming, cleansing and modeling
data with the goal of discovering required information
• Used in different business, science, and social science domains.
• In today's business world, data analysis plays a role in making decisions more
scientific and helping businesses operate more effectively
DATA
• Data is used to describe things by assigning a value to them.
• These values are then organized, processed and presented within a given
context so that it becomes useful.
• Webster's new Collegiate dictionary 1973 defines data as “factual
information( as measurements or statistics) used as a basis for reasoning,
discussion or calculation.”
• Data are numbers, characters, images or other method of recording, in a
form which can be assessed to make a determination or decision about a
specific action.
DIFFERENT FORMS OF DATA
Qualitative data Quantitative data
Categorical Continuous
data data
DIFFERENT FORMS OF DATA-
QUALITATIVE AND QUANTITATIVE
• Qualitative data
 Data that uses words and descriptions.
 Qualitative data can be observed but is subject and therefore difficult to use for the
purpose of making comparison.
 Descriptions of texture, taste, or an experience are examples of quality data.
 These types of data are collected through focus groups, interview, open ended
questionnaire items and other structured situations.
• Quantitative data
 Data that is expressed with numbers. data which can be put into categories, measured,
or ranked.
 Length, weight, rating scales are examples of quantitative data.
 They can be represented visually in graphs and tables and can be statistically analysed
TYPES OF QUANTITATIVE DATA
• Two types- categorical and continuous
• Categorical data
Data that has been placed into groups
An item cannot belong to more than one group at a time
Categorical variables represent types of data which may be divided into
groups.eg: race,sex, etc.
Analysis of categorical data involves the use of data tables
• Continuous data
Numerical data measured on a continuous range or scale
All values are possible with no gaps in between. eg:weight, height etc
BUSINESS DATA ANALYSIS
• Data driven businesses make decisions based on data, which means they can
be more confident that their actions will bring success since there is data to
support them.
• Business analytics is the process of collecting sorting, processing and
studying business data, and using statistical models and iterative
methodologies to transform data into business insights.
• The goal of business analytics is to determine which data sets are useful and
how they can be leveraged to solve problems and increase efficiency,
productivity and revenue.
• Business analytics is a subset of business intelligence-business intelligence is
typically descriptive, focusing on the Strategies and tools utilised to acquire,
identify and categorise raw data and report on past or current events.
• Sophisticated data, quantitative analysis, and mathematical models are all
employed by business analysts to find solutions for data driven issues.
• Data Analytics is the science of analysing raw data in order to make
conclusion about that information
• In simple words data analysis is the process of collecting and organising
data in order to draw helpful conclusions from it.
• The main purpose of data analysis is to find meaning in data so that derived
knowledge can be used to make informed decisions
• Data Analytics is used in business to help organisation to make better
decisions, market research, product research customer reviews etc .
• Analysing data will provide insights that organisation need in order to make
the right decisions.
IMPORTANCE OF BUSINESS
ANALYTICS
• A methodology for commercial decision making
• Operational efficiency
• Competitive advantage
• Valuable information
TYPES OF DATA ANALYSIS
Four types
Descriptive Diagnostic
analysis analysis
Predictive Prescriptive
analysis analysis
DESCRIPTIVE ANALYSIS
• Descriptive data analysis looks at past data and tells what happened.
• Summarizes a business’s existing data to get a picture of what has happened in
the past or is happening currently.
• Simplest form of analytics
• Applies descriptive statistics to existing data and make it more accessible to
members of an organization.
• Often used when tracking key Performance Indicators(KPIs) like revenue,sales
leads, and more.
• Can help identify strengths and weaknesses and provide insight into customer
behavior
• Most common physical product of descriptive analysis is a report with visual
statistical aids.
DIAGNOSTIC ANALYSIS
• Aims to determine why something happened.
• Can be done to find out the reason why something negative or positive
happened.
• It shifts from the “what” of past and current events to “how” and “why”,
focusing on past performance to determine which factors influence
trends.
• Employs techniques such as drill-down, data discovery, data mining, and
correlations to uncover the root causes of events.
PREDICTIVE ANALYSIS
• It predicts what is likely to happen in the future.
• Trends are derived from the past data which are then used to form predictions about the future.
• Can also be applied to more complicated issues such as risk assessment, sales forecasting etc.
• It forecasts the possibility of future events using statistical models and machine learning techniques.
• Machine learning experts and data scientists are typically employed to run predictive analysis using
learning algorithms and statistical models, enabling a higher level of predictive accuracy.
• A common application of predictive analysis is sentiment analysis.
• Existing text data can be collected from social media to provide a comprehensive picture of opinions
held by a user.
• This data can be analysed to predict their sentiment towards a new subject.
• A common physical product of predictive analysis is a detailed report used to support complex
forecasts in sales and marketing.
PRESCRIPTIVE ANALYSIS
• Combine the information found from the above mentioned types and forms a
plan of action for the organization to face the issue or decision.
• This is where data driven choices are made.
• It goes a step beyond predictive analysis, providing recommendations for next
best actions and allowing potential manipulation of events to drive better
outcomes.
• It recommends specific actions to deliver the most desired result.
• It relies on strong feedback system and constant iterative analysis and testing to
continually learn more about the relationships between different actions and
outcomes.
• A common physical product of prescriptive analysis is a focused recommendation
for next best actions, which can be applied to clearly identified business goals.
PHASES OF DATA ANALYSIS
DATA REQUIREMENTS
SPECIFICATION
First step to determine the data requirements or how the data is grouped.
The data required for analysis is based on a question or an experiment.
Based on the requirements of those directing the analysis, the data
necessary as inputs to the analysis is identified (e.g., Population of
people).
 Specific variables regarding a population (e.g., Age and Income) may be
specified and obtained.
Data may be numerical or categorical.
DATA COLLECTION
• Data Collection is the process of gathering information on targeted variables

identified as data requirements.
• The emphasis is on ensuring accurate and honest collection of data.
• Data Collection ensures that data gathered is accurate such that the related
decisions are valid.
• Data Collection provides both a baseline to measure and a target to improve.
• Data is collected from various sources such as computers, online sources, etc.
• The data thus obtained, may not be structured and may contain irrelevant
information.
• Hence, the collected data is required to be subjected to Data Processing and
Data Cleaning.
DATA PROCESSING
• The data that is collected must be processed or organized for analysis.

• Organizing can be done on a spreadsheet or other form of software that
can take statistical data.
• This includes structuring the data as required for the relevant Analysis
Tools. For example, the data might have to be placed into rows and
columns in a table within a Spreadsheet or Statistical Application.
• A Data Model might have to be created.
DATA CLEANING
• The processed and organized data may be incomplete, contain

duplicates, or contain errors.
• Data Cleaning is the process of preventing and correcting these errors.
• There are several types of Data Cleaning that depend on the type of data.
For example, while cleaning the financial data, certain totals might be
compared against reliable published numbers or defined thresholds.
• Likewise, quantitative data methods can be used for outlier detection that
would be subsequently excluded in analysis.
DATA ANALYSIS
• Data that is processed, organized and cleaned would be ready for the
analysis.
• Various data analysis techniques are available to understand, interpret, and
derive conclusions based on the requirements.
• Data Visualization may also be used to examine the data in graphical format,
to obtain additional insight regarding the messages within the data.
• Statistical Data Models such as Correlation, Regression Analysis can be used
to identify the relations among the data variables.
• These models that are descriptive of the data are helpful in simplifying
analysis and communicate results.
• The process might require additional Data Cleaning or additional Data
Collection, and hence these activities are iterative in nature.
COMMUNICATION
• The results of the data analysis are to be reported in a format as required

by the users to support their decisions and further action.
• The feedback from the users might result in additional analysis.
• The data analysts can choose data visualization techniques, such as
tables and charts, which help in communicating the message clearly and
efficiently to the users.
• The analysis tools provide facility to highlight the required information
with color codes and formatting in tables and charts.
COMPONENTS OF BUSINESS
ANALYTICS
• Data aggregation-- Before data can be analyzed, it must be collected, centralized, and
cleaned to avoid duplication, and filtered to remove inaccurate, incomplete, and
unusable data.
• Data Mining- In the search to reveal and identify previously unrecognized trends and
patterns, models can be created by mining through vast amounts of data. Data mining
employs several statistical techniques like Classification, Regression, Clustering.
• Association and Sequence Identification- In many cases, consumers perform similar
actions at the same time or perform predictable actions sequentially. This data can
reveal patterns such as:
 Association: For example, two different items frequently being purchased in the same
transaction, such as multiple books in a series or a toothbrush and toothpaste.
 Sequencing: For example, a consumer requesting a credit report followed by asking for
a loan or booking an airline ticket, followed by booking a hotel room or reserving a car.
COMPONENTS
• Text Mining- Companies can also collect textual information from social
media sites, blog comments, and call center scripts to extract meaningful
relationship indicators. This data can be used to develop demand new
products, improve customer service and review competitor performance.
• Forecasting- A forecast of future events or behaviors based on historical
data can be created by analyzing processes that occur during a specific
period or season. For example: people demand for more gas for a city
with a static population in any given month or quarter.
• Predictive Analytics- Companies can create, deploy, and manage
predictive scoring models, proactively addressing events such as
Customer behavior narrowed down to customer age bracket, income
level, lifetime of existing account, and availability of promotions
COMPONENTS
• Optimization- Companies can identify best-case scenarios and next best
actions by developing and engaging simulation techniques. For example,
Inventory stocking and shipping options that optimize delivery schedules
and customer satisfaction without sacrificing warehouse space.
• Data Visualization- Information and insights drawn from data can be
presented with highly interactive graphics to show Exploratory data
analysis, Modeling output and Statistical predictions. These data
visualization components allow organizations to use their data to inform
and drive new goals for the business, increase revenues, and improve
consumer relations.
CHALLENGES FACED BY
BUSINESS ANALYTICS
• Success of business analytics often depends on whether or not all parties of an
organization fully support adoption and execution. There are certain challenges:
 Executive Distrust
• It is not practical to get the consent of everyone in upper management to implement
business analytics.
• Although most organizations have adopted some form of BI, analytics is still an area
viewed with distrust by many top-level executives. Therefore trust must be built to
effectively leverage data analysis.
 Poor Collaboration
• Lack of teamwork among some departments can slow the evaluation and
implementation of analytics-driven initiatives. Business and IT personnel must come
together for an analytics strategy to succeed. Poor collaboration creates the risk that
analytics won’t provide the information needed, leading to further distrust and potential
abandonment of the new system.
CHALLENGES
 Lack of Commitment
• Many analytics software packages provide solutions that are easy to implement.
However, cost of implementation is high and return on investments is often not
immediate.
• Although analytical models develop over time and predictions will improve, dedication
is required during the initial days of an analytics initiative to become success. During
this period, it is seen that executives losing trust in the solution and refusing to believe
the models, eventually abandoning the whole concept.
 Slow Information Maturity
• BA implementations often fail due to lack or low quality of available data. A proper
assessment should always be performed on the company’s information architecture and
data sources based on analytical requirements.
• The quality of data and sources should be evaluated. The time required to acquire,
clean, and analyze new data must be built into the adjustment period.
BUSINESS ANALYTICS TOOLS
• Business analytics tools include many methodologies and open source solutions that can be
used to help analysts perform tasks and generate reports that are easy for people to
understand.
• Business analytics tools are types of application software that retrieve data from one or
more business systems and combine it in a repository, such as a data warehouse, to be
reviewed and analysed.
• Most organisations use more than one Analytics tool, including spreadsheets with statistical
functions, statistical software packages, sophisticated data mining tools, and predictive
modelling tools.
• Together, this business analytics tools give the organisation a complete overview of the
company to provide insights and understanding of the business so smarter decisions may
be made regarding business operations, and customer conversions.
• Tools of business analytics help ensure that organisations can identify, document, verify,
and meet the needs and expectations of employees and existing customers. Below are
some of the popular Data Analytics tools, both open source and paid version.
TOOLS
• R programming-
 Leading Analytics tool in the industry and widely used for statistics and data modelling.
 It can be used to manipulate data and present in different ways.
 R compiles and runs on a wide variety of platforms viz- Unix, Windows and Mac OS.
 It has 11,556 packages and allows to browse the packages by categories.
 It provide tools to automatically install all packages as per user requirement
• Tableau public-
 A software that connects any data source be it corporate data warehouse, Microsoft
Excel or web based data and create data visualisations, maps, dashboards etc. with
real-time updates presenting on web.
 They can also be shared through social media or with the client.
 It allows the access to download the file in different formats,
TOOLS
• Python
Python is an object oriented scripting language which is easy to
read,write , maintain and is a free open source tool.
It was developed by Guiudo van Rossum in late 1980’s which supports
both functional and structured programming methods.
 Python is easy to learn as it is very similar to JavaScript.
 Another important feature of python is that it can be assembled on any
platform like SQL server, a mongo DB database or JSON.
Python can also handle text data very well
TOOLS
• SAS
 SAS ("Statistical Analysis System") is a statistical software suite developed by
SAS Institute for data management, advanced analytics, multivariate analysis,
business intelligence, criminal investigation, and predictive analytics.
 SAS is a programming environment and language for data manipulation and a
leader in Analytics, developed by the SAS institute in 1966 and further
developed in 1980’s and 1990’s.
 SAS is easily accessible, manageable and can analyse data from any sources.
 SAS introduced a large set of products in 2011 for customer intelligence and
numerous SAS nodules for Web, social media and marketing Analytics that is
widely used for profiling customers and prospects.
 It can also predict their behaviours, manage and optimise communications.
TOOLS
• Excel
A basic, popular and widely used analytical tool almost in all industries.
Excel is important when there is a requirement of analytics on the client’s
internal data.
It analyzes the complex task that summarizes the data with a preview of
pivot tables that help in filtering the data as per client requirement.
Excel has the advance business analytics option which helps in modelling
capabilities which have options like automatic relationship detection.
TOOLS
• RapidMiner
 A powerful integrated data science platform developed by the same company
that performs predictive analysis and other advanced analytics like data
mining,text analytics, and machine learning without any programming.
 RapidMiner can incorporate with any data source types, including Access,
Excel,Microsoft SQL,etc.
• KNIME
 Developed in January 2004 by a team of software engineers at University of
Konstanz,
 Leading open source, reporting, and integrated analytics tools that allow to
analyse and model the data through visual programming, it integrates various
components for data mining and machine learning
TOOLS
• BIRT- A basic open source analytics tool for reports, dashboards, and
visualizations, but requires working knowledge of Java, scripts, and formatting
• Zeppelin by Apache- A completely open web-based notebook that enables
interactive data analytics. This is a multipurpose notebook for data
visualization,analytics, and data discovery and readily takes data from related
Apache based technologies including Spark,Hadoop Hive and Python.
• OmniSci- OmniSci is an American-based software company, that uses graphics
processing units and central processing units to query and visualize big data.
It is a high performance analytics tool ideal for business analytics and data
scientists, with big data capable of querying, interactive dashboards and an
SQL engine designed to manage extremely large workloads.
TRENDS IN BUSINESS ANALYTICS
BIG DATA
MISRO
AI
SEGMENTATION
TRENDS
DEEP
IoT
LEARNING
NEURAL
NETWORK
TRENDS IN BUSINESS ANALYTICS
• Big Data-With an increasing emphasis on digitization in every aspect of life,

datasets continue to expand at an extraordinary rate. This expansion is both an
advantage and a disadvantage—more data means more potential insights, but
the total volume can be challenging.
• Artificial Intelligence- AI become smarter and able to teach themselves. AI are
being developed and launched in industry verticals such as banking, financial
services, insurance, retail, hospitality, engineering, manufacturing, and more
• Deep Learning - The next step up from machine learning, deep learning
leverages the advantages of vast computing power to manage enormous data
sets, identifying patterns and delivering predictive results that were formerly
impossible.
TRENDS
• Neural Networks- Data scientists can now create “brains” which have the
computing power of thousands of human minds. Data can be processed
and sorted, patterns can be identified along a historical timeline, and
future predictions are delivered with an unprecedented level of accuracy.
• The Internet of Things- IoT-driven devices number in the millions,
delivering real-time data to organizations worldwide and allowing
intimate entry into the lives of consumers around the globe
• Micro-Segmentation- As data becomes bigger, the ability to separate it
into smaller and smaller slices enables organizations to accurately define
their “ideal” customer and create funnels that lead them directly to the
desired action. This granular segmentation is the driving force behind
successful digital transformation initiatives.
ADVANTAGES OF BUSINESS
ANALYTICS
• Increase efficiency
- efficiency of businesses has been improving since the advent of BA.
- Companies can formulate decisions to help achieve specified goals
- BA encourages a company culture of efficiency and teamwork
• Insights through data visualization
- comprehensive charts and graphs can be used to make sure that decision
making is more interesting.
- Relevant and useful insights can be extracted in a much clearer way.
• Keep updated
- modern consumers change their mind easily and go for better offers.
- Analytics give insight about how the target market thinks and acts.
ADVANTAGES
• Better decision making
- help to avoid guesswork from planning market campaigns, help the
company to make informed business decisions.
• More effective marketing- data analytics gives useful insights and can use
this information to adjust the targeting criteria or use it to develop different
messaging and creative for different segments.
• Better customer service- give more insights about customers’
communication preferences, their interests, their concerns and more.
• More efficient operations- help companies streamline their processes,and
save money
• Plan for the future- predictive analytics help businesses to plan their future .
DISADVANTAGES OF BUSINESS ANALYTICS
• Lack of alignment, availability and trust- In most organizations, the analysts are
organized according to the business domains. Unfortunately, the analysis is shared with
the top executives and thus the results are not easily communicated to the business
users for whom they provide the greatest value.
• Lack of Commitment- implementation of business analytics can be very costly and the
ROI is not immediate. By nature, these analytics models are prepared to improve
accuracy over time but it is a complex model that requires dedication to implement the
solution. Because the business users do not see the promised results immediately, they
lose interest which results in loss of trust as a result of which the models fail.
• Low quality of underlying transactional data-
• Implementation of the solutions provided by the business analysts fail because either
the data is not available, the data sources are too complex or they are poorly
constructed.
BUSINESS DATA ANALYST
• A data analyst collects, processes and performs statistical analysis on large dataset.
• They discover how data can be used to answer questions and to solve problems.
• With the development of computers and an ever increasing move toward
technological development, data analysis has evolved.
• In smaller organisations Business Data Analytics may be used interchangeably to
describe roles that involve data analysis. Larger organisations, however, often
employee both data analyst and business analyst, making the difference between
the two careers important to understand.
• While data analysts and business analysts both work with data, the main difference
lies in what they do with it. Business analyst use data to help organisation make
more effective business decisions, while Data Analytics are more interested in
gathering and analysing data for the business development and used to make
decisions on their own.
• Most jobs in data analytics involved gathering and cleaning data to
uncover Trends and business insights and it varies depending on the
industry or company.
• Most data analyst work with IT teams, management and data scientists
to determine organisational goals. in most cases, they pinpoint trends,
correlation and patterns in complex data sets and identify new
opportunities for process improvement.
RESPONSIBILITIES OF DATA
ANALYST
• Producing Reports
- data analysts spend a significant amount of time producing and
maintaining both internal and client-facing reports.
- These reports give management insights about new trends on the horizon
as well as areas the company may need to improve upon.
- Writing up a report is not simple. Successful data analysts understand
how to create narratives with data.
- To remain valuable, the reports, answers and insights that data analysis
provides have to be understood by the next decision-maker, who is not an
analyst but a manager.
• Spotting Patterns
- The most effective data analysts can use data to tell a story.
- In order to produce a meaningful report, a data analyst first has to be
able to see important patterns in the data.
- At the base level, data is used to find trends and insights that we can use
to make recommendations to clients.
- Reporting in regular increments, such as weekly, monthly or quarterly, is
important since it helps an analyst notice significant patterns.
- They all contribute to see trends over time.
• Collaborating with others
- The wide variety of data analyst roles and responsibilities means that one
has to collaborate across many other departments in the organization
including marketers, executives and salespeople.
- They are also likely collaborate closely with those who work in data
science like data architects and database developers.
- Hence it is vital to be able to communicate well as a data analyst.
- His success is dependent on his ability to work with people.
• Collecting data and setting infrastructure
- The most technical aspect of an analyst’s job is collecting the data itself.
- This often means working together with web developers to optimize data
collection.
- Streamlining this data collection is key for data analysts.
- They work to develop routines that can be automated and easily modified
to reuse in other areas.
- Analysts keep a handful of specialized software and tools to help them
accomplish this.
SKILLS REQUIRED FOR DATA
ANALYSTS
• Programming Languages : data analysts should be proficient in one language
and have working knowledge of a few more. Data analysts use programming
languages such as R and SAS for data gathering, data cleaning, statistical
analysis, and data visualization.
• Creative and Analytical Thinking: Curiosity and creativity are key attributes of
a good data analyst. It’s important to have a strong grounding in statistical
methods, but even more critical to think through problems with a creative and
analytical lens. This will help the analyst to generate interesting research
questions that will enhance a company’s understanding of the matter at hand.
• Strong and Effective Communication: Data analysts must clearly convey their
findings — whether it’s to an audience of readers or a small team of
executives making business decisions. Strong communication is the key to
success.
SKILLS
• Data Visualization: Effective data visualization takes trial and error. A
successful data analyst understands what types of graphs to use, how to scale
visualizations, and know which charts to use depending on their audience.
• Data Warehousing: Some data analysts work on the back-end. They connect
databases from multiple sources to create a data warehouse and use querying
languages to find and manage data.
• SQL Databases: SQL databases are relational databases with structured data.
Data is stored in tables and a data analyst pulls information from different
tables to perform analysis.
• Database Querying Languages: The most common querying language data
analysts use is SQL and many variations of this language exist, including
PostreSQL, T-SQL, PL/SQL (Procedural Language/SQL).
SKILLS
• Data Mining and Cleaning : When data isn’t neatly stored in a database,
data analysts must use other tools to gather unstructured data. Once
they have enough data, they clean and process through programming.
• Advanced Microsoft Excel: Data analysts should have a good handle on
excel and understand advanced modeling and analytics techniques.
• Machine Learning: Data analysts with machine learning skills are
incredibly valuable, although machine learning is not expected skill of
typical data analyst jobs.
ORGANISATION AND SOURCE OF
DATA
• All decision makers needs three terms
Informatio
Data Knowledge
n
• These three terms are sometimes used interchangeably and may have
several definitions.
• Data- items about things, events, activities and transactions are recorded,
classified and stored but are not organized to convey any specific
meaning. It can be numeric, alphanumeric, figures, sounds and images.
• Information – data have been organized in a manner that gives them
meaning for the recipient. They confirm something the recipient knows.
Data items are processed so that the results are meaningful for an
intended action or decision.
• Knowledge- consists of data items and information organized and
processed to convey understanding, experience, accumulated learning,
and expertise that are applicable to a current problem or activity.
Knowledge can be the application of data and information in making a
better decision.
SOURCES OF DATA
Internal data External data
Internal data
• Stored in one or more places
• Data about people ,products, services and processes
• Private data that organistion owns, controls or collects.
• Exs –sales data, financial data
INTERNAL SOURCES OF DATA
• Accounting resources-- gives much information which can be used by
marketing researcher-internal factors
• Sales force report- information about the sale of a product- info provided
is outside the organisation
• Internal experts- people who are heading the organization- can give idea
about how a particular thing is working
• Miscellaneous reports-reports from operational reports.
EXTERNAL DATA
• There are many sources of external data- sources outside the
organisation
• Range from commercial databases to data collected by sensors and
satellites.
• Govt reports and files are a major source
• Also available by using GIS (geographical information systems)and other
demographic sources.
EXTERNAL DATA
Classified as- Govt Publications and Non Govt Publications
Govt pubications- Government sources provide an extremely rich pool of data
for the researchers. In addition, many of these data are available free of cost on
internet websites. There are number of government agencies generating data.
These are:
• Registrar General of India- It is an office which generate demographic data. It
includes details of gender, age, occupation etc.
• Central Statistical Organisation- This organization publishes the national
accounts statistics. It contains estimates of national income for several years,
growth rate, and rate of major economic activities. Annual survey of Industries
is also published by the CSO. It gives information about the total number of
workers employed, production units, material used and value added by the
manufacturer.
• Director General of Commercial Intelligence- This office operates from
Kolkata. It gives information about foreign trade i.e. import and export.
These figures are provided region-wise and country-wise.
• Ministry of Commerce and Industries - This ministry through the office of
economic advisor provides information on wholesale price index. These
indices may be related to a number of sectors like food, fuel, power, food
grains etc. It also generates All India Consumer Price Index numbers for
industrial workers, urban, non manual employees and cultural labourers.
• Reserve Bank of India- This provides information on Banking Savings and
investment. RBI also prepares currency and finance reports.
• Labour Bureau- It provides information on skilled, unskilled, white collared
jobs etc.
• National Sample Survey- This is done by the Ministry of Planning and it
provides social, economic, demographic, industrial and agricultural
statistics.
• Department of Economic Affairs- It conducts economic survey and it also
generates information on income, consumption, expenditure, investment,
savings and foreign trade.
• Non Government Publications- These includes publications of various industrial and
trade associations, such as
• The Indian Cotton Mill Association
• Various chambers of commerce
• The Bombay Stock Exchange (it publishes a directory containing financial accounts, key
profitability and other relevant matter)
• Various Associations of Press Media.
• Export Promotion Council.
• Confederation of Indian Industries ( CII )
• Small Industries Development Board of India
• Different Mills like - Woolen mills, Textile mills etc
The only disadvantage of the above sources is that the data may be biased. They are likely to
colour their negative points.
DATA COLLECTION
• Need to extract data from internal and external sources is a complicate
task
• Data collection is one of the first step of the data life cycle.
• To collect the right data, need to know the effort in collecting it,
• Methods of data collection

Different methods depending upon the type of data
There are two main types of data:
Primary data Secondary data
PRIMARY DATA
• First-hand information collected by the surveyor.
• The data so collected are pure and original and collected for a specific
purpose.
• They have never undergone any statistical treatment before.
• The collected data may be published as well.
• The Census is an example of primary data.
METHODS OF PRIMARY DATA
COLLECTION
• Personal investigation: The surveyor collects the data himself/herself. The
data so collected is reliable but is suited for small projects.
• Collection Via Investigators: Trained investigators are employed to
contact the respondents to collect data.
• Questionnaires: Questionnaires may be used to ask specific questions
that suit the study and get responses from the respondents. These
questionnaires may be mailed as well.
• Telephonic Investigation: The collection of data is done through asking
questions over the telephone,to give quick and accurate information.
SECONDARY DATA
• Secondary data are opposite to primary data.
• They are collected and published already (by some organization, for
instance).
• They can be used as a source of data and used by surveyors to collect
data from and conduct the analysis.
• Secondary data are impure in the sense that they have undergone
statistical treatment at least once.
METHODS OF SECONDARY DATA
COLLECTION
• Official publications such as the Ministry of Finance, Statistical

Departments of the government, Federal Bureaus, Agricultural Statistical
boards, etc. Semi-official sources include State Bank, Boards of Economic
Enquiry, etc.
• Data published by Chambers of Commerce and trade associations and

boards.
• Articles in the newspaper, from journals and technical publications.

DATA QUALITY
• The quality and integrity of data are critical
• Decisions depend on data because compiled data that make up
information and knowledge are at the heart of any decision making
system
• System must include a data acquisition subsystem
• Data quality is an extremely important issue because quality determines
the usefulness of data as well as the quality of the decisions based on
them.
IMPORTANCE AND BENEFITS OF
DATA QUALITY
• Good data management is crucial for keeping up with the competition and taking
advantage of opportunities. High-quality data can also provide various concrete benefits
for businesses.
• Some of the potential benefits of good data quality include:
 More Informed Decision-Making- Improved data quality leads to better decision-
making across an organization. The high-quality data gives much confidence in
making decisions. Good data decreases risk and can result in consistent improvements
in results.
 Better customer Targeting- Data quality also leads to improved customer targeting.
In the absence of high-quality data, marketers are forced to target a large number of
customers, which is not efficient. In many cases, they have to guess about their target
audience. This can be done by collecting data about existing customers and then
finding potential new customers with similar attributes. This knowledge can be used to
target advertising campaigns and develop products or content that appeal to the right
people.
• More Effective Content and Marketing Campaigns-
In addition to improving targeting, data quality can also help to improve
your content and marketing campaigns themselves. If the company know
about audience, the more reliably you can create content or ads that
appeal to them. For instance, if you’re a publisher of a sports website, you
can gather data that tells you which sports your website users are most
interested in. If you discover that cricket is one of your popular categories,
you can direct your content team to create more cricket related articles
and videos. A similar technique can be applied to content used as part of a
marketing campaign.
• Improved Relationships With Customers
Accurate data will improve relationships with customers, which is crucial for
success in any industry. Gathering data about customers helps you get to know
them better. This information about customers’ preferences, interests and
needs can be used to provide them with content that appeals to them and even
anticipates their needs. This can help you build strong relationships with them.
• Easy to use
High-quality data is also much easier to use than poor-quality data. Quality data
also increases company’s efficiency. If information is not complete or
consistent, significant amounts of time have to spend in making that data
useable. This takes time that can be used for other activities and it takes longer
for implementing the insights.
• Competitive Advantage
Having high quality is much useful to gain a competitive advantage than
competitors. High quality data is one of the most valuable resources that
today’s companies have. Organisation can better anticipate prospects’
needs and, therefore, beat competitors to sales. A lack of good data means
missed opportunities and falling behind the competition.
• Increased Profitability
High-quality data can lead to increased profitability. It can help to design
more effective marketing campaigns and increase sales. It also decreases
ad waste, making your marketing campaigns more cost-effective.
COMPONENTS OF DATA QUALITY
 Accurcay
- Accuracy refers to how well the data describes the real-world conditions it aims
to describe. information should be accurate.
- The quality of management decisions depends on the accuracy of information. If
the data is wrong, the decision will also go wrong.
- Inaccurate data creates clear problems, as it can cause you to come to incorrect
conclusions.
- The actions you take based on those conclusions might not have the effects you
expect because they’re based on inaccurate data. For example, data might lead
a marketer to believe that their customers are mostly females in their 20s. If that
data is inaccurate and their customers are actually primarily men in their 40s,
then they will end up targeting the wrong group with their advertisements.
Completeness
- The data should be complete in all respects.it should include all material
facts which are necessary for decision making.
- If data is complete, there are no gaps in it.
- There should not be any gap in the data from what was supposed to be
collected and what was actually collected.
- If a customer skipped several questions on a survey, for example, the data
they submitted would not be complete. If the data is incomplete, there may
be the problem of arriving inaccurate decisions from it. If someone skips
some of the questions on a survey, it may make the rest of the information
they provide less useful. For instance, if a respondent doesn’t include their
age, it will be harder to target content to people based on their age.
Consistency
- When comparing a data item or its counterpart across multiple data sets
or databases, it should be the same.
- This lack of difference between multiple versions of a single data item is
referred to as consistency.
- A data item should be consistent both in its content and its format.
- If the data isn’t consistent, different groups may be operating under
different assumptions about what is true. This can mean that the different
departments within the company will not be well coordinated and may
even unknowingly be working against one another.
Relevancy
- Relevancy is a decisive quality.
- Data gains value if it is relevant to the decision making context.
- Providing irrelevant data to the decision makers may create the problem
of information overload.
- Relevant information is what increases knowledge and reduces
uncertainty surrounding the problem under consideration.
- Even if the information collected has all other characteristics of quality
data, but not relevant to the purpose, it is not useful to a company.
Validity
- Validity refers to how the data is collected rather than the data itself.
- Data is valid if it is in the right format, of the correct type and falls within
the right range.
- If data does not meet these criteria, you might run into trouble organizing
and analyzing it.
 Timeliness
– Data should be available as and when it is needed.
- Appropriate time is one in which the recipient wants to initiate some actions
that are meant to achieve organizational objectives.
- Timeliness refers to how recently the event the data represents occurred.
Generally, data should be recorded as soon after the real world event as
possible.
- Data typically becomes less useful and less accurate as time goes on.
- Data that reflects events that happened more recently are more likely to
reflect the current reality.
- Using outdated data can lead to inaccurate results and taking actions that
don’t reflect the current reality.
HOW TO COLLECT HIGH-
QUALITY DATA
• Implement a data collection plan-
• Set data quality standards
• Create a plan for data correction
• Plan for data integration and distribution across departments
• Set goals for ongoing data collection
DATA INTEGRITY
• Data integrity refers to the reliability and trustworthiness of data throughout its
lifecycle.
• Major issues in data quality is data integrity
• A change made in the file in one place may not be made in the file in another
place- results in conflicting data
• Data integrity can be ensured by addressing the following issues:
 Uniformity –data are within specified limits
 Version- version checks are performed
 Completeness check
 Conformity check
 Geneaology check or drill down
MISSING DATA OR INCOMPLETE
DATA
• Missing data (or missing values) is defined as the data value that is not stored
for a variable in the observation of interest.
• The problem of missing data is relatively common in almost all research and can
have a significant effect on the conclusions that can be drawn from the data.
• Missing data present various problems.
• First, the absence of data reduces statistical power, which refers to the
probability that the test will reject the null hypothesis when it is false.
• Second, the lost data can cause bias in the estimation of parameters.
• Third, it can reduce the representativeness of the samples.
• Fourth, it may complicate the analysis of the study. Each of these distortions
may threaten the validity of the trials and can lead to invalid conclusions.
TYPES OF MISSING DATA
• In general, there are three types of missing data according to the
mechanisms of missingness.
Missing completely at random(MCAR)
• Missing completely at random (MCAR) is defined as when the probability
that the data are missing is not related to either the specific value which
is supposed to be obtained or the set of observed responses.
• For example, a questionnaire might be lost in the post, or a blood sample
might be damaged in the lab.. MCAR is an ideal but unreasonable
assumption for many studies. However, if data are missing by design,
because of an equipment failure or because the samples are lost in
transit or technically unsatisfactory, such data are regarded as being
MCAR.
TYPES
 Missing at random(MAR)
• Missing at random (MAR) is a more realistic assumption. Data are regarded to be MAR when
the probability that the responses are missing depends on the set of observed responses, but
is not related to the specific missing values which is expected to be obtained.
• As we tend to consider randomness as not producing bias, we may think that MAR does not
present a problem.
• However, MAR does not mean that the missing data can be ignored.
• For example, missing blood pressure measurements may be lower than measured blood
pressures but only because younger people may be more likely to have missing blood
pressure measurements.
• we mean that the missingness is to do with the person but can be predicted from other
information about the person. It is not specifically related to the missing information. For
example, if a child does not attend an educational assessment because the child is
(genuinely) ill, this might be predictable from other data we have about the child’s health, but
it would not be related to what we would have measured had the child not been ill.
TYPES
Missing not at random(MNAR)
• If the characters of the data do not meet those of MCAR or MAR, then they
fall into the category of missing not at random (MNAR).
• The cases of MNAR data are problematic. The only way to obtain an
unbiased estimate of the parameters in such a case is to model the missing
data.
• The model may then be incorporated into a more complex one for estimating
the missing values.
• For example, people with high blood pressure may be more likely to miss
clinic appointments because they have headaches.
• the missingness is specifically related to what is missing, e.g. a person does
not attend a drug test because the person took drugs the night before.
TECHNIQUES OF HANDLING
MISSING DATA
• The best possible method of handling the missing data is to prevent the
problem by well-planning the study and collecting the data carefully
• The following are suggested to minimize the amount of missing data
One technique of handling the missing data is to use the data analysis
methods which are robust to the problems caused by the missing data.
 An analysis method is considered robust to the missing data when there
is confidence that mild to moderate violations of the assumptions will
produce little to no bias or distortion in the conclusions drawn on the
population. However, it is not always possible to use such techniques.
Therefore, a number of alternative ways of handling the missing data has
been developed.
TECHNIQUES
List wise deletion
• List wise deletion (complete-case analysis) removes all data for an observation
that has one or more missing values.
• The most common approach to the missing data is to simply omit those cases
with the missing data and analyze the remaining data. This approach is known as
the complete case (or available case) analysis or listwise deletion.
• Listwise deletion is the most frequently used method in handling missing data,
and thus has become the default option for analysis in most statistical software
packages.
• However, if the assumption of MCAR is satisfied, a listwise deletion is known to
produce unbiased estimates and conservative results. When the data do not fulfill
the assumption of MCAR, listwise deletion may cause bias in the estimates of the
parameters
EXAMPLE
• Researchers using listwise deletion will remove a case completely if it is
missing a value for one of the variables included in the analysis.
• For example, say you are conducting analyses using cumulative high
school GPA, hours of study for first semester, SAT score, and first
semester grade in college algebra. Participant X is missing data for
cumulative high school GPA, therefore, Participant X will be completely
removed from the analyses because the participant does not have
complete data for all the variables.
Pairwise deletion
• pairwise deletion analyses all cases in which the variables of interest are
present and thus maximizes all data available by an analysis basis.
• Pairwise deletion eliminates information only when the particular data-
point needed to test a particular assumption is missing.
• In this method, the maximum amount of available data is retained, and
so this method is sometimes referred to as available case analysis (Pigott,
2001). Cases are excluded from only operations in which data are missing
on a variable that is required (Bennett, 2001; Roth, 1994). In a correlation
matrix, for example, a case that was missing data on one variable would
not be used to calculate the correlation coefficient between that variable
and another but would be included in all other correlations.
EXAMPLE
• a method in which data for a variable pertinent to a specific assessment
are included, even if values for the same individual on other variables are
missing. For example, consider a researcher studying the influence of
age, education level, and current salary on the socioeconomic status of a
sample of employees. If assessing specifically how education and salary
influence socioeconomic status, he or she could include all participants
for whom that data had been recorded even if they were missing
information on age, as the latter variable is not of interest in the current
analysis.
Mean substitution
• In a mean substitution, the mean value of a variable is used in place of the
missing data value for that same variable.
• This allows the researchers to utilize the collected data in an incomplete
dataset.
• The theoretical background of the mean substitution is that the mean is a
reasonable estimate for a randomly selected observation from a normal
distribution.
• However, with missing values that are not strictly random, especially in the
presence of a great inequality in the number of missing values for the
different variables, the mean substitution method may lead to inconsistent
bias.
Regression imputation
• Imputation is the process of replacing the missing data with estimated
values.
• Instead of deleting any case that has any missing value, this approach
preserves all cases by replacing the missing data with a probable value
estimated by other available information.
• After all missing values have been replaced by this approach, the data set
is analyzed using the standard techniques for a complete data.
• In regression imputation, the existing variables are used to make a
prediction, and then the predicted value is substituted as if an actual
obtained value.
Last observation carried forward
• Many studies are performed with the longitudinal or time-series approach, in which the
subjects are repeatedly measured over a series of time-points
• One of the most widely used imputation methods in such a case is the last observation
carried forward (LOCF).
• This method replaces every missing value with the last observed value from the same
subject. Whenever a value is missing, it is replaced with the last observed value
• For example, If a person drops out of a study before it ends, then his or her last observed
score on the dependent variable is used for all subsequent (i.e., missing) observation
points.
• As an example, assume that there are 8 weekly assessments after the baseline
observation. If a patient drops out of the study after the third week, then this value is
"carried forward" and assumed to be his or her score for the 5 missing data points. The
assumption is that the patients improve gradually from the start of the study until the end,
Maximum likelihood
• There are a number of strategies using the maximum likelihood method
to handle the missing data. In these, the assumption that the observed
data are a sample drawn from a multivariate normal distribution is
relatively easy to understand.
• After the parameters are estimated using the available data, the missing
data are estimated based on the parameters which have just been
estimated.
• When there are missing but relatively complete data, the statistics
explaining the relationships among the variables may be computed using
the maximum likelihood method. That is, the missing data may be
estimated by using the conditional distribution of the other variables.
• The method of maximum likelihood corresponds to many well-known
estimation methods in statistics. For example, suppose you are interested
in the heights of Americans. You have a sample of some number of
Americans, but not the entire population, and record their heights.
Further, you are willing to assume that heights are normally distributed
with some unknown mean and variance. The sample mean is then the
maximum likelihood estimator of the population mean, and the sample
variance is a close approximation to the maximum likelihood estimator of
the population variance
Expectation-Maximization
• Expectation-Maximization (EM) is a type of the maximum likelihood method that can be
used to create a new data set, in which all missing values are imputed with values
estimated by the maximum likelihood methods.
• This approach begins with the expectation step, during which the parameters (e.g.,
variances, covariances, and means) are estimated, perhaps using the listwise deletion.
Those estimates are then used to create a regression equation to predict the missing
data.
• The maximization step uses those equations to fill in the missing data. The expectation
step is then repeated with the new parameters, where the new regression equations are
determined to "fill in" the missing data.
• The expectation and maximization steps are repeated until the system stabilizes, when
the covariance matrix for the subsequent iteration is virtually the same as that for the
preceding iteration.
Multiple imputation
• Multiple imputation is another useful strategy for handling the missing data. Multiple
imputation involves filling in the missing values multiple times, creating multiple “complete”
datasets.
• In a multiple imputation, instead of substituting a single value for each missing data, the
missing values are replaced with a set of plausible values which contain the natural variability
and uncertainty of the right values.
• This approach begin with a prediction of the missing data using the existing data from other
variables.
• The missing values are then replaced with the predicted values, and a full data set called the
imputed data set is created. This process iterates the repeatability and makes multiple imputed
data sets (hence the term "multiple imputation").
• Each multiple imputed data set produced is then analyzed using the standard statistical
analysis procedures for complete data, and gives multiple analysis results. Subsequently, by
combining these analysis results, a single overall analysis result is produced.
• The first step in MI is to create several imputed data sets. Three to five
imputations are usually adequate. Then, the analyses are carried out on
each data set, with the parameter estimates (e.g., factor loadings, group
mean differences, correlations, regression coefficients) and their standard
errors saved for each data set. Final results are obtained by averaging the
parameter estimates across these multiple analyses, which results in an
unbiased parameter estimate
Sensitivity analysis
• Sensitivity analysis is defined as the study which defines how the
uncertainty in the output of a model can be allocated to the different
sources of uncertainty in its inputs.
• When analyzing the missing data, additional assumptions on the reasons
for the missing data are made, and these assumptions are often
applicable to the primary analysis.
• However, the assumptions cannot be definitively validated for the
correctness.
SOCIAL NETWORKING
ANALYSIS(SNA)
SOCIAL NETWORKING
ANALYSIS(SNA)
• Social network refers to the expression of a social relationship achieved
among individuals, families, villages, communities and so on.
• Social network analysis (SNA) means analyzing various characteristics of
the pattern of distribution of relational ties as mentioned above and
drawing inferences about the network as a whole or about those
belonging to it considered individually or in groups.
• The aim of social network analysis is to understand a community by
mapping the relationships that connect them as a network, and then
trying to draw out key individuals, groups within the network
(‘components’), and/or associations between the individuals.
SNA
• Social network analysis helps in discovering and uncovering the patterns of
interaction between people.
• Social Network Analysis (SNA) emerged from the Social Sciences branch as a
very useful method for studying why and how social groups operate, behave,
and interact in certain ways.
• A Social Network is comprised of individuals or organizations connected by
kinship, friendship, beliefs, common interests, and financial exchange and
many other things. These social relationships are considered as a graph where
the entities form the nodes and the edges are the connections between them.
• Social networks operate at many levels, from within the families to nation-
wide and also across nations and play a crucial role in understanding the
behaviour of entities within the network.
BASIC TERMINOLOGY IN SNA
• Centrality: A highly centralized network is dominated by one individual
who controls the information and knowledge flow and may become a hub
of communication failure. A less centralized network has no hub point of
failures. So people can still pass on information even if some channels are
blocked.
• Betweenness: Betweenness of a node measures the number of paths that
pass through each individual. This can identify the nodes which has the
ability to control the flow of information between different parts of the
network. These can be called as the gateway nodes. Gateway nodes
channel information to most of the others in the network if they have
many paths running through them. If they have a few paths running
through them, still they play a powerful communication role if they exist
between different clusters of the network.
• Closeness: Closeness measures the extent to which an individual is near
to all other individuals in a network either directly or indirectly. It exhibits
the ability to access information through the network member
• Degree: Degree of a node specifies the number of links to other
individuals in the network. Higher the degree of a node, the more
influential it is within the network.
. THE NEED FOR SOCIAL
NETWORK ANALYSIS
• Users are dealing with ever growing data sets and it is need of the hour,
the capabilities that can help to filter network information faster and with
more efficiency.
• Users need to quickly identify the crucial individuals/groups for better
optimization of limited resources due to dynamism of target networks.
• To identify the characteristics of networks (that are not appealing) and to
analyze how those networks are dynamic over time.
GENERAL APPLICATIONS OF
SOCIAL NETWORK ANALYSIS
• Social Network Analysis is being used in number of fields. Examining people’s
acquaintances and affiliations can be valuable in uncovering various patterns
and anti-patterns including credit card misuse or theft, false insurance claims,
abuse of health care, insider trading etc. General applications of SNA are:
 For improved customer targeting, for potential promotions based on their
past purchase history.
 In identifying loyal customers who are vocal, active and passionate and can
be characterized as brand ambassadors.
 In combating terrorist activities by characterizing the network organizations
to determine the likelihood and impact of terrorist activity.
 In detecting health care fraud by detecting patterns, establishing linkage
between individuals, and to connect non-obvious relationships.
BUSINESS APPLICATIONS OF
SOCIAL NETWORK ANALYSIS
• Analysis for creating usable customer intelligence
 Organizations generally want to use the social media data to understand the
behaviour and needs of their customers or specific targeted groups of people
with respect to the organization’s current or future services or products.
Three major approaches are adopted to observe the social media – tools for
channel reporting, overview score-card systems, and predictive analytic
techniques.
The combination of network mining and text mining reveals new assorted
insights into customer behaviour in social media. A good understanding of
the segments of social media can make a valuable contribution to the
decision about how to shape and invest in the organization’s social media
and marketing plans.
• Social Network Analysis in organizational change
Social Network Analysis helps to reveal strategically important networks
that can’t be found on the formal organizational charts.
 It supports discovering the underlying informal structures that exist in
the organizations.
SNA used within an organization can also be termed as Organizational
Network Analysis. In this approach, the focal point is identifying crucial
networks within the organization’s boundary, understanding the structure
of personal and group relationships within the networks, and using that
knowledge to improve the business performance. Good managers
understand the role played by these networks and how to use them to
maximum advantage.
In order to strengthen the business perspective, business use cases are
grouped into three broad categories in as follows
 Managing human resources in large enterprises by SNA to map and
measure otherwise invisible relationships between people to study how
responsibility, influence and power are disseminated across large groups
of people.
 Business Process Management through SNA which can give business’
insight into how their best employees operate.
 Strategic restructuring by using appropriate SNA metrics and tools at
various levels such as the individual, organization and industry. They are
useful for restructuring programs and organizational change.
• Social Network Analysis for understanding health behaviour
SNA provides a perspective of how a society functions .
Social networks have a tremendous influence on the health behaviour of
individuals.
The results from social network analysis can be used by the Government
for designing health plans, benefits and to take preventive measures
during some disease outbreaks.
Pharmaceutical companies can target demographic groups and specific
markets. Health insurance companies can design their insurance plans in
a better way.
BIG DATA ANALYSIS
• Big Data is often described as extremely large data sets that have grown beyond
the ability to manage and analyze them with traditional data processing tools.
• Big data refers to the large, diverse sets of information that grow at ever-
increasing rates.
• It encompasses the volume of information, the velocity or speed at which it is
created and collected, and the variety or scope of the data points being covered.
• Big data often comes from multiple sources and arrives in multiple formats.
• Definition by Frank OHL Horst : Big Data defines a situation in which data sets have
grown to such enormous sizes that conventional information technologies can no
longer effectively handle either the size of the data set or the scale and growth of
the data set. In other words, the data set has grown so large that it is difficult to
manage and even harder to garner value out of it. The primary difficulties are the
acquisition, storage, searching, sharing, analytics, and visualization of data
DIMENSIONS OF BIG DATA
• Big Data is multidimensional, in which four dimensions relate to the primary aspects
of Big Data. 4 V’s
• Volume. Refers to the quantity of data. Big Data comes in one size: large, massive
data sets with measures such as petabytes and zettabytes.
• Variety. Diversified sources and types of data. Big Data extends beyond structured
data to include unstructured data of all varieties: text, audio, video, click streams,
log files, and more.
• Veracity. The massive amounts of data collected for Big Data purposes can lead to
statistical errors and misinterpretation of the collected information. Purity of the
information is critical for value.
• Velocity. Big data velocity deals with the accelerating speed at which data flows in
from sources like business processes,machines,networks, and human interaction
things like social media sites, mobile devices. The flow of data is massive and
continuous.
TYPES OF BIG DATA
• BigData' could be found in three forms: Structured, Unstructured and
Semi-structured
Structured
• Any data that can be stored, accessed and processed in the form of fixed
format is termed as a 'structured' data.
• computer science has achieved greater success in developing techniques
for working with such kind of data and also deriving value out of it.
• However, nowadays, we are foreseeing issues when a size of such data
grows to a huge extent
• Ex- an ‘Employee’ table in a database
 Unstructured
• Any data with unknown form or the structure is classified as unstructured data.
• In addition to the size being huge, un-structured data poses multiple challenges in
terms of its processing for deriving value out of it.
• A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
• Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or
unstructured format.
 Semi-structured
• Semi-structured data can contain both the forms of data.
• We can see semi-structured data as a structured in form but it is actually not defined
with.
TECHNOLOGIES AND CONCEPTS
OF BIG DATA
• The complexity of big data not end with just four dimensions. It include following
concepts
 Business intelligence
- Business intelligence is a set of processes, architectures and technologies that convert
raw data into meaningful information that derives profitable business actions.
- This consists of a broad category of applications and technologies for gathering
storing analysing and providing access to data. It delivers actionable information,
which helps enterprise users make better business decisions using fact-based support
systems.
 Data mining- data mining is the process of finding anomalies, patterns and
correlations within large data sets to predict outcomes. It is defined as a process used
to extract usable data from a larger set of any raw data
Statistical applications- this look at data using algorithms based on statistical
principles and normally concentrate on data sets related to polls, census and
other static data sets. Statistical applications ideally deliver sample
observations that can be used to study populated data sets for the purpose of
estimating, testing and predictive analysis
Predictive analysis- predictive analysis is the use of data, statistical
algorithms and machine learning techniques to identify the likelihood of future
outcomes based on historical data. with the help of sophisticated predictive
analytics tools and models, any organisation can now use fast in current data
to reliably forecast trends and behaviours
Data modelling- data modelling is nothing but a process through which data
is stored structurally in a format in a database. it is important because it
enables organisations to make data driven decisions varied business goals.
HOW BIG DATA WORKS
• Involves three key actions
Integrate
• Big Data is always collected from many sources and as there
are enormous loads of information, new strategies and technologies to
handle it need to be discovered.
• It will be a challenge to integrate such volume of information in the
system.
• Have to receive the data, process it and format it in the right form that
your business needs and that your customers can understand.
Manage
• Big data need a place to store it.
• Storage solution can be in the cloud, on-premises, or both.
• Company can also choose in what form the data will be stored, and bring
desired processing requirements and necessary engines to those datasets
on a demand basis.
• This is why more and more people are choosing a cloud solution for
storage because it supports the current compute requirements.
Analyse
• The data received and stored, but need to analyze it so it can be used.
• Explore the data and use it to make any important decisions such as
knowing what features are mostly researched from the customers or use
it to share research.
• Big data is analyzed using software specifically designed to handle
large,complex data sets
• Many software-as-a-service(SaaS) specialize in managing such data.
ADVANTAGES OF BIG DATA
• Improved business process
• Businesses can utilize outside intelligence while taking decisions
• Improved customer service
• Fraud detection
• Early identification of risk to the product service
DISADVANTAGES
• Traditional storage can cost a lot of money to store big data
• Lots of big data is unstructured
• Big data analysis violates principles of privacy
• It can be used for manipulation of customer records
• It may increase social stratification
• Big data analysis is not useful in short run. it needs to be analysed for
longer duration
• Big data analysis results are misleading sometimes
• Speedy update in big data can mismatch real figures
DATA SCIENTIST
• A data scientist is normally associated with an employee or business
intelligence consultant who excels at analyzing data, particularly large
amounts of data, to help a business gain a competitive edge.
• The title data scientist is sometimes criticized because it lacks specificity
and clarity.
• A data scientist must possess a combination of analytic, machine
learning, data mining and statistical skills as well as experience with
algorithms and coding.
• Most critical skill is the ability to translate the significance of data in a
way that can be easily understood by others.
THE ROLE OF A DATA SCIENTIST
THE ROLE OF A DATA SCIENTIST IN BUSINESS
AND SOCIETY
• Data scientists help companies interpret and manage data and solve complex
problems using expertise in a variety of data niches.
• They generally have a foundation in computer science, modeling, statistics,
analytics, and math - coupled with a strong business sense. The following
explains the role:
Empowering Management to Make Better Decisions
- A trusted advisor and strategic partner to the organization’s upper
management by ensuring that the staff maximizes their analytics capabilities.
- A data scientist communicates and demonstrates the value of the institution’s
data to facilitate improved decision-making processes across the entire
organization, through measuring, tracking, and recording performance
metrics and other information.
 Directing Actions Based on Trends - Examines and explores the organization’s data,
after which they recommend and prescribe certain actions that will help improve the
institution’s performance, better engage customers, and ultimately increase
profitability.
 Challenging the Staff to Adopt Best Practices- One of the responsibilities of a data
scientist is to ensure that the staff is familiar and well-versed with the organization’s
analytics product. They prepare the staff for success with the demonstration of the
effective use of the system to extract insights and drive action. Once the staff
understands the product capabilities, their focus can shift to addressing key business
challenges.
 Identifying Opportunities- During their interaction with the organization’s current
analytics system, data scientists question the existing processes and assumptions for
the purpose of developing additional methods and analytical algorithms. Their job
requires them to continuously and constantly improve the value that is derived from
the organization’s data.
Decision Making with Quantifiable data- There is no need to take high
risk in data gathering and analyzing from various channels with the
arrival of data scientists. Data scientists create models using existing
data that simulate a variety of potential actions. Thus, an organization
can learn which path will bring the best business outcomes.
 Testing the Decisions- The primary role of data scientists involves making
certain decisions and implementing those changes. It is crucial to know
how those decisions have affected the organization. This is where a data
scientist comes in. They measure the key metrics that are related to
important changes and quantify their success.
Identification and Refining of Target Audiences- The importance of data
scientist is based on the ability to take existing data that is not
necessarily useful on its own and combine it with other data points to
generate insights an organization can use to learn more about its
customers and audience. With this in-depth knowledge, organizations
can tailor services and products to customer groups, and help
profit increases.
Recruiting the Right Talent for the Organization- The process of recruiting
is changed due to big data. In addition to resume, vast amount of
information available on talent—through social media, corporate
databases, and job search websites. Data science specialists can work
their way through all these data points to find the candidates who best fit
the organization’s needs.
ROLE OF AI AND INTELLIGENT
AGENTS IN E-BUSINESS
• Artificial Intelligence is now a common concept.
• In fact, AI is embedding itself in each and every aspect of our life. Now
when you visit a mall you see several self-checkout cash counters. Even
the airports are now being equipped with state of art advanced security
check systems.
• Some examples of AI in E-Commerce
• Chatbots- The e-commerce sites now offer 24/7 assistance and this is
because of chatbots. Earlier these chatbots just offered customary
replies, now they have turned into intelligent beings which understand all
issues which the visitors have to deal with.
• CRM- Previously human resource was the base of Customer Relationship
Management (CRM). Huge efforts were made to gather and evaluate data
so as to offer superb services to the clients. CRM pinpoints buying trends
so that your actions are directed in a right way. With advanced CRM
solutions, predictions are made with enhanced accuracy, thus the sales
team can concentrate on building long-lasting associations with
customers.
• Internet of Things- IoT intends to do-offering connectivity in all
elements of our life. From syncing different devices to programming your
washing machine, lights, household appliances and even your car, IoT is
simply everywhere.
BENEFITS OF AI IN E-BUSINESS
• Sales Forecasting- analyzing big volumes of user data and on that basis it offers
useful insights about consumer buying patterns.
• Superior services at affordable costs- It automates routine processes thus offering
personalized marketing options.
• Enhance customer satisfaction and promote sales- Besides offering superb
customer service, AI also helps in conversational commerce. This is basically real-time, a
human-like interaction between client and messenger, chatbot, or voice-chat.
• Personalized content- The Artificial technology offers any business a huge competitive
edge regardless to its size of the operation. It allows the users to pick up any item in any
picture online, and then ask to come up with similar things by making use of the image.
• AI in marketing- small business firms can use even modest budgets due to AI solutions
meant for marketing.
• Customer service- AI enables to follow an excellent customer service with perfection.
INTELLIGENT AGENTS
• An intelligent agent is a software program that supports a user with the
accomplishment of some task or activity by collecting information
automatically over the internet and communicating data with other
agents depending on the algorithm of the program.
• Business establishments want to improve sales by pulling the right
shoppers to their websites. Consumers are generally interested in
comparing various products and services to find the best product at best
price. These have led to the emergence of variety of intelligent agents
working for buyers and sellers of products and services over the web.
• Intelligent agents play an important role in e-marketing and enhancing
the eCommerce aspect of an organization. It enables both buyers and
sellers in eCommerce transactions by providing efficient, precise, and
enhanced comprehensive searches on the web and information
repository. Intelligent agents are very interactive and can perform
multiple tasks at a variety of locations. They are very powerful and
versatile than traditional search engines, spiders, and web crawlers.
• Features of Intelligent Agents
Mobility: Intelligent agents engaged in eCommerce travel from computer
to computer, across different system architecture and platforms and
gather information until search parameters are exhausted.
Goal oriented: Intelligent agent has the ability to accept the user
statement of goals and carry out the task delegated to it. It can move
around from one machine to another and can act proactively in response
to their environment, and can exhibit goal-directed behavior by taking the
initiative.
 Independent: Intelligent agents function on its own without human
intervention and must have the ability to make decisions and to initiate
action without direct human supervision. They communicate
independently with repositories of information and other agents and
accomplish the objectives and tasks on the behalf of the user.
Intelligent: Intelligent Agents are able to crawl for data more intelligently.
They have intelligence to reason things out based on the existing
knowledge of its user and environment and on past experiences.
Intelligent agents follow preset rules to evaluate conditions in the
external environment.
Reduces net traffic: Agents can communicate and cooperate with other
agents quickly. This enables them to perform tasks, such as information
searches quickly and efficiently and reduces network traffic.
Multiple tasks: An intelligent agent can perform multiple tasks
simultaneously. It relieves the human user from monotonous clerical
work.
INTELLIGENT AGENTS IN E
COMMERCE
• Artificial intelligence (AI) continues to play a significant role in many
leading information systems.
• AI contribution is now crucial in systems such as workflow, data mining,
production scheduling, supply chain logistics, and most recently,
ecommerce.
• It is useful to explore the roles of agents as mediators in electronic
commerce in the context of a common framework.
Identification: This stage characterizes the buyer becoming aware of
some unmet need by inspiring through product information. Agents can
play an important role for those purchases that are repetitive (supplies) or
predictable (habits).
Brokering: two types- a) Product Brokering: once a buyer has identified a
need to make a purchase, the buyer has to determine what to buy
through a critical evaluation of retrieved product information. There are
several agents systems that lower consumers’ search cost when deciding
which products best meet their needs: B)Merchant Brokering: this stage
combines the consideration set from the previous stage with merchant-
specific alternatives to help determine who to buy from.
Negotiation: in this stage, price and other terms of the transaction are
settled on. Real-world negotiation increases transaction costs that may be
too high for either consumers or merchants. The majority of business-to-
business transactions involve negotiation.
Payment and Delivery: this stage can either signal the termination of the
negotiation stage or occur sometimes afterwards. In some cases, the
available payment or delivery options can influence product and
merchant brokering.
Product Service and Evaluation: this post-purchase stage involves product
service, customer service, and an evaluation of the satisfaction of the
overall buying experience and decision.
ETHICAL AND LEGAL COSIDERATION
IN BUSINESS ANALYTICS
• Comply with legal requirements
• Cultural and social norms
• Interest of stake holders
• Accountability
• Data protection
• Due care
• Confidentiality

Module 3

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Module 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 3

Uploaded by

Copyright:

Available Formats

MODULE 3

Qualitative data Quantitative data

• Data Collection is the process of gathering information on targeted variables

• The data that is collected must be processed or organized for analysis.

• The processed and organized data may be incomplete, contain

• The results of the data analysis are to be reported in a format as required

• Big Data-With an increasing emphasis on digitization in every aspect of life,

Internal data External data

• Methods of data collection

• Official publications such as the Ministry of Finance, Statistical

• Data published by Chambers of Commerce and trade associations and

• Articles in the newspaper, from journals and technical publications.

You might also like