Introduction To Data Analysis
Introduction To Data Analysis
ANALYSIS
Contents
analysis
As per Wikipedia – Data analysis is a
process of inspecting, cleansing,
transforming, and modeling data with
the goal of discovering useful
information, informing conclusions, and
supporting decision-making
Data analysis vs Data analytics
For example, to purchase any product from the e-commerce site, often,
we see the previous customer’s ratings and feedback on that product.
Then, after analyzing product rating, feedback, and other factors, we buy
a new product.
We review and analyze the available data related to our interest and then
decide that this is nothing but a simple data analysis.
Data analysis helps organizations make better decisions and make their
business more profitable.
Why data So, we can say that data analysis is the backbone of any data-driven
analysis
business decision.
Descriptive data analysis
• Descriptive analysis is one of the most common and primary forms of data
analysis. Descriptive data analysis is helpful to find the “what is/has
happening/happened?” in business. Usually, we take the help of descriptive data
analysis to track the Key Performance Indicators (KPIs), sales profit/loss, etc.
Predictive analysis
• Predictive data analysis is used to forecast or predict what can likely happen by
analyzing the historical data, which helps us understand “what will probably
happen in the future?“ For example, by analyzing the past years of sales reports, it
is possible to predict the coming year’s sales, but this task does not come easily.
It is needed in advanced data analysis and Machine Learning (ML)
analysis
• Prescriptive data analysis uses the outcomes of all the data analysis. It will help
find the “what action should be taken?” to counter a problem or predicted
problem. So, it prescribes the action(s) as a solution to counter the specific
situation. It needs advanced machine learning and real-time artificial intelligence.
PROCESS
FLOW OF
DATA
ANALYSIS
Structured data Semi-structured data Unstructured data Tools for data analysis in
Python
Structured data have a fixed, Semi-structure has a partially Unstructured data means there Python has many libraries for
predefined, and consistent defined structure. Though it is no predefined structure of the data analysis, data visualization,
structure. This type of data is does not have entire relational data. This is a bit complex to and data modeling like IPython,
most effective for analysis. For data, it is manageable to process and store, and we need Pandas, NumPy, Matplotlib,
example, relational data is understand the data structure some advanced capacity, tools, Seaborn, Scikit-Learn, NLTK,
organized in rows and columns and process. For example, CSV and methods to analyze and Keras, TensorFlow, and so on. All
data, JSON data, XML data, and process such data. For example, are not in scope for this book,
so on pdf, image, text log, audio/video but some are common and used
data, and so on” by the data science and data
analytics community
Types of data
IPython
• IPython is a web-based interactive shell notebook for several
programming languages but is mainly used with Python to write, test,
and execute the Python programme to analyze and visualize the data.
Pandas
• Pandas is a trendy data analysis and data exploration library that
provides a structured representation of data. It helps to do data
manipulation, cleansing, aggregation, merging, and so on effortlessly
Numpy
• NumPy is a fundamental library for doing array and vector-based
mathematical operations
Matplotlib
Tools for data • Matplotlib library is a vastly used data visualization library. It helps us
analysis
represent the data in various visual graphs, such as line plots, bar
charts, histograms, etc