Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Module 4

Data processing in research involves collecting and transforming data into usable information for stakeholders, aiding in decision-making and theory validation. The data processing cycle consists of six steps: data collection, preparation, input, processing, output, and storage. Data analysis, which can be qualitative or quantitative, helps extract actionable insights from data, reducing decision-making risks and improving research outcomes.

Uploaded by

Chirag Rastogi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module 4

Data processing in research involves collecting and transforming data into usable information for stakeholders, aiding in decision-making and theory validation. The data processing cycle consists of six steps: data collection, preparation, input, processing, output, and storage. Data analysis, which can be qualitative or quantitative, helps extract actionable insights from data, reducing decision-making risks and improving research outcomes.

Uploaded by

Chirag Rastogi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Module 4

What is data processing in research?


Data processing in research is the process of collecting research data and transforming it into
information usable to multiple stakeholders. While data can be looked at in numerous ways and
through various lenses, data processing aids in proving or disproving theories, helping make business
decisions, or even advancing enhancements in products and services. Data processing is even used in
research to understand pricing sentiments, consumer behavior and preferences, and competitive
analysis.

Through this process, research stakeholders turn qualitative data and quantitative data from a
research study into a readable format in the form of graphs, reports, or anything else that business
stakeholders resonate with. The process also provides context to the data that has been collected and
helps with strategic business decisions.

While it is a critical aspect of a business, data processing is still an underutilized process in research.
With the proliferation of data and the number of research studies conducted, processing and putting
the information into knowledge management repositories like InsightsHub is critical.

Data processing Steps in Research

The data processing cycle in research has six steps. Let’s look at these steps and why they are an
imperative component of the research design.

1) Collection of research data


Data collection is the primary stage in the research process. This process could be through
various online and offline research techniques and could be a mix of primary and secondary
research methods. The most commonly used form of data collection is research surveys.
However, with a mature market research platform, you can collect qualitative data through
focus groups, discussion modules, and more.

2) Preparing research data


The second step in research data management is preparing the data to eliminate
inconsistencies, remove bad or incomplete survey data, and clean the data to maintain
consensus. This step is critical since insufficient data could render research studies wholly
useless and could be a waste of time and effort.

3) Inputting research data


The next step is putting the cleaned-up data into a digitally readable format consistent with
organizational policies, research needs, and more. This step is critical since the data is then
put into online systems compatible with managing research data.

4) Processing research data


Once the data is input into systems, it is critical to process this data to make sense of it. The
information is processed basis on needs, types of data collected, time available to process
data, and multiple other factors. This is one of the most critical components of the research
process.
5) Output of research data
This stage of research data processing is where it gets turned into insights. This stage allows
business owners, stakeholders, and other personnel to look at data in graphs, charts, reports,
and other easy-to-consume formats.

6) Storage of the processed research data


The final stage of the steps of data processing is the storage. Keeping the data in a format that
is indexable, searchable, and creates a single source of truth is essential. Knowledge
management platforms are most commonly used for storage of processed research data.

What Is Data Analysis?


Although many groups, organizations, and experts have different ways of approaching data analysis,
most of them can be distilled into a one-size-fits-all definition. Data analysis is the process of cleaning,
changing, and processing raw data and extracting actionable, relevant information that helps
businesses make informed decisions. The procedure helps reduce the risks inherent in decision-
making by providing useful insights and statistics, often presented in charts, images, tables, and
graphs.

A simple example of data analysis can be seen whenever we make a decision in our daily lives by
evaluating what has happened in the past or what will happen if we make that decision. Basically, this
is the process of analyzing the past or future and making a decision based on that analysis.

It’s not uncommon to hear the term “big data” brought up in discussions about data analysis. Data
analysis plays a crucial role in processing big data into useful information. Neophyte data analysts who
want to dig deeper by revisiting big data fundamentals should go back to the basic question, “What is
data?”

What Is the Importance of Data Analysis in Research?

A huge part of a researcher’s job is to sift through data. That is literally the definition of “research.”
However, today’s Information Age routinely produces a tidal wave of data, enough to overwhelm even
the most dedicated researcher. From a birds eye view, data analysis:

1) plays a key role in distilling this information into a more accurate and relevant form, making
it easier for researchers to do to their job.
2) provides researchers with a vast selection of different tools, such as descriptive statistics,
inferential analysis, and quantitative analysis.
3) offers researchers better data and better ways to analyze and study said data.

What is Data Analysis: Types of Data Analysis

A half-dozen popular types of data analysis are available today, commonly employed in the worlds of
technology and business. They are:

1) Diagnostic Analysis: Diagnostic analysis answers the question, “Why did this happen?” Using
insights gained from statistical analysis (more on that later!), analysts use diagnostic analysis
to identify patterns in data. Ideally, the analysts find similar patterns that existed in the past,
and consequently, use those solutions to resolve the present challenges hopefully.
2) Predictive Analysis: Predictive analysis answers the question, “What is most likely to
happen?” By using patterns found in older data as well as current events, analysts predict
future events. While there’s no such thing as 100 percent accurate forecasting, the odds
improve if the analysts have plenty of detailed information and the discipline to research it
thoroughly.
3) Prescriptive Analysis: Mix all the insights gained from the other data analysis types, and you
have prescriptive analysis. Sometimes, an issue can’t be solved solely with one analysis type,
and instead requires multiple insights.
4) Statistical Analysis: Statistical analysis answers the question, “What happened?” This analysis
covers data collection, analysis, modeling, interpretation, and presentation using dashboards.
The statistical analysis breaks down into two sub-categories:
5) Descriptive: Descriptive analysis works with either complete or selections of summarized
numerical data. It illustrates means and deviations in continuous data and percentages and
frequencies in categorical data.
6) Inferential: Inferential analysis works with samples derived from complete data. An analyst
can arrive at different conclusions from the same comprehensive data set just by choosing
different samplings.
7) Text Analysis: Also called “data mining,” text analysis uses databases and data mining tools to
discover patterns residing in large datasets. It transforms raw data into useful business
information. Text analysis is arguably the most straightforward and the most direct method
of data analysis.

Data Analysis Methods


Some professionals use the terms “data analysis methods” and “data analysis techniques”
interchangeably. To further complicate matters, sometimes people throw in the previously discussed
“data analysis types” into the fray as well! Our hope here is to establish a distinction between what
kinds of data analysis exist, and the various ways it’s used.

Although there are many data analysis methods available, they all fall into one of two primary types:
qualitative analysis and quantitative analysis.

Qualitative Data Analysis: The qualitative data analysis method derives data via words, symbols,
pictures, and observations. This method doesn’t use statistics. The most common qualitative methods
include:

• Content Analysis, for analyzing behavioral and verbal data.


• Narrative Analysis, for working with data culled from interviews, diaries, surveys.
• Grounded Theory, for developing causal explanations of a given event by studying and
extrapolating from one or more past cases.

Quantitative Data Analysis: Statistical data analysis methods collect raw data and process it into
numerical data. Quantitative analysis methods include:

• Hypothesis Testing, for assessing the truth of a given hypothesis or theory for a data set or
demographic.
• Mean, or average determines a subject’s overall trend by dividing the sum of a list of numbers
by the number of items on the list.
• Sample Size Determination uses a small sample taken from a larger group of people and
analyzed. The results gained are considered representative of the entire body.

We can further expand our discussion of data analysis by showing various techniques, broken down
by different concepts and tools.
What is a Parametric Test?
The basic principle behind the parametric tests is that we have a fixed set of parameters that
are used to determine a probabilistic model that may be used in Machine Learning as well.
Parametric tests are those tests for which we have prior knowledge of the population
distribution (i.e, normal), or if not then we can easily approximate it to a normal distribution
which is possible with the help of the Central Limit Theorem.
Parameters for using the normal distribution is –

• Mean
• Standard Deviation
What is a Non-parametric Test?
In Non-Parametric tests, we don’t make any assumption about the parameters for the given
population or the population we are studying. In fact, these tests don’t depend on the
population.
Hence, there is no fixed set of parameters is available, and also there is no distribution (normal
distribution, etc.) of any kind is available for use.

Differences Between Parametric and Nonparametric Tests

Parameter Parametric Test Nonparametric Test

Assume normal distribution and No assumptions about distribution or


Assumptions
equal variance variance

Suitable for both continuous and


Data Types Suitable for continuous data
categorical data

Test Statistics Based on population parameters Based on ranks or frequencies

Generally, more powerful when More robust to violations of


Power
assumptions are met assumptions

Requires larger sample size,


Sample Size especially when distributions are Requires smaller sample size
non-normal

Results are based on ranks or


Interpretation of Straightforward interpretation of
frequencies and may require
Results results
additional interpretation
What is power of a hypothesis test?
Power in a hypothesis test is the ability to correctly reject a false null hypothesis. Generally speaking,
this is a trade-off between increasing our chance of rejecting the null hypothesis when it is false and
decreasing our chance of rejecting the null hypothesis when it is true. People will talk about Type I
error and Type II error — respectively, erroneously rejecting a true null and erroneously failing to
reject a false null — which captures the same idea.

Usually we get more power when we can make valid and correct assumptions about our data. There’s
nothing mysterious about this relationship between power and assumptions. For instance, if we are
measuring the mass of an object by weighing it against a certain volume of water, we can be more
confident in our results if we can assume that the water is pure distilled water; if we cannot assume
that the water is distilled — i.e. it might be distilled, or it might be heavily salted, or contain dissolved
solids, either of which could throw off its weight — then we have to make allowances for those
possibilities. Likewise, if we can assume that data is normally distributed we can use a parametric test
like a T-test, Z-test, or Chi-Squared test, which leverage our mathematical understanding of normal
distributions to reduce our uncertainty about the distribution of errors. This means we can reject the
null at smaller values, thus reducing the chance of Type II errors. However, if we cannot validly assume
that our data is normally distributed, we use a non-parametric test instead. Non-parametric tests
make it harder to reject the null hypothesis, creating a larger chance of Type II error, but they make
allowances for different possible distributions that would create Type I errors if we are make
assumptions that aren't true.

As a rule, Type I error is something that we decide on as a feature of our test design — when we
determine the significance level of our hypothesis test, we are determining exactly how much chance
of falsely rejecting a true null hypothesis we are willing to risk. Type II error is more of an
imponderable, having to do with the validity of the model we adopt and the assumptions we make
about our data. The more unsure we are of our model or our assumptions, the more allowances we
have to make. We would always rather fail to reject a false null hypothesis than reject a true one —
we want to control our chance of make false assertions diligently — and so we go out of our way to
be conservative about Type I errors.

Decision based on Reality


test
Ho is True Ho is False

Correct Decision (1 – alpha) Type II error (Beta)


Accept Ho
Confidence Level

Type I error (alpha) Correct Decision (1 – Beta)


Reject Ho
Power of the Test

If we want the test to pick up a significant effect, it means that whenever H1 is true, it should accept
that there is significant effect.

In other words, it means that whenever H0 is false, it should accept that there is significant effect.

Again, in other words, it means that whenever H0 is false, it should reject H0. This is represented by
(1-Beta). As seen from the above table, this is defined as the power of the test.
Thus, if we want to increase the assurance that the test will pick up significant effect, it is the power
of the test that needs to be increased.

Types of Errors
There are basically two types of errors:

• Type I
• Type II
Type I Error
The type I error occurs when the researcher finds out that the relationship assumed through
research hypothesis does exist; but in reality, there is evidence that it does not exist. In this
type of error, the researcher is supposed to reject the research hypothesis and accept the null
hypothesis, but its opposite happens. The probability that researchers commit Type I error is
denoted by alpha (α).
Type II Error
The type II error is just opposite the type I error. It occurs when it is assumed that a
relationship does not exist, but in reality it does. In this type of error, the researcher is
supposed to accept the research hypothesis and reject the null hypothesis, but he does not
and the opposite happens. The probability that a type II error is committed is represented by
beta (β).

Null vs. Alternative Hypothesis – Comparative Table

Particulars Null Hypothesis Alternative Hypothesis

It refers to the statement suggesting the


It is the proposed theory or postulation
Meaning absence of statistical significance and
in a hypothesis test.
opposing the alternative hypothesis.

Accepting a null hypothesis dismisses Accepting the alternative hypothesis


Accept or
the alternative hypothesis and indicates involves rejecting the null hypothesis
reject
no statistical significance. and indicates statistical significance.

It disapproves of the central idea; It is the prime statement or argument.


Importance researchers work to reject the null So the researcher works to prove an
hypothesis. alternative hypothesis.

The null hypothesis is favored if the p- The alternate hypothesis is favored if


P-value value is higher than the statistical the p-value is lower than the statistical
significance level. significance level.
Particulars Null Hypothesis Alternative Hypothesis

States that there is no relationship States that a relationship exists


Relationship
between the variables. between variables.

If results or effects are observed, they If results or effects are observed, they
Observation
are caused by chance. are an outcome of a real cause.

Denoted by H0 H1

What is Null Hypothesis?

The null hypothesis is denoted by the symbol H0. It implies that there is no effect on the population
and that is dependent variable is not influenced by the independent variable in the study. According
to the null hypothesis, the result or effect is caused by chance and establishes no relation between
the two variables. The null hypothesis is generally based on a previous analysis or specialized
knowledge. The main types of null hypotheses are simple, composite, exact, and inexact hypotheses.

To justify the research hypothesis or the argument coined by the researcher, the null hypothesis
constructed against the alternative hypothesis must be proved wrong. It can only turn out in two ways,
either getting rejected or accepted, depending upon the experimental data and nature of the scenario
taken for observation. The null hypothesis is accepted if the statistical test provides no satisfactory
evidence proving the anticipated effect on the population. Furthermore, incorrectly rejecting the null
hypothesis points to type I error (false positive conclusion), and incorrectly failing to reject the null
hypothesis results in type II error (false negative conclusion).

What is Alternative Hypothesis?

The symbol H1 or Ha denotes the alternative hypothesis. It can be based on limited evidence or belief.
It implies the effect on the population; the independent variable influences the dependent variable in
the study. The alternative hypothesis can be one-sided (directional) or two (non-directional).

The alternative hypothesis defines a statistically substantial relationship between the two variables. It
can be based on limited evidence or belief. From the researcher’s perspective, this statement stands
correct and thus works to reject the contrasting null hypothesis to replace it with the new or improved
theory. The researcher predicts the distinguishing factors between the two variables, ensuring that
the data observed is not due to chance.

Definition of Significance Testing

In statistics, it is important to know if the result of an experiment is significant enough or not. In order
to measure the significance, there are some predefined tests which could be applied. These tests are
called the tests of significance or simply the significance tests.

This statistical testing is subjected to some degree of error. For some experiments, the researcher is
required to define the probability of sampling error in advance. In any test which does not consider
the entire population, the sampling error does exist.
A test of significance is a formal procedure for comparing observed data with a claim

(also called a hypothesis), the truth of which is being assessed.

• The claim is a statement about a parameter, like the population proportion p or the population
mean µ.
• The results of a significance test are expressed in terms of a probability that measures how
well the data and the claim agree.

Tests of Significance: The Four-Step Process

1) State the null and alternative hypotheses.


2) Calculate the test statistic.
3) Find the P-value (using a table or statistical software).
4) Compare P-value with α and decide whether the null hypothesis should be
rejected or accepted.

Tests of Significance in Statistics

Technically speaking, the statistical significance refers to the probability of a result of some statistical
test or research occurring by chance. The main purpose of performing statistical research is basically
to find the truth. In this process, the researcher has to make sure about the quality of sample,
accuracy, and good measures which need a number of steps to be done. The researcher has to
determine whether the findings of experiments have occurred due to a good study or just by fluke.

The significance is a number which represents probability indicating the result of some study has
occurred purely by chance. The statistical significance may be weak or strong. It does not necessarily
indicate practical significance. Sometimes, when a researcher does not carefully make use of language
in the report of their experiment, the significance may be misinterpreted.

The psychologists and statisticians look for a 5% probability or less which means 5% results occur due
to chance. This also indicates that there is a 95% chance of results occurring NOT by chance. Whenever
it is found that the result of our experiment is statistically significant, it refers that we should be 95%
sure the results are not due to chance.

You might also like