Distributions in Data Science
Distributions in Data Science
(1) If a card is chosen from a standard deck of cards, what is the probability of getting a five
or a seven?
(a) 4/52
(b) 1/26
(c) 8/52
(d) 1/169
(a) Each value in the set of possible values has the exact same possibility of happening.
(b) Have a constant probability of success
(c) Has only two possible outcomes
(d) Must have at least 3 trials
Ans: (a) Each value in the set of possible values has the exact same possibility of
happening.
(a) Probability
(b) Distribution
(c) Event
(d) Random Experiment
Ans: (c) Event
(a) Continuous
(b) Discrete
(a) A table
(b) A graph
(c) A Mathematical Equation
(d) All of these
Ans: (d) All of these
(7) What is the probability that a ball is drawn at random from a jar?
(a) 0.1
(b) 1
(c) 0.5
(d) 0
Standard Questions
(1) Explain what distributions in data science with the help of two examples is?
Ans: Distribution in data science is a method which shows the probable values for a
variable and how often they occur.
While the concept of probability gives us the mathematical calculations, distributions help
us actually visualize what is happening underneath. For example, consider a coin which
has two sides, head and tail. The probability of getting the head is 0.5. The probability of
getting the tail is 0.5 and so on. You can be sure that you have exhausted all the values
when the sum of probabilities is equal to 1% to 100%. For all other values apart from this,
the probability of occurrence is zero.
Every probability distribution is associated with a graph which describes the likelihood of
occurrence of each event. Below graph represents our example. This type of distribution is
called as a Uniform Distribution.
Probability Table for Tossing two coin Now, let us extend our problem statement to tossing
two coins. By looking at the graph we can understand that probability of getting a head in
both the coins is 0.25. Similarly, getting a head in one coin and tail in another coin is 0.25.
Probability of getting tail in one coin and head in another coin is 0.25. And probability
of getting a tail in both the coins is 0.25.
Some questions are been asked for collecting data such as How tall is the plant? Many
other such type of data collection questions can be asked in order to answer the statistical
investigative questions. The plants which gets exposed to sunlight grows faster?
There are some features statistical investigative questions which needs to be understood
before predicting the differences and are much important. The variables of interest much
be transparent, the group or population that the question is focused on must be clear, is
question requiring for the description of data, is the question comparing variables across
two or more groups is the question of looking at association of two variables, the question
should be about the whole group and not and not about an individual, the question should
be answered through data collection with the data in hand, and the question should be
purposeful.
(b) Collect/consider the data: This step is recalled as the acknowledging variability while
designing for differences.
Data collection designs must understand the differences in the data. Statistical Process
Control and random sampling are the two methods which can help in detecting the
changes in the data and reduce them. Designs of Experiments are the method which are
used for testing the induce variabilities.
The data which is collected whether as the first hand (freshly/new data) or the second hand
(collected from other sources) needs interrogation. For ex :- We needs to answer or explain
the certain questions in regards to how the variables are different as per the type, what are
the possible results/outcomes of the variables, and how the data was collected. Such
questions are needed to explain whether the data is answerable to the statistical
investigation questions. The scope of generalizability and the possible limitations in
analysis and interpretations are been affected by data collection designs.
(c) Analyse the data: It can be also called as the step of accounting variability while the
distributions. In the case of data analysing we have to understand its variability. Giving
reasons in regards to the distributions is the key accounting for and describing variability
for all the developing levels. In order to compare, describe, and explore the distributions
variability graphical displays and numerical summaries are used. For ex :- In the box plots
or comparative dot plots are used for showing the batting averages of both the teams i.e is
Indian Cricket Team and Australia Cricket Team for specific year. These graphs helps us in
differentiating batting averages team distributions. By separating the distributions of the
two teams or by describing the overlap we can consider the variability.
(d) Interpret the data: This step is also recalled as the permitting for the variations while
considering the data. You’ll come to know that mostly statistical interpretations are made
in the presence of variabilities and are often taken into considerations. The two sources of
variability such as randomization to treatment group, and variability from individual to
individual are to be remembered when interpreting the results of the randomized
comparative medical experiment. When the results are been declared generally and when
look back towards the moment while collecting and studying the data, we consider such
variability sources.
(3) Explain low distributions are broadly categorized, support your answer with appropriate
example for each category.
Ans:
Depending on the type of data we use, we have grouped distributions into two categories,
discrete distributions for discrete data (finite outcomes) and continuous distributions for
continuous data (infinite outcomes).
Continuous data
Continuous data is a type of information that can range from one extreme to
another, usually measured on a scale such as temperature or weight. It can also
be presented in the form of a histogram which allows for easier comparison
and understanding between different sets of data. With Continuous Data, you
are able to gain insights into trends and relationships that might not ordinarily
be seen with other types of datasets.
Discrete data
Discrete data has a limited set of values and ranges, such as countable
elements like the student population in a classroom or cars passing through an
intersection. Representing this kind of information with bar graphs allows for
quick understanding at-a-glance!
Ans: This method is involves imagining/predicting the differences before starting with the
actual process. Framing of statistical questions helps us understanding/identifying the
differences which leads to productive investigations. Below are some examples of the
statistical questions for identifying the changes and nourishing the process of data
collection and analysing of data subsequently.
Some questions are How tall is the plant? Where the question is answered with the single
height, therefore such question is not a type of statistical question. Some questions are
been asked for collecting data such as How tall is the plant? Many other such type of data
collection questions can be asked in order to answer the statistical investigative questions.
The plants which gets exposed to sunlight grows faster?
Different heights for different exposures of sunlight are been noticed. Which means the
plants growth due exposure of sunlight may depend upon the measurement of the plants
and may differ. While statistical investigative questions begin worth while studies, the use
of questioning is prominent throughout all four components of the statistical problem-
solving process. Such pattern of questions can be explained detailed with help of examples
at different levels. There are some features statistical investigative questions which needs
to be understood before predicting the differences and are much important. The variables
of interest much be transparent, the group or population that the question is focused on
must be clear, is question requiring for the description of data, is the question comparing
variables across two or more groups is the question of looking at association of two
variables, the question should be about the whole group and not and not about an
individual, the question should be answered through data collection with the data in hand,
and the question should be purposeful.
(5) Name five instances where you have observed a uniform distribution.
Ans:
(1) Consider that there are 60 students in your class out if which 20 get affected with cold
and flu every semester. Note down five statistical investigative questions for determining a
student’s immunity to a catching cold flu.
(2) Consider you are taking a part in an animal welfare campaign. One of the most recent
concerns raised by people is dogs not being able to tolerate sudden rise in temperature due
to global warming. Note down five statistical investigative questions to understand how
dogs react to changing weather.