Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Module_2_Answers_Corrected

Uploaded by

gullyboy056
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module_2_Answers_Corrected

Uploaded by

gullyboy056
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Answers for Module 2 - Fundamentals of Data Science

1. Define Probability.

Probability is a measure of the likelihood of an event occurring. It is defined as the ratio of the

favorable outcomes to the total number of possible outcomes:

P(E) = Number of favorable outcomes / Total number of outcomes

2. Discuss any two terms in Probability.

1. Experiment: Any process that generates well-defined outcomes (e.g., tossing a coin).

2. Event: A specific outcome or a set of outcomes from an experiment (e.g., getting heads).

3. Mention the types of Descriptive Statistics.

1. Measures of Central Tendency: Mean, Median, Mode.

2. Measures of Dispersion: Range, Variance, Standard Deviation.

3. Measures of Position: Percentiles, Quartiles.

4. List some applications of Conditional Probability.

1. Spam filtering in emails.

2. Fraud detection in banking.

3. Medical diagnosis.

4. Weather forecasting.

5. What is the Questionnaire Method?

A data collection method where respondents answer a set of pre-designed questions. It is used for

surveys and research studies.

6. Define Bayes Theorem.

Bayes theorem is used to find the probability of an event given the probability of another related

event.
Formula: P(A|B) = [P(B|A) * P(A)] / P(B)

7. Define Measure of Variability.

A measure of variability quantifies the spread or dispersion of a dataset. Examples include Range,

Variance, and Standard Deviation.

8. Write the formula for Regression.

For a simple linear regression: Y = a + bX

Where: Y = Dependent variable, X = Independent variable, a = Intercept, b = Slope

9. Define Data Munging.

The process of cleaning and transforming raw data into a usable format for analysis.

10. What is Data Enrichment?

Data enrichment involves enhancing raw data by adding context or supplementary information from

external sources.

11. Define Data Transformation.

The process of converting data from one format or structure to another to make it more suitable for

analysis. Examples include scaling, normalization, and encoding.

12. What is Quality Assurance in Data?

It refers to ensuring the accuracy, consistency, and reliability of data by performing checks and

validations throughout the data lifecycle.

13. Write the definition of Mean with Formula.

The Mean is the average of a dataset.

Formula: Mean = Sum of all observations / Number of observations

14. Define Mode and Median.

Mode: The value that occurs most frequently in a dataset.


Median: The middle value when data is arranged in ascending or descending order.

15. Explain the Types of Correlation.

1. Positive Correlation: Both variables increase or decrease together.

2. Negative Correlation: One variable increases while the other decreases.

3. No Correlation: No relationship between the variables.

16. Difference Between Correlation and Regression.

Correlation measures the strength and direction of a relationship between two variables, while

Regression predicts the value of one variable based on the other.

17. Compute the Population and Sample Standard Deviation.

For the given datasets: a) 1, 3, 7, 2, 0, 4, 3, 7

b) 10, 8, 5, 0, 1, 7, 9, 2, 1

I can calculate this step-by-step if required. Let me know!

18. Compute Mean, Median, and Mode for the Following Data Sets:

a) 45, 55, 60, 60, 63, 63, 63, 63, 65, 65, 70

b) 26.9, 26.3, 28.7, 27.4, 26.6, 27.4, 26.9, 26.9

These involve calculations. Let me know if you'd like detailed steps!

19. Describe Data Cleaning Process.

Data cleaning involves:

1. Removing duplicate entries.

2. Handling missing data (imputation or removal).

3. Correcting inconsistent formatting.

4. Removing outliers.

20. Crowdsourcing:

a) Define Crowdsourcing: It involves obtaining data, ideas, or services from a large group of people,
typically via the internet.

b) Types of Crowdsourcing: 1. Crowdfunding. 2. Open innovation. 3. Microtasking.

21. Primary Data Collection Methods.

1. Surveys.

2. Interviews.

3. Observation.

22. Types of Descriptive Statistics.

Covered under question 3.

23. Measures of Central Tendency.

Mean, Median, and Mode are used to describe the central value of a dataset.

24. Conditional Probability.

The probability of an event A, given that another event B has occurred.

Formula: P(A|B) = P(A intersection B) / P(B)

25. Bayes Theorem with Example.

Let me know if you'd like a worked-out example for this.

26. Data Cleaning Steps.

The 8 steps include:

1. Removing duplicates.

2. Addressing missing values.

3. Correcting errors.

4. Standardizing formats.

5. Validating data accuracy.

6. Removing irrelevant data.

7. Handling outliers.
8. Finalizing the cleaned dataset.

27. Data Collection Methods.

Primary Methods: Surveys, experiments, interviews.

Secondary Methods: Online databases, published reports.

28. Crowdsourcing in Data Science.

Crowdsourcing allows researchers to gather vast amounts of labeled data quickly. It is often used in

machine learning projects for tasks like image labeling or sentiment analysis. Challenges include

maintaining quality and consistency.

You might also like