Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Fds Answers

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 53

2 MARKS

1. Enumerate the categories of data used in data science.


*Structured
*Unstructured
*Natural language
*Machine-generated
*Graph-based
*Audio,video and images
*Streaming
2. List the characteristics of a data warehouse
*Subject-oriented
*Time-variant
*Integrated
*Non-volatile
3. Name two common techniques for handling missing data in a
data set
*Mistakes during data entry - Manual overrules.
*Redundant white space - using string functions.
*Impossible values - Manual overrules.
*Missing values - Remove observation or value .
4.Demonstrate to collect z-score value.
z-scores are "standard scores".
A z-score states the position of a raw score in relation to the mean of
the distribution, using the standard deviation as the unit of
measurement.

Z = raw score mean standard deviation

for a population: z = Χ – μ / σ
5. Define single and multiple linear regression.
Single Linear Regression: Single linear regression is a statistical
method used to model the relationship between one independent
variable and one dependent variable by fitting a linear equation to
the observed data. The equation for a simple linear regression model
is typically represented as:
y=mx+b
Multiple Linear Regression: Multiple linear regression is an extension
of simple linear regression that involves modeling the relationship
between two or more independent variables and one dependent
variable.The equation for a multiple linear regression model with
p predictors is:
y=b0+b1x1+b2x2+...bpxp
6.Describe correlation coefficient
The correlation coefficient (r) is a statistical measure of the strength
of relationship between two variables from the range -1 to 1.
The correlation coefficient (r) measures the strength and direction of
relationship between two quantitative variables.
7. How a pandas data frame can be constructed?
* From a dictionary
* From a list of list or numpy
* From list of dictionary
* From external source
8. Write a python code to create 1D,2D and 3D numpy arrays.
import numpy as np
# 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
# 2D array
arr_2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# 3D array
arr_3d = np.array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]]])
print("1D Array:")
print(arr_1d)
print("Shape:", arr_1d.shape)
print("\n2D Array:")
print(arr_2d)
print("Shape:", arr_2d.shape)
print("\n3D Array:")
print(arr_3d)
print("Shape:", arr_3d.shape)
9. Explain how to create 3D wire frame plot.
1.Import necessary libraries
2.Create data for the plot
3.Create the 3D plot
4.Customize the plot (optional)
5.Display the plot
10. Indicate the significance of subplots.
Subplots in data visualization libraries like Matplotlib allow for
concise comparison, contextualization, and presentation of multiple
plots within the same figure, fostering exploration, analysis, and
communication of data.

16 MARKS
11. a) 1. Illustrate the steps in data science process with diagram.
11.a)2. Explain in detail about exploratory data analysis model with
an example.
Exploratory Data analysis:
11.b)1. Describe the architecture of data warehousing.
Architecture of data warehousing
11.b)2. How do you set the research goal, retrieving data and data
preparation process in data science process?
12.a)1. Explain the types of frequency distribution with examples.
12.a)2. Generate the grouped and ungrouped frequency table for
the following data
90,92,87,88,87,92,98,90,90,87,87,88,88,89,90,87,89,92,92,92,98,
0,95,87,87
i) How many people scored 98?
ii) How many people scored 90 or less?
iii) What proportion scored 87?
12.b)1. Specify the real limits for the lowest class interval in this
frequency distribution for the given dataset.
91 85 84 79 80
87 96 75 86 104
95 71 105 90 77
123 80 100 93 108
98 69 99 95 90
110 109 94 100 103
112 90 90 98 89
To find the real limits for the lowest class interval in the frequency
distribution, we first need to determine the range of the data set,
which is the difference between the largest and smallest values.
The smallest value in the dataset is 69, and the largest value is 123.
Now, let's calculate the class width:
Class width=RangeNumber of desired classesClass width=Number of
desired classesRange
Let's assume we want to have 5 classes:
Class width=123−695=545=10.8Class width=5123−69=554=10.8
Rounding up to the nearest whole number, we get a class width of
11.
Now, to find the lower limit of the first class, subtract half of the
class width from the smallest value:
Lower limit=Smallest value−(0.5×Class width)Lower limit=Smallest val
ue−(0.5×Class width)
Lower limit=69−(0.5×11)=69−5.5=63.5Lower limit=69−(0.5×11)=69−5
.5=63.5
And to find the upper limit of the first class, add half of the class
width to the lower limit:
Upper limit=Lower limit+Class widthUpper limit=Lower limit+Class wi
dth Upper limit=63.5+11=74.5Upper limit=63.5+11=74.5
So, the real limits for the lowest class interval are 63.5 to 74.5.
12.b)2.Analyze how graphs are used to represent qualitative and
quantitative data.
13.a)1. Calculate the value of r using computation formula for the
following data FRIENDS SENT RECEIVED Dories 13 14 Steve 9 18
Mike 7 12 Andrea 5 10 John 1 6

13.a)2. Discuss about the standard error of estimation.


13.b)1. Demonstrate how correlation coefficient can be calculated
for the quantitative data.
13.b)2. Explain the different types of regression analysis in detail.
14. a) 1. Write a note on the basics of numpy arrays with example
NumPy, short for Numerical Python, is a fundamental package for
numerical computing in Python. It provides support for multi-
dimensional arrays and matrices, along with a variety of
mathematical functions to operate on these arrays efficiently.

Basics of NumPy Arrays:

Creating Arrays:
NumPy arrays can be created using various methods such as
numpy.array(), numpy.zeros(), numpy.ones(), numpy.arange(), etc.

Example:

python
Copy code
import numpy as np

# Creating a 1D array
arr1d = np.array([1, 2, 3, 4, 5])

# Creating a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# Creating an array of zeros


zeros_arr = np.zeros((2, 3))

# Creating an array of ones


ones_arr = np.ones((3, 2))

# Creating an array with a range of values


range_arr = np.arange(1, 10, 2)
Array Attributes:
NumPy arrays have attributes like shape, dtype, and ndim to describe
their shape, data type, and number of dimensions, respectively.

Example:

python
Copy code
print(arr1d.shape) # Output: (5,)
print(arr2d.dtype) # Output: int64
print(arr2d.ndim) # Output: 2
Array Operations:
NumPy arrays support various mathematical operations like addition,
subtraction, multiplication, division, exponentiation, etc., either
element-wise or matrix operations.

Example:
python
Copy code
# Element-wise addition
result = arr1d + arr1d

# Matrix multiplication
mat_mul = np.dot(arr2d, arr2d)
Indexing and Slicing:
Similar to Python lists, NumPy arrays support indexing and slicing to
access elements or subarrays.

Example:

python
Copy code
print(arr1d[0]) # Output: 1
print(arr2d[1, 1]) # Output: 5
print(arr1d[2:]) # Output: [3, 4, 5]
Universal Functions (ufuncs):
NumPy provides a wide range of mathematical functions known as
universal functions (ufuncs) that operate element-wise on arrays.

Example:
python
Copy code
# Square root of array elements
sqrt_arr = np.sqrt(arr1d)

# Exponential function
exp_arr = np.exp(arr1d)
NumPy arrays form the foundation for numerical computing in
Python and are extensively used in various scientific and engineering
applications. Write a note on the basics of numpy arrays with
example

NumPy, short for Numerical Python, is a fundamental package for


numerical computing in Python. It provides support for multi-
dimensional arrays and matrices, along with a variety of
mathematical functions to operate on these arrays efficiently.

Basics of NumPy Arrays:

Creating Arrays:
NumPy arrays can be created using various methods such as
numpy.array(), numpy.zeros(), numpy.ones(), numpy.arange(), etc.

Example:

python
Copy code
import numpy as np

# Creating a 1D array
arr1d = np.array([1, 2, 3, 4, 5])

# Creating a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# Creating an array of zeros


zeros_arr = np.zeros((2, 3))

# Creating an array of ones


ones_arr = np.ones((3, 2))

# Creating an array with a range of values


range_arr = np.arange(1, 10, 2)
Array Attributes:
NumPy arrays have attributes like shape, dtype, and ndim to describe
their shape, data type, and number of dimensions, respectively.

Example:

python
Copy code
print(arr1d.shape) # Output: (5,)
print(arr2d.dtype) # Output: int64
print(arr2d.ndim) # Output: 2
Array Operations:
NumPy arrays support various mathematical operations like addition,
subtraction, multiplication, division, exponentiation, etc., either
element-wise or matrix operations.

Example:

python
Copy code
# Element-wise addition
result = arr1d + arr1d

# Matrix multiplication
mat_mul = np.dot(arr2d, arr2d)
Indexing and Slicing:
Similar to Python lists, NumPy arrays support indexing and slicing to
access elements or subarrays.

Example:

python
Copy code
print(arr1d[0]) # Output: 1
print(arr2d[1, 1]) # Output: 5
print(arr1d[2:]) # Output: [3, 4, 5]
Universal Functions (ufuncs):
NumPy provides a wide range of mathematical functions known as
universal functions (ufuncs) that operate element-wise on arrays.

Example:

python
Copy code
# Square root of array elements
sqrt_arr = np.sqrt(arr1d)

# Exponential function
exp_arr = np.exp(arr1d)
NumPy arrays form the foundation for numerical computing in
Python and are extensively used in various scientific and engineering
applications.
14.a)2. Illustrate data manipulation with pandas.

import pandas as pd

# Create a sample DataFrame


data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 35, 42, 31],
'Gender': ['M', 'F', 'M', 'F'],
'City': ['New York', 'Paris', 'Tokyo', 'London']}
df = pd.DataFrame(data)

# Display the original DataFrame


print("Original DataFrame:")
print(df)

# Filter the DataFrame based on a condition


filtered_df = df[df['Age'] > 30]

# Display the filtered DataFrame


print("\nFiltered DataFrame (Age > 30):")
print(filtered_df)

# Sort the DataFrame by a column


sorted_df = df.sort_values(by='Age')

# Display the sorted DataFrame


print("\nSorted DataFrame (by Age):")
print(sorted_df)
# Group the DataFrame by a column and calculate statistics
grouped_df = df.groupby('Gender').agg({'Age': ['mean', 'median']})

# Display the grouped DataFrame


print("\nGrouped DataFrame (by Gender, with mean and median
Age):")
print(grouped_df)
14.b)1 Describe about fancy indexing with an example.
Fancy indexing is a technique in Python's NumPy library that allows
accessing multiple elements of an array at once using an array of
indices or boolean masks.

Here's an example:

python
Copy code
import numpy as np

# Creating an array
arr = np.array([10, 20, 30, 40, 50])

# Fancy indexing with an array of indices


indices = np.array([0, 2, 4])
result1 = arr[indices]
print(result1) # Output: [10 30 50]
# Fancy indexing with a boolean mask
mask = np.array([True, False, True, False, True])
result2 = arr[mask]
print(result2) # Output: [10 30 50]
In the example, arr[indices] returns elements at indices 0, 2, and 4,
while arr[mask] returns elements where the corresponding mask
value is True.
14.b)2 What is a pivot table? Explain in detail

A pivot table is a data


summarization tool used in spreadsheet programs like Microsoft
Excel, Google Sheets, or other similar software. It allows users to
rearrange and summarize selected columns and rows of data into a
more meaningful format without altering the original data set.
Here's a detailed explanation of how pivot tables work:
Data Source: The pivot table starts with a source data range. This
data can be organized in rows and columns, typically with headings
in the first row and records in subsequent rows.
Selection: Users select the data range they want to analyze using the
pivot table. This range can include multiple columns and rows of
data.
Field Assignment: The user assigns the selected fields to different
areas within the pivot table layout. There are typically four areas
where fields can be placed:
Rows: Fields placed in the rows area form the rows of the pivot table.
Each unique value in these fields will create a separate row in the
pivot table.
Columns: Fields placed in the columns area form the columns of the
pivot table. Each unique value in these fields will create a separate
column in the pivot table.
Values: Fields placed in the values area are used to perform
calculations or summarizations. For example, you can calculate sums,
averages, counts, etc., for the values in these fields.
Filters: Fields placed in the filters area allow users to filter the data
displayed in the pivot table based on specific criteria.
Summarization: Once the fields are assigned, the pivot table
organizes and summarizes the data according to the layout defined
by the user. It can perform various calculations and aggregations
based on the values in the data set.
Dynamic Updating: Pivot tables are dynamic, meaning that they can
be easily updated if the source data changes. Users can refresh the
pivot table to reflect any modifications in the underlying data.
Customization: Pivot tables offer various customization options such
as sorting, filtering, formatting, and changing the calculation type for
values.
Data Analysis: Pivot tables help users analyze large data sets quickly
by providing insights into trends, patterns, and relationships within
the data.
Overall, pivot tables are powerful tools for summarizing, analyzing,
and visualizing data in a flexible and dynamic manner, making them
widely used in data analysis and reporting tasks.
15.a)1. Compare the line plot and scatter plot with example.
Line plot
Scatter plot
15.a) 2 Describe 3D plotting with examples
3D plotting is a visualization technique used to represent data points
or mathematical functions in three-dimensional space. Here are
examples of 3D plotting using Python's popular libraries, Matplotlib
and Plotly.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x2 + y2))
# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
15.b)1. Analyze the different types of data visualization tools with
examples.
Data visualization tools can be categorized into several types based
on their features, functionalities, and intended use cases. Here's an
analysis of some common types along with examples:
Charting Libraries:
These libraries provide a wide range of chart types (e.g., line charts,
bar charts, pie charts) that users can customize and embed into
applications or websites.
Examples:
Chart.js: A flexible JavaScript charting library that offers various chart
types and is easy to use.
D3.js: A powerful JavaScript library for creating interactive data
visualizations using SVG, HTML, and CSS. It provides low-level
building blocks for creating custom visualizations.
Business Intelligence (BI) Tools:
These tools are designed for analyzing and visualizing large datasets
to gain insights and make data-driven decisions.
Examples:
Tableau: A popular BI tool that allows users to create interactive
dashboards and visualizations from various data sources.
Power BI: Microsoft's business analytics service that enables users to
create interactive reports and dashboards with a wide range of data
sources.
Mapping Tools:
These tools specialize in visualizing geographical data and creating
maps with various layers and overlays.
Examples:
Mapbox: A mapping platform that allows users to create custom
maps and add data overlays for interactive visualizations.
Google Maps API: Offers tools for embedding Google Maps into
applications and customizing them with data overlays and markers.
Statistical Software with Visualization Capabilities:
Statistical software packages often include built-in data visualization
tools for analyzing and presenting statistical data.
Examples:
R: A programming language and environment for statistical
computing and graphics. It provides a wide range of packages for
creating static and interactive visualizations.
Python with Matplotlib and Seaborn: Python libraries commonly
used for creating static visualizations, especially in data analysis and
scientific computing.
Dashboarding Platforms:
These platforms allow users to create customizable dashboards with
multiple visualizations to monitor key metrics and KPIs.
Examples:
Kibana: An open-source data visualization dashboard for
Elasticsearch that enables users to explore, visualize, and analyze
data.
Grafana: An open-source analytics and monitoring platform that
allows users to create dynamic dashboards with metrics, logs, and
other data sources.
Specialized Visualization Tools:
These tools focus on specific types of data or industries, providing
tailored solutions for particular needs.
Examples:
Plotly: A Python graphing library that specializes in creating
interactive visualizations for data analysis and presentation.
Sigma.js: A JavaScript library for visualizing complex networks and
graphs, commonly used in network analysis and social network
visualization.
Each type of data visualization tool has its strengths and weaknesses,
and the choice often depends on factors such as data complexity,
user requirements, and integration capabilities.
Example
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Create scatter plot


sns.scatterplot(x=x, y=y)

# Add labels and title


plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot Example')

# Show the plot


plt.show()
15.b)2. How graphical data can be projected using matplotlib? Give
example
Matplotlib is a popular Python library for creating static, animated,
and interactive visualizations. Here's a basic example of how you can
use Matplotlib to project graphical data:
pythonCopy code
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11] # Plotting the data plt.plot(x, y) # Adding labels and
title plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Sample Graph')
plt.show()
Example
import numpy as np
import matplotlib.pyplot as plt

# Generate x values from 0 to 2*pi


x = np.linspace(0, 2*np.pi, 100) # 100 points between 0 and 2*pi
y = np.sin(x) # Compute corresponding y values (sine of x)

# Plot the sine wave


plt.plot(x, y)

# Add labels and title


plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Sine Wave')

# Show the plot


plt.show()

You might also like