Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Assignment2 DataViz

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

1. Convert a continuous variable into quantile bins and plot them using countplots.

Use dataset
diamonds and do this for column x,y,z

To visualize the distribution of continuous variables (x, y, z) in the "diamonds" dataset by converting
them into quantile bins and creating count plots, you can use the Seaborn library in Python. Here's
how to do it:

python

Copy code

import seaborn as sns

import matplotlib.pyplot as plt

# Load the diamonds dataset

diamonds = sns.load_dataset("diamonds")

# Specify the columns to be converted into quantile bins

columns = ["x", "y", "z"]

# Create quantile bins for the specified columns

for column in columns:

diamonds[f"{column}_quantile"] = pd.qcut(diamonds[column], q=5) # You can adjust the number


of quantiles as needed

# Create count plots for each quantile bin

plt.figure(figsize=(12, 6))

for i, column in enumerate(columns):

plt.subplot(1, 3, i + 1)

sns.countplot(data=diamonds, x=f"{column}_quantile")

plt.title(f"Distribution of {column} in Quantile Bins")

plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

2. Plot a bivariate relationship between atll the categorical and numeric columns of housing
dataset to understand the relationship between variables. What approaches you took?

To plot a bivariate relationship between all the categorical and numeric columns of the "housing"
dataset,I can use various approaches depending onIr goals. Here are a few common approaches to
understanding the relationships between variables:

Pairplots:

I can create pairplots using Seaborn to visualize the relationships between all pairs of numerical and
categorical variables. This will result in a grid of scatter plots for numeric-numeric relationships and
bar plots for categorical-numeric relationships. Here's an example:

python

Copy code

import seaborn as sns

import matplotlib.pyplot as plt

# Load the housing dataset

housing = sns.load_dataset("housing")

# Create a pairplot to visualize relationships

sns.pairplot(housing, hue='categorical_column')

plt.show()

In the pairplot, the "hue" parameter is used to differentiate data points based on a categorical
column to help distinguish between categories.

Box Plots and Violin Plots:

I can use box plots or violin plots to visualize the distribution of numeric variables across different
categories in the categorical columns. This helpsI understand the spread and central tendency of
numeric data within each category. Here's an example:

python

Copy code

import seaborn as sns

import matplotlib.pyplot as plt


# Load the housing dataset

housing = sns.load_dataset("housing")

# Create box plots or violin plots to visualize relationships

plt.figure(figsize=(12, 6))

for numeric_column in housing.select_dtypes(include=['float64', 'int64']):

plt.subplot(1, 2, 1)

sns.boxplot(x='categorical_column', y=numeric_column, data=housing)

plt.title(f'Box Plot of {numeric_column} by Category')

plt.subplot(1, 2, 2)

sns.violinplot(x='categorical_column', y=numeric_column, data=housing)

plt.title(f'Violin Plot of {numeric_column} by Category')

plt.tight_layout()

plt.show()

In this example,I create box plots and violin plots to visualize how numeric variables vary by category
in the categorical column.

Heatmap for Correlation:

I can create a heatmap to visualize the correlation between numerical variables. While this approach
doesn't directly show the relationship with categorical variables, it helps identify relationships
between numeric columns. Here's an example:

python

Copy code

import seaborn as sns

import matplotlib.pyplot as plt

# Load the housing dataset

housing = sns.load_dataset("housing")
# Calculate the correlation matrix

corr_matrix = housing.select_dtypes(include=['float64', 'int64']).corr()

# Create a heatmap to visualize the correlation

plt.figure(figsize=(8, 6))

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

plt.title("Correlation Heatmap")

plt.show()

This heatmap shows the correlation between numeric variables in the dataset. High positive or
negative correlations may indicate strong relationships between numeric columns.

3. Generate a heatmap between all the continuous variables of 'housing' dataset, make sure you
understand that what continuous dataset is in the given the input dataset of housing

To generate a heatmap between all the continuous variables in the "housing" dataset, we first need
to identify which columns in the dataset are continuous variables. Continuous variables are typically
numerical values that can take on a wide range of values and are not limited to distinct categories. In
a housing dataset, continuous variables might include features like "square footage," "number of
bedrooms," "price," "age of the property," etc.

Here's how you can generate a heatmap for the correlation between continuous variables in the
"housing" dataset:

python

Copy code

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

# Load the housing dataset

housing = sns.load_dataset("housing")

# Select only the continuous variables (numerical columns)


continuous_columns = housing.select_dtypes(include=['float64', 'int64'])

# Calculate the correlation matrix for continuous variables

correlation_matrix = continuous_columns.corr()

# Create a heatmap to visualize the correlation

plt.figure(figsize=(10, 8))

sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f")

plt.title("Correlation Heatmap of Continuous Variables")

plt.show()

In this code, we first select only the continuous variables (numerical columns) in the "housing"
dataset using select_dtypes. Then, we calculate the correlation matrix between these continuous
variables and create a heatmap using Seaborn. The heatmap shows the correlation coefficients
between the continuous variables, which helps you understand the relationships and dependencies
between them. High positive or negative correlations indicate a strong relationship between the
corresponding variables, while values close to zero suggest weaker or no correlation.

4. Find roots of the equation of : x^2 - 6x + 5 and point out the roots on matploltlib graph.

To find the roots of the quadratic equation x^2 - 6x + 5, I can use the quadratic formula:

x = (-b ± √(b² - 4ac)) / 2a

In this equation, a = 1, b = -6, and c = 5. Plug these values into the quadratic formula to find the
roots:

x = (-(-6) ± √((-6)² - 4 * 1 * 5)) / (2 * 1)

x = (6 ± √(36 - 20)) / 2

x = (6 ± √16) / 2

x = (6 ± 4) / 2

So, the roots of the equation x^2 - 6x + 5 are x = 5 and x = 1.


Now, let's create a matplotlib graph to point out these roots on the graph:

python

Copy code

import matplotlib.pyplot as plt

import numpy as np

# Define the quadratic equation

def quadratic_equation(x):

return x**2 - 6*x + 5

# Generate x values for the plot

x = np.linspace(-1, 7, 400)

y = quadratic_equation(x)

# Plot the quadratic equation

plt.plot(x, y, label='y = x^2 - 6x + 5')

# Mark the roots on the graph

root1 = 5

root2 = 1

plt.scatter([root1, root2], [0, 0], color='red', label='Roots', marker='o', s=100)

# Label the roots

plt.text(root1, -1, f'Root 1: {root1}', ha='center')

plt.text(root2, -1, f'Root 2: {root2}', ha='center')

# Add labels and legend

plt.xlabel('x')

plt.ylabel('y')

plt.axhline(0, color='black', lw=0.5)


plt.axvline(0, color='black', lw=0.5)

plt.grid(True)

plt.legend()

# Show the plot

plt.show()

This code defines the quadratic equation, generates x values for the plot, and then plots the
equation. The roots are marked on the graph as red points and labeled accordingly. The graph helps
visualize the quadratic equation and the location of its roots.

5. . Graph sin(x) and cos(x) in matplotlib and represent them on matplotlib, make sure these plots
should be done in one plot only.

I can plot both the sine (sin(x)) and cosine (cos(x)) functions in one plot using Matplotlib. Here's how
you can do it:

python

Copy code

import matplotlib.pyplot as plt

import numpy as np

# Generate x values

x = np.linspace(0, 2 * np.pi, 100) # Range from 0 to 2π

# Calculate y values for sin(x) and cos(x)

y_sin = np.sin(x)

y_cos = np.cos(x)

# Create a plot with both sin(x) and cos(x)

plt.figure(figsize=(8, 6))

plt.plot(x, y_sin, label='sin(x)', color='blue')

plt.plot(x, y_cos, label='cos(x)', color='red')


# Set axis labels and a legend

plt.xlabel('x')

plt.ylabel('y')

plt.title('Plot of sin(x) and cos(x)')

plt.legend()

# Show the plot

plt.grid(True)

plt.show()

In this code, we generate a range of x values from 0 to 2π, calculate the corresponding y values for
sin(x) and cos(x), and then create a plot that includes both functions. The plt.plot() function is used
to add the sine and cosine curves to the same plot, and plt.legend() is used to provide labels for the
curves. The resulting plot displays both sin(x) and cos(x) on the same set of axes.

6. Generate a 3D plot to represent using numpy meshgrid to reprsent a peak and a valley in a
graph

I can generate a 3D plot to represent a peak and a valley in a graph using NumPy and Matplotlib.
Here's an example of how to create such a plot:

python

Copy code

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

# Create a grid of x and y values using NumPy's meshgrid

x = np.linspace(-5, 5, 100)

y = np.linspace(-5, 5, 100)

X, Y = np.meshgrid(x, y)

# Define a function for the peak and valley

# You can use any mathematical function that represents the shape you want

# Here, we use a simple peak and valley function as an example


Z_peak = np.sin(np.sqrt(X**2 + Y**2)) # A peak

Z_valley = -np.sin(np.sqrt(X**2 + Y**2)) # A valley

# Create a 3D plot for the peak

fig_peak = plt.figure()

ax_peak = fig_peak.add_subplot(111, projection='3d')

ax_peak.plot_surface(X, Y, Z_peak, cmap='viridis')

ax_peak.set_title("Peak")

# Create a 3D plot for the valley

fig_valley = plt.figure()

ax_valley = fig_valley.add_subplot(111, projection='3d')

ax_valley.plot_surface(X, Y, Z_valley, cmap='inferno')

ax_valley.set_title("Valley")

plt.show()

In this code, we create two separate 3D plots, one for the peak and one for the valley. We use
NumPy's meshgrid to generate a grid of x and y values. Then, we define mathematical functions for
the peak and the valley, which determine the shape of the surface in the 3D plot. Finally, we create
separate 3D plots for the peak and valley using Matplotlib.

You can modify the functions and the grid range to represent different shapes and landscapes in your
3D plots.

7. Why do you think it is important to melt the dataframe before we do some plotting, In what
scenarios this will be important

Melting a DataFrame, also known as "unpivoting" or "reshaping," is an important data preprocessing


step when working with certain types of datasets and plotting scenarios. The process of melting
essentially involves converting wide-format data into long-format data. Here's why it's important and
in what scenarios it's necessary:

Facilitating Plotting with Seaborn and Other Libraries:

Many plotting libraries, including Seaborn and Plotly, are designed to work with data in a specific
format, often in long-form or tidy data. Melting your DataFrame allows you to reshape your data into
the appropriate format to create various types of plots easily. Seaborn, for instance, expects data to
be in long-form for many of its plotting functions.

Categorical and Numerical Variables for Faceting:

Melting is useful when you want to create plots with facet grids or when you need to compare
different categories. By melting your DataFrame, you can put categorical variables into a single
column, making it easier to create faceted plots and visualize data across different categories.

Time Series Data:

Time series data often comes in wide-format, where each time point is a separate column. To work
with time series data efficiently, you may need to melt it into a long format where time is a single
column. This allows you to create time series plots and perform time-related analyses.

Aggregating Data for Specific Analyses:

When you have data that needs to be aggregated or summarized in specific ways for analysis and
plotting, melting can be useful. For example, if you have a DataFrame with multiple columns
representing different periods (e.g., months or years) and you want to create a time series plot, you
would melt the data to have one column for the time period and another for the corresponding
values.

Handling Multivariate Data:

Melting can be crucial when dealing with multivariate data, where each variable or measurement is
represented in a separate column. Converting the data into a long format makes it easier to visualize
relationships between variables and conduct multivariate analyses.

Creating Stacked Bar Charts and Heatmaps:

Stacked bar charts and heatmaps often require data in a specific format, where one column
represents the grouping variable and another represents the values. Melting can help you achieve
this format for effective visualization.

In summary, melting a DataFrame is important when you need to prepare your data for specific
plotting and analysis tasks. It's a way to restructure data to make it more amenable to visualization
and to ensure that it's in the appropriate format for the tools and libraries you're using. The exact
scenarios where melting is important will depend on the nature of your data and the specific analysis
and visualization tasks you need to perform.

You might also like