Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Python Interview Prep Doc

The document serves as a Python interview preparation guide, focusing on data visualization libraries Matplotlib and Seaborn, highlighting their differences in syntax, aesthetics, and statistical visualization capabilities. It includes practical questions and answers about customizing plots, handling categorical data, and visualizing distributions and relationships. Additionally, it compares Pandas and NumPy, explaining their functionalities and use cases in data manipulation and analysis.

Uploaded by

deepali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Python Interview Prep Doc

The document serves as a Python interview preparation guide, focusing on data visualization libraries Matplotlib and Seaborn, highlighting their differences in syntax, aesthetics, and statistical visualization capabilities. It includes practical questions and answers about customizing plots, handling categorical data, and visualizing distributions and relationships. Additionally, it compares Pandas and NumPy, explaining their functionalities and use cases in data manipulation and analysis.

Uploaded by

deepali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

PYTHON INTERVIEW PREP DOC

1) LIBRARIES
2) Matplotlib vs Seaborn (SEABORN - data visualization library built on top of Matplotlib)
1) Ease of syntax
- Matplotlib - write more code to create visualizations.
- Seaborn - simplifies the process of creating appealing plots
import matplotlib.pyplot as plt

# Matplotlib Scatter Plot


plt.figure(figsize=(8, 6)) # Set the figure size
plt.scatter(df['x'], df['y'], color='blue', s=100) # Scatter points
plt.title('Matplotlib Scatter Plot') # Title
plt.xlabel('X-axis') # X-axis label
plt.ylabel('Y-axis') # Y-axis label
plt.grid(True) # Add a grid
plt.axhline(0, color='black',linewidth=0.5, ls='--') # Horizontal line
plt.axvline(0, color='black',linewidth=0.5, ls='--') # Vertical line
plt.show() # Show plot

import seaborn as sns

# Seaborn Scatter Plot


plt.figure(figsize=(8, 6)) # Set the figure size
sns.scatterplot(x='x', y='y', hue='category', data=df, s=100) # Scatter points
with hue
plt.title('Seaborn Scatter Plot') # Title
plt.grid(True) # Add a grid
plt.show() # Show plot

Requires explicit commands for setting figure size, scatter points, title, labels,
grid, and lines. More code and customization steps are involved.
Combines multiple steps into a single function call and automatically handles
color coding for the categories with the hue parameter, leading to more
concise and readable code.

2) Default Aesthetics

- basic color schemes


- visually appealing plots

3) Statistical Visualizations
- limited built-in statistical plotting functions
- specializes in statistical visualizations and offers a wide array of built-in
statistical plotting functions
To create a box plot, you need to manually calculate statistics like quartiles
and then plot them.
Creating a box plot in Seaborn requires just one function call:
While powerful for general plotting, it lacks built-in functions for statistical
visualizations, requiring more manual calculations and code.

4) Integration with Pandas

-Both of them integrate well with python

5) Customization

-extensive customization options (axis,titles etc)

- While it offers customization options, they are generally more limited

compared to Matplotlib. less flexible for detailed adjustments.

Can you explain a scenario where you would choose Seaborn over Matplotlib?

 Answer: I would choose Seaborn when I need to create statistical plots, such as pair plots or
violin plots, that require quick visualization of relationships and distributions. Seaborn
simplifies the process with built-in themes and better default aesthetics, allowing me to
focus on the analysis rather than customization.

 What are some common plot types available in Matplotlib?

 Answer: Common plot types in Matplotlib include line plots, scatter plots, bar charts,
histograms, pie charts, box plots, and error bars. These cover a wide range of visualization
needs, from simple trends to complex distributions.

 How can you customize the appearance of a plot in Matplotlib?

 Answer: Customization in Matplotlib can be done using various functions. You can change
the color and style of lines, adjust marker types, set titles and labels, customize axes limits,
and modify ticks. For example, using plt.title(), plt.xlabel(), and plt.ylabel() allows you to set
titles and labels for your plots.

Matplotlib-Specific Questions

5. How do you save a plot created with Matplotlib?

o Answer: You can save a plot using the plt.savefig("filename.png") function. You can
specify different file formats such as PNG, JPG, PDF, or SVG by changing the file
extension. Additionally, you can adjust parameters like DPI for better resolution.

6. What are subplots, and how do you create them in Matplotlib?

o Answer: Subplots allow you to create multiple plots in a single figure. You can create
them using plt.subplot(nrows, ncols, index) to specify the layout or
plt.subplots(nrows, ncols) to return a figure and an array of axes. For example,
plt.subplots(2, 2) creates a 2x2 grid of subplots.

7. How can you display multiple plots in one figure?

o Answer: You can display multiple plots in one figure using subplots. For example

fig, axs = plt.subplots(2, 2)

axs[0, 0].plot(x, y1)

axs[0, 1].scatter(x, y2)

axs[1, 0].bar(x, y3)

axs[1, 1].hist(y4)

plt.show()

What is the purpose of the figure() function in Matplotlib?

 Answer: The figure() function creates a new figure object, allowing you to manage the size,
resolution, and background color of your plots. It’s important for organizing multiple plots in
one window and setting specific properties for the figure.

Seaborn-Specific Questions

9. What is a pair plot, and when would you use it?

o Answer: A pair plot is a grid of scatter plots that displays relationships between
multiple pairs of variables in a dataset. It’s useful for visualizing the distribution of
variables and spotting correlations in high-dimensional data.

 How do you handle categorical data in Seaborn?

 Answer: Seaborn provides several functions to visualize categorical data, such as


sns.boxplot(), sns.violinplot(), and sns.countplot(). These functions allow for visual
comparisons across different categories, making it easy to understand distributions and
relationships.

 What is the purpose of the hue parameter in Seaborn?

 Answer: The hue parameter in Seaborn is used to color the data points based on a
categorical variable. This enhances the visualization by allowing you to differentiate between
groups within the same plot, making it easier to observe relationships.

 How can you create a heatmap in Seaborn?

 Answer: You can create a heatmap in Seaborn using the sns.heatmap() function. This
function visualizes data in a matrix format with color coding to represent values, making it
useful for displaying correlations or frequencies.

Practical Questions

13. Given a dataset, how would you visualize the distribution of a numeric variable?
o Answer: I would use a histogram to visualize the distribution. In Matplotlib, I would
use plt.hist(data), or in Seaborn, I would use sns.histplot(data) to quickly plot the
distribution and add density curves if needed.

14. How would you visualize the relationship between two continuous variables?

o Answer: I would use a scatter plot for this purpose. In Matplotlib, I would use
plt.scatter(x, y), or in Seaborn, I could use sns.scatterplot(x='variable1', y='variable2',
data=data) to visualize the relationship and observe patterns.

15. Can you write code to create a bar chart using either library?

import seaborn as sns

import matplotlib.pyplot as plt

data = {'categories': ['A', 'B', 'C'], 'values': [10, 20, 15]}

sns.barplot(x='categories', y='values', data=data)

plt.title('Bar Chart Example')

plt.show()

Scenario-Based Questions

16. If your plots are cluttered and hard to read, what steps would you take to improve them?

o Answer: I would simplify the plot by reducing the number of elements, using fewer
colors, and ensuring adequate spacing. I would also adjust the size of the plot, add
labels and legends for clarity, and consider using faceting to break down the data
into smaller visualizations.

17. How would you visualize time series data?

o Answer: For time series data, I would typically use a line plot to visualize trends over
time. In Matplotlib, I would use plt.plot(x_dates, y_values), or in Seaborn, I could use
sns.lineplot(x='date', y='value', data=data) to visualize the data and include error
bands if necessary.

PANDAS & NUMPY


Diff b/w Pandas & Numpy

19 What are the differences between a Python list and a NumPy array?

o Answer: Key differences include:

 NumPy arrays are homogeneous (all elements of the same type), while lists
can contain mixed types.

 NumPy arrays provide more efficient memory usage and faster operations
due to optimized C implementation.
 NumPy offers a wide range of mathematical operations that are not available
for lists.

Intermediate Questions

18. What are some common functions in NumPy?

o Answer: Common functions include:

1. np.mean(): Computes the average.

2. np.median(): Computes the median.

3. np.std(): Computes the standard deviation.

4. np.sum(): Sums the elements.

5. np.dot(): Computes the dot product of two arrays.

4. What is broadcasting in NumPy?

o Answer: Broadcasting is a powerful mechanism that allows NumPy to perform


arithmetic operations on arrays of different shapes. When performing operations,
NumPy automatically expands the smaller array across the larger array to make their
shapes compatible.

Advanced Questions

7. How do you handle missing data in a NumPy array?

o Answer: You can handle missing data in NumPy arrays by using np.nan to represent
missing values. Functions like np.nanmean() can compute the mean while ignoring
NaN values.

o Check for missing values - missing_mask = np.isnan(data)

Numpy vs Pandas

NumPy:

 Primarily uses arrays (ndarray), which are homogeneous (all elements of the same type).

 Primarily used for numerical computations.

Generally faster for numerical computations,

Pandas:

 Built on top of NumPy.

 Uses Series (1D) and DataFrames (2D), which can hold mixed data types (e.g., integers, floats,
strings)

 Designed for data manipulation and analysis, particularly for tabular data (like spreadsheets
or SQL tables).

 may be slower for purely numerical operations compared to NumPy


SciPy is a powerful library that extends NumPy's capabilities, providing a robust environment for
scientific and numerical computing.(powerful numerical operation)

Statsmodels is a Python module that provides classes and functions for the estimation of many
different statistical models, as well as for conducting statistical tests, and statistical data exploration.

SERIES

What is a Pandas Series?

 Answer: A Pandas Series is a one-dimensional array-like object that can hold data of any type
(integers, floats, strings, etc.) and is associated with an index. It's similar to a list or a
dictionary but comes with additional features for data manipulation and analysis.

If you have a Series with duplicate indices, how would you handle it?

 Answer: You can use methods like groupby() to aggregate values or drop_duplicates() to
remove duplicates. You might also consider resetting the index with reset_index().

How would you handle missing values in a Pandas Series?

 Answer: You can handle missing values using methods like fillna() to fill them with a specific
value, dropna() to remove them, or interpolate() to perform interpolation.

You might also like