1 1. Write a Python program that reads a CSV file and rename one column name of existing file as your name. 2. Write a Python program that will show the first five observation of the renamed column.
2 1. Write a Python program that will show the last
five observation of the renamed column.
2. Write a Python program that will show country,
continent and year / any three columns in your data set. 3 1. Create range of integers from 0 to 4 inclusive. 2. Create range of integers from 0 to 50 inclusive. 4 1. Write the python program to counts the first row of the data. 2. Write the python program to get/read the 7th row data of the data set. 5 1. Write the python program to get the last row of the data set. 2. Write a python program to select and read the 4th, 7th and 10th rows. 6 1. Write a python program to get 2nd row of your data set. 2. Write the python program to get 10th row.
7 Write a python program using “:” and “loc” or “iloc” to
read the column. Example: df.loc[:, [columns]] Subset = df.loc[:, [‘year’, ‘pop’]] Print(subset.head()) 1. Write python program to sub-setting rows and column using loc. 2. Write python program to get the data from 1st, 3rd, and 5th rows and from the 1st, 4th and 6th columns. 8 1. Write python program to get the data from 1st, 3rd, and 5th rows and from the 1st, 4th and 6th columns. 2. Write a python pregaming to print first 10 rows. 9 Write a Python script that calculates and displays descriptive statistics (mean, median, standard deviation, quartiles, etc.) for numerical columns and value counts for categorical columns in a dataset.
10 Write a Python program that reads a CSV file
containing missing values (e.g., represented by NaN or empty strings). The program should: ● Identify the types of missing values present. ● Impute missing values using appropriate techniques (e.g., mean/median for numerical data, mode for categorical data). ● Optionally, handle outliers before imputation (e.g., using capping or winsorization). 11 Develop a Python script that analyzes a numerical column for outliers. It should: ● Calculate descriptive statistics (e.g., mean, standard deviation, quartiles). ● Identify potential outliers based on methods like IQR (Interquartile Range) or z-scores. ● Visualize the distribution of the data (e.g., boxplots) to inspect for outliers visually. ● Provide options for handling outliers (e.g., removal, capping) based on domain knowledge.
12 1. Write a Python program that calculates the
harmonic mean of a list of numbers. 2. Create a python program that computes the combined mean of two datasets. The function should take two lists of numbers as input and return the combined mean using the formula. 13 Write a Python program that finds the mode of a list of numbers. If there are multiple modes, the function should return all of them. If no mode exists, the function should return a message indicating that. 14 Implement a script that calculates both the harmonic mean and the arithmetic mean of a list of numbers, and then compares the two. Print both means along with a message indicating which is greater and the implications of the comparison. 15 Create a python program that takes a list of numbers as input and provides a summary of statistics, including: • Arithmetic Mean • Harmonic Mean • Combined Mean with a second list provided by the user • Mode
16 Write a Python program that takes a list of numbers as
input and returns the variance of those numbers. 17 Create a Python program that calculates the standard deviation of a given list of numbers. The function should return both the variance and the standard deviation. How do you derive standard deviation from variance? 18 Implement a Python program that calculates the interquartile range (IQR) of a list of numbers. The IQR is defined as the difference between the 75th percentile (Q3) and the 25th percentile (Q1). Use NumPy for this task. 19 Using Matplotlib and NumPy, write a python script that generates a boxplot for a dataset of your choice. Explain how the boxplot visualizes dispersion and what insights can be drawn from it. 20 Create a Python function that takes two lists of numbers and compares their dispersions using both variance and standard deviation. Based on the results, explain how you can interpret the differences in dispersion between the two datasets. 21 Implement a Python function that normalizes or standardizes a numerical dataset. The function should: ● Understand the difference between normalization and standardization. ● Apply appropriate scaling techniques (e.g., Min-Max scaling, z-score normalization).
22 Create a Python program that discretizes a continuous
numerical column into bins (categories). The function could: ● Use equal-width binning or quantile-based binning. ● Optionally, apply techniques like chi-square testing to determine the optimal number of bins.
23 Develop a Python program that creates various
visualizations for a dataset using libraries like Matplotlib or Seaborn. Examples: ● Histograms for continuous data distribution. ● Scatter plots for relationships between two variables. ● Boxplots to compare group distributions. ● Pie charts for categorical data proportions.
24 Create a Python program that calculates the correlation
matrix for a dataset. The function should: ● Handle different data types (numerical vs. categorical). ● Choose appropriate correlation coefficients ● Visualize the correlation matrix using heatmaps or other techniques. 25 Write a function that takes a DataFrame and returns a new DataFrame with missing values replaced using appropriate methods (e.g., mean, median, mode, or custom logic). ● Implement different strategies for handling missing values based on data type (numerical vs. categorical) and column importance. 26 Create a function to identify outliers in a DataFrame using techniques like Interquartile Range (IQR) or standard deviation. ● Provide options to remove outliers, cap them to a specific value, or transform them (e.g., using log transformation). 27 Write code to clean inconsistent date formats in a DataFrame, converting them to a standard format. ● Handle mixed case (uppercase/lowercase) in categorical columns by converting them to a consistent format (e.g., lowercase). 28 Implement functions to scale numerical features in a DataFrame using methods like standardization (z- score) or normalization (min-max scaling). ● Explain the benefits of scaling and when to use each method. 29 Create functions to encode categorical features in a DataFrame using techniques like one-hot encoding or label encoding. ● Discuss the advantages and disadvantages of each encoding method. 30 Write code to identify and address data inconsistencies, such as negative values in columns that should be positive. ● Perform domain-specific checks (e.g., validating email addresses, phone numbers).
31 Write code to calculate summary statistics for
numerical columns (mean, median, standard deviation) and categorical columns (frequency counts, proportions). ● Use these statistics to understand the central tendency, spread, and distribution of data. 32 Create Python visualizations (using libraries like Matplotlib, Seaborn) to explore data relationships. Examples: ● Histograms to visualize feature distributions. ● Scatter plots to identify correlations between features. ● Box plots to compare distributions across groups. Explain the insights gained from each visualization.
33 Generate a dataset containing the number of hours
studied by students and their corresponding test scores for 10 students in Excel. Write python programming to perform a simple linear regression that will be predict test scores based on hours studied.
34 Generate a dataset containing house prices, square
footage, number of bedrooms, and age of the house. Then build Python program to multiple linear regression model to predict house prices.
35 Generate a dataset with daily temperatures and
ice cream sales. Write a python program to calculate the Pearson correlation coefficient to determine the strength and direction of the relationship between temperature and sales.
36 Generate data sets with 15 responses for six
independent variables and one dependent variables.
After this fit a regression model to a dataset, and then
check for multi-collinearity Inflation Factor (VIF). Interpret the results. 37 Use a dataset with 5-categorical variables (e.g., gender, region) and perform a regression analysis using one dependent variables with categorical variable. 38 Write a Python program to calculate the Pearson correlation coefficient between two lists of numbers. For example, given two lists: • List A: [10, 20, 30, 40, 50] • List B: [15, 25, 35, 45, 55] 39 Using the Pandas library, create a DataFrame from the following data and compute the correlation matrix: Height (cm): [150, 160, 170, 180, 190] Weight (kg): [50, 60, 70, 80, 90] Age (years): [20, 25, 30, 35, 40]
40 Write a Python program that generates a scatter plot to
visualize the correlation between two variables using Matplotlib. Use the following data: • X: [1, 2, 3, 4, 5] • Y: [2, 4, 6, 8, 10] • Explain how the plot illustrates the correlation.
41 Load a real dataset (e.g., from a CSV file) using Pandas.
Calculate and display the correlation between different numerical columns in the dataset. For example, use the famous Iris dataset to find the correlation between sepal length, sepal width, petal length, and petal width.
42 Write a Python program that calculates the Spearman
correlation coefficient between two lists of ranks. For example: • List X: [1, 2, 3, 4, 5] • List Y: [5, 6, 7, 8, 7] • Explain the difference between Pearson and Spearman correlation. 43 Write a Python program that performs a one-way ANOVA test on three different groups of data. For example: • Group 1: [5, 7, 8, 6, 9] • Group 2: [10, 12, 11, 13, 12] • Group 3: [15, 17, 14, 16, 18] Use the scipy.stats library to perform the test and interpret the results. 44 Download a real dataset (e.g., the Iris dataset from sklearn) and perform a one-way ANOVA to compare the means of different species based on petal length. Print the ANOVA table and interpret the p-value.
45 Write a Python program to perform a two-way ANOVA.
Create a dataset that includes two categorical variables (e.g., Treatment Type and Gender) and a numerical outcome variable (e.g., Test Scores). Use the stats models library to conduct the analysis and interpret the results. 46 After performing a one-way ANOVA, visualize the group means and their confidence intervals using a box plot. Use Matplotlib or Seaborn to create the plot based on the following data: • Group A: [20, 21, 22, 19, 23] • Group B: [25, 26, 24, 27, 26] • Group C: [30, 29, 31, 32, 30] Explain how the visualization helps in understanding the results. 47 Create a dataset for a two-way ANOVA test that includes two categorical independent variables (e.g., "Diet" and "Exercise") and one continuous dependent variable (e.g., "Weight Loss"). Here's an example dataset: • Diet A (No Exercise): [2, 3, 4, 5] • Diet A (Exercise): [5, 6, 7, 8] • Diet B (No Exercise): [1, 2, 1, 3] • Diet B (Exercise): [6, 7, 5, 6] Use the statsmodels library to perform the two-way ANOVA. Print the ANOVA table and interpret the results to see if there is a significant effect from either of the independent variables and their interaction.
48 Write a Python program to calculate the Quartile
Deviation (QD) for a given dataset. Use the following data: • Data: [12, 15, 14, 10, 18, 22, 20, 16, 17, 19] Print the QD along with the first and third quartiles. 49 Implement a Python program to calculate the Mean Deviation (MD) of a dataset. Given the data: • Data: [5, 7, 8, 9, 10] Calculate and print the Mean Deviation from the mean. 50 Implement a Python program to calculate the Mean Deviation (MD) of a dataset. Given the data: • Data: [5, 7, 8, 9, 10] Calculate and print the Mean Deviation from the mean.
51 Create a Python program that computes the Skewness
and Kurtosis of a given dataset. Use the following data: • Data: [1, 2, 2, 3, 4, 4, 4, 5, 6, 8] Print the values of skewness and kurtosis and interpret what these values mean in terms of the distribution shape.
52 Write a Python program that takes a dataset (e.g., from
a CSV file) and computes the Quartile Deviation, Mean Deviation, Standard Deviation, Variance, Skewness, and Kurtosis. Print a summary report for the dataset. You can use the following sample data: • Sample Data: [5, 10, 15, 20, 25, 30, 35, 40]