Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

7- Introduction to Data Science in Python

The document contains a series of questions and answers related to Python programming, specifically focusing on Jupyter Notebooks, Python data structures, and the Pandas and NumPy libraries. It covers topics such as data manipulation, visualization, and best practices in coding. Each question is followed by the correct answer and a brief explanation.

Uploaded by

eyob53834
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

7- Introduction to Data Science in Python

The document contains a series of questions and answers related to Python programming, specifically focusing on Jupyter Notebooks, Python data structures, and the Pandas and NumPy libraries. It covers topics such as data manipulation, visualization, and best practices in coding. Each question is followed by the correct answer and a brief explanation.

Uploaded by

eyob53834
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

When working in a Jupyter Notebook, which key combination is commonly used to run
the current cell?
a) Ctrl + S
b) Shift + Enter
c) Alt + R
d) Ctrl + Enter
2. Which Python data structure is immutable?
a) List
b) Dictionary
c) Tuple
d) Set
3. In Python, the import numpy as np convention is used primarily because:
a) np is shorter and reduces code verbosity
b) np improves computational efficiency
c) np automatically optimizes memory usage
d) np is required by Python syntax for NumPy imports
4. Which function can you use in Pandas to quickly view the first few rows of a DataFrame?
a) df.head()
b) df.tail()
c) df.sample()
d) df.peek()
5. Suppose arr = np.array([10, 20, 30, 40]). What is arr[1:3]?
a) [20, 30]
b) [10, 20, 30]
c) [20, 30, 40]
d) [30, 40]
6. Which Pandas function is used to combine two DataFrames vertically (stacking one on
top of the other)?
a) pd.merge()
b) pd.concat()
c) pd.join()
d) df.append_column()
7. To handle missing values in a Pandas DataFrame, which method can you use to fill them
with a specified value?
a) df.dropna()
b) df.isna()
c) df.fillna()
d) df.replace()
8. If you have a series s = pd.Series([1, 2, np.nan, 4]), what does s.isna() return?
a) A boolean Series: [False, False, True, False]
b) [False, True, True, False]
c) [1, 2, NaN, 4]
d) [True, True, True, True]
9. Which library is most commonly used for basic data visualization in Python?
a) Seaborn
b) Matplotlib
c) Plotly
d) Bokeh
10. In Pandas, the groupby() operation is often followed by:
a) An aggregation function like .mean() or .sum()
b) A sorting function
c) A reshaping operation like .melt()
d) No subsequent operation is possible
11. Which of the following best describes a NumPy array?
a) It can contain mixed data types and automatically changes size
b) It is always one-dimensional
c) It is a fixed-size, multidimensional, homogeneously-typed array
d) It cannot be indexed by integers
12. Suppose df is a DataFrame with a column age. To select only rows where age > 30,
what is a correct Pandas syntax?
a) df[df.age > 30]
b) df.where(age > 30)
c) df.select(age > 30)
d) df.filter(age > 30)
13. When reading a CSV file into a Pandas DataFrame using pd.read_csv('file.csv'),
what is the default delimiter?
a) Tab (\t)
b) Comma (,)
c) Space ( )
d) Semicolon (;)
14. Which method would you use in Pandas to rename columns of a DataFrame?
a) df.rename(columns={'old_name':'new_name'})
b) df.columns = {'old_name':'new_name'}
c) df.relabel({'old_name':'new_name'})
d) df.set_labels({'old_name':'new_name'})
15. In a Jupyter Notebook, which command can be used to display plots inline (within the
notebook) using Matplotlib?
a) %matplotlib inline
b) plt.show_inline()
c) display(matplotlib)
d) %plot inline
16. Which of the following is not a recommended practice for writing clean, readable Python
code?
a) Using descriptive variable names
b) Adding comments to explain complex logic
c) Maintaining consistent indentation
d) Using excessive inline lambda functions for clarity
17. If df is a DataFrame, df.describe() typically returns:
a) A summary of statistical measures (count, mean, std, etc.) for numeric columns
b) A column-by-column list of data types
c) A visualization of histograms
d) The first five rows of the DataFrame
18. Which command in NumPy would you use to create a 2x2 identity matrix?
a) np.eye(2)
b) np.ones((2,2))
c) np.identity_matrix(2)
d) np.unit(2)
19. Given a Pandas DataFrame df with a datetime column named date, which Pandas
function helps in extracting the year directly from df['date'] once converted to
datetime?
a) df['date'].year()
b) df['date'].dt.year
c) df['date'].extract('year')
d) df['date'].to_year()
20. Which Pandas function is used to pivot a DataFrame from long to wide format?
a) df.pivot()
b) df.unstack()
c) df.melt()
d) df.reshape()
21. In Python, slicing a list as lst[2:5] returns:
a) Elements at indices 0, 1, and 2
b) Elements at indices 2, 3, and 4
c) Elements at indices 3, 4, and 5
d) Elements at indices 2 and 5 only
22. If arr = np.array([1, 2, 3, 4]), what is arr * 2?
a) [2, 4, 6, 8]
b) [1, 2, 3, 4, 1, 2, 3, 4]
c) Error: cannot multiply arrays by scalars
d) [1, 4, 9, 16]
23. Which Pandas method can be used to reset the index of a DataFrame to the default
integer index?
a) df.reset_index()
b) df.index_reset()
c) df.reindex()
d) df.reset()
24. If you have a DataFrame df and you run df['col'].unique(), what do you get?
a) A DataFrame of unique rows in col
b) A Series of unique values in col
c) The number of unique values in col
d) A boolean mask of unique values
25. To find the correlation between columns in a Pandas DataFrame, you would typically
use:
a) df.corr()
b) df.correlate()
c) df.compare()
d) pd.correlation(df)
26. If s = pd.Series([10, 20, 30]) and t = pd.Series([1, 2, 3]), what is s + t?
a) A new Series [11, 22, 33]
b) A new Series [10, 20, 30, 1, 2, 3]
c) Throws an error since indices do not match
d) A new Series [9, 18, 27]
27. Which of the following is a correct way to create a DataFrame from a dictionary data =
{'A':[1,2], 'B':[3,4]}?
a) df = pd.DataFrame(data)
b) df = pd.create(data)
c) df = pd.make_dataframe(data)
d) df = DataFrame.create(data)
28. In Matplotlib, which function is used to create a line plot?
a) plt.plot()
b) plt.line()
c) plt.chart()
d) plt.lineplot()
29. When merging two DataFrames on a key column using pd.merge(), the default join type
is:
a) Inner join
b) Outer join
c) Left join
d) Right join
30. If df has a column 'A' with values [1, 2, 3] and you run df['A'].apply(lambda x:
x**2), what do you get?
a) [1, 2, 3]
b) [1, 4, 9]
c) [1, 8, 27]
d) [2, 3, 4]
31. Which file format is commonly used with Pandas for more efficient reading and writing
than CSV, especially for larger datasets?
a) .txt
b) .json
c) .parquet
d) .xlsx
32. What does df.info() provide?
a) Summary statistics for all columns
b) Detailed information about DataFrame including data types, non-null counts, and
memory usage
c) A correlation matrix of all numeric columns
d) A frequency count of all categorical values
33. If you want to visualize the distribution of a single numeric variable, which Matplotlib
function is commonly used?
a) plt.scatter()
b) plt.bar()
c) plt.hist()
d) plt.line()
34. Which of the following best describes df['col'].value_counts() in Pandas?
a) Summarizes how many null values are in col
b) Gives the distinct values and their frequency counts in col
c) Replaces col with a count of each value’s frequency
d) Shows descriptive statistics of col
35. What is a common way to handle categorical data for machine learning models in
Pandas?
a) Convert them to floats by dividing by 100
b) Drop them entirely
c) Use pd.get_dummies() for one-hot encoding
d) Convert them to strings
36. If arr = np.array([[1, 2], [3, 4]]), what is arr.shape?
a) (4,)
b) (2, 2)
c) (2, 4)
d) (1, 4)
37. In Pandas, what does df.sort_values('column_name') do by default?
a) Sorts the DataFrame by column_name in ascending order
b) Sorts the DataFrame by column_name in descending order
c) Sorts the DataFrame by index values
d) Returns a count of unique values in column_name
38. To remove duplicate rows in a DataFrame, you would use:
a) df.removedups()
b) df.drop_duplicates()
c) df.unique()
d) df.clear_duplicates()
39. When converting a column to datetime in Pandas, which function is commonly used?
a) pd.to_datetime()
b) pd.parse_date()
c) pd.convert_to_date()
d) pd.datetime_convert()
40. To create a simple scatter plot of x vs y using Matplotlib, which is correct?
a) plt.scatter(x, y)
b) plt.scatterplot(x, y)
c) plt.scatterplotting(x, y)
d) plt.xyplot(x, y)
1. b) Shift + Enter
Explanation: In Jupyter, Shift+Enter runs the current cell.
2. c) Tuple
Explanation: Tuples are immutable in Python.
3. a) np is shorter and reduces code verbosity
Explanation: The alias np is a common convention, not required by syntax.
4. a) df.head()
Explanation: head() returns the first few rows.
5. a) [20, 30]
Explanation: Slicing from index 1 up to (but not including) 3.
6. b) pd.concat()
Explanation: concat() stacks DataFrames either vertically or horizontally.
7. c) df.fillna()
Explanation: fillna() is used to fill missing values.
8. a) [False, False, True, False]
Explanation: isna() returns True for NaN values.
9. b) Matplotlib
Explanation: Matplotlib is the foundational Python plotting library.
10. a) An aggregation function like .mean() or .sum()
Explanation: groupby() is usually followed by an aggregation.
11. c) It is a fixed-size, multidimensional, homogeneously-typed array
Explanation: NumPy arrays are homogeneous and fixed-size.
12. a) df[df.age > 30]
Explanation: Boolean indexing uses df[...] syntax.
13. b) Comma (,)
Explanation: read_csv() defaults to comma delimited.
14. a) df.rename(columns={'old_name':'new_name'})
Explanation: rename() is the correct method.
15. a) %matplotlib inline
Explanation: This magic command displays plots inline.
16. d) Using excessive inline lambda functions for clarity
Explanation: Excessive inline lambdas can reduce readability.
17. a) A summary of statistical measures (count, mean, std, etc.)
Explanation: describe() gives descriptive statistics.
18. a) np.eye(2)
Explanation: eye() creates an identity matrix.
19. b) df['date'].dt.year
Explanation: .dt accessor extracts datetime components.
20. a) df.pivot()
Explanation: pivot() transforms long to wide format.
21. b) Elements at indices 2, 3, and 4
Explanation: Slicing is end-exclusive.
22. a) [2, 4, 6, 8]
Explanation: Scalar multiplication broadcasts to each element.
23. a) df.reset_index()
Explanation: reset_index() resets to default integer index.
24. b) A Series of unique values in col
Explanation: unique() returns unique values as an array or Series-like object.
25. a) df.corr()
Explanation: corr() computes pairwise correlation.
26. a) A new Series [11, 22, 33]
Explanation: Element-wise addition.
27. a) df = pd.DataFrame(data)
Explanation: DataFrame() constructor is correct.
28. a) plt.plot()
Explanation: plot() creates a line plot by default.
29. a) Inner join
Explanation: The default merge type in pd.merge() is inner.
30. b) [1, 4, 9]
Explanation: Applying x**2 squares each element.
31. c) .parquet
Explanation: Parquet is efficient and well-compressed.
32. b) Detailed information about DataFrame including data types, non-null counts, and
memory usage
Explanation: info() provides structure and type info.
33. c) plt.hist()
Explanation: A histogram shows the distribution of one numeric variable.
34. b) Gives the distinct values and their frequency counts in col
Explanation: value_counts() counts occurrences of each distinct value.
35. c) Use pd.get_dummies() for one-hot encoding
Explanation: get_dummies() is common for encoding categorical variables.
36. b) (2, 2)
Explanation: The array is 2 by 2.
37. a) Sorts the DataFrame by column_name in ascending order
Explanation: By default, sorting is ascending.
38. b) df.drop_duplicates()
Explanation: drop_duplicates() removes duplicate rows.
39. a) pd.to_datetime()
Explanation: to_datetime() converts strings to datetime.
40. a) plt.scatter(x, y)
Explanation: scatter() creates a scatter plot.

You might also like