Python Interview Q&A
Python Interview Q&A
Python Basics
EASY:
Q.1 What is Python? List some popular applications of Python in the world of technology.
A. Python is a widely-used general-purpose, high-level programming language. It was created by
Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was
ills
designed with an emphasis on code readability, and its syntax allows programmers to express
their concepts in fewer lines of code.
It is used for:
● Data Science and Analytics
Sk
● Web Development
● Game Development
● Software Development a
● Machine Learning and Artificial Intelligence
at
● Automation and Scripting etc.
D
Q.2 What are the benefits of using Python language as a tool in the present scenario?
A. The following are the benefits of using Python language:
● Object-Oriented Language
w
● High-Level Language
ro
ills
A. ‘#’ is used to comment on everything that comes after on the line.
Q.5 What is the difference between a Mutable data type and an Immutable data type?
Sk
A. Mutable data types can be edited i.e., they can change at runtime. Eg – List, Dictionary, etc.
Immutable data types can not be edited i.e., they can not change at runtime. Eg – String, Tuple,
etc.
a
Q.6 How are arguments passed by value or by reference in Python?
at
A. Everything in Python is an object and all variables hold references to the objects. The
reference values are according to the functions; as a result, you cannot change the value of the
references. However, you can change the objects if it is mutable.
D
A. The set is an unordered collection of data types that is iterable, mutable and has no duplicate
elements.
A dictionary in Python is an unordered collection of data values, used to store data values like a
ro
map.
G
ills
Q.10 What is a pass in Python?
A. Pass means performing no operation or in other words, it is a placeholder in the compound
statement, where there should be a blank left and nothing has to be written there.
Sk
Q.11 What is the difference between / and // in Python?
A. // represents floor division whereas / represents precise division. For Example:
a
5//2 = 2
at
5/2 = 2.5
D
creation or at compile-time. In a dynamically typed language like Python, you don't need to
specify the data type of a variable when you declare it. Instead, the data type of a variable is
inferred based on the value assigned to it, and this type can change during the program's
ro
execution.
Here's why Python is dynamically typed:
G
1. Type Inference: When you assign a value to a variable, Python automatically determines
the data type of that value and associates it with the variable. For example:
x = 5 # x is an integer
y = "Hello" # y is a string
2. Type Changes: You can change the type of a variable by assigning a new value of a
different type to it. Python allows this flexibility:
x = 5 # x is an integer
x = "Hello" # x is now a string
3. No Explicit Type Declarations: Unlike statically typed languages (e.g., C++ or Java), you
don't need to explicitly declare the data type of a variable before using it. This makes
Python code more concise and easier to write.
4. Dynamic Type Checking: Python performs type checking at runtime, meaning it checks
the compatibility of operations and values when they are executed, not during
compilation. This can lead to errors being discovered at runtime rather than
ills
compile-time.
For example:
x=5
Sk
y = "Hello"
z=x+y
# This would result in a TypeError because you can't add an integer and a string directly.
a
While dynamic typing offers flexibility and ease of use, it also requires careful coding to
avoid type-related errors at runtime. Static typing languages, on the other hand, require
at
explicit type declarations and perform type checking at compile-time, which can help
catch certain types of errors before the program runs.
D
is used to alter the existing case of the string. This method creates a copy of the string which
contains all the characters in the swap case. For Example:
ro
string = "GeeksforGeeks"
string.swapcase() ---> "gEEKSFORgEEKS"
G
ills
Q.16 What are *args and *kwargs?
A. To pass a variable number of arguments to a function in Python, use the special syntax *args
and *kwargs in the function specification. It is used to pass a variable-length, keyword-free
argument list. By using the *, the variable we associate with the * becomes iterable, allowing
Sk
you to do operations on it such as iterating over it and using higher-order operations like map
and filter.
A. The location where we can find a variable and also access it if required is called the scope of a
variable.
ro
● Python Local variable: Local variables are those that are initialized within a function and
are unique to that function. It cannot be accessed outside of the function.
● Python Global variables: Global variables are the ones that are defined and declared
G
ills
Q.20 What is a dynamically typed language?
Sk
A. Type languages are the languages in which we define the type of data type and it will be
known by the machine at the compile-time or at runtime. Typed languages can be classified into
two categories:
● Statically typed languages: In this type of language, the data type of a variable is known
at the compile time which means the programmer has to specify the data type of a
a
variable at the time of its declaration.
at
● Dynamically typed languages: These are the languages that do not require any
pre-defined data type for any variable as it is interpreted at runtime by the machine
itself. In these languages, interpreters assign the data type to a variable at runtime
D
A. The break statement is used to terminate the loop or statement in which it is present. After
that, the control will pass to the statements that are present after the break statement, if
ro
available.
Continue is also a loop control statement just like the break statement. The continue statement
G
is opposite to that of the break statement, instead of terminating the loop, it forces the
execution of the next iteration of the loop.
Pass means performing no operation or in other words, it is a placeholder in the compound
statement, where there should be a blank left and nothing has to be written there.
Q.22 What are Built-in data types in Python?
A. The following are the standard or built-in data types in Python:
● Numeric: The numeric data type in Python represents the data that has a numeric value.
A numeric value can be an integer, a floating number, a Boolean, or even a complex
number.
● Sequence Type: The sequence Data Type in Python is the ordered collection of similar or
different data types. There are several sequence types in Python:
ills
● String
● List
● Tuple
Sk
● Range
● Mapping Types: In Python, hashable data can be mapped to random objects using a
mapping object. There is currently only one common mapping type, the dictionary, and
mapping objects are mutable.
a
● Dictionary
at
● Set Types: In Python, a set is an unordered collection of data types that is iterable,
mutable, and has no duplicate elements. The order of elements in a set is undefined
though it may consist of various elements.
D
A. The Python math module includes a method that can be used to calculate the floor of a
number.
ro
● floor() method in Python returns the floor of x i.e., the largest integer not greater than
x.
● Also, The method ceil(x) in Python returns a ceiling value of x i.e., the smallest integer
G
ills
● Factorial of a Number:
def factorial(n):
if n == 0:
Sk
return 1
return n * factorial(n - 1)
a
● Check if a Number is Prime:
def is_prime(n):
at
if n <= 1:
return False
D
return False
return True
ro
def sum_of_digits(n):
return sum(int(digit) for digit in str(n))
ills
print(f"{n} x {i} = {n * i}")
Sk
def print_even_numbers(n):
for i in range(2, n + 1, 2):
print(i)
a
● Check if a Number is Positive, Negative, or Zero:
at
def check_number(n):
if n > 0:
D
return "Positive"
elif n < 0:
w
return "Negative"
else:
ro
return "Zero"
G
ills
def area_of_circle(radius):
return math.pi * radius ** 2
Sk
● Calculate the Power of a Number:
def power(base, exponent):
return base ** exponent
a
● Square of a Number using Lambda:
at
square = lambda x: x ** 2
D
def find_largest(lst):
return max(lst)
G
ills
return lst.count(element)
Sk
def remove_duplicates(lst):
return list(set(lst))
a
● Remove Even Numbers from a List:
def remove_even_numbers(lst):
at
return [x for x in lst if x % 2 != 0]
D
product = 1
for num in lst:
ro
product *= num
return product
G
ills
return t.count(value)
Sk
def remove_key(d, key):
if key in d:
del d[key]
a
at
D
w
ro
G
MEDIUM:
Q.1 What is the difference between xrange and range functions?
A. range() and xrange() are two functions that could be used to iterate a certain number of
times in for loops in Python. In Python 3, there is no xrange, but the range function behaves like
xrange in Python 2.
● range() – This returns a list of numbers created using the range() function.
● xrange() – This function returns the generator object that can be used to display
ills
numbers only by looping. The only particular range is displayed on demand and hence
called lazy evaluation.
Sk
Q.2 What is Dictionary Comprehension? Give an Example
A. Dictionary Comprehension is a syntax construction to ease the creation of a dictionary based
on the existing iterable.
For Example: my_dict = {i:1+7 for i in range(1, 10)}
a
at
Q.3 Is Tuple Comprehension? If yes, how, and if not why?
A. (i for i in (1, 2, 3))
D
Tuple comprehension is not possible in Python because it will end up in a generator, not a tuple
comprehension.
w
List
● Lists are Mutable data types.
G
Q.5 What is the difference between a shallow copy and a deep copy?
A.Shallow copy is used when a new instance type gets created and it keeps values that are
copied whereas deep copy stores values that are already copied.
ills
A shallow copy has faster program execution whereas a deep copy makes it slow.
Sk
Q.6 What is the difference between sort and sorted ?
A. In Python, both sort and sorted are used for sorting elements in a list, but they have some
key differences:
1. sort Method: a
● sort is a method that is available directly on a list object.
at
● It sorts the elements of the list in place, meaning it modifies the original list and
does not create a new list.
● The sort method does not return a new list. It returns None.
D
● Example:
my_list = [3, 1, 2]
w
my_list.sort()
print(my_list) # Output: [1, 2, 3]
ro
2. sorted Function:
● sorted is a built-in function that takes an iterable (e.g., list, tuple, string) as an
G
ills
Q.7 What are Decorators?
A.Decorators are a very powerful and useful tool in Python as they are the specific change that
we make in Python syntax to alter functions easily.
Sk
Q.8 How do you debug a Python program?
A. By using this command we can debug a Python program:
a
$ python -m pdb python-script.py
at
Q.9 What are Iterators in Python?
A. In Python, iterators are used to iterate a group of elements, containers like a list. Iterators are
D
collections of items, and they can be a list, tuples, or a dictionary. Python iterator implements
__itr__ and the next() method to iterate the stored elements. We generally use loops to iterate
over the collections (list, tuple) in Python.
w
A. In Python, the generator is a way that specifies how to implement iterators. It is a normal
function except that it yields expression in the function. It does not implement __itr__ and
G
ills
private space as the interpreter takes care of this space. Python also has an inbuilt garbage
collector, which recycles all the unused memory and frees the memory and makes it available to
the heap space.
Sk
Q.13 How to delete a file using Python?
A. We can delete a file using Python by following approaches:
● os.remove() a
● os.unlink()
at
Q.14 What is slicing in Python?
D
A. Python slicing is a string operation for extracting a part of the string, or some part of a list.
With this operator, one can specify where to start the slicing, where to end, and specify the
step. List slicing returns a new list from the existing list.
w
conflicts.
Q.16 Programs:
● Generate Fibonacci Sequence:
def fibonacci(n):
fib_sequence = [0, 1]
while len(fib_sequence) < n:
fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])
return fib_sequence
ills
result = 1
for i in range(1, n + 1):
result *= i
Sk
return result
return False
G
ills
return "F"
Sk
def check_vowel_consonant(char):
vowels = "AEIOUaeiou"
if char in vowels:
return "Vowel"
else:
a
at
return "Consonant"
D
while b:
a, b = b, a % b
return a
ills
● Check if a Number is Even using Lambda:
is_even = lambda x: x % 2 == 0
Sk
● Calculate the Difference between Two Numbers using Lambda:
subtract = lambda a, b: a – b
a
● Find the Maximum of Three Numbers using Lambda:
max_of_three = lambda a, b, c: max(a, b, c)
at
● Check if a List is Palindrome:
D
def is_palindrome(lst):
return lst == lst[::-1]
w
def sort_by_length(lst):
return sorted(lst, key=len)
G
ills
return list(set(list1) & set(list2))
Sk
def count_words(sentence):
return len(sentence.split())
return s.isdigit()
ro
return list(set(lst))
ills
def set_union(set1, set2):
return set1.union(set2)
Sk
● Check if a Key Exists in a Dictionary:
def key_exists(d, key):
return key in d
a
at
D
w
ro
G
HARD:
Q.1 What is main() in python?
A. In Python, main() is not a built-in or reserved function like it is in some other programming
languages. However, the term "main()" is often used conventionally to refer to the entry point
of a Python program, where the execution of the program starts.
In many programming languages like C, C++, and Java, there is a special function called main()
that serves as the starting point for the program's execution. In Python, there is no strict
requirement to use a specific function name like main(), but the idea of having a designated
ills
entry point is still applicable.
Conventionally, Python programmers often use the following construct to define the entry point
of their script:
Sk
def main():
# Your main program logic goes here
print("Hello, world!")
if __name__ == "__main__":
main()
a
at
In this example, main() is just a function name, and you can choose any other name that makes
D
sense to you. The important part is the if __name__ == "__main__": block. This block ensures
that the code inside it is only executed if the script is run directly, not if it is imported as a
module into another script.
w
So, while main() is not a special or predefined function in Python, it is a common naming
convention for the entry point of a Python script
ro
A. In Python, the double asterisk (**) is used as an exponentiation operator and also as a syntax
to unpack dictionaries. The specific behavior of the ** operator depends on the context in
which it is used.
Exponentiation Operator:
In mathematical operations, ** is used to raise a number to a power. For example:
result = 2 ** 3 # 2 raised to the power of 3, result = 8
Dictionary Unpacking:
When used in a function call or in dictionary construction, ** is used to unpack the contents of
a dictionary. This is often used to pass multiple keyword arguments to a function or to merge
dictionaries. For example:
def example_function(a, b):
ills
print(a, b)
kwargs = {'a': 10, 'b': 20}
example_function(**kwargs) # Unpacks dictionary as keyword arguments
Sk
dict1 = {'x': 1, 'y': 2}
dict2 = {'y': 3, 'z': 4}
merged_dict = {**dict1, **dict2} # Merge dictionaries using unpacking
a
Keep in mind that the usage of ** in different contexts might have different meanings and
behaviors. The specific behavior is determined by the Python syntax rules and the context in
at
which it appears.
D
It takes an iterable, converts it into an iterator and aggregates the elements based on iterables
passed. It returns an iterator of tuples.
Q.5 What is __init__() in Python?
A. Equivalent to constructors in OOP terminology, __init__ is a reserved method in Python
classes. The __init__ method is called automatically whenever a new object is initiated. This
method allocates memory to the new object as soon as it is created. This method can also be
used to initialize variables.
ills
A. currenttime= time.localtime(time.time())
print (“Current time is”, currenttime)
Sk
Q.7 What are Access Specifiers in Python?
A. Python uses the ‘_’ symbol to determine the access control for a specific data member or a
member function of a class. A Class in Python has three types of python access modifiers:
● Public Access Modifier: The members of a class that are declared public are easily
a
accessible from any part of the program. All data members and member functions of a
class are public by default.
at
● Protected Access Modifier: The members of a class that are declared protected are only
accessible to a class derived from it. All data members of a class are declared protected
D
by adding a single underscore ‘_’ symbol before the data members of that class.
● Private Access Modifier: The members of a class that are declared private are accessible
within the class only, the private access modifier is the most secure access modifier. Data
w
members of a class are declared private by adding a double underscore ‘__’ symbol
before the data member of that class.
ro
A. From version 3.10 upward, Python has implemented a switch case feature called “structural
pattern matching”. You can implement this feature with the match and case keywords. Note
that the underscore symbol is what you use to define a default case for the switch statement in
Python.
Note: Before Python 3.10 Python didn't support match Statements.
● Python3
match term:
case pattern-1:
action-1
case pattern-2:
action-2
case pattern-3:
ills
action-3
case _:
action-default
Sk
Q.9 Discuss the Global NameSpace and Local Name Space in Python. How does the LEGB (Local,
Enclosing, Global, Built-in) rule work in resolving variable names?
a
A. Python uses a series of nested namespaces to organize and resolve variable names. The LEGB
at
rule defines the order in which namespaces are searched: Local, Enclosing, Global, and Built-in.
When you reference a variable, Python searches through these namespaces in order until it
finds the variable or reaches the built-in namespace.
D
Q.10 Explain the differences between a generator and an iterator in Python. How does the yield
keyword contribute to the behavior of generators?
w
A. A generator is a special type of iterator that produces values on-the-fly using the yield
keyword, allowing you to iterate over large sequences without storing them in memory. An
ro
iterator is an object that implements the methods __iter__() and __next__(), allowing you to
iterate over its elements. Generators are more memory-efficient compared to iterators.
G
Functions: Functions are blocks of code that are defined outside of classes.
They can be thought of as standalone units of code that take some input (arguments), perform
a task, and return an output (return value).
Functions are defined using the def keyword followed by a function name, parentheses for
arguments, and a colon. The function body is indented below.
Example:
def add(a, b):
return a + b
ills
Methods: Methods are functions that are associated with objects and are defined within
classes.
Sk
They operate on the data that belongs to the object they are called on.
Methods are defined similarly to functions, but they have an additional parameter called self,
which refers to the instance of the object on which the method is called.
Methods are accessed using dot notation: object.method().
Example:
a
at
class Circle:
def init(self, radius):
D
self.radius = radius
def area(self):
w
my_circle = Circle(5)
print(my_circle.area()) # Calling the method on the Circle object
G
In summary, the key difference between methods and functions lies in their association with
classes and objects. Functions are standalone and can be used anywhere in your code, while
methods are tied to objects and operate on their specific data.
Q.12 Why do we need lambda functions if the same can be achieved using loops?
A. While lambda functions are one-liners and powerful tools, they are not suitable for all
situations. Loops are essential for tasks that involve iterating, accumulating, searching, filtering,
and other complex operations. Choosing the right tool for the job ensures that your code is both
effective and maintainable.
Here are some examples where you would need looping:
● Iterating: When you have a collection of items (e.g., a list, tuple, dictionary, etc.) and you
ills
need to perform an operation on each item, you would typically use a loop. Lambda
functions are not well-suited for this scenario because they lack the necessary structure
to handle multiple iterations.
● Accumulation and Aggregation: If you need to accumulate or aggregate values from a
Sk
collection, like summing up the elements of a list or finding the maximum value, loops
are essential.
● Searching and Filtering: When you want to find specific elements in a collection that
satisfy certain conditions, you often need to loop through the collection and apply a
filtering function. Lambda functions can be used for simple cases, but more complex
a
filtering and searching tasks usually require loops to manage the logic effectively.
● Multidimensional Data: When dealing with multi-dimensional data structures (e.g.,
at
matrices), loops are necessary to traverse the different dimensions and apply operations
or logic at each level.
D
but they serve different purposes. Let's break down the differences between them with
examples:
ro
Iterable:
An iterable is an object that can be looped over (iterated) to retrieve its elements one by one. It
needs to implement the iter() method, which returns an iterator object. Common examples of
G
ills
my_iterator = iter(my_iterable)
print(next(my_iterator)) # Output: 1
print(next(my_iterator)) # Output: 2
Sk
print(next(my_iterator)) # Output: 3
An important thing to note is that an iterable can be converted to an iterator using the iter()
function. When you use a for loop to iterate over an iterable, Python automatically creates an
iterator for that iterable behind the scenes.
my_list = [1, 2, 3, 4, 5]
a
at
for item in my_list:
print(item)
D
In this loop, Python converts my_list into an iterator and then uses that iterator to loop through
the items.
In summary:
w
An iterable is an object that can be looped over and provides an iterator when requested.
ro
An iterator is an object that performs the actual iteration and maintains its internal state.
Both concepts work together to allow you to efficiently and effectively work with collections of
data in Python.
G
Q.14 Difference between printing variable name and using print(variable name) ,why does ' '
appear on calling string variable name?
A. When you simply type the variable name in a Python interpreter or script without using the
print() function, you're actually relying on the interpreter's default behavior of displaying the
result of the last expression in the interactive session. This behavior varies depending on where
you're running your Python code: it's more common in interactive environments like the
standard Python interpreter, Jupyter notebooks, and some integrated development
environments (IDEs).
When you use the print() function, you explicitly instruct Python to output the contents of the
variable as text to the console or output window. This is the recommended way to display
variable values when writing scripts or programs, especially when you want more control over
the formatting and when you want to see the value of a variable at a specific point in your code.
Regarding the ' ' appearing when just calling the variable name, this typically happens when the
ills
variable contains a string. When you type the variable name directly, Python displays the string
representation of the variable, which includes the single quotes ' ' to indicate that it's a string.
For example:
variable = "Hello, world!"
Sk
variable # This will display 'Hello, world!' (including the single quotes)
On the other hand, using print(variable) will print the actual contents of the string without the
enclosing single quotes:
variable = "Hello, world!"
a
print(variable) # This will print: Hello, world!
at
So, the difference between just typing the variable name and using print(variable) is that the
former displays the string representation of the variable (including quotes for strings), while the
D
Q.15 Programs:
w
nums = [1,2,4,3,5,4,6,9,2,1]
print("Original list:")
print(nums)
k=1
for i in range(1, 11):
print("kth smallest element in the said list, when k = ",k)
print(kth_smallest_el(nums, k))
k=k+1
● Write a Python program to find all the pairs in a list whose sum is equal to a given value.
ills
else:
complement_dict[num] = g_sum - num
return pairs
Sk
● Find the Most Common Character in a String:
from collections import Counter
def most_common_char(s):a
char_count = Counter(s)
at
most_common = char_count.most_common(1)
return most_common[0][0] if most_common else None
D
num_list = [1, 2, 3, 4]
new_dict = current = {}
for name in num_list:
ro
current[name] = {}
current = current[name]
print(new_dict)
G
ills
import re
def remove_special_characters(s):
return re.sub(r'[^\w\s]', '', s)
Sk
● Find the Length of the Longest Substring Without Repeating Characters:
def longest_substring_without_repeating(s):
seen = set()
max_length = start = 0
a
at
for end, char in enumerate(s):
while char in seen:
D
seen.remove(s[start])
start += 1
w
seen.add(char)
max_length = max(max_length, end - start + 1)
ro
return max_length
G
ills
if any(s[i] != char for s in strings):
return shortest[:i]
Sk
● Combine Two Tuples Element-Wise:
def combine_tuples(t1, t2):
return tuple(x + y for x, y in zip(t1, t2))
a
● Find Common Elements in Two Tuples:
at
def common_elements(t1, t2):
return tuple(set(t1) & set(t2))
D
ills
freq_dict[char] = freq_dict.get(char, 0) + 1
return freq_dict
Sk
a NUMPY
EASY:
at
Q.1 What is NumPy, and why is it used in data analysis?
A. NumPy is a Python library for numerical computations, particularly array operations. It's
essential for handling large datasets efficiently and performing mathematical operations.
D
A:
● Python lists support storing heterogeneous data types whereas NumPy arrays can store
ro
data types of one nature itself. NumPy provides extra functional capabilities that make
operating on its arrays easier which makes NumPy arrays advantageous in comparison to
Python lists as those functions cannot be operated on heterogeneous data.
G
● NumPy arrays are treated as objects which results in minimal memory usage. Since
Python keeps track of objects by creating or deleting them based on the requirements,
NumPy objects are also treated the same way. This results in lesser memory wastage.
● NumPy arrays support multi-dimensional arrays.
● NumPy provides various powerful and efficient functions for complex computations on
the arrays.
● NumPy also provides a variety of functions for BitWise Operations, String Operations,
Linear Algebraic operations, Arithmetic operations etc. These are not provided on
Python’s default lists.
ills
Following are some of the properties of ndarrays:
● When the size of ndarrays is changed, it results in a new array and the original array is
deleted.
Sk
● The ndarrays are bound to store homogeneous data.
● They provide functions to perform advanced mathematical operations in an efficient
manner.
Q.4 What are the ways of creating 1D, 2D and 3D arrays in NumPy?
a
A. Consider you have a normal python list. From this, we can create NumPy arrays by making
at
use of the array function as follows:
● One-Dimensional array
D
arr = [[1,2,3,4],[4,5,6,7]]
ro
numpy_arr = np.array(arr)
● Three-Dimensional array
G
arr = [[[1,2,3,4],[4,5,6,7],[7,8,9,10]]]
numpy_arr = np.array(arr)
Using the np.array() function, we can create NumPy arrays of any dimensions
Q.5 Programs:
● Write a program for creating an integer array with values belonging to the range 10 and
60
import numpy as np
arr = np.arange(10, 60)
print(arr)
ills
● Write a NumPy program to create a 2-dimensional array of size 2 x 3 (composed of
4-byte integer elements), also print the shape, type and data type of the array.
import numpy as np
Sk
x = np.array([[2, 4, 6], [6, 8, 10]], np.int32)
print(type(x))
print(x.shape) a
print(x.dtype)
at
● Write a NumPy program to create a new array of 3*5, filled with 2.
D
import numpy as np
#using no.full
x = np.full((3, 5), 2, dtype=np.uint)
w
print(x)
#using no.ones
y = np.ones([3, 5], dtype=np.uint) *2
ro
print(y)
G
● Write a NumPy program to create an array of (3, 4) shapes, multiply every element value
by 3 and display the result array.
import numpy as np
x= np.arange(12).reshape(3, 4)
print("Original array elements:")
print(x)
for a in np.nditer(x, op_flags=['readwrite']):
a[...] = 3 * a
print("New array elements:")
print(x)
ills
● Generate an Array of Random Integers:
Sk
def random_integers_array(low, high, size):
return np.random.randint(low, high, size)
● Reshape an Array:
def reshape_array(arr, rows, cols):
ro
ills
A. np.mean() method calculates the arithmetic mean and provides additional options for input
and results. For example, it has the option to specify what data types have to be taken, where
the result has to be placed etc.
Sk
np.average() computes the weighted average if the weights parameter is specified. In the case
of weighted average, instead of considering that each data point is contributing equally to the
final average, it considers that some data points have more weightage than the others (unequal
contribution).
a
Q.3 How do you multiply 2 NumPy array matrices?
at
A. We can make use of the dot() for multiplying matrices represented as NumPy arrays. This is
represented in the code snippet below:
D
import numpy as np
w
# NumPy matrices
A = np.arange(15,24).reshape(3,3)
ro
B = np.arange(20,29).reshape(3,3)
print("A: ",A)
G
print("B: ",B)
# Multiply A and B
result = A.dot(B)
print("Result: ", result)
Output
A: [[15 16 17]
[18 19 20]
[21 22 23]]
B: [[20 21 22]
ills
[23 24 25]
[26 27 28]]
Result: [[1110 1158 1206]
Sk
[1317 1374 1431]
[1524 1590 1656]]
a
Q.4 How is arr[:,0] different from arr[:,[0]]
A. arr[:,0] - Returns 0th index elements of all rows. In other words, return the first column
at
elements.
import numpy as np
D
arr = np.array([[1,2,3,4],[5,6,7,8]])
new_arr =arr[:,0]
w
print(new_arr)
Output:
ro
[1 5]
G
arr[:,[0]] - This returns the elements of the first column by adding extra dimension to it.
import numpy as np
arr = np.array([[1,2,3,4],[5,6,7,8]])
new_arr =arr[:,[0]]
print(new_arr)
Output:
[[1]
[5]]
ills
Q.5 Explain broadcasting in NumPy with an example.
A. Broadcasting is a feature in NumPy that allows element-wise operations on arrays of different
shapes. It automatically adjusts the shape of smaller arrays to match the shape of larger arrays
during arithmetic operations.
Sk
Example:
a = np.array([1, 2, 3])
print("Array a:") a
print(a)
at
# Adding a scalar to an array
scalar = 2
D
result = a + scalar
print("\nAdding a scalar to an array:")
print(result)
w
Output:
ro
Array a:
[1 2 3]
G
ills
print(np.any([10, 20, -50]))
Sk
● Write a NumPy program to find the indices of the maximum and minimum values along
the given axis of an array.
Original array: [1 2 3 4 5 6]
Maximum Values: 5
Minimum Values: 0 a
Answer:
at
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6])
print("Original array: ",x)
D
import numpy as np
ro
import numpy as np
n = np.zeros((4,4))
print("%d bytes" % (n.size * n.itemsize))
ills
x = np.array([[10, 20, 30], [20, 40, 50]])
print("Original array:")
print(x)
Sk
y = np.ravel(x)
print("New flattened array:")
print(y)
a
● Calculate Dot Product of Two Matrices:
at
def dot_product(matrix1, matrix2):
return np.dot(matrix1, matrix2)
D
def unique_elements_and_counts(arr):
unique, counts = np.unique(arr, return_counts=True)
ro
ills
● Calculate the Standard Deviation of an Array:
def array_std_deviation(arr):
Sk
return np.std(arr)
print("Original array:")
w
print(nums)
print("\nFind the missing data of the said array:")
ro
print(np.isnan(nums))
G
HARD:
Q.1 What do you understand about Vectorization in NumPy?
A. Vectorization in NumPy refers to the practice of performing element-wise operations on
entire arrays or matrices without the need for explicit looping. It is a fundamental concept in
NumPy and is a key reason why NumPy is widely used for numerical and scientific computations
in Python.
When you perform operations on individual elements of NumPy arrays, those operations are
automatically applied to all elements in parallel, taking advantage of low-level optimizations and
ills
efficient memory usage. This leads to more concise and efficient code compared to traditional
Python loops.
Benefits of vectorization in NumPy:
Sk
● Performance: Vectorized operations are implemented using highly optimized C and
Fortran code, making them much faster than equivalent operations using native Python
loops.
● Readability: Vectorized code is often more concise and easier to read than explicit loops,
making the codebase more maintainable.
a
● Efficiency: NumPy's vectorized operations are optimized for efficient memory usage,
allowing you to process large datasets without consuming excessive memory.
at
Q.2 How is vstack() different from hstack() in NumPy?
D
A.Both methods are used for combining the NumPy arrays. The main difference is that the
hstack method combines arrays horizontally whereas the vstack method combines arrays
vertically.
w
a = np.array([1,2,3])
b = np.array([4,5,6])
G
# vstack arrays
c = np.vstack((a,b))
print("After vstack: \n",c)
# hstack arrays
d = np.hstack((a,b))
print("After hstack: \n",d)
The output of this code would be:
After vstack:
[[1 2 3]
ills
[4 5 6]]
After hstack:
[1 2 3 4 5 6]
Sk
Notice how after the vstack method, the arrays were combined vertically along the column and
how after the hstack method, the arrays were combined horizontally along the row.
smaller array along the larger array to ensure both arrays are having compatible shapes for
NumPy operations. Performing Broadcasting before Vectorization helps to vectorize operations
which support arrays of different dimensions.
w
● Write a NumPy program to swap rows and columns of a given array in reverse order.
ro
import numpy as np
nums = np.array([[[1, 2, 3, 4],
[0, 1, 3, 4],
G
ills
print(nums2)
print("\nMultiply said arrays of same size element-by-element:")
print(np.multiply(nums1, nums2))
Sk
● Write a NumPy program to repeat all the elements three times of a given array of string
Original Array:
['Python' 'PHP' 'Java' 'C++']
New array: a
['PythonPythonPython' 'PHPPHPPHP' 'JavaJavaJava' 'C++C++C++']
at
Answer:
x1 = np.array(['Python', 'PHP', 'Java', 'C++'], dtype=np.str)
D
print("Original Array:")
print(x1)
new_array = np.char.multiply(x1, 3)
w
print("New array:")
ro
print(new_array)
G
● Write a NumPy program to remove the leading whitespaces of all the elements of a
given array.
x = np.array([' python exercises ', ' PHP ', ' java ', ' C++'], dtype=np.str)
print("Original Array:")
print(x)
lstripped_char = np.char.lstrip(x)
print("\nRemove the leading whitespaces : ", lstripped_char)
● Write a NumPy program to replace "PHP" with "Python" in the element of a given array.
import numpy as np
x = np.array(['PHP Exercises, Practice, Solution'], dtype=np.str)
print("\nOriginal Array:")
ills
print(x)
r = np.char.replace(x, "PHP", "Python")
print("\nNew array:")
Sk
print(r)
● Write a NumPy program to count a given word in each row of a given array of string
values.
import numpy as np
a
at
str1 = np.array([['Python','NumPy','Exercises'],
['Python','Pandas','Exercises'],
D
['Python','Machine learning','Python']])
print("Original array of string values:")
w
print(str1)
print("\nCount 'Python' row wise in the above array of string values:")
ro
print(np.char.count(str1, 'Python'))
G
● Write a NumPy program to split a given text into lines and split the single line into array
values.
Sample output:
Original text:
01 V Debby Pramod
02 V Artemiy Ellie
03 V Baptist Kamal
04 V Lavanya Davide
05 V Fulton Antwan
06 V Euanthe Sandeep
07 V Endzela Sanda
08 V Victoire Waman
ills
09 V Briar Nur
10 V Rose Lykos
Array from the said text:
Sk
[['01' 'V' 'Debby Pramod']
['02' 'V' 'Artemiy Ellie']
['03' 'V' 'Baptist Kamal']
a
['04' 'V' 'Lavanya Davide']
['05' 'V' 'Fulton Antwan']
at
['06' 'V' 'Euanthe Sandeep']
['07' 'V' 'Endzela Sanda']
D
import numpy as np
student = """01 V Debby Pramod
G
02 V Artemiy Ellie
03 V Baptist Kamal
04 V Lavanya Davide
05 V Fulton Antwan
06 V Euanthe Sandeep
07 V Endzela Sanda
08 V Victoire Waman
09 V Briar Nur
10 V Rose Lykos"""
print("Original text:")
print(student)
text_lines = student.splitlines()
text_lines = [r.split('\t') for r in text_lines]
result = np.array(text_lines, dtype=np.str)
print("\nArray from the said text:")
print(result)
ills
● Write a program to convert a string element to uppercase, lowercase, capitalise the first
letter, title-case and swapcase of a given NumPy array.
Sk
import numpy as np
# Create Sample NumPy array
arr = np.array(['i', 'love', 'NumPy', 'AND', 'interviewbit'], dtype=str)
a
upper_case_arr = np.char.upper(arr)
lower_case_arr = np.char.lower(arr)
at
capitalize_case_arr = np.char.capitalize(arr)
titlecase_arr = np.char.title(arr)
D
swapcase_arr = np.char.swapcase(arr)
w
the left.
import numpy as np
G
import numpy as np
# Create Sample NumPy Array
arr = np.array(['i', 'love', 'NumPy', 'AND', 'interviewbit'], dtype=str)
ills
transformed_arr = np.char.join(" ", arr)
print("Transformed Array: ")
print(transformed_arr)
Sk
● Write a program to add a border of zeros around the existing array.
import numpy as np
a
# Create NumPy arrays filled with ones
at
ones_arr = np.ones((4,4))
print("Transformed array:")
D
ills
Q.2 Mention the different types of Data Structures in Pandas?
Sk
A. Pandas have three different types of data structures. It is due to these simple and flexible
data structures that it is fast and efficient.
● Series - It is a one-dimensional array-like structure with homogeneous data which means
data of different data types cannot be a part of the same series. It can hold any data
type such as integers, floats, and strings and its values are mutable i.e. it can be changed
a
but the size of the series is immutable i.e. it cannot be changed.
● DataFrame - It is a two-dimensional array-like structure with heterogeneous data. It can
at
contain data of different data types and the data is aligned in a tabular manner. Both size
and values of DataFrame are mutable.
D
● Panel - The Pandas have a third type of data structure known as Panel, which is a 3D
data structure capable of storing heterogeneous data but it isn’t that widely used.
w
ills
Q.5 Define DataFrame in Pandas?
A. It is a two-dimensional array-like structure with heterogeneous data. It can contain data of
different data types and the data is aligned in a tabular manner i.e. in rows and columns and the
Sk
indexes with respect to these are called row index and column index respectively. Both size and
values of DataFrame are mutable. The columns can be heterogeneous types like int and bool. It
can also be defined as a dictionary of Series.
The syntax for creating a dataframe:
a
import pandas as pd
at
dataframe = pd.DataFrame( data, index, columns, dtype)
Here:
D
● data - It represents various forms like series, map, ndarray, lists, dict, etc.
● index - It is an optional argument that represents an index to row labels.
● columns - Optional argument for column labels.
● Dtype - It represents the data type of each column. It is an optional parameter.
w
Q.6 What are the different ways in which a series can be created?
ro
ills
Q.7 What are the different ways in which a dataframe can be created?
A.
● Creating an empty dataframe: A basic DataFrame, which can be created is an Empty
Sk
Dataframe. An Empty Dataframe is created just by calling a pandas.DataFrame()
constructor.
● Creating a dataframe using List: DataFrame can be created using a single list or by using
a list of lists.
● Creating DataFrame from dict of ndarray/lists: To create a DataFrame from dict of
a
narray/list there are a few conditions to be met.
o First, all the arrays must be of the same length.
at
o Second, if the index is passed then the length index should be equal to the length
of arrays.
o Third, if no index is passed, then by default, the index will be in the range(n)
D
● Creating DataFrame from Dictionary of series: To create a DataFrame from Dict of series,
a dictionary needs to be passed as an argument to form a DataFrame. The resultant
index is the union of all the series of passed indexed.
G
ills
● Filter Rows Based on a Condition:
def filter_rows(df, condition_column, condition_value):
return df[df[condition_column] == condition_value]
Sk
● Calculate Summary Statistics of a DataFrame:
def summary_statistics(df):
return df.describe()
a
at
● Write a Pandas program to create and display a one-dimensional array-like object
containing an array of data using Pandas module.
D
import pandas as pd
ds = pd.Series([2, 4, 6, 8, 10])
w
print(ds)
ro
● Write a Pandas program to convert a Panda module Series to Python list and it's type.
import pandas as pd
G
ds = pd.Series([2, 4, 6, 8, 10])
print("Pandas Series and type")
print(ds)
print(type(ds))
print("Convert Pandas Series to Python list")
print(ds.tolist())
print(type(ds.tolist()))
● Write a Pandas program to add, subtract, multiple and divide two Pandas Series.
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 9])
ds = ds1 + ds2
print("Add two Series:")
print(ds)
ills
print("Subtract two Series:")
ds = ds1 - ds2
print(ds)
print("Multiply two Series:")
Sk
ds = ds1 * ds2
print(ds)
print("Divide Series1 by Series2:")
ds = ds1 / ds2
print(ds) a
at
● Write a Pandas program to compare the elements of the two Pandas Series.
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 10]
D
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 10])
print("Series1:")
w
print(ds1)
print("Series2:")
ro
print(ds2)
print("Compare the elements of the said Series:")
print("Equals:")
G
print(ds1 == ds2)
print("Greater than:")
print(ds1 > ds2)
print("Less than:")
print(ds1 < ds2)
MEDIUM:
Q.1 How can we create a copy of the series in Pandas?
A. We can create a copy of the series by using the following syntax: Series.copy(deep=True)
The default value for the deep parameter is set to True.
When the value ofdeep=True, the creation of a new object with a copy of the calling object’s
data and indices takes place. Modifications to the data or indices of the copy will not be
reflected in the original object whereas when the value of deep=False, the creation of a new
ills
object will take place without copying the calling object’s data or index i.e. only the references
to the data and index will be copied. Any changes made to the data of the original object will be
reflected in the shallow copy and vice versa.
Sk
Q.2 Explain Categorical data in Pandas?
A. Categorical data is a discrete set of values for a particular outcome and has a fixed range.
Also, the data in the category need not be numerical, it can be textual in nature. Examples are
gender, social class, blood type, country affiliation, observation time, etc. There is no hard and
a
fast rule for how many values a categorical value should have. One should apply one’s domain
knowledge to make that determination on the data sets.
at
Q.3 Explain Reindexing in pandas along with its parameters?
D
A.Reindexing as the name suggests is used to alter the rows and columns in a DataFrame. It is
also defined as the process of conforming a dataframe to a new index with optional filling logic.
For missing values in a dataframe, the reindex() method assigns NA/NaN as the value. A new
w
object is returned unless a new index is produced that is equivalent to the current one. The
copy value is set to False. This is also used for changing the index of rows and columns in the
ro
dataframe.
G
ills
Q.7 Describe the differences between .loc[] and .iloc[] in pandas.
A. .loc[] and .iloc[] are both used for indexing and selecting data from a pandas DataFrame, but
Sk
they have distinct purposes and ways of specifying the rows and columns you want to access:
print(selected_data) # Output: 5
G
ills
Key Differences:
.loc[] uses labels (row and column names) for indexing, while .iloc[] uses integer positions
Sk
(0-based index).
.loc[] is inclusive of both the start and end indices/slices, while .iloc[] is inclusive of the start
index and exclusive of the end index/slice.
When using .loc[], you can select rows and columns using Boolean arrays based on label
a
conditions, while .iloc[] only allows integer-based indexing.
at
Q.8 How would you handle a situation where a DataFrame has duplicate rows?
A. To handle a situation where a DataFrame has duplicate rows, you can use the
D
drop_duplicates() method. This method removes duplicate rows from the DataFrame, keeping
only the first occurrence or a specified subset of columns.
w
A. Time series is an organized collection of data that depicts the evolution of a quantity through
time. Pandas have a wide range of capabilities and tools for working with time-series data in all
fields.
G
Supported by pandas:
● Analyzing time-series data from a variety of sources and formats.
● Create time and date sequences with preset frequencies.
● Date and time manipulation and conversion with timezone information.
● A time series is resampled or converted to a specific frequency.
● Calculating dates and times using absolute or relative time increments is one way to.
Q.10 Explain MultiIndexing in Pandas.
A.Multiple indexing is defined as essential indexing because it deals with data analysis and
manipulation, especially for working with higher dimensional data. It also enables us to store
and manipulate data with an arbitrary number of dimensions in lower-dimensional data
structures like Series and DataFrame.
ills
A.The conversion of Series to DataFrame is quite a simple process. All we need to do is to use
the to_frame() function.
Syntax:
Sk
Series.to_frame(name=None)
Parameters:
● name: It accepts data objects as input. It is an optional parameter. The value of the
name parameter will be equal to the name of the Series if it has any.
a
● Return Type: It returns the DataFrame after converting it from Series.
at
Q.12 How can we convert DataFrame to Numpy Array?
A.In order to convert DataFrame to a Numpy array we need to use DataFrame.to_numpy()
D
method.
Syntax:
DataFrame.to_numpy(dtype=None, copy=False, na_value=_NoDefault.no_default)
w
Parameters:
ro
data type will depend on the data type of the column in the dataframe.
ills
● prod() – It returns the product of the values.
Sk
A.The function used for sorting in pandas is called DataFrame.sort_values(). It is used to sort a
DataFrame by its column or row values. The function comes with a lot of parameters, but the
most important ones to consider for sort are:
● by: It is used to specify the column/row(s) which are used to determine the sorted order.
It is an optional parameter.
a
● axis: It specifies whether the sorting is to be performed for a row or column and the
value is 0 and 1 respectively.
at
● ascending: It specifies whether to sort the dataframe in ascending or descending order.
The default value is set to ascending. If the value is set as ascending=False it will sort in
descending order.
D
A. fillna(): It fills the NaN values with a given number with which you want to substitute. It gives
you the option to fill according to the index of rows of a pd.DataFrame or on the name of the
columns in the form of a python dict.
ro
interpolate(): It gives you the flexibility to fill the missing values with many kinds of
interpolations between the values like linear, time, etc.
G
Q.17 Programs:
● Group and Aggregate Data:
def group_and_aggregate(df, group_column, aggregation_column,
aggregation_function):
return df.groupby(group_column)[aggregation_column].agg(aggregation_function)
● Merge Two DataFrames:
def merge_dataframes(df1, df2, merge_column):
return pd.merge(df1, df2, on=merge_column)
ills
def rolling_mean(df, column, window):
return df[column].rolling(window).mean()
Sk
● Fill Missing Values with the Mean:
def fill_missing_with_mean(df):
return df.fillna(df.mean())
a
● Sort DataFrame Rows by a Column:
at
def sort_dataframe(df, sort_column):
return df.sort_values(by=sort_column)
D
ills
data = {'Date': ['2023-08-01', '2023-08-01', '2023-08-02', '2023-08-02'],
'Product': ['A', 'B', 'A', 'B'],
'Sales': [100, 200, 150, 250]}
Sk
df = pd.DataFrame(data)
pivot_table = df.pivot_table(index='Date', columns='Product', values='Sales', aggfunc='sum')
print(pivot_table) a
Output:
at
Product A B
Date
D
Q.2 In pandas, how can you efficiently handle a large dataset that doesn't fit into memory?
ro
A. You can efficiently handle a large dataset that doesn't fit into memory in pandas by reading
and processing the data in smaller chunks using the chunksize parameter of functions like
read_csv() or read_sql(). This allows you to work with the data incrementally without loading
G
ills
print(result)
Output:
Name Age Salary
Sk
0 Bob 30 60000
1 David 28 55000
a
Q.4 What is the purpose of the applymap() function in pandas?
A. The applymap() function in pandas is used to apply a specified function element-wise to
at
every element in a DataFrame. It is particularly useful when you want to perform a custom
operation on each individual element of the DataFrame. This function is available on both
DataFrames and Series.
D
Syntax:
df.applymap(func)
w
Q.5 Describe the concept of "reshaping" in pandas. How can you reshape data using functions
ro
DataFrame, typically by reorganizing rows and columns. This can help in better understanding
and analysis of the data. Three commonly used functions for reshaping data in pandas are
pivot(), melt(), and stack().
1. pivot() Function:
● The pivot() function is used to reshape data by changing the layout of columns
and rows.
● It takes columns as index, columns, and values arguments to specify the new
arrangement.
● It aggregates data if there are multiple rows with the same index and column
values.
● Example:
pivot_table = df.pivot(index='Date', columns='Product', values='Sales')
2. melt() Function:
ills
● The melt() function is used to transform a wide DataFrame into a long one by
"melting" columns into rows.
● It gathers columns into two new columns: one for variable names and another
Sk
for corresponding values.
● Useful when you have multiple columns representing different categories or time
periods.
● Example: a
melted_df = df.melt(id_vars='Date', value_vars=['ProductA', 'ProductB'],
var_name='Product', value_name='Sales')
at
3. stack() Function:
● The stack() function is used to reshape data by "stacking" the specified columns
D
● Example:
ro
stacked_df = df.set_index('Date').stack()
G
Q.6 How can you efficiently merge two DataFrames in pandas with a large number of rows and
columns?
A. To efficiently merge two large DataFrames in pandas:
1. Sort and Index: Sort DataFrames by merging columns and set them as indexes.
2. Specify Merge Columns: Use on or left_on and right_on parameters.
3. Choose Merge Type: Decide inner, outer, left, or right merge.
4. Reduce Memory: Downcast data types, convert to categorical, remove unnecessary
columns.
5. Use merge(): Prefer it over join() for more control.
ills
allowing you to analyze and summarize data more effectively.
Syntax:
pd.cut(x, bins, labels=None)
Sk
Q.8 Why do we need groupby in pandas and what is the use of index in it?
A. In pandas, the groupby function is used to group data based on one or more columns in a
DataFrame. When you use groupby and then apply an aggregation function (e.g., sum, mean,
a
etc.), by default, pandas will use the grouping columns as the new index for the resulting
DataFrame. This can be desirable in some cases, as it provides a clearer representation of the
at
grouped data.
However, in certain scenarios, you might want to keep the original index of the DataFrame even
D
after applying the groupby operation. This is where the index=False parameter comes into play.
When you set index=False while performing a groupby and aggregation operation, pandas will
reset the index of the resulting DataFrame and use the default integer index instead of the
grouping columns as the index.
w
ro
Q.9 Problems:
● Write a Pandas program to replace the 'qualify' column containing the values 'yes' and
'no' with True and False.
G
ills
print("\nReplace the 'qualify' column contains the values 'yes' and 'no' with True and
False:")
df['qualify'] = df['qualify'].map({'yes': True, 'no': False})
Sk
print(df)
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
ro
df = pd.DataFrame(exam_data , index=labels)
print(list(df.columns.values))
G
ills
● Write a Pandas program to count city wise number of people from a given of data set
(city, name of the person).
Sample data:
Sk
city Number of people
0 California 4
1 Georgia 2
2 Los Angeles 4
Answer:
a
at
import pandas as pd
df1 = pd.DataFrame({'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
D
g1 = df1.groupby(["city"]).size().reset_index(name='Number of people')
print(g1)
ro
● Write a Pandas program to delete DataFrame row(s) based on given column value.
G
Sample data:
Original DataFrame
col1 col2 col3
0 1 4 7
1 4 5 8
2 3 6 9
3 4 7 0
4 5 8 1
New DataFrame
col1 col2 col3
0 1 4 7
ills
2 3 6 9
3 4 7 0
4 5 8 1
Sk
Answer:
import pandas as pd
import numpy as np
d = {'col1': [1, 4, 3, 4, 5], 'col2': [4, 5, 6, 7, 8], 'col3': [7, 8, 9, 0, 1]}
df = pd.DataFrame(data=d)
a
print("Original DataFrame")
print(df)
at
df = df[df.col2 != 5]
print("New DataFrame")
print(df)
D
● Write a Pandas program to replace all the NaN values with Zero's in a column of a
w
dataframe.
import pandas as pd
ro
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
G
ills
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
'Matthew', 'Laura', 'Kevin', 'Jonas'],
Sk
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
df = pd.DataFrame(exam_data)
print("Original DataFrame")
a
print(df)
print("\nAfter converting index in a column:")
at
df.reset_index(level=0, inplace=True)
print(df)
print("\nHiding index:")
D
print( df.to_string(index=False))
w
ro
DATA VISUALIZATION
EASY:
G
Q:3 What are some best practices for designing effective visualizations?
ills
A: Some best practices for designing effective visualizations include keeping it simple, using
appropriate colors and fonts, labeling axes and titles clearly, providing context and explanations,
and using appropriate scales and axes.
Sk
Q:4 What is Matplotlib?
A: Matplotlib is a Python plotting library that provides a wide range of 2D and 3D plots for
visualizing data. It is widely used for creating static, interactive, and animated visualizations in
Python.
a
at
Q.5 What is Seaborn?
A: Seaborn is a Python visualization library built on top of Matplotlib that provides a higher-level
D
interface for creating statistical graphics. It is used for creating visually appealing and
informative statistical graphics such as heatmaps, pair plots, etc.
w
Matplotlib, run "pip install matplotlib" in the command line. To install Seaborn, run "pip install
seaborn" in the command line.
G
MEDIUM:
Q.1 Explain the difference between Matplotlib and Seaborn. When would you use one over the
other?
A. Matplotlib and Seaborn are both popular Python libraries used for data visualization, but
they have different focuses and design philosophies. Here's an explanation of when you might
choose one over the other:
Matplotlib:
ills
● Matplotlib is a versatile and foundational plotting library in Python. It provides a wide
range of functionalities for creating static, interactive, and animated visualizations.
● It offers fine-grained control over plot elements, allowing you to customize every aspect
of your plot.
Sk
● Matplotlib follows a low-level approach, meaning you have to write more code to
achieve certain visualizations.
● It's highly customizable and is well-suited for creating complex plots from scratch.
● Matplotlib serves as the foundation for many other data visualization libraries, making it
a fundamental tool for any data scientist or analyst.
a
● Use Matplotlib when you need precise control over plot customization and want to
create complex or highly customized visualizations.
at
Seaborn:
● Seaborn is built on top of Matplotlib and provides a high-level interface for creating
D
● Use Seaborn when you want to quickly create visually appealing statistical visualizations
without delving into the details of plot customization.
In summary, Matplotlib is a powerful and flexible library that gives you granular control over
plot creation, while Seaborn is focused on simplifying the creation of attractive statistical
visualizations. Depending on your needs, you might choose Matplotlib when you require
extensive customization or Seaborn when you want to create visually pleasing and informative
statistical graphics more quickly. Additionally, you can also use both libraries together,
leveraging Matplotlib for fine-tuned customization and Seaborn for streamlined statistical
visualization.
ills
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 3, 5]
Sk
plt.scatter(x, y)
plt.show()
a
Q:3 How do you create a line plot in Matplotlib?
at
A: A line plot can be created in Matplotlib using the "plot" function. For example, the following
code will create a line plot of x and y coordinates:
D
y = [2, 4, 1, 3, 5]
plt.plot(x, y)
ro
plt.show()
G
ills
plt.xlim(0, 6)
plt.ylim(0, 6)
plt.legend(["Data"])
Sk
plt.grid(True)
plt.show()
a
Q:5 How do you create a bar plot in Matplotlib?
A: A bar plot can be created in Matplotlib using the "bar" function. For example, the following
at
code will create a bar plot of x and y values:
import matplotlib.pyplot as plt
D
plt.bar(x, y)
plt.show()
ro
A. Both histograms and bar charts are graphical tools used to visualize data distributions.
However, they are used for different types of data and serve different purposes. Here's a
breakdown of the differences between a histogram and a bar chart:
Histogram:
● A histogram is used to visualize the distribution of continuous or quantitative data.
● It divides the data range into intervals (bins) and counts the frequency or number of
data points that fall into each interval.
● The x-axis represents the data range (intervals) while the y-axis represents the frequency
or count of data points in each interval.
● Histograms provide insight into the underlying distribution of data, showing patterns like
skewness, central tendency, and spread.
● There are no gaps between the bars in a histogram, as it represents continuous data.
● Histograms are commonly used for analyzing data such as age distribution, income
distribution, exam scores, etc.
Bar Chart:
ills
● A bar chart is used to visualize categorical or qualitative data.
● It displays the frequency, count, or proportion of different categories or groups.
● The x-axis represents the categories or groups, while the y-axis represents the frequency,
count, or proportion associated with each category.
Sk
● Bar charts are often used to compare different categories and identify patterns or trends
among them.
● There are gaps between the bars in a bar chart, as it represents distinct categories or
groups.
● Bar charts are commonly used for comparing items like sales by product, population by
a
region, survey responses, etc.
In summary, the key distinction between a histogram and a bar chart lies in the type of data
at
they are used to visualize. Histograms are used for continuous data distributions, while bar
charts are used for categorical data comparisons
D
Q.7 How do you decide whether to use a bar chart or a line chart?
A: A bar chart is used to compare categorical or discrete data, while a line chart is used to show
w
trends or changes over time. When deciding which to use, consider the type of data you have
and the question you are trying to answer.
ro
Q.8 Programs:
G
● Create a line plot using Matplotlib or Seaborn to visualize the trend of monthly sales for
a year. Use random data to simulate the sales values.
Sample answer:
#Line Plot - Visualizing Monthly Sales Trend:
ills
plt.xlabel('Month')
plt.ylabel('Sales')
plt.xticks(months)
plt.grid(True)
Sk
plt.show()
● Generate a scatter plot using Matplotlib or Seaborn to show the relationship between
two continuous variables, such as age and income, from a given dataset.
Sample Answer: a
# Scatter Plot - Relationship between Age and Income:
import matplotlib.pyplot as plt
at
import numpy as np
np.random.seed(0)
age = np.random.randint(18, 65, size=100)
income = np.random.randint(20000, 100000, size=100)
w
plt.ylabel('Income')
plt.grid(True)
plt.show()
● Create a bar chart using Matplotlib or Seaborn to display the top 10 countries with the
highest GDP. Use a dataset containing country names and GDP values.
Sample Answer:
# Bar Chart - Top 10 Countries by GDP:
import matplotlib.pyplot as plt
ills
plt.figure(figsize=(10, 6))
plt.bar(countries, gdp_values, color='green')
plt.title('Top 10 Countries by GDP')
plt.xlabel('Country')
Sk
plt.ylabel('GDP ($ Trillion)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
a
● Generate a histogram using Matplotlib or Seaborn to visualize the distribution of exam
scores. Use a dataset containing student names and their scores.
at
Sample Answer:
# . Histogram - Exam Score Distribution:
D
plt.figure(figsize=(8, 5))
plt.hist(scores, bins=10, edgecolor='black', alpha=0.7)
plt.title('Exam Score Distribution')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
● Create a pie chart using Matplotlib or Seaborn to show the percentage distribution of
different types of expenses in a monthly budget.
Sample Answer:
# Pie Chart - Monthly Expenses Distribution:
ills
import matplotlib.pyplot as plt
# Sample data: Expense categories and percentages
categories = ['Rent', 'Food', 'Transport', 'Entertainment', 'Utilities']
Sk
percentages = [30, 20, 15, 10, 25]
# Create a pie chart
plt.figure(figsize=(8, 8))
plt.pie(percentages, labels=categories, autopct='%1.1f%%', startangle=140,
colors=plt.cm.Paired.colors)
a
plt.title('Monthly Expenses Distribution')
plt.axis('equal') # Equal aspect ratio ensures a circular pie
at
plt.show()
● Generate a box plot using Matplotlib or Seaborn to compare the distribution of heights
D
among different age groups. Use a dataset with age and height values.
Sample Answer:
# Box Plot - Height Distribution by Age Group:
w
import numpy as np
np.random.seed(0)
heights = np.random.normal(160, 10, size=300)
age_groups = np.repeat(['18-25', '26-35', '36-45'], 100)
● Generate a pair plot using Seaborn to explore the relationships between multiple
numerical variables in a dataset. Include color differentiation based on a categorical
variable.
ills
Sample Answer:
# Pair Plot - Relationships between Numerical Variables:
import seaborn as sns
Sk
import pandas as pd
import numpy as np
# Generate random data with multiple numerical variables and a categorical variable
np.random.seed(0) a
data = pd.DataFrame({
'A': np.random.normal(0, 1, 100),
at
'B': np.random.normal(1, 2, 100),
'C': np.random.normal(2, 3, 100),
'Category': np.random.choice(['X', 'Y'], size=100)
D
})
ills
import matplotlib.pyplot as plt
# Create a figure with 2 rows and 2 columns of subplots
Sk
# The number 221 means 2 rows, 2 columns, and subplot 1 (top-left)
plt.figure(figsize=(10, 6)) # Create a new figure
# Create the first subplot (top-left)
plt.subplot(2, 2, 1) # 2 rows, 2 columns, subplot 1
a
plt.plot([0, 1], [0, 1]) # Example plot
at
# Create the second subplot (top-right)
plt.subplot(2, 2, 2) # 2 rows, 2 columns, subplot 2
D
Q.3 Explain the concept of "tidy data" and why it's important in data analysis.
A. Tidy data is a structured format where each variable forms a column, each observation forms
ills
a row, and each type of observational unit forms a table. It simplifies data manipulation and
analysis.
Sk
Q.4 Define the term 'Data Wrangling in Data Analytics.
A. Data Wrangling is the process wherein raw data is cleaned, structured, and enriched into a
desired usable format for better decision making. It involves discovering, structuring, cleaning,
enriching, validating, and analyzing data. This process can turn and map out large amounts of
a
data extracted from various sources into a more useful format. Techniques such as merging,
grouping, concatenating, joining, and sorting are used to analyze the data. Thereafter it gets
at
ready to be used with another dataset.
D
Q.5 What are the various steps involved in any analytics project?
A. The various steps involved in any common analytics projects are as follows:
▪ Understanding the Problem: Understand the business problem, define the organizational
w
on your priorities.
▪ Cleaning Data: Clean the data to remove unwanted, redundant, and missing values, and
make it ready for analysis.
G
▪ Exploring and Analyzing Data: Use data visualization and business intelligence tools to
analyze data.
▪ Interpreting the Results: Interpret the results to find out hidden patterns, future trends,
and gain insights.
Q.6 What are the common problems that data analysts encounter during analysis?
A. The common problems steps involved in any analytics project are:
● Handling duplicate
● Collecting the meaningful right data and the right time
● Handling data purging and storage problems
● Making data secure and dealing with compliance issues
ills
● Exploratory data analysis (EDA) helps to understand the data better.
● It helps you obtain confidence in your data to a point where you’re ready to engage a
machine learning algorithm.
Sk
● It allows you to refine your selection of feature variables that will be used later for
model building.
● You can discover hidden trends and insights from the data.
Q.8 Programs:
a
at
● You are given a dataset containing information about monthly sales for different
products. Each row represents a sale with columns 'Product', 'Date', and 'Amount'. Write
code to:
D
c) Create a line plot to visualize the sales trend over months for the top three products.
ro
Answer:
import pandas as pd
G
ills
monthly_sales = df.pivot_table(index='Date', columns='Product', values='Amount',
aggfunc='sum')
# c) Create line plot for top three products
Sk
top_products = total_sales.nlargest(3).index
monthly_sales[top_products].plot(kind='line')
plt.title('Monthly Sales Trend for Top Products')
plt.xlabel('Month')
a
plt.ylabel('Sales Amount')
at
plt.legend(title='Product')
plt.grid(True)
D
plt.show()
w
● You have a dataset containing information about customer ratings for different
categories. Each row represents a rating with columns 'Category', 'Customer', and
'Rating'. Write code to:
ro
customers as index, showing the average rating for each category and customer.
c) Create a grouped box plot to visualize the distribution of ratings across different
categories.
Answer:
import pandas as pd
import matplotlib.pyplot as plt
# Sample data
data = {
'Category': ['A', 'B', 'C', 'A', 'B', 'C'],
'Customer': ['C1', 'C2', 'C3', 'C1', 'C2', 'C3'],
'Rating': [4, 5, 3, 2, 4, 5]
ills
}
# Create DataFrame
df = pd.DataFrame(data)
Sk
# a) Calculate average rating for each category
average_rating = df.groupby('Category')['Rating'].mean()
a
# b) Transform data for average ratings per category per customer
at
avg_ratings = df.pivot_table(index='Customer', columns='Category', values='Rating',
aggfunc='mean')
D
avg_ratings.boxplot(grid=False)
plt.title('Distribution of Ratings by Category')
ro
plt.ylabel('Rating')
plt.xlabel('Category')
G
plt.show()