Data Analytics With Python Laboratory - Lab Manual
Data Analytics With Python Laboratory - Lab Manual
Lab Manual
C022523(022)
1. To be able to use Python for handling various data structures for data representation
and manipulation.
2. To be able to use Numpy for data handling.
3. To be able to use Pandas for data processing.
4. To be able to use Matplotlib for visual representation of data.
List of Experiments:
1. Write programs to understand the use ofPython Identifiers, Keywords,
Indentations, Comments in Python, Operators, Membership operator.
2. Write programs to understand the use of Python String, Tuple, List, Set, Dictionary,
File input/output.
3. Write programs to understand the use of Numpy’s Ndarray, Basic Operations,
Indexing, Slicing, and Iterating, Conditions and Boolean Arrays.
4. Write programs to understand the use of Numpy’s Shape Manipulation, Array
Manipulation, Vectorization.
5. Write programs to understand the use of Numpy’s Structured Arrays, Reading and
Writing Array Data on Files.
6. Write programs to understand the use of Pandas Series, Data Frame, Index
Objects, Reindexing, Dropping, Arithmetic and Data Alignment.
7. Write programs to understand the use of Pandas Functions by Element,
Functions by Row or Column,
Statistics Functions, Sorting and Ranking, Correlation and Covariance, “Not a
Number” Data.
8. Write programs to understand the use of Pandas for Reading and Writing Data
using CSV and Textual Files, HTML Files, XML, Microsoft Excel Files.
9. Write programs to understand the use of Matplotlib for Simple Interactive Chart,
Set the Properties of the Plot, matplotlib and NumPy.
10. Write programs to understand the use of Matplotlib for Working with Multiple
Figures and Axes, Adding Text, Adding a Grid, Adding a Legend, Saving the
Charts.
11. Write programs to understand the use of Matplotlib for Working with Line Chart,
Histogram, Bar Chart, Pie Charts.
Course Outcomes [After undergoing the course, students will be able to:]
1. Apply Python for handling various data structures for data representation and
manipulation.
2. ApplyNumpy for data handling.
3. Apply Pandas for data processing.
4. Apply Matplotlib for visual representation of data.
TABLE OF CONTENT
An identifier is a name used to identify a variable, function, class, or module. It can consist of
letters (A-Z, a-z), digits (0-9), and underscores (_), but it cannot begin with a digit.
Output:
Keywords are reserved words that cannot be used as identifiers. These words have special
meaning in Python and are predefined in the language.
Output:
Python Keywords: ['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class',
'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is',
'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']
Indentation is critical in Python, as it indicates the block of code belonging to a control structure
(such as loops, functions, classes).
greet("Alice")
greet("") # This will print "Hello, World!" because the name is empty.
Output:
Hello, Alice
Hello, World!
Comments are used to explain the code, and they are ignored by the Python interpreter.
Output:
Sum: 8
# Arithmetic Operators
a = 10
b=5
print("Arithmetic Operators:")
print(f"a + b = {a + b}") # Addition
print(f"a - b = {a - b}") # Subtraction
print(f"a * b = {a * b}") # Multiplication
print(f"a / b = {a / b}") # Division (float)
print(f"a // b = {a // b}") # Floor division (integer)
print(f"a % b = {a % b}") # Modulus (remainder)
print(f"a ** b = {a ** b}") # Exponentiation
print("\nComparison Operators:")
# Comparison Operators
print(f"a == b: {a == b}") # Equal to
print(f"a != b: {a != b}") # Not equal to
print(f"a > b: {a > b}") # Greater than
RSR Rungta College of Engineering & Technology Bhilai
B.Tech 5th CSE
Data Analytics with Python Lab Manual
print(f"a < b: {a < b}") # Less than
print(f"a >= b: {a >= b}") # Greater than or equal to
print(f"a <= b: {a <= b}") # Less than or equal to
print("\nLogical Operators:")
# Logical Operators
x = True
y = False
print("\nAssignment Operators:")
# Assignment Operators
c = 20
print(f"c = {c}")
c += 5 # c = c + 5
print(f"c += 5 => c = {c}")
c -= 3 # c = c - 3
print(f"c -= 3 => c = {c}")
c *= 2 # c = c * 2
print(f"c *= 2 => c = {c}")
c /= 4 # c = c / 4
print(f"c /= 4 => c = {c}")
print("\nBitwise Operators:")
# Bitwise Operators
x = 5 # Binary: 0101
y = 3 # Binary: 0011
print(f"x & y = {x & y}") # AND
print(f"x | y = {x | y}") # OR
print(f"x ^ y = {x ^ y}") # XOR
print(f"~x = {~x}") # NOT (two's complement)
print(f"x << 1 = {x << 1}") # Left shift
print(f"x >> 1 = {x >> 1}") # Right shift
print("\nMembership Operators:")
# Membership Operators
lst = [1, 2, 3, 4, 5]
print(f"3 in lst: {3 in lst}") # True, 3 is in the list
print(f"6 not in lst: {6 not in lst}") # True, 6 is not in the list
print("\nIdentity Operators:")
# Identity Operators
a = [1, 2, 3]
b=a
c = [1, 2, 3]
print(f"a is b: {a is b}") # True, both refer to the same object
print(f"a is c: {a is c}") # False, a and c are different objects
print(f"a is not c: {a is not c}") # True, a and c are not the same object
Arithmetic Operators:
a + b = 15
a-b=5
a * b = 50
a / b = 2.0
a // b = 2
a%b=0
a ** b = 100000
Comparison Operators:
a == b: False
a != b: True
a > b: True
a < b: False
a >= b: True
a <= b: False
Logical Operators:
x and y = False
x or y = True
not x = False
Assignment Operators:
c = 20
c += 5 => c = 25
c -= 3 => c = 22
c *= 2 => c = 44
c /= 4 => c = 11.0
Bitwise Operators:
x&y=1
x|y=7
x^y=6
~x = -6
x << 1 = 10
x >> 1 = 2
Membership Operators:
3 in lst: True
6 not in lst: True
Identity Operators:
a is b: True
a is c: False
a is not c: True
The membership operators in Python are in and not in, which are used to test whether a value is
in a sequence (such as a list, tuple, or string).
Output:
True
True
True
True
Experiment-02
AIM : Write programs to understand the use of Python String, Tuple, List, Set, Dictionary,
File input/output.
# Slicing a string
substring = string[7:13]
print("Sliced String:", substring)
# String length
length = len(string)
print("Length of String:", length)
# String reversal
reversed_string = string[::-1]
print("Reversed String:", reversed_string)
# String concatenation
new_string = string + " How are you?"
print("Concatenated String:", new_string)
# String split
split_string = string.split(",")
print("Split String:", split_string)
Output:
# Accessing elements
first_element = my_tuple[0]
print("First Element:", first_element)
# Tuple slicing
sub_tuple = my_tuple[1:4]
print("Sliced Tuple:", sub_tuple)
# Tuple length
tuple_length = len(my_tuple)
print("Length of Tuple:", tuple_length)
# Concatenating tuples
new_tuple = my_tuple + (6, 7)
print("Concatenated Tuple:", new_tuple)
Output:
First Element: 1
Sliced Tuple: (2, 3, 4)
Length of Tuple: 5
Concatenated Tuple: (1, 2, 3, 4, 5, 6, 7)
# Accessing elements
third_element = my_list[2]
print("Third Element:", third_element)
# List slicing
sub_list = my_list[2:4]
print("Sliced List:", sub_list)
# Adding elements
my_list.append(60)
print("List After Append:", my_list)
# Removing an element
my_list.remove(40)
print("List After Remove:", my_list)
Output:
# Adding an element
my_set.add(6)
print("Set After Add:", my_set)
# Removing an element
my_set.remove(3)
print("Set After Remove:", my_set)
# Set length
set_length = len(my_set)
print("Length of Set:", set_length)
# Set union
set2 = {4, 5, 6, 7, 8}
union_set = my_set | set2
print("Union of Sets:", union_set)
# Set intersection
intersection_set = my_set & set2
print("Intersection of Sets:", intersection_set)
Output:
# Modifying a value
my_dict["age"] = 26
print("Modified Dictionary:", my_dict)
Output:
# Writing to a file
with open("example.txt", "w") as file:
file.write("Hello, this is a test file.\n")
file.write("We are writing to it using Python.\n")
print("Data written to file.")
Output:
Experiment-03
AIM: Write programs to understand the use of Numpy’s Ndarray, Basic Operations,
Indexing, Slicing, and Iterating, Conditions and Boolean Arrays.
import numpy as np
# Creating a 3D ndarray
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("3D ndarray:\n", array_3d)
Output:
1D ndarray: [1 2 3 4 5]
2D ndarray:
[[1 2 3]
[4 5 6]]
3D ndarray:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
3.2 Basic Operations with Ndarray
# Addition
sum_array = array_a + array_b
print("Array addition:", sum_array)
# Subtraction
diff_array = array_a - array_b
print("Array subtraction:", diff_array)
# Multiplication
prod_array = array_a * array_b
print("Array multiplication:", prod_array)
# Division
div_array = array_a / array_b
print("Array division:", div_array)
# Scalar multiplication
scalar_prod = array_a * 3
print("Scalar multiplication:", scalar_prod)
Output:
Array addition: [5 5 5 5]
Array subtraction: [-3 -1 1 3]
Array multiplication: [4 6 6 4]
Array division: [0.25 0.66666667 1.5 4. ]
Scalar multiplication: [3 6 9 12]
Indexing and slicing in NumPy is similar to Python lists, but with more flexibility.
python
# Indexing in 1D ndarray
array_1d = np.array([10, 20, 30, 40, 50])
print("Element at index 2:", array_1d[2])
# Slicing in 1D ndarray
slice_1d = array_1d[1:4] # Get elements from index 1 to 3
print("Sliced 1D array:", slice_1d)
# Indexing in 2D ndarray
array_2d = np.array([[10, 20, 30], [40, 50, 60]])
RSR Rungta College of Engineering & Technology Bhilai
B.Tech 5th CSE
Data Analytics with Python Lab Manual
print("Element at (1,2) in 2D array:", array_2d[1, 2])
# Slicing in 2D ndarray
slice_2d = array_2d[:, 1:3] # Select all rows, columns 1 and 2
print("Sliced 2D array:\n", slice_2d)
Output:
lua
Element at index 2: 30
Sliced 1D array: [20 30 40]
Element at (1,2) in 2D array: 60
Sliced 2D array:
[[20 30]
[50 60]]
We can iterate over elements of a NumPy array just like a list. However, NumPy provides
efficient ways to work with elements.
python
Output:
mathematica
Element: 1
Element: 2
Element: 3
Element: 4
Element: 5
Row: [1 2 3]
Element in row: 1
Element in row: 2
Element in row: 3
Row: [4 5 6]
Element in row: 4
RSR Rungta College of Engineering & Technology Bhilai
B.Tech 5th CSE
Data Analytics with Python Lab Manual
Element in row: 5
Element in row: 6
NumPy allows efficient element-wise comparisons, and you can use Boolean arrays for filtering
and conditional operations.
python
# Creating an array
array = np.array([10, 20, 30, 40, 50])
Output:
Condition array (elements > 25): [False False True True True]
Filtered Array (elements > 25): [30 40 50]
Filtered Array (elements between 20 and 40): [30]
Conditional Replacement: ['Lesser or Equal' 'Lesser or Equal' 'Greater' 'Greater' 'Great
Experiment-04
AIM: Write programs to understand the use of Numpy’s Shape Manipulation, Array
Manipulation, Vectorization.
Shape manipulation involves reshaping arrays, adding/removing dimensions, and modifying the
shape of arrays without changing their data.
python
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5, 6])
print("Original 1D array:", array_1d)
Output:
Original 1D array: [1 2 3 4 5 6]
Reshaped into 2D array:
[[1 2 3]
[4 5 6]]
Flattened array: [1 2 3 4 5 6]
Resized array (in-place):
[[1 2]
[3 4]
[5 6]]
Transposed array:
[[1 4]
[2 5]
[3 6]]
Expanded array (added dimension):
[[1 2 3 4 5 6]]
Array manipulation includes operations like stacking, splitting, and combining arrays.
python
Output:
Array 1: [1 2 3]
Array 2: [4 5 6]
Array 3: [7 8 9]
Vertical Stack:
[[1 2 3]
[4 5 6]
[7 8 9]]
Horizontal Stack: [1 2 3 4 5 6 7 8 9]
Split array: [array([1, 2]), array([3, 4]), array([5, 6, 7, 8, 9])]
Combined array (along new axis):
[[1 4 7]
[2 5 8]
[3 6 9]]
Vectorization allows us to perform operations on entire arrays or large datasets without needing
explicit loops. This results in faster code.
# Element-wise multiplication
prod_result = array_x * array_y
print("Element-wise multiplication:", prod_result)
Output
Experiment-05
AIM: Write programs to understand the use of Numpy’s Structured Arrays, Reading and
Writing Array Data on Files.
Structured arrays allow you to store heterogeneous data types in an efficient manner, similar to
a database table or a spreadsheet.
Example:
import numpy as np
# Define a structured array with fields: name (string), age (int), and height (float)
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]
print("\nHeights:")
print(data['height'])
Output:
less
Structured Array:
[('Alice', 25, 5.5) ('Bob', 30, 5.9) ('Charlie', 35, 6.1)]
Names:
['Alice' 'Bob' 'Charlie']
Ages:
[25 30 35]
Heights:
[5.5 5.9 6.1]
NumPy provides functions like np.save(), np.load(), np.savetxt(), and np.loadtxt() to read and
write arrays to files. We will show both binary and text file operations.
Example:
import numpy as np
Experiment-06
AIM: Write programs to understand the use of Pandas Series, Data Frame, Index
Objects, Reindexing, Dropping, Arithmetic and Data Alignment.
Pandas Series is a one-dimensional labeled array that can hold any data type. It can be created
from lists, NumPy arrays, or dictionaries.
Example:
python
import pandas as pd
Output:
Pandas Series:
0 10
1 20
2 30
3 40
4 50
dtype: int64
A DataFrame is a two-dimensional table of data with labeled axes (rows and columns). It is
similar to a database table, Excel spreadsheet, or dictionary of Series objects.
Example:
import pandas as pd
df = pd.DataFrame(data)
Output:
Pandas DataFrame:
Name Age Height
0 Alice 25 5.5
1 Bob 30 5.9
2 Charlie 35 6.1
Pandas Index objects are immutable and are used to label the axes of DataFrames and Series.
You can create your own Index objects and assign them to a DataFrame or Series.
Example:
import pandas as pd
df = pd.DataFrame(data, index=index)
Output:
DataFrame with custom Index:
Name Age
a Alice 25
b Bob 30
c Charlie 35
d David 40
e Eva 45
Index Object:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Reindexing allows you to conform a DataFrame to a new index with optional filling logic.
Example:
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
# New index
new_index = ['a', 'b', 'c', 'd']
Output:
Reindexed DataFrame:
Name Age
a Alice 25.0
b Bob 30.0
c Charlie 35.0
d NaN NaN
Pandas allows you to drop rows or columns from a DataFrame using the drop() method.
Example:
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Height': [5.5, 5.9, 6.1]}
df = pd.DataFrame(data)
Output:
Pandas automatically aligns data when performing arithmetic operations between Series and
DataFrames, taking care of missing values.
Example:
import pandas as pd
Output:
r
Experiment-07
Pandas provides vectorized operations to apply functions element-wise over Series and
DataFrames.
import pandas as pd
Output:
yaml
Original Series:
0 10
1 20
2 30
3 40
4 50
dtype: int64
You can apply functions across rows or columns using the apply() method for DataFrames. This
is useful for aggregating data along a specific axis (rows or columns).
import pandas as pd
# Create a DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)
Output:
Original DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Pandas provides built-in methods to calculate common statistical measures on DataFrames and
Series.
import pandas as pd
# Create a DataFrame
data = {'A': [10, 20, 30, 40, 50], 'B': [15, 25, 35, 45, 55]}
df = pd.DataFrame(data)
Output:
Original DataFrame:
A B
0 10 15
1 20 25
2 30 35
3 40 45
4 50 55
Pandas provides functions to sort and rank data by index or by column values.
import pandas as pd
# Create a DataFrame
data = {'A': [50, 30, 20, 40, 10], 'B': [55, 35, 25, 45, 15]}
df = pd.DataFrame(data)
Output:
Original DataFrame:
A B
0 50 55
1 30 35
2 20 25
3 40 45
4 10 15
Pandas allows you to calculate correlation and covariance between columns of a DataFrame.
import pandas as pd
# Create a DataFrame
data = {'A': [10, 20, 30, 40, 50], 'B': [15, 25, 35, 45, 55]}
df = pd.DataFrame(data)
# Calculate correlation
correlation = df.corr()
# Calculate covariance
covariance = df.cov()
print("\nCorrelation Matrix:")
print(correlation)
print("\nCovariance Matrix:")
print(covariance)
Output:
Original DataFrame:
A B
0 10 15
1 20 25
2 30 35
3 40 45
4 50 55
Correlation Matrix:
A B
A 1.0 1.0
B 1.0 1.0
Covariance Matrix:
A B
A 250.0 250.0
B 250.0 250.0
import pandas as pd
import numpy as np
Output:
Experiment-08
AIM: Write programs to understand the use of Pandas for Reading and Writing Data
using CSV and Textual Files, HTML Files, XML, Microsoft Excel Files.
Pandas provides functions like read_csv() and to_csv() to read and write CSV files. Similarly,
you can work with text files using read_csv() for textual data.
import pandas as pd
Output:
Pandas also supports reading and writing HTML files using read_html() and to_html().
import pandas as pd
# Reading HTML file (note: returns a list of DataFrames if multiple tables are present)
df_html = pd.read_html('data.html')[0]
Output:
In this example:
Pandas can also handle XML files using the read_xml() and to_xml() functions. The
xml.etree.ElementTree library is used as a default parser.
Output:
Data read from XML file:
Name Age Height
0 Alice 25 5.5
1 Bob 30 5.9
2 Charlie 35 6.1
In this example:
Pandas supports reading and writing Excel files using read_excel() and to_excel(). You need to
install openpyxl or xlrd for handling Excel files.
Output:
Data read from Excel file:
Name Age Height
0 Alice 25 5.5
1 Bob 30 5.9
2 Charlie 35 6.1
Experiment-09
RSR Rungta College of Engineering & Technology Bhilai
B.Tech 5th CSE
Data Analytics with Python Lab Manual
AIM: Write programs to understand the use of Matplotlib for Simple Interactive Chart,
Set the Properties of the Plot, matplotlib and NumPy.
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
# Creating a basic line plot
plt.plot(x, y, marker='o') # 'marker' adds points on the line
plt.title("Simple Interactive Line Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
This chart allows you to plot values and provides a starting point for adding moreinteractive
elements.
9.2 Set the Properties of the Plot
You can set properties such as line style, color, and marker to make the chart morereadable.
import matplotlib.pyplot as plt
plt.plot(x, y, color='green', linestyle='--', marker='o', markersize=8, linewidth=2)
plt.title("Customized Line Chart", fontsize=14, color='blue')
plt.xlabel("X-axis", fontsize=12)
plt.ylabel("Y-axis", fontsize=12)
plt.grid(True) # Adds a grid for better readabilityplt.show()
Explanation:
9.3 color='green': Sets the line color.
9.4 linestyle='--': Makes the line dashed.
9.5 marker='o': Adds markers at each data point.
9.6 markersize=8: Sets marker size.
9.7 linewidth=2: Sets line thickness.
9.8 fontsize and color adjust the axis and title fonts.
9.3 Matplotlib and NumPy
NumPy integrates seamlessly with matplotlib, enabling mathematical operationsdirectly on
datasets.
import numpy as np
import matplotlib.pyplot as plt
# Generating data with numpy
x = np.linspace(0, 10, 100) # 100 points between 0 and 10y =
np.sin(x)
plt.plot(x, y, label='sin(x)')
plt.title("Sinusoidal Function")
plt.xlabel("X values")
plt.ylabel("sin(X)")
plt.legend() plt.show()
Experiment-10
AIM: Write programs to understand the use of Matplotlib for Working with Multiple
Figures and Axes, Adding Text, Adding a Grid, Adding a Legend, Saving the Charts.
#First figure
plt.figure(1)
10.3Adding Text
import matplotlib.pyplot as plt
Explanation:
10.2 plt.text(x, y, "text"): Adds a text label at the specified (x, y) coordinates.
plt.xlabel("X values")
plt.ylabel("Y values")
plt.legend(loc="upper right") # Position the legendplt.show()
Explanation:
10.6 label="text" assigns a label to each line.
10.7 plt.legend(loc="upper right") places the legend at the top right.
Explanation:
11.4 dpi=300 specifies the resolution.
11.5 format='png' saves as a PNG file.
11.2: Histogram
Histograms are used to show the distribution of a dataset.
import matplotlib.pyplot as plt
data = np.random.randn(1000) # Random data plt.hist(data, bins=30,
color='green', alpha=0.7) plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
Explanation:
bins=30: Divides the data range into 30 intervals.
alpha=0.7: Sets transparency.