Working with CSV files is a common task in data analysis and manipulation. Finding the average of each row in a CSV file becomes crucial when dealing with datasets. In this guide, we'll explore how to achieve this using Python, providing a step-by-step approach and practical examples.
dataset.csv
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2
24.59,3.61,Female,No,Sun,Dinner,4
25.29,4.71,Male,No,Sun,Dinner,4
8.77,2.0,Male,No,Sun,Dinner,2
26.88,3.12,Male,No,Sun,Dinner,4
15.04,1.96,Male,No,Sun,Dinner,2
14.78,3.23,Male,No,Sun,Dinner,2
10.27,1.71,Male,No,Sun,Dinner,2
35.26,5.0,Female,No,Sun,Dinner,4
15.42,1.57,Male,No,Sun,Dinner,2
18.43,3.0,Male,No,Sun,Dinner,4
Find the Average of Each Row in Python
Below are some of the ways by which we can find the average of each row in a CSV file in Python:
- Using the CSV module
- Using Pandas
- Using NumPy
Average of Each Row Using the CSV module
After converting the data into numerical values, we can find the average of each row. To calculate the average, we use the sum() function to add up all the numerical values in the row, and the len() function to determine the number of values. By dividing the sum by the length, we obtain the average for that particular row.
Python3
import csv
def convert_to_float(value):
try:
return float(value)
except ValueError:
return None
with open('dataset.csv', 'r') as file:
# Create a CSV reader object
csv_reader = csv.reader(file)
# Skip the header if it exists
next(csv_reader, None)
for row in csv_reader:
values = []
# Iterate through each value in the row
for value in row:
converted_value = convert_to_float(value)
# Check if the conversion was successful (not None)
if converted_value is not None:
values.append(converted_value)
# Check if there are numeric values in the row
if values:
# Calculate the row average
row_average = sum(values) / len(values)
# Print the result
print(f"Row: {row}, Average: {row_average}")
Output:
Row: ['16.99', '1.01', 'Female', 'No', 'Sun', 'Dinner', '2'], Average: 6.666666666666667
Row: ['10.34', '1.66', 'Male', 'No', 'Sun', 'Dinner', '3'], Average: 5.0
Row: ['21.01', '3.5', 'Male', 'No', 'Sun', 'Dinner', '3'], Average: 9.17
Row: ['23.68', '3.31', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 9.663333333333332
Row: ['24.59', '3.61', 'Female', 'No', 'Sun', 'Dinner', '4'], Average: 10.733333333333334
Row: ['25.29', '4.71', 'Male', 'No', 'Sun', 'Dinner', '4'], Average: 11.333333333333334
Row: ['8.77', '2.0', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 4.256666666666667
Row: ['26.88', '3.12', 'Male', 'No', 'Sun', 'Dinner', '4'], Average: 11.333333333333334
Row: ['15.04', '1.96', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 6.333333333333333
Row: ['14.78', '3.23', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 6.669999999999999
Row: ['10.27', '1.71', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 4.66
Row: ['35.26', '5.0', 'Female', 'No', 'Sun', 'Dinner', '4'], Average: 14.753333333333332
Row: ['15.42', '1.57', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 6.329999999999999
Row: ['18.43', '3.0', 'Male', 'No', 'Sun', 'Dinner', '4'], Average: 8.476666666666667
Average of Each Row Using Pandas
In this example, the Pandas
library is used to read a CSV file (dataset.csv
) and calculate the average of numeric values for each row in the specified columns ('total_bill', 'tip', 'size'). The function average_of_each_row
reads the CSV file, drops any NaN values from the specified columns, calculates the row-wise mean, and prints the average along with the corresponding row values.
Python3
import pandas as pd
def average_of_each_row(csv_file, numeric_columns=None):
df = pd.read_csv(csv_file, usecols=numeric_columns)
for index, row in df.iterrows():
row = row.dropna() # Drop any NaN values
if not row.empty:
average = row.mean()
print("Average of row {} : {}".format(row.values, average))
csv_file = 'dataset.csv'
# Specify the columns that contain numeric data
numeric_columns = ['total_bill', 'tip', 'size']
average_of_each_row(csv_file, numeric_columns)
Output:
Average of row [16.99 1.01 2. ] : 6.666666666666667
Average of row [10.34 1.66 3. ] : 5.0
Average of row [21.01 3.5 3. ] : 9.17
Average of row [23.68 3.31 2. ] : 9.663333333333332
Average of row [24.59 3.61 4. ] : 10.733333333333334
Average of row [25.29 4.71 4. ] : 11.333333333333334
Average of row [8.77 2. 2. ] : 4.256666666666667
Average of row [26.88 3.12 4. ] : 11.333333333333334
Average of row [15.04 1.96 2. ] : 6.333333333333333
Average of row [14.78 3.23 2. ] : 6.669999999999999
Average of row [10.27 1.71 2. ] : 4.66
Average of row [35.26 5. 4. ] : 14.753333333333332
Average of row [15.42 1.57 2. ] : 6.329999999999999
Average of row [18.43 3. 4. ] : 8.476666666666667
Average of Each Row Using NumPy
In this example, the N
umPy library is used to read a CSV file (dataset.csv
) and calculate the average of each row in the dataset. The function average_of_each_row
employs np.genfromtxt
to load the data, removes NaN values from each row, calculates the row-wise mean using np.mean
, and prints the average along with the corresponding row index.
Python3
import numpy as np
def average_of_each_row(csv_file):
data = np.genfromtxt(csv_file, delimiter=',')
for i, row in enumerate(data):
row_without_nan = row[~np.isnan(row)] # Remove nan values from the row
average = np.mean(row_without_nan)
print("Average of row {}: {}".format(i + 1, average))
csv_file = 'dataset.csv'
average_of_each_row(csv_file)
Output:
Average of row 1: 6.666666666666667
Average of row 2: 5.0
Average of row 3: 9.17
Average of row 4: 9.663333333333332
Average of row 5: 10.733333333333334
Average of row 6: 11.333333333333334
Average of row 7: 4.256666666666667
Average of row 8: 11.333333333333334
Average of row 9: 6.333333333333333
Average of row 10: 6.669999999999999
Average of row 11: 4.66
Average of row 12: 14.753333333333332
Average of row 13: 6.329999999999999
Average of row 14: 8.476666666666667
Conclusion
To sum up, working with CSV files is important for analyzing data. This guide showed an easy way to find the average of each row in a CSV file using Python. We used a module called `csv` to handle the data efficiently. The key steps include turning text values into numbers, dealing with non-number data correctly, and going through the process step by step: opening the file, changing the data, and figuring out the averages using the different methods as discussed above.
Similar Reads
How To Calculate Average For Every Column In A Csv File
We are given a CSV file and our task is to find the average of each column in Python using different approaches. In this article, we will see how we can calculate the average for every column in a CSV file. Example: Input: data.csv Age,Salary 30,50000 25,60000 28,55000 Output: Average Age: 27.67, Av
3 min read
How To Create A Csv File Using Python
CSV stands for comma-separated values, it is a type of text file where information is separated by commas (or any other delimiter), they are commonly used in databases and spreadsheets to store information in an organized manner. In this article, we will see how we can create a CSV file using Python
3 min read
How to Add Numbers in a Csv File Using Python
When working with CSV files in Python, adding numbers in the CSV file is a common requirement. This article will guide you through the process of adding numbers within a CSV file. Whether you're new to data analysis or an experienced practitioner, understanding this skill is vital for efficient data
3 min read
Copy Rows and Columns in Excel Using Python
Manipulating data in Excel often involves tasks such as copying specific rows or columns based on certain conditions. Python, with its powerful libraries like Pandas and openpyxl, provides efficient solutions for handling Excel files. In this article, we will explore how to copy rows and columns in
3 min read
How to Append Data in Excel Using Python
We are given an Excel file and our task is to append the data into this excel file using Python. In this article, we'll explore different approaches to append data to an Excel file using Python. Append Data in Excel Using PythonBelow, are some examples to understand how to append data in excel using
2 min read
How to convert tab-separated file into a dataframe using Python
In this article, we will learn how to convert a TSV file into a data frame using Python and the Pandas library. A TSV (Tab-Separated Values) file is a plain text file where data is organized in rows and columns, with each column separated by a tab character. It is a type of delimiter-separated file,
4 min read
Convert CSV to JSON using Python
Converting CSV to JSON using Python involves reading the CSV file, converting each row into a dictionary and then saving the data as a JSON file. For example, a CSV file containing data like names, ages and cities can be easily transformed into a structured JSON array, where each record is represent
2 min read
How to Convert Tab-Delimited File to Csv in Python?
We are given a tab-delimited file and we need to convert it into a CSV file in Python. In this article, we will see how we can convert tab-delimited files to CSV files in Python. Convert Tab-Delimited Files to CSV in PythonBelow are some of the ways to Convert Tab-Delimited files to CSV in Python: U
2 min read
Applying a Window function to calculate differences in PySpark
Pyspark offers a very useful function, Window which is operated on a group of rows and returns a single value for every input row. Do you know you can even find the difference between two columns using the WIndow function? In this article, we will discuss applying a Window function to calculate diff
3 min read
Column Average in Record List - Python
Given a list of records where each record contains multiple fields, the task is to compute the average of a specific column. Each record is represented as a dictionary or a list, and the goal is to extract values from the chosen column and calculate their average. Letâs explore different methods to
3 min read