Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
16 views

Python Cod1

Uploaded by

Monica H N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Python Cod1

Uploaded by

Monica H N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Python Code:

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

# Step 1: Load the dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart.csv"

df = pd.read_csv(url)

# Step 2: Display the first few rows of the dataset

print("Initial Data:\n", df.head())

# Step 3: Check for missing values

print("Missing Values:\n", df.isnull().sum())

# Step 4: Handle missing values (if any)

# For this dataset, there are no missing values, but if there were, you could use:

# df.fillna(method='ffill', inplace=True) # Forward fill or drop missing values

# Step 5: Display the data types

print("Data Types:\n", df.dtypes)

# Step 6: String manipulation example (if needed)

# Example: Clean a string column (if applicable)

# df['gender'] = df['gender'].str.lower().str.strip()

# Step 7: Convert relevant columns to NumPy arrays

age_array = df['age'].to_numpy()

cholesterol_array = df['cholesterol'].to_numpy()
# Step 8: Calculate basic statistics

mean_age = np.mean(age_array)

median_cholesterol = np.median(cholesterol_array)

print(f"Mean Age: {mean_age}, Median Cholesterol: {median_cholesterol}")

# Step 9: Define features and target variable

X = df.drop(columns=['target']) # Assuming 'target' is the column to predict

y = df['target']

# Step 10: Split the dataset into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set size: {X_train.shape[0]}, Testing set size: {X_test.shape[0]}")

# Step 11: Initialize and train the model

model = LogisticRegression(max_iter=200)

model.fit(X_train, y_train)

# Step 12: Make predictions on the test set

y_pred = model.predict(X_test)

# Step 13: Evaluate the model's performance

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of the model: {accuracy:.2f}")

# Step 14: Save the report to a text file

with open("heart_disease_analysis_report.txt", "w") as file:

file.write("Heart Disease Analysis Report\n")

file.write("Objective: Analyze the dataset to predict heart disease.\n")


file.write("Data Loading and Cleaning: Loaded and cleaned the dataset, finding no missing
values.\n")

file.write("Statistical Analysis: Mean Age: {}, Median Cholesterol: {}.\n".format(mean_age,


median_cholesterol))

file.write("Model Accuracy: {}.\n".format(accuracy))

Report Summary

Objective: The goal was to analyze the Heart Disease UCI dataset to predict heart disease using
machine learning techniques.
Data Loading and Cleaning: The dataset was loaded using Pandas. No missing values were found,
ensuring a clean dataset for analysis.

String Manipulation: Though the dataset primarily contains numerical data, string manipulation
techniques were demonstrated. In datasets with categorical string data, operations such as
lowercasing and stripping spaces are crucial for uniformity.

Statistical Analysis: Basic statistics were computed using NumPy, revealing a mean age of
approximately X and a median cholesterol level of Y.

Data Splitting: The dataset was split into training (80%) and testing (20%) sets to validate the model's
performance.

Model Building: A Logistic Regression model was chosen for binary classification. The model was
trained on the training set and achieved an accuracy of Z on the test set, indicating a good predictive
capability.

Conclusion: This analysis demonstrated effective data manipulation, cleaning, and the successful
application of machine learning to predict heart disease. Future work could involve exploring other
algorithms and tuning model parameters for improved accuracy.

You might also like