Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Internship Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi-590018, Karnataka

Internship Report on
“Sleep Efficiency”
Submitted in partial fulfillment of the requirements for the award of the
degree of Bachelor of Engineering
in
Computer Science & Engineering

Submitted by
1BI19CS071 Javeeria Muskan F

Internship carried out


At
Prinston Smart Engineers

Under the Guidance of


Ms. Farheen

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


BANGALORE INSTITUTE OF TECHNOLOGY
K.R. Road, V.V. Pura, Bengaluru-560 004

2022-23
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“Jnana Sangama”, Belagavi-590018, Karnataka

BANGALORE INSTITUTE OF TECHNOLOGY


Bengaluru-560 004

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Certificate
This is to certify that the internship project entitled “Sleep Efficiency” carried out
by

USN Name

1BI19CS071 Javeeria Muskan F

are bonafide students of VII semester B.E. for the partial fulfillment of the requirements
for the Bachelor's Degree in Computer Science & Engineering of the VISVESVARAYA
TECHNOLOGICAL UNIVERSITY during the academic year 2022-23.

Dr. M. S. Bhargavi Dr. J. Girija Dr. M. U. Aswath


Associate Professor Professor and Head Principal, BIT

Department of CSE, BIT Department of CSE, BIT

External Viva
Name of the examiners, signature with date

1.

2.
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompanies the successful completion of any
task would be incomplete without complementing those who made it possible and whose
guidance and encouragement made my efforts successful. So, my sincere thanks to all
those who have supported me in completing this technical Seminar successfully.

My sincere thanks to Dr. M. U. Aswath, Principal, BIT and Dr. Girija J., HOD,
Department of CS&E, BIT for their encouragement, support and guidance to the student
community in all fields of education. I am grateful to our institution for providing us
with a congenial atmosphere to carry out the Technical Seminar successfully.

I would not forget to remember Dr. Maya B. S, Assistant Professor and


Internship Coordinator, for her encouragement and more over for her timely support and
guidance till the completion of the Technical Seminar Coordinator.

I avail this opportunity to express my profound sense of deep gratitude to my


esteemed guide Dr. M. S. Bhargavi., Associate Professor, Department of CS&E, BIT,
for her moral support, encouragement and valuable suggestions throughout the
Technical Seminar Coordinator.

I extend my sincere thanks to all the department faculty members and non-
teaching staff for supporting me directly or indirectly in the completion of this Technical
Seminar.

Javeeria Muskan F
1BI19CS071
ABSTRACT
The Sleep Efficiency dataset is a collection of sleep-related data collected from
wearable fitness devices. The data includes information about sleep duration, sleep
efficiency, and other sleep-related parameters for a group of individuals. Each row in the
dataset represents a single night of sleep for an individual. The columns in the dataset
include the date of the sleep record, the sleep duration in minutes, the sleep efficiency as a
percentage, the number of times the individual woke up during the night, the time spent in
bed, and the time spent asleep.
The dataset provides a unique opportunity to explore the relationship between sleep
duration, sleep efficiency, and other sleep-related parameters. It can be used to identify
factors that may affect sleep quality and to develop interventions to improve sleep health.
Some potential applications of this dataset include analyzing the relationship between sleep
duration and sleep efficiency, investigating the impact of lifestyle factors on sleep quality,
identifying patterns in sleep behavior over time, developing predictive models to estimate
sleep efficiency based on other sleep-related parameters.
With the help of Machine Learning techniques, the knowledge can be extracted from
sleeping habits of various people. Suitable data pre-processing methods are applied along
with the features selections. Some Domain expertise is used for pre-processing as well as
for outliers that grab in the dataset. We have used various Machine Learning Algorithms
like Logistic, Random Forest.
TABLE OF CONTENTS

CHAPTER 1 - INTRODUCTION 1-3

1.1 Problem Statement 3

1.2 Objective 3

1.3 Future Scope 3

CHAPTER 2 – Requirement Specification 4

2.1 Software Requirements 4

2.2 Hardware Requirements 4

CHAPTER 3 – System Definition 5-11

3.1 Project Description 5

3.2 Libraries Used 10

3.3 Technology Used 10

3.4 Dataset 10

3.5 Advantages 11

3.6 Disadvantages 11

CHAPTER 4 – Implementation 12-16

CHAPTER 5 – Snapshots 17-22

CHAPTER 6 - Declaration 23

CHAPTER 7 - Conclusion/Future Enhancement 24

CHAPTER 8 - References 25
1. INTRODUCTION

Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming,
coined theterm “Machine Learning”. He defined machine learning as – a “Field of study that
gives computers thecapability to learn without being explicitly programmed”. The process
starts with feeding good qualitydata and then training our machines (computers) by building
machine learning models using the data and different algorithms. The choice of algorithms
depends on what type of data do we have and whatkind of task we are trying to automate.

Figure 1.1 Difference between the traditional and machine learning

How does ML work?


● Gathering past data in any form suitable for processing. The better the quality of
data, the moresuitable it will be for modeling
● Data Processing – Sometimes, the data collected is in raw form and it needs to be
pre-processed. Example: Some tuples may have missing values for certain attributes,
and, in this case, it has to be filled with suitable values in order to perform machine
learning or any form of data mining. Missing values for numerical attributes such
as the price of the house may be replaced with the mean value of the attribute
whereas missing values for categorical attributes may be replaced with the attribute
with the highest mode. This invariably depends on the types of filters we use. If data
is in the form of text or images then converting it to numerical form will be required,
beit a list or array or matrix.
● Divide the input data into training, cross-validation, and test sets. The ratio

Department of CS&E, BIT 2022-2023 1


Sleep Efficiency 1BI19CS071

between therespective sets must be 6:2:2.


● Building models with suitable algorithms and techniques on the training set.
● Testing our conceptualized model with data that was not fed to the model at
the time oftraining and evaluating its performance using metrics such as F1 score,
precision, and recall.
• Linear Algebra
• Statistics and Probability
• Calculus
• Graph theory
• Programming Skills – Languages such as Python, R, MATLAB, C++, or
Octave.
How we split data in Machine Learning?
• Training Data: The part of data we use to train our model. This is the data that your
model actually sees (both input and output) and learns from.
• Validation Data: The part of data that is used to do a frequent evaluation of the model,
fit on the training dataset along with improving involved hyperparameters (initially set
parameters before the model begins learning). This data plays its part when the model
is actually training.
• Testing Data: Once our model is completely trained, testing data provides an unbiased
evaluation. When we feed in the inputs of Testing data, our model will predict some
values (without seeing actual output). After prediction, we evaluate our model by
comparing it with the actual output present in the testing data. This is how we evaluate
and see how much ourmodel has learned from the experiences feed in as training data,
set at the time of training.

Department of CS&E, BIT 2022-2023 2


Sleep Efficiency 1BI19CS071

1.1 Problem Statement


To build a machine learning model to know the sleep efficiency of an individual based
on the demographic and lifestyle factor. The goal is to identify the individual with poor
sleep efficiency and to develop interventions to improve sleep quality. The dataset is
modelled using the classification and regression models where we can identify the sleep
efficiency.

1.2 Objective
• Understanding the factors that impact sleep efficiency.
• Developing models to predict sleep efficiency.
• Identifying the risk factors for poor sleep quality.
• Improving sleep quality.

1.3 Future Scope


I realized that feature scaling is an essential aspect of ML models during this project.
The basicconcept is to make sure that all of the functionalities are on the same scale. We,
can expand the existingsystem with additional analysis methods such as text analysis and
implementation with other algorithms and enhanced coding.

Department of CS&E, BIT 2022-2023 3


2. REQUIREMENT SPECIFICATION

2.1 Software Requirements


• Operating system – Windows 7/8/10/11
• Google Collab Environment
• Libraries – NumPy and Pandas
• Language used is Python

2.2 Hardware Requirements


• Processor – i3 Processor
• Processor Speed – 1 GHz
• Memory – 2 GB RAM
• 1TB Hard Disk Drive
• Mouse or any other pointing device
• Keyboard
• Display device: Color Monitor

Department of CS&E, BIT 2022-2023 4


3. SYSTEM DEFINITION

3.1 Project Description


Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms. In Regression algorithms, we have predicted the output for
continuous values, but to predict the categorical values, we need Classification algorithms.
The Classification Algorithm is a Supervised Learning technique that is used to
identify the category of new observations on the basis of training data. In Classification, a
program learns from the given dataset or observations and then classifies new observation
into a number of classes or groups. Such as, Yes or No, 0 or 1, cat or dog, etc. Classes can
be called as targets/labels or categories. The main goal of the Classification algorithm is to
identify the category of a given dataset, and these algorithms are mainly used to predict the
output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below
diagram, there are two classes, class A and Class B. These classes have features that are
similar to each other and dissimilar to other classes.

Figure 3.1: Classification


Types of ML Classification Algorithms

Classification Algorithms can be further divided into the Mainly two category:
• Linear Models
1. Logistic Regression
2. Support Vector Machines
• Non-linear Models
1. K-Nearest Neighbors
2. Kernel SVM

Department of CS&E, BIT 2022-2023 5


Sleep Efficiency 1BI19CS071

3. Naïve Bayes
4. Decision Tree Classification
5. Random Forest Classification
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
How does Random Forest algorithm work?
Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data pointsto the category that wins the majority votes.
Implementation Steps are given below:
• Data Pre-processing step
• Fitting the Random Forest algorithm to the Training set
• Predicting the test result
• Test accuracy of the result (Creation of Confusion matrix) and visualizing the result
The Regression Algorithm is a type of supervised learning algorithm in machine
learning that is used to predict a continuous output variable (also known as a dependent
variable) based on one or more input variables (also known as independent variables or
features). Regression algorithms are commonly used in many different fields, including
finance, healthcare, and social sciences, to make predictions based on historical data.
Here are some popular regression algorithms used in machine learning:
• Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression

Department of CS&E, BIT 2022-2023 6


Sleep Efficiency 1BI19CS071

• Decision Tree Regression


• Random Forest Regression
These are some popular regression algorithms used in machine learning. The choice of
algorithm depends on the problem we are trying to solve, the type of data we have and the
performance we want to achieve.

Figure 3.2: Regression


Linear Regression Algorithm
Linear regression is a type of regression analysis that models the relationship between
the input variable and the output variable as a linear function. It is a linear approach to
modeling the relationship between the input and output variables. In simple linear
regression, there is only one independent variable, whereas in multiple linear regression,
there are more than one independent variables. The goal of linear regression is to fit a
straight line through the data points that best represents the relationship between the
independent variables and dependent variable.
How does Linear Regression algorithm work?
Here are the steps involved in the working of the linear regression algorithm:
Step 1: Data Collection- Collect the data for the input variables and output variable from
various sources, such as sensors, surveys, databases, or simulations.
Step 2: Data Preprocessing- Preprocess the data by cleaning, transforming, and scaling it
to make it suitable for the linear regression algorithm.
Step 3: Model Selection- Choose the type of linear regression model, such as simple linear
regression or multiple linear regression, based on the number of input variables and the
nature of the relationship between the input variables and the output variable.
Step 4: Training the Model- Train the linear regression model on the training dataset by
fitting the best-fit line through the data points using a method such as the Ordinary Least
Squares (OLS) method.

Department of CS&E, BIT 2022-2023 7


Sleep Efficiency 1BI19CS071

Step 5: Model Evaluation- Evaluate the performance of the linear regression model on the
validation dataset by calculating the accuracy metrics such as the Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), or R-squared.
Step 6: Prediction- Use the trained linear regression model to make predictions on new
input data and validate the accuracy of the predictions.
The linear regression algorithm is widely used in various fields such as finance, marketing,
engineering, and social sciences for predicting the values of the output variable based on
the input variables. It is a simple and interpretable model that provides insights into the
relationship between the input and output variables.
Data visualization is an essential part of machine learning as it helps to understand
the data, identify patterns, and communicate insights effectively. Here are some ways in
which data visualization is used in machine learning:
• Exploratory Data Analysis (EDA): Data visualization techniques such as
histograms, scatter plots, and box plots are used to explore the data, identify outliers,
and understand the distribution of the input and output variables.
• Feature Selection: Data visualization can help to identify the most important input
variables for the model by visualizing the relationship between the input variables
and the output variable. For example, correlation matrices and heatmaps can be used
to visualize the correlation between the input variables.
• Model Performance Evaluation: Data visualization techniques such as ROC curves,
precision-recall curves, and confusion matrices are used to evaluate the
performance of the machine learning model and identify areas for improvement.
• Interpretability: Data visualization techniques such as decision trees and partial
dependence plots are used to interpret the machine learning model and understand
the relationship between the input and output variables.
• Reporting: Data visualization is used to communicate the insights and findings of
the machine learning model to stakeholders effectively. Visualization techniques
such as bar charts, pie charts, and line charts are used to create easy-to-understand
and visually appealing reports.

Department of CS&E, BIT 2022-2023 8


Sleep Efficiency 1BI19CS071

Working Description
Sleep Efficiency is the collection of data on the sleep efficiency measure of the various
individuals. We have extracted the dataset from Kaggle.
Sleep Efficiency
Sleep efficiency is a measure of the quality of sleep, calculated as the percentage of
time spent asleep compared to the total amount of time spent in bed. It reflects how much
of the time spent in bed is actually spent sleeping. For example, if a person spends 8 hours
in bed and sleeps for 7 hours, their sleep efficiency would be 87.5% (7/8 x 100). This
measure is typically calculated using sleep monitoring devices such as actigraphy, which
measures physical movements during sleep, or polysomnography, which records brain
waves, eye movements, and other physiological signals during sleep.
Sleep efficiency is an important metric for evaluating sleep quality because it reflects both
the duration and continuity of sleep. A low sleep efficiency score may indicate difficulty
falling or staying asleep, frequent awakenings during the night, or other sleep disturbances.
Poor sleep efficiency has been linked to a range of negative health outcomes, including
increased risk for obesity, diabetes, cardiovascular disease, and mental health problems.
Healthy adults typically have a sleep efficiency score of 85-90% or higher, while scores
below 80% may indicate a sleep disorder or other underlying health problem. Improving
sleep habits and addressing underlying medical conditions can help improve sleep
efficiency and overall sleep quality.
Context of Dataset
Dataset revolves around the sleep efficiency of individuals. Where it has various
factors on individuals such as age, gender and etc., Finally it contains the status of sleep
efficiency.
Data Preprocessing
We have to encode the categorical variables like the following into 1s and 0s
• Gender
• Smoking status
• Bedtime
• Wakeup time
Because the machine learning models require the input to be numeric.

Department of CS&E, BIT 2022-2023 9


Sleep Efficiency 1BI19CS071

Training and Testing Split


Before splitting the data for training and testing, we have to assign the response
variable and predictor variable to Y and X respectively. Now we have to split the data in an
80:20 ratio. 80% of thedata will be used for training the models and 20% of the data will be
used for testing.
3.2 Libraries Used
● NumPy (for Numerical Analysis)
● Pandas (for handling data files)
● Matplotlib (for data visualization)
● Jinja2
3.3 Technology Used
Machine learning (ML) is the study of computer algorithms that improve
automatically throughexperience and using data. It is seen as a part of artificial intelligence.
Machine learning algorithmsbuild a model based on sample data, known as "training data”,
in order to make predictions or decisions without being explicitly programmed to do so.
Machine learning algorithms are used in a wide variety of applications, such as in medicine,
email filtering, speech recognition, and computer vision, where it is difficult or unfeasible
to develop conventional algorithms to perform the needed tasks.
3.4 Dataset
For this project I have used the dataset has 215 rows and 15 columns. Snapshot of part
of the dataset is given below

Department of CS&E, BIT 2022-2023 10


Sleep Efficiency 1BI19CS071

3.5 Advantages
• Provides valuable information of sleep habits and patterns.
• Allows for the evaluation of sleep interventions.
• Can be used to identify risk factors for sleep disorders.
• Large sample sizes.
3.6 Disadvantages
• Limited information on subjective sleep experiences.
• Potential for measurement error.
• Limited generalizability.
• Limited demographic information.

Department of CS&E, BIT 2022-2023 11


4. IMPLEMENTATION
Executed on google collab
Classification model
import jinja2
import pandas as pd
dataset=pd.read_csv('/content/Sleep_Efficiency.csv')
dataset
dataset = dataset.drop('ID',axis=1)
train_data=dataset.sample(frac=0.90, random_state=123)
test_data=dataset.drop(train_data.index)
train_data.reset_index(inplace=True,drop=True)
test_data.reset_index(inplace=True,drop=True)
print('Data used to train the model has '+str(train_data.shape[0])+' rows and '+ str(train_d
ata.shape[1])+' columns')
print('Unseen data (test data) has '+str(test_data.shape[0])+' rows and '+ str(test_data.shap
e[1])+' columns')
from pycaret.classification import *
s=setup(data=train_data,target='Gender’)
model=compare_models()
forest=create_model('rf')
plot_model(forest,plot='confusion_matrix')
evaluate_model(model)
predictions=predict_model(forest,data=test_data)
predictions
save_model(forest,'Random Forest Model')
saved_forest=load_model('Random Forest Model')
saved_model_prediction=predict_model(saved_forest, data=test_data)
saved_model_prediction.head(8)

Department of CS&E, BIT 2022-2023 12


Sleep Efficiency 1BI19CS071

Random Forest Classification


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
dataset = pd.read_csv('/content/Sleep_Efficiency.csv’)
dataset = dataset.drop('ID',axis=1)
dataset['Smoking status'] = dataset['Smoking status'].map({'Yes': 1, 'No': 0})
dataset['Gender'] =dataset['Gender'].map({'Male':1,'Female':0})
dataset['Bedtime'] = pd.to_datetime(dataset['Bedtime'], format='%d-%m-%Y %H:%M’)
dataset['Bedtime'] = dataset['Bedtime'].apply(lambda x: int(x.timestamp()))
dataset['Wakeup time'] = pd.to_datetime(dataset['Wakeup time'], format='%d-%m-
%Y %H:%M’)
dataset['Wakeup time'] = dataset['Wakeup time'].apply(lambda x: int(x.timestamp()))
dataset['Awakenings'].fillna(dataset['Awakenings'].min(), inplace=True)dataset['Caffeine
consumption'].fillna(dataset['Caffeine consumption'].mean(), inplace=True)
dataset['Exercise frequency'].fillna(dataset['Exercise frequency'].mean(), inplace=True)
X = dataset.drop([ 'Gender'], axis=1)
y = dataset['Gender’]
dataset = dataset .drop('Light sleep percentage',axis=1)
dataset = dataset.drop(['Smoking status','Caffeine consumption','Age','Bedtime','Wakeup t
ime','Sleep duration','Exercise frequency'],axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
print('Accuracy Score:', accuracy_score(y_test, y_pred))
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))
feat_importances = pd.Series(rfc.feature_importances_, index=X.columns)
feat_importances.plot(kind='bar’)
plt.title('Feature Importances’)
plt.show()

Department of CS&E, BIT 2022-2023 13


Sleep Efficiency 1BI19CS071

Regression Model
import jinja2
import pandas as pd
dataset=pd.read_csv('/content/Sleep_Efficiency.csv')
dataset
dataset = dataset.drop('ID',axis=1)
train_data=dataset.sample(frac=0.90,random_state=123)
test_data=dataset.drop(train_data.index)
train_data.reset_index(inplace=True,drop=True)
test_data.reset_index(inplace=True,drop=True)
print('Data used to train the model has '+str(train_data.shape[0])+' rows and '+ str(train_d
ata.shape[1])+' columns')
print('Unseen data (test data) has '+str(test_data.shape[0])+' rows and '+ str(test_data.shap
e[1])+' columns’)
from pycaret.regression import *
s=setup(data=train_data,target='Sleep efficiency’)
best_model=compare_models()
model=create_model('lr')
evaluate_model(model)
predict_model(model,data=test_data)
save_model(model,'linear')
modell=load_model('linear')
Linear Regression
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
df = pd.read_csv('/content/Sleep_Efficiency.csv')
df = df.drop('ID',axis=1)

Department of CS&E, BIT 2022-2023 14


Sleep Efficiency 1BI19CS071

df['Smoking status'] = df['Smoking status'].map({'Yes': 1, 'No': 0})


df['Gender'] =df['Gender'].map({'Male':1,'Female':0})
df['Bedtime'] = pd.to_datetime(df['Bedtime'], format='%d-%m-%Y %H:%M')
df['Bedtime'] = df['Bedtime'].apply(lambda x: int(x.timestamp()))
df['Wakeup time'] = pd.to_datetime(df['Wakeup time'], format='%d-%m-%Y %H:%M')
df['Wakeup time'] = df['Wakeup time'].apply(lambda x: int(x.timestamp()))
df['Awakenings'].fillna(df['Awakenings'].min(), inplace=True)
df['Caffeine consumption'].fillna(df['Caffeine consumption'].mean(), inplace=True)
df['Alcohol consumption'].fillna(df['Alcohol consumption'].mean(), inplace=True)
df['Exercise frequency'].fillna(df['Exercise frequency'].mean(), inplace=True)
X = df.drop('Sleep efficiency',axis=1)
y = df['Sleep efficiency']
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print("The shape of the dataframe before removing the outliers is " + str(df.shape))
df_1 = df[~((df < (Q1-1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
print("The shape of the dataframe after removing the outliers is " + str(df_1.shape))
corrmat = df_1.corr()
plt.figure(figsize = (10,10))
sns.heatmap(corrmat, annot = True,cmap='coolwarm')
df_1 = df_1.drop('Light sleep percentage',axis=1)
df_1 = df_1.drop(['Smoking status','Caffeine consumption','Age','Gender','Bedtime','Wake
up time','Sleep duration','Exercise frequency'],axis=1)
scaler_linear = StandardScaler()
X = scaler_linear.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.25, random_state=50)
plt.figure(figsize=(8,8))
plt.scatter(x= y_train,y = yhat, alpha = 0.3)
plt.ylabel('Predicted efficiency')
plt.xlabel('Experimental efficiency')
plt.show()

Department of CS&E, BIT 2022-2023 15


Sleep Efficiency 1BI19CS071

Data Visualization (Histogram)


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset=pd.read_csv('/content/Sleep_Efficiency.csv’)
dataset['Smoking status'] = dataset['Smoking status'].map({'Yes': 1, 'No': 0})
dataset['Gender'] =dataset['Gender'].map({'Male':1,'Female':0})
dataset['Bedtime'] = pd.to_datetime(dataset['Bedtime'], format='%d-%m-%Y %H:%M')
dataset['Bedtime'] = dataset['Bedtime'].apply(lambda x: int(x.timestamp()))
dataset['Wakeup time'] = pd.to_datetime(dataset['Wakeup time'], format='%d-%m-
%Y %H:%M')
dataset['Wakeup time'] = dataset['Wakeup time'].apply(lambda x: int(x.timestamp()))
#Splitting Feature and Target variables first
X = dataset.drop('Sleep efficiency',axis=1)
y = dataset['Sleep efficiency’]
#Visualizing in histogram
X.hist(figsize=(10,10))
plt.show()

Department of CS&E, BIT 2022-2023 16


5. SNAPSHOTS
Classification Model

Department of CS&E, BIT 2022-2023 17


Sleep Efficiency 1BI19CS071

Department of CS&E, BIT 2022-2023 18


Sleep Efficiency 1BI19CS071

Random Forest classification

Regression Model

Department of CS&E, BIT 2022-2023 19


Sleep Efficiency 1BI19CS071

Department of CS&E, BIT 2022-2023 20


Sleep Efficiency 1BI19CS071

Linear Regression

Finding co-relations

Plotting experimental and predicted values for training data

Plotting experimental and predicted values for testing data

Department of CS&E, BIT 2022-2023 21


Sleep Efficiency 1BI19CS071

Data Visualization (Histogram)

Department of CS&E, BIT 2022-2023 22


6. DECLARATION

I, Javeeria Muskan F a student of 8th semester BE, Computer Science and Engineering
department, Bangalore Institute of Technology , Bengaluru hereby declare that internship
project work entitled "SLEEP EFFICIENCY ANALYSIS" has been carried out by me at
Prinston Smart Engineers , Bengaluru and submitted in partial fulfilment of the course
requirement for the award of the degree of Bachelor of Engineering in Computer Science
and Engineering of Visvesvaraya Technological University, Belagavi, during the academic
year 2022-2023.
I also declare that, to the best of my knowledge and belief, the work reported here is
not from the part of dissertation on the basis of which a degree or award was conferred on
an earlier occasion on this by any other student.

Place: Bangalore

Javeeria Muskan F
1BI19CS071

Department of CS&E, BIT 2022-2023 23


7. CONCLUSION/FUTURE ENHANCEMENT
• It includes data on sleep duration, onset latency, and sleep efficiency for each
participant, as well as demographic information such as age and gender.
• The analysis of the dataset suggests that sleep efficiency is correlated with age, with
older individuals tending to have lower sleep efficiency. Additionally, there are
slight differences in sleep efficiency between males and females.
• Overall, this dataset provides valuable insights into sleep patterns and efficiency,
and could be useful for researchers and healthcare professionals interested in
understanding sleep-related issues.
• In the future the efficiency can be measured considering more samples, more
demographic information, incorporating data on sleep disorders and include
additional variables.

Department of CS&E, BIT 2022-2023 24


8. REFERENCES
• https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency
• https://www.researchgate.net/profile/Maryam-
Ravan/publication/337162005_A_machine_learning_approach_using_EEG_signa
ls_to_measure_sleep_quality/links/5dcad606299bf1a47b32f21d/A-machine-
learning-approach-using-EEG-signals-to-measure-sleep-quality.pdf
• https://mhealth.jmir.org/2016/4/e125/
• https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8648428
• Wikipedia

Department of CS&E, BIT 2022-2023 25

You might also like