Internship Report
Internship Report
Internship Report
Internship Report on
“Sleep Efficiency”
Submitted in partial fulfillment of the requirements for the award of the
degree of Bachelor of Engineering
in
Computer Science & Engineering
Submitted by
1BI19CS071 Javeeria Muskan F
2022-23
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“Jnana Sangama”, Belagavi-590018, Karnataka
Certificate
This is to certify that the internship project entitled “Sleep Efficiency” carried out
by
USN Name
are bonafide students of VII semester B.E. for the partial fulfillment of the requirements
for the Bachelor's Degree in Computer Science & Engineering of the VISVESVARAYA
TECHNOLOGICAL UNIVERSITY during the academic year 2022-23.
External Viva
Name of the examiners, signature with date
1.
2.
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompanies the successful completion of any
task would be incomplete without complementing those who made it possible and whose
guidance and encouragement made my efforts successful. So, my sincere thanks to all
those who have supported me in completing this technical Seminar successfully.
My sincere thanks to Dr. M. U. Aswath, Principal, BIT and Dr. Girija J., HOD,
Department of CS&E, BIT for their encouragement, support and guidance to the student
community in all fields of education. I am grateful to our institution for providing us
with a congenial atmosphere to carry out the Technical Seminar successfully.
I extend my sincere thanks to all the department faculty members and non-
teaching staff for supporting me directly or indirectly in the completion of this Technical
Seminar.
Javeeria Muskan F
1BI19CS071
ABSTRACT
The Sleep Efficiency dataset is a collection of sleep-related data collected from
wearable fitness devices. The data includes information about sleep duration, sleep
efficiency, and other sleep-related parameters for a group of individuals. Each row in the
dataset represents a single night of sleep for an individual. The columns in the dataset
include the date of the sleep record, the sleep duration in minutes, the sleep efficiency as a
percentage, the number of times the individual woke up during the night, the time spent in
bed, and the time spent asleep.
The dataset provides a unique opportunity to explore the relationship between sleep
duration, sleep efficiency, and other sleep-related parameters. It can be used to identify
factors that may affect sleep quality and to develop interventions to improve sleep health.
Some potential applications of this dataset include analyzing the relationship between sleep
duration and sleep efficiency, investigating the impact of lifestyle factors on sleep quality,
identifying patterns in sleep behavior over time, developing predictive models to estimate
sleep efficiency based on other sleep-related parameters.
With the help of Machine Learning techniques, the knowledge can be extracted from
sleeping habits of various people. Suitable data pre-processing methods are applied along
with the features selections. Some Domain expertise is used for pre-processing as well as
for outliers that grab in the dataset. We have used various Machine Learning Algorithms
like Logistic, Random Forest.
TABLE OF CONTENTS
1.2 Objective 3
3.4 Dataset 10
3.5 Advantages 11
3.6 Disadvantages 11
CHAPTER 6 - Declaration 23
CHAPTER 8 - References 25
1. INTRODUCTION
Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming,
coined theterm “Machine Learning”. He defined machine learning as – a “Field of study that
gives computers thecapability to learn without being explicitly programmed”. The process
starts with feeding good qualitydata and then training our machines (computers) by building
machine learning models using the data and different algorithms. The choice of algorithms
depends on what type of data do we have and whatkind of task we are trying to automate.
1.2 Objective
• Understanding the factors that impact sleep efficiency.
• Developing models to predict sleep efficiency.
• Identifying the risk factors for poor sleep quality.
• Improving sleep quality.
Classification Algorithms can be further divided into the Mainly two category:
• Linear Models
1. Logistic Regression
2. Support Vector Machines
• Non-linear Models
1. K-Nearest Neighbors
2. Kernel SVM
3. Naïve Bayes
4. Decision Tree Classification
5. Random Forest Classification
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
How does Random Forest algorithm work?
Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data pointsto the category that wins the majority votes.
Implementation Steps are given below:
• Data Pre-processing step
• Fitting the Random Forest algorithm to the Training set
• Predicting the test result
• Test accuracy of the result (Creation of Confusion matrix) and visualizing the result
The Regression Algorithm is a type of supervised learning algorithm in machine
learning that is used to predict a continuous output variable (also known as a dependent
variable) based on one or more input variables (also known as independent variables or
features). Regression algorithms are commonly used in many different fields, including
finance, healthcare, and social sciences, to make predictions based on historical data.
Here are some popular regression algorithms used in machine learning:
• Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression
Step 5: Model Evaluation- Evaluate the performance of the linear regression model on the
validation dataset by calculating the accuracy metrics such as the Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), or R-squared.
Step 6: Prediction- Use the trained linear regression model to make predictions on new
input data and validate the accuracy of the predictions.
The linear regression algorithm is widely used in various fields such as finance, marketing,
engineering, and social sciences for predicting the values of the output variable based on
the input variables. It is a simple and interpretable model that provides insights into the
relationship between the input and output variables.
Data visualization is an essential part of machine learning as it helps to understand
the data, identify patterns, and communicate insights effectively. Here are some ways in
which data visualization is used in machine learning:
• Exploratory Data Analysis (EDA): Data visualization techniques such as
histograms, scatter plots, and box plots are used to explore the data, identify outliers,
and understand the distribution of the input and output variables.
• Feature Selection: Data visualization can help to identify the most important input
variables for the model by visualizing the relationship between the input variables
and the output variable. For example, correlation matrices and heatmaps can be used
to visualize the correlation between the input variables.
• Model Performance Evaluation: Data visualization techniques such as ROC curves,
precision-recall curves, and confusion matrices are used to evaluate the
performance of the machine learning model and identify areas for improvement.
• Interpretability: Data visualization techniques such as decision trees and partial
dependence plots are used to interpret the machine learning model and understand
the relationship between the input and output variables.
• Reporting: Data visualization is used to communicate the insights and findings of
the machine learning model to stakeholders effectively. Visualization techniques
such as bar charts, pie charts, and line charts are used to create easy-to-understand
and visually appealing reports.
Working Description
Sleep Efficiency is the collection of data on the sleep efficiency measure of the various
individuals. We have extracted the dataset from Kaggle.
Sleep Efficiency
Sleep efficiency is a measure of the quality of sleep, calculated as the percentage of
time spent asleep compared to the total amount of time spent in bed. It reflects how much
of the time spent in bed is actually spent sleeping. For example, if a person spends 8 hours
in bed and sleeps for 7 hours, their sleep efficiency would be 87.5% (7/8 x 100). This
measure is typically calculated using sleep monitoring devices such as actigraphy, which
measures physical movements during sleep, or polysomnography, which records brain
waves, eye movements, and other physiological signals during sleep.
Sleep efficiency is an important metric for evaluating sleep quality because it reflects both
the duration and continuity of sleep. A low sleep efficiency score may indicate difficulty
falling or staying asleep, frequent awakenings during the night, or other sleep disturbances.
Poor sleep efficiency has been linked to a range of negative health outcomes, including
increased risk for obesity, diabetes, cardiovascular disease, and mental health problems.
Healthy adults typically have a sleep efficiency score of 85-90% or higher, while scores
below 80% may indicate a sleep disorder or other underlying health problem. Improving
sleep habits and addressing underlying medical conditions can help improve sleep
efficiency and overall sleep quality.
Context of Dataset
Dataset revolves around the sleep efficiency of individuals. Where it has various
factors on individuals such as age, gender and etc., Finally it contains the status of sleep
efficiency.
Data Preprocessing
We have to encode the categorical variables like the following into 1s and 0s
• Gender
• Smoking status
• Bedtime
• Wakeup time
Because the machine learning models require the input to be numeric.
3.5 Advantages
• Provides valuable information of sleep habits and patterns.
• Allows for the evaluation of sleep interventions.
• Can be used to identify risk factors for sleep disorders.
• Large sample sizes.
3.6 Disadvantages
• Limited information on subjective sleep experiences.
• Potential for measurement error.
• Limited generalizability.
• Limited demographic information.
Regression Model
import jinja2
import pandas as pd
dataset=pd.read_csv('/content/Sleep_Efficiency.csv')
dataset
dataset = dataset.drop('ID',axis=1)
train_data=dataset.sample(frac=0.90,random_state=123)
test_data=dataset.drop(train_data.index)
train_data.reset_index(inplace=True,drop=True)
test_data.reset_index(inplace=True,drop=True)
print('Data used to train the model has '+str(train_data.shape[0])+' rows and '+ str(train_d
ata.shape[1])+' columns')
print('Unseen data (test data) has '+str(test_data.shape[0])+' rows and '+ str(test_data.shap
e[1])+' columns’)
from pycaret.regression import *
s=setup(data=train_data,target='Sleep efficiency’)
best_model=compare_models()
model=create_model('lr')
evaluate_model(model)
predict_model(model,data=test_data)
save_model(model,'linear')
modell=load_model('linear')
Linear Regression
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
df = pd.read_csv('/content/Sleep_Efficiency.csv')
df = df.drop('ID',axis=1)
Regression Model
Linear Regression
Finding co-relations
I, Javeeria Muskan F a student of 8th semester BE, Computer Science and Engineering
department, Bangalore Institute of Technology , Bengaluru hereby declare that internship
project work entitled "SLEEP EFFICIENCY ANALYSIS" has been carried out by me at
Prinston Smart Engineers , Bengaluru and submitted in partial fulfilment of the course
requirement for the award of the degree of Bachelor of Engineering in Computer Science
and Engineering of Visvesvaraya Technological University, Belagavi, during the academic
year 2022-2023.
I also declare that, to the best of my knowledge and belief, the work reported here is
not from the part of dissertation on the basis of which a degree or award was conferred on
an earlier occasion on this by any other student.
Place: Bangalore
Javeeria Muskan F
1BI19CS071