100% found this document useful (1 vote)

383 views

The Applications of Machine Learning and Computer Vision Algorithms To Aid People With Vision Impairment.

Explores the applications of machine learning and computer vision algorithms in aiding people with vision impairment. The paper focuses on the development of a comprehensive solution involving object detection and monocular depth estimation and optical character recognition to enhance the visual perception and independent navigation of individuals with vision impairment.

Uploaded by

Siddhanth Kheria

100% found this document useful (1 vote)

383 views

The Applications of Machine Learning and Computer Vision Algorithms To Aid People With Vision Impairment.

Uploaded by

Siddhanth Kheria

You are on page 1/ 16

Extended Research Project

The Applications of Machine Learning

and Computer Vision Algorithms to Aid
People with Vision Impairment.
Siddhanth Kheria L6R2
Siddhanth Kheria ERP 2023

Introduction
Computer vision is a field of machine learning1 which has revolutionised the way machines understand and inter-
pret visual information, with its popularity soaring in recent years. However, individuals with vision impairment
face numerous challenges in their daily life and the number of blind people has risen globally from 34.4 million in
1990 to 49.1 million in 2020 (Bourne, R.R.A, June 2020) – almost a 45% increase. This highlights the urgent need
for innovative solutions to improve their accessibility and quality of life.

One of the most significant ways in which machine learning algorithms can be used to help people with vision
impairments is through their use in navigation which can provide individuals with detailed spatial awareness and
orientation. This is particularly useful in crowded or unfamiliar spaces where traditional navigation methods, such
as using a guide dog, can be limiting in terms of mobility. According to a study by Baxter and Beresford on the ef-
fectiveness of assistance dogs in public buildings, 90% of responses classified guide dogs as a “nuisance” and
caused “hygiene issues”. These limitations of guide dogs further exacerbate the difficulties that blind people en -
counter when navigating crowded or unfamiliar spaces (2016, Baxter & Beresford). Furthermore, blind people
learn alternative skills to navigate themselves. This is through echoes, texture changes and auditory cues. How-
ever, these skills take a substantial time to learn and use confidently, so computer vision technology can be utilized
to recognize and warn of hazards like obstacles - further increasing their ability to navigate safely. This paper ex-
plores methods to bridge the gap between visual information and non-visual perception, offering an encouraging
path towards creating a more inclusive and empowering society.

Volunteering
Volunteering at a blind school, I was given the incredible opportunity to witness first-hand how hard-working staff
accomplished remarkable tasks each day. After closely observing the challenges encountered by visually impaired
individuals, I carefully evaluated current technology solutions deployed and noted areas for potential improve -
ment. Current assistive technologies are more geared towards educational aspects such as Braille readers and text-
to-speech conversion, however; they fail to tackle the basic problem of navigating around a classroom independ -
ently. I plan on adapting my program accordingly to meet these challenges by including object detection al-
gorithms, robust depth estimation techniques and efficient optical character recognition methods. My goal is to de-
velop a comprehensive solution that allows the visually impaired to be more independent by integrating these com-
ponents.

The Program
A proposed solution to address these challenges involves developing wearable ‘smart glasses’, similar to what
Google unveiled in 2014. However, these smart glasses will be tailored specifically to help individuals with vision
impairment. Ideally, this combination of algorithms should work collaboratively to enhance the wearer's visual
perception, helping them identify objects, understand their surroundings, estimate distances, and recognize written
text all in one program, however; due to limitations in computing power, simultaneous execution of these machine
learning models will not be feasible. The program will be coded in Python due to its vast community and 3 rd party
packages such as OpenCV which is used for image processing. This paper presents each algorithm separately, ana-
lysing its functionalities and performance.

Object Detection

1
Machine Learning is the field of computing related to giving "computers the ability to learn without being explicitly programmed." -
Samuel Arthur (1959),
Page 2 of 16
Siddhanth Kheria ERP 2023

Object detection works by receiving an input in the form of an image from a camera. Next, it analyses the input us-
ing a neural network2 to extract relevant features such as shapes, textures and colours that help identify objects.
Then the algorithm identifies the potential regions of interest (ROI). These regions then undergo a classification al -
gorithm to determine whether the region contains an object or not and goes through all its pretrained images and
labels and assigns a confidence score3 to each region. If this confidence score is greater than a certain probability, a
bounding box is formed around the region and post-processing takes place4 (Ali, S - 2023).

Figure 1 shows the basic object detection model. However, this model just detects whether a frame contains an ob-
ject or not giving the binary output: yes/no. This will be taken this further by attaching the correct label to the de -
tected object.

Figure 1 – Sharma, K.U. and Thakur, N.V. (2017) A review and an approach for object detection in images.

Object Detection Algorithms

The object detection model does not use a standard convolutional neural network (CNN) followed by a fully con-
nected layer because the length of the output layer is not constant (Kundu, R 2023). There can be multiple in-
stances of the same object in the same frame, e.g., multiple people walking in front of the impaired person. A
method to address this challenge is to use a network which involves analysing multiple ROIs within the frame to
detect multiple instances. However, this approach has one major limitation: significant computational complexit-
ies. It would require many regions to be selected, leading to a potential overload in computational power (Gandhi,
R - 2018). As a result, the model made use of 3rd party algorithms such as You Only Look Once (YOLO), R-
CNN, and SSD300 which have been developed with high accuracy and performance.

2
A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human
brain. (Amazon AWS)
3
Value between 0 and 1 which is the probability that a region contains an object or not.
4
Post-processing ensures that the output is accurate and redundant data is removed.
Page 3 of 16
Siddhanth Kheria ERP 2023

Figure 2 - Graph showing speed vs accuracy of different object detection algorithms available at https://cv-tricks.com/object-detec-
tion/faster-r-cnn-yolo-ssd

The YOLO model was implemented due to the faster processing speed whilst maintaining high accuracy. Al-
though Faster RCNN has the highest accuracy, its speed is considerably slower. In our specific application, speed
was prioritised as any loss in frames can cause life-threatening consequences for the impaired user.

The YOLO algorithm takes an image as an input and then uses a convolutional neural network to detect objects in
the frame. The architecture of the deep neural network is shown in Figure 3 (Kundu, R 2023).

Figure 3 - YOLO Architecture (https://arxiv.org/pdf/1506.02640.pdf)

The first 20 layers of the model are trained using ImageNet [1] which includes a temporary average pooling and a
fully connected layer (Kundu, R 2023). This pre-trained model is then adapted for object detection by adding con-
volutional and connected layers which improves accuracy and performance. Finally, the last fully connected layer
is responsible for predicting both class probabilities and bounding box coordinates (Redmon, J et al. - 2016).

YOLO divides the input image into a grid. If the centre of an object falls within a grid cell, that cell becomes re -
sponsible for detecting the object. Each grid cell predicts bounding boxes and confidence scores (Keita, Z – Sep
2022).

Another benefit of YOLO is that it predicts multiple bounding boxes. A crucial technique which YOLO uses to
handle these multiple bounding boxes is non-maximum suppression (NMS) which serves as a post-processing step
to enhance the accuracy of object detection (Kundu, R 2023). Bounding boxes may overlap or be located at differ-

Page 4 of 16
Siddhanth Kheria ERP 2023

ent positions, yet they all represent the same object, so NMS is utilized to identify and eliminate redundant bound-
ing boxes. (Redmon, J, 2016).

Here is a walkthrough of this entire process:

The first step divides the original image into equal grid cells. Each cell is used to detect objects as well as assign a
confidence score.

Figure 4 – Grid cell layout of an image (Author’s own – 2023)

The next step is to determine the bounding boxes. At this step the algorithm accounts for the confidence score of
each cell. In figure 5 the yellow cells show that the probability that the cell contains an object is greater than 0.

Figure 5 Grid cell layout with probability greater than 0 (Author’s own - 2023)

YOLO draws the bounding boxes around the objects detected in the vector form of Y = [PC, BX, BY, BH, BW,
C1]. PC correlates with the probability score that the grid cells contain an object or not. BX and BY are the x and
y-coordinates of the centre of the bounding box. BH and BW are the heights and widths of the box and finally C1
corresponds to the class of the object, which in our case is ‘bin’. (Keita, Z – Sep 2022).

Page 5 of 16
Siddhanth Kheria ERP 2023

Figure 6 – Grid boxes and bounding boxes (Author’s own - 2023)

A single object can have multiple grid box candidates but not all of them are significant. This is where intersection
over union (IOU) comes in. The objective of IOU is to filter out irrelevant grid boxes. YOLO calculates the IOU
area of interesection
for each grid cell with the equation: (Mohamed, N.A - 2020). Finally, it disregards the
area of ∪¿ ¿
predictions which has an IOU score less than the defined threshold.

Figure 7 – Intersection between sets A and B (Mohamed, N.A - 2020)

Grid box 2

Grid box 1

Figure 8 –Intersection of two grid cells (Author’s Own - 2023)

Figure 8 shows the intersection of two grid cells (marked in the yellow). The IOU for Grid Box 1 is smaller than
the threshold value which I have defined at 0.7 therefore it is disregarded, and we are only left with the object in
Grid Box 2.

The final step is NMS. We can use NMS to keep only the boxes which have the highest confidence score; it gets
rid of all the potentially incorrectly detected objects in the frame (Keita, Z – Sep 2022).

Labelled Data

Training a robust object detection algorithm requires a large amount of accurately labelled data and building such a
dataset is time consuming and labour intensive. To address this challenge, I investigated the minimum number of
images required per object for effective training. I used dataset sizes including, 25, 50, 200 and 500 images, to
gauge the impact of model accuracy. 150 epochs 5 was used for every single dataset and the same model. Upon
analysing the results, I discovered that the number of images used for training had a direct correlation with the
model's accuracy. With only 25 images, the algorithm struggled to accurately detect objects and false positives and
missing detections occurred often. By increasing the dataset to 50 images, the model's accuracy improved notice-
ably. However occasional false negatives and misclassifications occurred. Interestingly the model’s accuracy did
not change much between 200 images and 500 images. The 200 images dataset's diversity and the law of diminish-

5
used to describe the number of times a learning algorithm has been iterated through a dataset (alibabacloud)
Page 6 of 16
Siddhanth Kheria ERP 2023

ing returns6 likely played a role in reaching a plateau. Furthermore, the model which used 500 images to train took
almost twice. Considering the time and computational resources required for training, this becomes an important
factor, therefore, it was concluded that the 200-image dataset was sufficient to achieve satisfactory performance.

Object Detection Demo

Due to impracticality of collecting hundreds of images for multiple objects, it was decided to focus on a single ob -
ject to demonstrate the proof of concept behind the idea for this specific application. 200 images of bins were
downloaded from the internet and bounding boxes around the bins were manually drawn. Images in various set -
tings were used to ensure that the model is accurate. The data was split into a 90:10 ratio, allowing the model to be
trained on majority of the data while reserving a separate portion to evaluate the accuracy and performance of the
model. Finally, this model was tested in a real-world scenario by feeding in a custom image containing a bin into
the program:

Figure 9 - object detection output (Author’s own - 2023)

Figure 8 shows that the bins were correctly detected by the program. This program can be taken further by adding
more objects or even use a 3rd party library such as COCO which is a large dataset which contains over 330,000
images each annotated with 80 object classes. However, when this was implemented with YOLO the performance
dropped significantly, as a result, creating a custom dataset which is contains the most common objects a visually
impaired person encounters will be more computationally beneficial.

6
The law of diminishing marginal returns states that there comes a point when an additional factor of production results in a lessening
of output or impact. (Investopedia)
Page 7 of 16
Siddhanth Kheria ERP 2023

Monocular Depth Estimation

The 2nd part of the program is depth estimation. With an accurate depth estimation model, users can gain a better
understanding of the distance between themselves and potential hazards. This information is key for them to navig-
ate safely and confidently, avoiding any obstacles.

The first step of the algorithm, like most computer vision programs, is to get an input through an image in the form
of RGB (Red, Green, Blue) values which represents colours. The next step involves feature extraction which in-
cludes extracting valuable bits of information such as textures, shadows, and edges. The algorithm is then trained
on a large dataset which already has RGB images along with their corresponding heat maps. The depth maps are
the ground truth values; they provide the accurate depth information on the image. In the training process of the al-
gorithm, the model optimizes its parameters to minimize the difference between the predicted depth and the
ground truth depth (Tan, D - 2020). Once the model has undergone training, an unseen image is ready to be input-
ting. The model applies the learned mapping to the features extracted from the input image and produces an estim-
ated depth map.

Sensors vs Machine Learning

Lidar sensors is a device which generates precise spatial information and the distance to a given target. It works by
sending a laser pulse and recording the time it takes for the pulse to be reflected. They are often classed as the most
efficient and accurate method for depth estimation and many autonomous vehicles & coastal mappings make use
of this technology; however, these powerful sensors come with a significant caveat: they cost upwards of $1000 7
(Carter, J - 2012). On the other hand, a machine learning based approach is far more cost effective. It can be ap -
plied to a wide range of scenarios and environments without the need for specific hardware. This significantly re-
duces the cost of implementing depth estimation systems, making them more accessible to a broader user base.

Dataset Collection

Instead of going through the time-consuming process of creating a data set from scratch the program made use of
the DIODE (Dense Indoor and Outdoor Depth) Dataset. It consisted of diverse high-resolution images with accur-
ate and dense depth measurements. The dataset has an 81GB training set and a separate 2.6GB validation dataset
(Vasiljevic, I et al - 2019). Once the data was prepared, a data pipeline was built which takes a data frame contain-
ing all the images and depth mask files. The pipeline reads and resizes the input image and processes the depth
mask files which then returns the RGB images, and the depth map images for a batch (Basu, V - 2021).

Figure 10 - Dataset Sample (Author’s own - 2023)

The Model

After building the data pipeline and visualising the samples, the model was built. The basic model is from U-Net
and additive skip-connections are implemented in the downscaling block. The U-Net architecture consists of an
encoder-decoder structure which is well-suited for tasks like depth estimation as it combines the contracting path

7
Data from https://www.neuvition.com/media/blog/lidar-price.html
Page 8 of 16
Siddhanth Kheria ERP 2023

(downscaling block) and the expanding path (upscaling block) to capture both local and global information in the
input image (Bharath, K - 2021) & (Tomar, N - 2021).

The downscaling block takes an image tensor8 as an input and performs a series of operations to reduce its spatial
dimensions while increasing the number of filters. 9 The constructor (__init__ method) initializes the necessary
layers and parameters for the block. The call method is the forward pass of the block. It applies two sets of con-
volutional layers (convA and convB) with batch normalization10 and leaky ReLU activation functions 11. It per-
forms element-wise addition (x += d) with the output of the first convolutional layer (d) to create a residual con-
nection. Finally, it applies max pooling to reduce the spatial dimensions of the output. The function returns both
the output after the residual connection (x) and the pooled output (p).

The upscaling block takes in the feature map tensor (x) from the previous layer and a skip connection tensor
(skip) from the corresponding down sampling block. The call method performs up sampling (us) on the input
tensor. The up sampled tensor is concatenated with the skip along the channel axis and is then passed through the
two sets of convolutional layers. The function returns the output tensor after the convolutional layers.

The program also makes use of a BottleNeckBlock function which takes a feature map tensor as an input and
applies two sets of convolutional layers. The main purpose of the bottleneck block is to reduce the spatial dimen-
sions of the input tensor while increasing the number of filters, allowing for a more efficient representation of the
data. This is achieved using 1x1, 3x3, and 1x1 convolutions, where the 1x1 convolutions are responsible for redu -
cing and then increasing the number of filters. The method returns the output tensor (Kakumani, A.K - 2022).

Loss Functions

After the model has been built, loss functions need to be defined. The model uses three loss functions: structural
similarity index (SSIM), L1 Loss and depth smoothness loss. Out of the three loss functions, SSIM contributes the
most to improving model performance by measuring the similarity between two images (Basu, V - 2021).

The SSIM index is calculated using three key components: luminance, contrast, and structure (Glew, D & Vrscay,
E.R).

The luminance of an image is its overall brightness. SSIM compares the luminance values of the pixels in the ref -
erence and distorted image. It is defined by the equation:

2 μ x μ y + C1
l ( x , y )=
μ2x + μ2y +C1

It computes the mean luminance values of the reference image ( μ x) and distorted image ( μ y ) and the variance of
2 2
their luminance values (σ x and σ y ).

8
A tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space (ht-
tps://en.wikipedia.org/wiki/Tensor)
9
A filter acts as a single template or pattern, which, when convolved across the input, finds similarities between the stored template &
different locations/regions in the input image (https://www.analyticsvidhya.com/blog/2022/01/convolutional-neural-network-an-over-
view.)
10
A method used to make training of artificial neural networks faster (https://en.wikipedia.org/wiki/Batch_normalization)
11
an activation function that introduces the property of non-linearity to a deep learning model and solves the vanishing gradients issue
(https://builtin.com/machine-learning/relu-activation-function)
Page 9 of 16
Siddhanth Kheria ERP 2023

The contrast component of the index measures the local standard deviations of pixel intensities within an image. It
calculates the standard deviations of the reference image (x) and distorted image (y), and it is defined by the equa -
tion:

2 σ x σ y + C2
c ( x , y )= 2 2
σ x + σ y +C2

The structure component captures the correlation between neighbouring pixels in an image. It involves calculating
the covariance between corresponding pixels in the reference and distorted images. It is defined by the equation:

σ xy +C 3
s ( x , y )=
σ x σ y +C3

σ xy is the covariance between the reference and distorted image patches.

In all the equations, regularisation constants are used ([c1 c2 c3]) to prevent instability in the image regions
where the local mean or standard deviation is near zero. To ensure the equation follows the conventions of math -
ematics, a small non-zero value is used for these constants (Mathworks - 2021).

The final SSIM value is a product of these components:

SSIM ( x , y )=[l( x , y )] · [c ( x , y )] ·[s ( x , y )]

⍺ β Ɣ

The exponent values are less than 1 to account for non-linearities. The final SSIM index ranges from -1 to 1, where
a value of 1 suggests a perfect match between the two images (MathWorks - 2021).

Tracking loss function metrics is important in machine learning as it provides information of how well the model
is performing and what needs to be done to optimise it further. When the loss is minimized on the training set, it
encourages the model to capture meaningful patterns in the data, rather than memorizing the training examples
(Saravanan, P - 2021).

Model Training

After defining the loss functions, the final step was training the model. The program used the following hyperpara-
meters: LR = 0.0002, EPOCHS = 30 and BATCH-SIZE = 32 (Basu, V - 2021).

The learning rate (LR) The learning rate determines the step size at which the model's parameters are updated dur -
ing training. It affects the speed of convergence (how quickly the model reaches its limit in accuracy) and the sta-
bility of the optimization process (Careerera - 2022). A larger learning rate may lead to faster convergence but can
also cause overshooting (Google Machine Learning Crash Course - 2022).

An epoch refers to one complete pass through the entire training dataset during training. Increasing the number of
epochs can potentially improve the model's performance, but too many epochs can lead to overfitting.

The batch size determines the number of samples processed before the model's parameters are updated (Varghese,
R - 2023). A larger batch size can result in faster training, but it may require more memory. Smaller batch sizes
can offer more stochasticity and generalize better but may lead to slower convergence (Google Machine Learning
Crash Course - 2022).

Depth Estimation Demo

Page 10 of 16
Siddhanth Kheria ERP 2023

After the model had been trained, it was tested in a real-world scenario where the same image of the bins was in-
putted into the program.

Figure 11- Input and output of the depth estimation algorithm (Author’s Own - 2023)

In Figure 11, the image on the left, it is obvious to tell that the bin on the left is further behind than the bin on the
right and this spatial relationship is accurately shown in a heat-map form in the image on the right. The bin on the
right appears brighter with a lighter shade of green which indicates its closer proximity. The heat map data can be
converted into auditory feedback and different pitch, volume, or tones can be assigned to different depth levels.
The blind person can listen to this feedback and perceive the depth based on the sound cues.

Page 11 of 16
Siddhanth Kheria ERP 2023

Optical Character Recognition

Optical character recognition (OCR) is the process of extracting text from images. It enables individuals to access
printed or written text which would otherwise be inaccessible. This technology has been in existence for a long
time, and it was one of the first computer vision machine learning challenges to be tackled. Over the years, numer -
ous open-source algorithms have been developed and refined resulting in highly powerfully and accurate OCR sys-
tems. These algorithms have undergone extensive optimization and fine-tuning to achieve impressive performance.
There are several python packages which conduct OCR, however; one notable package is PyTesseract which
is a wrapped for Google’s Tesseract OCR Engine. This package provides a lightweight, powerful engine which
works in real time (Imtiaz, H - 2020).

Figure 12 OCR Process Flow - https://medium.com/@balaajip/optical-character-recognition-99aba2dad314

Since the program relies on a 3rd party OCR python package, it utilises an application package interface (API) 12re-
quest. The process begins with an input image, which undergoes pre-processing. Then, this image is fed into an
OCR engine which has been trained on a large data set and is processed by an open-source library called Lepton-
ica. This engine generates a textual output which is then returned as a response (Figure 11) (Parthasarathy, B -
2018).

To typically recognise a character a CNN is used, however as most of the use cases includes an arbitrary sequence
of characters these are recognised by building recurrent neural networks (RNN) and long-short term memories
(LSTMs) which is popular form of RNN (Zelic, F & Sable, A - 2023).

Figure 13 – How Tesseract uses LSTM’s - https://tesseract-ocr.github.io/docs/das_tutorial2016/6ModernizationEfforts.pdf

The fundamental idea behind LSTM is to address the vanishing gradient problem. The vanishing gradient problem
refers to the issue where gradients, which carry information about how to update the network's parameters during
training, diminish exponentially as they are backpropagated through many time steps. This makes it difficult for
the network to learn long-term patterns in the data (Wang, C-F - 2019).

12
Enables two software components to communicate with each other using a set of definitions and protocols (Amazon AWS)
Page 12 of 16
Siddhanth Kheria ERP 2023

LSTM tackles the vanishing gradient problem by introducing a memory cell, which is responsible for storing and
propagating information across time steps (Hochreiter, S & Schmidhuber, J - 1997). During the training process,
the parameters are learned by minimizing a loss function through backpropagation and gradient descent. By utiliz-
ing LSTMs, OCR systems can learn to recognize and interpret the sequential patterns in characters, allowing them
to accurately transcribe text from images (Olah, C - 2015).

After pre-processing and inputting the image into PyTesseract, it correctly outputted the text in the image.

Figure 14 – Input and output of the OCR algorithm (Author’s Own)

This output can then be converted into speech which then enables blind individuals to access and consume written
information. Furthermore, OCR can be used to convert printed text into Braille 13 which will allow people with vis-
ion impairment to read the content using Braille displays which facilitates access to textual information in a format
that is already very familiar to these individuals.

13
It is a system of reading and writing a specific language without the use of sight (https://brailleworks.com/braille-resources/what-is-
braille/)
Page 13 of 16
Siddhanth Kheria ERP 2023

Conclusion
This paper outlines the challenges faced by individuals with vision impairments in navigating their environment
and proposes a promising solution using computer vision techniques. My volunteering experience at the blind
school and the analysis carried out during that time have strengthened my commitment to creating a solution. Wit-
nessing the dedication of the staff in helping children navigate the classroom and adapting existing learning mater-
ials into specialised resources, I recognized the need for innovative methods to enhance accessibility and improve
the quality of life for the blind population. Through careful evaluation of existing technological solutions, I identi-
fied key areas that required enhancement and to address these challenges, I developed a solution that makes use of
machine learning and through the use of audio-based information delivery, it opens new possibilities for blind indi-
viduals to access and engage with written content as well as get feedback on their environment through object de-
tection and depth estimation so they can safely navigate their surroundings.

However, there is a concern that visually impaired people will become too reliant on this technology and neglect
practicing important skills. A possible solution to address this issue is to activate the assistive technology program
selectively, based on the specific needs and context of the individual. For instance, blind individuals can choose to
turn off the program when they are in familiar environments such as their homes, where they have already de-
veloped a good sense of spatial awareness, by doing so, they can maintain their proficiency in other important
skills like braille reading and echolocation. This approach encourages blind individuals to continue practising and
honing their abilities, while also utilising technology as a valuable tool in unfamiliar or challenging public environ-
ments where additional support may be necessary. This program should be viewed as a tool that provides addi -
tional support, rather than a device which individuals become overly dependent on. Blind individuals need to re-
cognize it as an aid that complements their existing skills and abilities.

Machine learning and computer vision applications could greatly enhance the quality of life, foster independence,
provide equal educational and employment opportunities and more - providing even greater support to empower
and assist this community in future years.

Page 14 of 16
Siddhanth Kheria ERP 2023

Bibliography
(1) Bourne, R.R.A et al. (2020); “Global Prevalence of Blindness and Distance and Near Vision Impairment
in 2020: progress towards the Vision 2020 targets and what the future holds.” [Accessed 3/1/23]
(2) Baxter, K & Beresford B. (2016); “A Review of Methods of Evaluation and Outcome Measurement of a
Complex Intervention in Social Care: The Case of Assistance Dogs”, pp 10-11; [Accessed 3/1/23]
(3) Ali, S. (2023); “Unveiling the Power of Object Detection: Revolutionizing Visual Perception”; [Accessed
15/2/23]
(4) Kundu, R. (2023) “YOLO: Algorithm for Object Detection Explained [+Examples]”. [Accessed 6/6/23]
(5) Gandhi, R. “R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms”; [Accessed
24/5/23]
(6) Redmon, J et al. (2016) “You Only Look Once: Unified, Real-Time Object Detection”; pp 2-5; [Accessed
19/4/23]
(7) Keita, Z. (2022) “YOLO Object Detection Explained. Understand YOLO object detection, its benefits,
how it has evolved over the last couple of years and some real-life applications.” [Accessed 11/5/23]
(8) Mohamed, N.A. (2020) “Moving object detection via TV-L1 optical flow in fall-down videos”. [Accessed
11/5/23]
(9) Tan, D. (2020) “Depth Estimation: Basics and Intuition”. [Accessed 14/5/23]
(10) Carter, J et al. (2012) “Lidar 101: An Introduction to Lidar Technology, Data, and Applications”. pp 1-3
& 9-12 [Accessed 4/6/23]
(11) Vasiljevic, I et al. (2019) “DIODE: A Dense Indoor and Outdoor Depth Dataset”. [Date Accessed
29/4/23]
(12) Basu, V. (2021) “Monocular depth estimation. Implement a depth estimation model with a convnet”. [Ac-
cessed 29/4/23]
(13) Bharath, K. (2021) “U-Net Architecture for Image Segmentation”. Link can be found at https://blog.pa-
perspace.com/unet-architecture-image-segmentation/ [Accessed 30/4/23]
(14) Tomar, N. (2021) “What is UNET?”. Link can be found at https://medium.com/analytics-vidhya/what-is-
unet-157314c87634 [Accessed 30/4/23]
(15) Kakumani, A.K (2022) “BRB U-Net: Bottleneck Residual Blocks in U-Net for Light-Weight Semantic
Segmentation”. [Accessed 3/5/2023]
(16) Glew, D & Vrscay, E.R “Max and min values of the structural similarity (SSIM) function S(x, a) on the
L2 sphere SR(a), a ∈ RN” [Accessed 11/5/23]
(17) Mathworks (2021) “(SSIM) index for measuring image quality”. Link can be found at https://www.math-
works.com/help/images/ref/ssim.html [Accessed 11/5/23]
(18) Saravanan, P. (2021) “Understanding Loss Functions in Machine Learning”. [Accessed 12/5/23]
(19) Careerera. (2022) “What is convergence theory in Machine Learning | Convergence in gradient descent |
Machine Learning”. Video link can be found at https://www.youtube.com/watch?v=2QCagwYlVaI [Ac-
cessed 10/5/23]
(20) Google Machine Learning Crash Course. (2022) “Reducing Loss” [Accessed 4/4/23]
(21) Varghese, R et al. (2023) International Journal for research in applied science & engineering technology”.
Volume 11, Issue V. [Accessed 29/5/23]
(22) Imtiaz, H. (2020) “A Beginners Guide to Tesseract OCR Using Pytesseract”. [Accessed 16/5/23]
(23) Parathasarathy, B. (2018) “Build your own OCR(Optical Character Recognition) for free”. [Accessed
17/5/23]
(24) Wang, C-F. (2019) “The Vanishing Gradient Problem, The Problem, Its Causes, Its Significance, and Its
Solutions”. [Accessed 22/5/23]
(25) Hochreiter, S & Schmidhuber, J. (1997) “LONG SHORT-TERM MEMORY”. Link can be found at
http://www.bioinf.jku.at/publications/older/2604.pdf [Accessed 22/5/23]
(26) Olah, C. (2015) “Understanding LSTM Networks”. Link can be found at http://colah.github.io/posts/
2015-08-Understanding-LSTMs/?ref=nanonets.com [Accessed 24/5/23]

Page 15 of 16
Siddhanth Kheria ERP 2023

(27) Preuhs, E. (2022) “Acquisition and Reconstruction Methods for Multidimensional and Quantitative Mag-
netic Resonance Imaging”. [Accessed 4/6/23]

Page 16 of 16

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (65)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Pme 833 - Module 6 - Teaching For Creativity
No ratings yet
Pme 833 - Module 6 - Teaching For Creativity
6 pages
2022-23 Fall - American Dilemmas - Syllabus
No ratings yet
2022-23 Fall - American Dilemmas - Syllabus
10 pages
Mills V Board of Education
No ratings yet
Mills V Board of Education
3 pages
Components of An ICT System
No ratings yet
Components of An ICT System
8 pages
Professional Development Plan: EPC 4406 - 4909 - PDP Template - 2021
No ratings yet
Professional Development Plan: EPC 4406 - 4909 - PDP Template - 2021
16 pages
Action Research Report On How To Increas
No ratings yet
Action Research Report On How To Increas
22 pages
Internet Fatigue Scale For Teachers (IFST) Development and Validation
No ratings yet
Internet Fatigue Scale For Teachers (IFST) Development and Validation
7 pages
En Wikipedia Org Wiki Social Construction of Gender
No ratings yet
En Wikipedia Org Wiki Social Construction of Gender
10 pages
17 - A Deep Learning Analysis On Question Classification Task Using Word2vec Representations
No ratings yet
17 - A Deep Learning Analysis On Question Classification Task Using Word2vec Representations
20 pages
Chapter 4 - Business and Ecology
No ratings yet
Chapter 4 - Business and Ecology
13 pages
Ologunde - 2 - Effects of The Use of Yoruba Language As A Medium of Instruction On Pupils Performance in English Language and Social Studies
No ratings yet
Ologunde - 2 - Effects of The Use of Yoruba Language As A Medium of Instruction On Pupils Performance in English Language and Social Studies
12 pages
Dutton, T. A. (1987) - Design and Studio Pedagogy. Journal of Architectural Education, 41 (1), 16-25.
No ratings yet
Dutton, T. A. (1987) - Design and Studio Pedagogy. Journal of Architectural Education, 41 (1), 16-25.
11 pages
Community of Inquiry Theoretical Framework (Garrison & Akyol, 2013)
No ratings yet
Community of Inquiry Theoretical Framework (Garrison & Akyol, 2013)
19 pages
CSTP 2 Miller 4:9:23
No ratings yet
CSTP 2 Miller 4:9:23
15 pages
Feature Selection For Loan Repayment Prediction System Using Machine Learning
No ratings yet
Feature Selection For Loan Repayment Prediction System Using Machine Learning
10 pages
Education and Social Stratification
No ratings yet
Education and Social Stratification
2 pages
Students' Social Interaction, Collaborative Learning, and Perceived Learning in An Online Learning Environment
No ratings yet
Students' Social Interaction, Collaborative Learning, and Perceived Learning in An Online Learning Environment
11 pages
Main Role of Education in Our Society
No ratings yet
Main Role of Education in Our Society
30 pages
Sink or Float Science Lesson Lesson 1
No ratings yet
Sink or Float Science Lesson Lesson 1
5 pages
Digital Strategy
100% (1)
Digital Strategy
11 pages
IS3183 Management & Social Media: Ricky FM Law
No ratings yet
IS3183 Management & Social Media: Ricky FM Law
55 pages
CEM Annex E CEM Application Form
No ratings yet
CEM Annex E CEM Application Form
2 pages
Project-Based Learning Plan
No ratings yet
Project-Based Learning Plan
38 pages
Development of Conceptual Bases of The Employee Life Cycle Within An Organization
No ratings yet
Development of Conceptual Bases of The Employee Life Cycle Within An Organization
14 pages
A New Literacies Sampler
No ratings yet
A New Literacies Sampler
263 pages
Belonging Being and Becoming V2.0
No ratings yet
Belonging Being and Becoming V2.0
19 pages
Chapter Four Communication Paradigms
No ratings yet
Chapter Four Communication Paradigms
48 pages
BFN715
No ratings yet
BFN715
241 pages
Img 09886 Jajajuyyabv
No ratings yet
Img 09886 Jajajuyyabv
56 pages
Phubbed and Alone Phone Snubbing, Social Exclusion, and Attachment To Social Media
No ratings yet
Phubbed and Alone Phone Snubbing, Social Exclusion, and Attachment To Social Media
10 pages
Detailedschedule
No ratings yet
Detailedschedule
26 pages
Evidence Standard 2
No ratings yet
Evidence Standard 2
5 pages
دليل المعلم ديسكفر كي جي 1 ترم 2 3
No ratings yet
دليل المعلم ديسكفر كي جي 1 ترم 2 3
272 pages
FAO - Food 2050
No ratings yet
FAO - Food 2050
64 pages
PDP - Aisha
No ratings yet
PDP - Aisha
18 pages
Syed Zahidul Hassan Resume
No ratings yet
Syed Zahidul Hassan Resume
5 pages
TTL 1 Module 3
No ratings yet
TTL 1 Module 3
33 pages
SLM 2012 Rubric
No ratings yet
SLM 2012 Rubric
10 pages
Social Change and Education
100% (1)
Social Change and Education
4 pages
President Uhuru Kenyatta Makes New Appointments
No ratings yet
President Uhuru Kenyatta Makes New Appointments
40 pages
Jayant Singla (249) 1
No ratings yet
Jayant Singla (249) 1
4 pages
© The Institute of Chartered Accountants of India
No ratings yet
© The Institute of Chartered Accountants of India
9 pages
Assessment of Early Childhood Education Resources in Public and Private Schools in Lagos State, Nigeria.
No ratings yet
Assessment of Early Childhood Education Resources in Public and Private Schools in Lagos State, Nigeria.
13 pages
Plagiarism Detection in Library Documents Using Blockchain Technology
No ratings yet
Plagiarism Detection in Library Documents Using Blockchain Technology
3 pages
Prof Ed 1 - Act 5 (Module 15-17) - Gines C.
No ratings yet
Prof Ed 1 - Act 5 (Module 15-17) - Gines C.
5 pages
Reflections On Higher Education in Uganda by Isaac Christopher Lubogo PDF
No ratings yet
Reflections On Higher Education in Uganda by Isaac Christopher Lubogo PDF
127 pages
Socal36: Tutorial Letter 301/3/2023
No ratings yet
Socal36: Tutorial Letter 301/3/2023
17 pages
Cagills
No ratings yet
Cagills
41 pages
Pandemic Effect Qualitative
No ratings yet
Pandemic Effect Qualitative
32 pages
Models of Youth Work
No ratings yet
Models of Youth Work
34 pages
D12C 20 Technical Paper
No ratings yet
D12C 20 Technical Paper
4 pages
Action Research
No ratings yet
Action Research
76 pages
Critiques of David Kolb's Theory: Experiential Learning Articles
No ratings yet
Critiques of David Kolb's Theory: Experiential Learning Articles
16 pages
10 1 1 677 4290
No ratings yet
10 1 1 677 4290
166 pages
Global Citizenship Lesson Plan - Phlong Paulina 2023
No ratings yet
Global Citizenship Lesson Plan - Phlong Paulina 2023
5 pages
Vrutika
No ratings yet
Vrutika
18 pages
BUS 690 Strategic Management Outline
No ratings yet
BUS 690 Strategic Management Outline
5 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Savings Accounts
No ratings yet
Savings Accounts
18 pages
MODULE 2 Lesson 12345 Prof Napil Prof Gomez
No ratings yet
MODULE 2 Lesson 12345 Prof Napil Prof Gomez
38 pages
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
No ratings yet
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
4 pages
CaseStudy-1
No ratings yet
CaseStudy-1
8 pages
Muhammad's Resume.
No ratings yet
Muhammad's Resume.
1 page
Cambridge IGCSE™: Information & Communication Technology 0417/11
No ratings yet
Cambridge IGCSE™: Information & Communication Technology 0417/11
9 pages
Missed Ips
No ratings yet
Missed Ips
9 pages
NUTRIENT DOCS
No ratings yet
NUTRIENT DOCS
5 pages
Guideline OpenVPN Access Server - GoogleAuthApp PDF
No ratings yet
Guideline OpenVPN Access Server - GoogleAuthApp PDF
18 pages
Paper 2 Past Papers Topical N Yearly 2K23 Onward - Marking
No ratings yet
Paper 2 Past Papers Topical N Yearly 2K23 Onward - Marking
151 pages
UI UX Design Model For Student Complaint Handling Application Using Design Thinking Method (Case Study STMIK Rosma Karawang)
No ratings yet
UI UX Design Model For Student Complaint Handling Application Using Design Thinking Method (Case Study STMIK Rosma Karawang)
13 pages
Anpadh Neta
No ratings yet
Anpadh Neta
4 pages
Product Development Life Cycle
No ratings yet
Product Development Life Cycle
1 page
Managing IBM Lotus Domino 8.5 Servers and Users
No ratings yet
Managing IBM Lotus Domino 8.5 Servers and Users
3 pages
Answer: 1,000 Subnets + 100 Subnets 1,100 Subnets, and As Many Host Addresses As
100% (1)
Answer: 1,000 Subnets + 100 Subnets 1,100 Subnets, and As Many Host Addresses As
2 pages
Files: Ipython Writing A File
No ratings yet
Files: Ipython Writing A File
5 pages
ODIN + Remote Flashing - S21 G996B
No ratings yet
ODIN + Remote Flashing - S21 G996B
14 pages
DTA1162
No ratings yet
DTA1162
1 page
Nodejs Succinctly
No ratings yet
Nodejs Succinctly
78 pages
RSLogix 5000 - 19.01.01 (Released 10 - 2011)
No ratings yet
RSLogix 5000 - 19.01.01 (Released 10 - 2011)
4 pages
Pattern CAD 3-1
No ratings yet
Pattern CAD 3-1
4 pages
JD - Field IMO NOA Jijiga
No ratings yet
JD - Field IMO NOA Jijiga
3 pages
Reporting TTL
No ratings yet
Reporting TTL
6 pages
An Automated Testing of Smartcards in Opensc Project
No ratings yet
An Automated Testing of Smartcards in Opensc Project
65 pages
Orca Share Media1547443144986
100% (1)
Orca Share Media1547443144986
5 pages
(VTGCS-Notice-201609090101D-EnG) Application of Anypano-X Sensor & AnyCeph - X Sensor in I-Series Product
No ratings yet
(VTGCS-Notice-201609090101D-EnG) Application of Anypano-X Sensor & AnyCeph - X Sensor in I-Series Product
2 pages
Technopreneurship Ideas
No ratings yet
Technopreneurship Ideas
17 pages
Apresentação - Carreta Missionária
No ratings yet
Apresentação - Carreta Missionária
20 pages
Introduction To Computers: By: Endalkachew M
No ratings yet
Introduction To Computers: By: Endalkachew M
44 pages
Outmatch Assessment: Reseller Setup
No ratings yet
Outmatch Assessment: Reseller Setup
22 pages
(Ebook) Space System Architecture Analysis and Wargaming by Larry B. Rainey All Chapters Instant Download
100% (3)
(Ebook) Space System Architecture Analysis and Wargaming by Larry B. Rainey All Chapters Instant Download
67 pages
PayPal Login Error
No ratings yet
PayPal Login Error
12 pages