Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Maintenance Required: Updating and Extending Bootstrapped Human Activity Recognition Systems for Smart Homes

Shruthi K. Hiremath School of Interactive Computing
Georgia Institute of Technology
Atlanta, USA
shiremath9@gatech.edu
   Thomas Plötz School of Interactive Computing
Georgia Institute of Technology
Atlanta, USA
thomas.ploetz@gatech.edu
Abstract

Developing human activity recognition (HAR) systems for smart homes is not straightforward due to varied layouts of the homes and their personalized settings, as well as idiosyncratic behaviors of residents. As such, off-the-shelf HAR systems are effective in limited capacity for an individual home, and HAR systems often need to be derived “from scratch”, which comes with substantial efforts and often is burdensome to the resident. Previous work has successfully targeted the initial phase. At the end of this initial phase, we identify seed points. We build on bootstrapped HAR systems and introduce an effective updating and extension procedure for continuous improvement of HAR systems with the aim of keeping up with ever changing life circumstances. Our method makes use of the seed points identified at the end of the initial bootstrapping phase. A contrastive learning framework is trained using these seed points and labels obtained for the same. This model is then used to improve the segmentation accuracy of the identified prominent activities. Improvements in the activity recognition system through this procedure help model the majority of the routine activities in the smart home. We demonstrate the effectiveness of our procedure through experiments on the CASAS datasets that show the practical value of our approach.

Index Terms:
smart homes, self-supervised learning, machine learning
111accepted at The 6th International Conference on Activity and Behavior Computing; under print @ IEEE Xplore. Contact shiremath9@gatech.edu for recent updates.

I Introduction

Developing robust and reliable human activity recognition systems for smart homes is essential, for example, to provide automated assistance to residents, or to longitudinally monitor daily activities for health and well-being assessments. With countries facing challenges in meeting elderly care requirements [1, 2], such assessments can help in tracking behavior changes over a longer duration of time and prove as beneficial ambient assisted living (AAL) systems. To alleviate privacy concerns arising from camera-based monitoring and owing to advancements in IoT technologies the use of ambient sensors has been on the rise. Such sensing mechanisms provide for accessibility of reliable and inexpensive sensing and computing technology and instrumenting homes with sensors for everyday activity recognition in real-world living environments is now a realistic option for many. Although such advancements have made the data collection process seamless and straightforward, substantial challenges remain for developing and deploying HAR systems in smart homes [3, 4, 5, 6].

Despite considerable progress in developing said activity recognition systems [7, 8, 9, 10, 11, 12], drawbacks exist. Since smart homes are individualized settings with idiosyncratic behaviors, utilizing an “off-the-shelf” activity recognition system is typically challenging and not straightforward. Thus, a HAR system must be designed for individual smart homes, that caters to specific home layouts and activity patterns of its residents. Such a system would require large amounts of data and annotations that can be used for building the fully-supervised and personalized models for a given home. In practical deployment scenarios, it is unreasonable to assume that a resident is willing to wait extended periods of time or provide extensive annotations until enough data is collected to develop a fully functional system. Also, for real-world deployments, privacy and logistical concerns essentially rule out that third parties will be able to collect the much needed annotated sample data while the resident already lives in their smart home. As such, the focus is often on developing an initial model that provides for a limited recognition capability but is available “early on” to the resident. These initial systems aim to capture prominent and frequently occurring activities [13]. Such, approaches are needed to derive functional HAR system quickly and with minimal yet targeted involvement of the residents themselves.

Refer to caption
Figure 1: Updating activity models for HAR system. Activity predictions from the initial bootstrapped procedure (top-right portion) are used as starting points [13]. A self-supervision based module–SimCLR–utilized to learn representations is trained using unlabelled data, with sparse annotations from the active learning like procedure. This module is then used to provide predictions in the non-detection regions produced through the motif models in the initial bootstrapping procedure. Updated motif models are also learnt in a data incremental procedure. These modules make up the update and extension procedure (bottom portion) Predictions from both these modules lead to improved segmentation accuracy for prominent activities that form majority of routine activities in the home.

The initial system is used for coarse-grained recognition, which will serve as the basis for further continuous and–again targeted–improvement and extension. In this work, we develop a method that–based on this initial, functional yet not perfect version of a HAR system–extends the capabilities of an activity recognition system by integrating improved segmentation accuracy on the identified prominent activities. This maintenance and extension approach is possible through the ‘seed points’ of activity recognition along a continuous monitoring timeline (for example over the span of a few weeks) provided by the initial system. We make use of the seed points identified by the initial model to train a self-supervision module. Annotations provided in the minimal fashion by the residents are used to learn the mapping between the representations learnt from the self-supervision scheme and the labels corresponding to those seed points. A number of methods broadly belonging to the self-supervised paradigm, have made use of unlabeled data to learn robust data representations. Recent work in [14] has shown the use of SimCLR(simple framework for contrastive learning of representations) as an effective contrastive learning based self-supervision technique to learn the underlying representation space in smart homes. The objectives and contributions of this paper can be summarized as follows:

  • Development of a Maintenance Procedure for the Activity Recognition System for Smart Homes –  The update and extension procedure are used to improve the segmentation accuracy for recognized activity segments. This procedure is detailed in Fig. 1.

  • Evaluation of the developed Maintenance Procedure –  We demonstrate the effectiveness of our maintenance and extension approach through an extensive experimental evaluation on real-world smart home scenarios, namely within the context of the CASAS datasets [15]. Improvements in the activity recognition system are characterized by the majority of routine activities being covered well, i.e., exhibiting high classification as well as segmentation accuracy for the analysis of the prominent activities recorded through the continuous sensor readings.

  • Application: Activity Logging –  As such, our approach keeps track of a resident’s life and moves the HAR system closer to fully covering what is going on in a resident’s life. Our contributions get us one step closer to developing an activity recognition system for smart homes in a fully data-driven manner.

II Related Work

With decreasing sensor costs, automating ‘regular’ homes has become a possibility for many. Such sensors can collect data for extended periods of time without concerns arising regarding battery re-charging (as in wearable sensors) or privacy (as in vision-based sensors). Analyzing such sensor data for human activity recognition purposes typically follows five steps [16]: i) data capture; ii) pre-processing to remove noise; iii) segmenting the data stream into static data points that are assumed to be independent and identically distributed; iv) feature extraction – learning relevant information from data points; and v) classification – identifying the activity label for a given data point. The approach proposed in this work integrates a set of techniques to continually update activity patterns as data is observed in the smart home. The initial activity patterns are obtained from previous work in [13]. We use these as a starting point for our procedure (as described in Fig. 1) where the activity patterns, provide ‘seed points’ to update activity segments corresponding to the prominent activities of interest. We make use of a self-supervision based module by passively observing data in the smart home and build knowledge of representations incrementally. We also make use of updates to the recognition models from the initial bootstrapping procedure to learn changing activity patterns. In learning these representations, the goal is to improve the segmentation accuracy of the identified activity patterns. Thus the update and extension procedure is able to identify longer activity segments in comparison to previous work in [13]. In what follows, we summarize the relevant related work and the various components that are of relevance to our work.

II-A Activity Recognition in Smart Homes

Activity recognition systems for smart homes typically aid in identifying instances of activities of daily living [3, 17, 18, 19]. This has a direct impact in terms of logging relevant behaviors in monitoring health-related scenarios and in identifying changes that might occur from regular routines. In order to track the resident’s activity or behavior, data is collected through networked devices in the home. Ambient sensors that record event-based data are employed to record state changes. These sensor events are recorded in a continuous fashion, resulting in a time-series problem. However, data collected through ambient sensing mechanisms does not have a continuous sampling rate as opposed to data collected through either wearable sensors or videos [20, 21]. A number of algorithms have been proposed for the task of sensor-based human activity recognition systems that use K-nearest neighbors, random forests [22], Hidden Markov Models, and support vector machines [23]. More recent work in [7] explores using sequential models (and variants thereof) in modeling activities of interest. In [8, 9] various language-based encodings, such as – ELMO and Word2Vec, are used to represent the data observed. However, most, if not all, contemporary works use the ‘segment-first and then recognize’ processing approach. Such procedures are not ideal for real-world deployments where the resident would need to provide activity start and end points. Based on the smart home layout and the activities of interest, various change point detection (CPD) algorithms have been applied to segment activities [24, 25, 26]. These procedures identify abrupt changes in the sensor data streams by employing a heuristic measure based on the statistical properties of the signal or likelihood ratios [27, 24].

Approaches that make use of either time-based windowing or (sensor) event-based windowing, that relax the requirement to know the start and end times of an activity in the ‘segment-first and then recognize’ approach have been employed. Features learnt over these windows are either i) handcrafted; or ii) learnt automatically. Hand-crafted features encode information such as time of the last sensor event in the window, day of the week corresponding to the last sensor event in the window, dominant sensor in the window, last sensor location in the window to name a few. Automatically learned features are extracted, to name a few procedures, through employing i) sequential-modeling procedures [10]; ii) graph-based approaches [28, 29]; iii) self-supervision based approaches [30, 14]; iv) convolution-based approaches over sensor data represented in the form of images [31, 32, 33].

Although, the aforementioned approaches report reasonable scores on the activity recognition task, the metric often used for evaluating the built systems is the–misguiding–F1-weighted score. The use of this metric falsely suggests satisfactory performance of the analysis procedure, since it weighs the activity class that occurs in majority more than those that occur less frequently. However, in the context of the CASAS datasets used for analysis, due to an imbalance in the datasets, the ‘Other’ class, which is an activity class not of interest occurs more frequently than activities of interest such as ‘Enter Home’. Thus, providing more weight based on the number of instances of an activity biases the classification towards predicting the majority class most of the time, without learning much about the actual activity classes of interest.

II-B Self-supervision based representational learning

The goal of this work is to continually improve the segmentation of the activities identified through the bootstrapping procedure. In order to continually update representations in an unsupervised fashion, we focus on learning these representations without the need for labels. Self-supervised learning is a powerful technique that allows for extracting feature representations from large amounts of unlabelled data. The self-supervision procedure consists of two stages: i) designing the pretext task for learning robust feature representations; and ii) the fine-tuning procedure on downstream tasks aimed at transferring the knowledge learnt from the pretext task to specific tasks by fine-tuning the features. Broadly, there are two approaches to self-supervised learning: i) Contrastive learning: where the aim is to distinguish between similar and dissimilar data points within the input data; and ii) Non-contrastive learning: where only positive samples are used to learn the feature representations – the model is aimed at bringing the original data point and its augmented version close. No negative data points are used during the learning process.

SimCLR [34], a popular contrastive learning approach that makes use of three major components to learn good representations: i) using appropriate data augmentations; ii) making use of a non-linear transformation layer also known as a projection head; and iii) optimizing the contrastive learning loss over larger batch sizes and training steps. This learning procedure does not require a memory bank or specialized architectures in order to learn the useful representations. SimCLR has been extensively used in time-series analysis problems and has shown promising results [35, 36, 37, 38]. In recent works [14], the usefulness of utilizing a modified version of SimCLR in ambient settings has shown promising results. Unlike using raw data samples to learn robust representations, features are extracted from raw ambient data which are then utilized to learn the representation space. The encoder consists of two layers of CNN followed by one layer of LSTM and the projection head consists of three fully connected layers.

II-C Active Learning

Developing large-scale activity recognition systems usually requires large amounts of labeled instances of data. To reduce such a reliance on annotation as a resource [39], a semi-supervised learning paradigm–Active Learning–uses a human in the loop to obtain annotations [40]. By providing annotations for limited amounts of relevant and informative data points, the goal is to reduce the requirement of large amounts of annotations and provide comparable performance scores to the fully supervised conditions.

The Active Learning paradigm has two components: i) the sampling strategy, which details the schedule for data points to be picked for annotations from the unlabeled dataset; and ii) the query strategy, which defines a heuristic function that determines the data points to be labeled based on the evaluation of the heuristic function.

Pool based active learning is employed for tasks where large samples of unlabeled data are available. A classifier is first trained on a small labeled dataset [41, 42, 43] and a query strategy utilizing the already trained classifier then identifies data points to be queried from the pool of unlabeled dataset for further annotation. Stream-based active learning is beneficial in online settings where pool-based strategies cannot be applied. A budget is usually defined to identify the number of queries that can be made in the active learning procedure. This is generally limited and tuned based on the performance expected from the system and the burden a human annotator is expected to accept. Different budget spending strategies have been explored in [44].

In the bootstrapping procedure, annotations corresponding to the identified motifs are obtained from the residents in an active learning like procedure. We make use of these annotations to train the self-supervision module. Thus the reliance on the resident to provide annotations is kept at a minimum throughout the entire procedure. For the updates to the recognition procedure, through the activity models, the procedure (similar to the initial bootstrapping procedure) of asking for minimal labels from residents every few weeks is retained.

III Updating and Extending Bootstrapped Human Activity Recognition Systems for Smart Homes

In this work, we develop a procedure to update HAR systems in continuously evolving smart home environments. We focus on utilizing the ‘seed points’ of the initial bootstrapping procedure as the starting point for analysis. The ‘seed points’ provide for initial segments of activities identified through the activity recognition procedure in the initial bootstrapping stage. The identified segments are aimed at discovering the occurrence of the activity. As such these segments may not cover the entire length of the activity as it occurs in the smart home. Our ‘update and extension’ procedure is aimed at extending these seed points and bringing the activity recognition system closer to identifying the correct start and end points of the activities in the smart home. We employ a conjunction of a self-supervision module and an updated activity models in order to update the recognition system. A data-incremental procedure is used to update the recognition model, where data collected over every few weeks in the smart home is used train the recognition procedure.

III-A Scope

We develop an update procedure that aims to extend patterns of already identified activities through the initial bootstrapping procedure, and to learn new patterns corresponding to activities the system was unable to identify previously – the HAR model will be updated and extended. For this update procedure, we make some assumptions that help scope out the contribution of this work.

We use the initial bootstrapping procedure in [13] as the starting point and obtain labels corresponding to newly identified motif models from the resident through an active learning like procedure [45]. The proposed procedure assumes that the predictions of the bootstrapping procedure are accurate with regard to the predicted activities but they may not be precise with regard to the actual boundaries of the detected activities, i.e., while the predictions will match with activities, the actual activities might extend those seed points.

Since we utilize [13] as the starting point for the work presented here, we retain the design choices used in the initial work. The design choices used in the previous work are: i) the length of the action unit is determined through observing 20 sensor events, and co-occurrence patterns across these sensor event triggers are learned through the embedding layer; ii) motifs for the prominent activities have to be \geq 2 in length; iii) motifs for prominent activities should occur \geq 5 times; iv) motifs identified should be homogeneous in the activity label; and v) motifs for prominent activities are identified through majority voting to ensure these models are strong predictors of a given activity.

Refer to caption
Figure 2: Summary of the initial bootstrapping procedure for Phase 1 of the HAR system lifespan (taken with permission from [13])

Both the initial bootstrapping and update procedures aim to provide solutions through a data-driven approach, without requiring too much effort from the resident’s end. Predictions from both these procedures can be refined by utilizing additional information such as an ontology built for the smart home identifying relevant sensor events corresponding to a given activity or using knowledge of activity routines that the resident engages in.

The update procedure starts after the initial n+m weeks of data in the smart home, wherein the first n weeks are utilized during the Cold Phase and the next m weeks during the Warm Phase as shown in Fig. 2. Previous work in [13] uses n and m equal to two weeks each. 2 However this is a hyper-parameter that can be varied. We hypothesize that the procedure, after the required break-in period, provides a good starting point for the update and extend procedure.

III-B Prerequisites: Initial Bootstrapping Procedure

Action units are learned through an embedding and clustering procedure that correspond to the movement patterns in the home, learnt during the Cold Phase (top portion of Fig. 2). Algorithm 1 details the procedure to obtain these action units that serve as building blocks for the next steps of the bootstrapping procedure. The action units are learnt over the first two weeks (n=2) of data observed in the home. In the second stage-–Warm Phase (middle portion of Fig. 2)–frequently occurring sub-sequences comprising of action units are identified through a set of filtering procedures and queried. Activity labels corresponding to these sub-sequences are collected through the query procedure from the resident. Thus, a set of motif models, which represent activity models are derived for a sub-set of activities of interest. A merge or overlap is used to merge motif models of varying lengths to obtain the final models. The length of these final motif models provides initial segmentation boundaries for the detected activity.

Motif models are learnt over the next two weeks (m=2) (after the Cold Phase), of data observed in the home. During the last stage-–Hot Phase (bottom portion of Fig. 2)–the system is deployed to detect activities occurring in the smart home. After the first n+m weeks (n+m=4) required for the initial bootstrapping procedure, the initial set of motif models is available.

III-C Self-supervision module

Refer to caption
Figure 3: Architecture for the self-supervision module. The pre-tranining module (top) requires only unlabeled data for training. The fine-tuning module (bottom) makes use of representations from the self-supervision module and annotations to train a model for an activity recognition task. See text in Sec. III-C for details of the architecture.

We employ self-supervision to extend the ‘seed points’ obtained during the bootstrapping procedure. The self-supervision module serves to learn good representations from data observed in the smart home. Since the self-supervision technique learns discriminative and meaningful representations from unlabeled data, the resident in the loop is not burdened with providing annotations for each data point observed in the home. The components of the self-supervision module are illustrated in Fig. 3. This module consists of two components: i) pre-training; and ii) fine-tuning.

In the pre-training procedure representative features are learnt from large amounts of unlabeled data. We detail the components of the self-supervision module below:

  1. 1.

    Representations are learnt over the embeddings of the actions units. These embeddings are learnt from the BERT model [13], which uses a 15% masking probability to obtain the encodings over which the action units are obtained. These representations are fed into the self-supervision module and are denoted as ‘x𝑥xitalic_x’ in Fig. 3.

  2. 2.

    We make use of the noise transformed and scaling transformed data augmentation techniques to obtain augmented pairs of the input data. These augmented pairs are represented as transformed datapoints ‘xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’, where ‘x𝑥xitalic_x’ undergoes the noise transformed and scaling transformations. A random noise is applied to the data point for the noise based transformation and scaled by a normal distribution for the scaling based transformation. These transformations were introduced in [46] and have proven effective for time series data.

  3. 3.

    The encoder consists of two convolutional layers, followed by an LSTM layer. The filter sizes for the convolutional layer are 32 and 64. Each convolutional layer is followed by a RELU and dropout layer (with dropout probability of 0.1). The number of hidden units for the LSTM layer are 64. Transformed data points ‘xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’ are passed through this encoder to obtain embeddings as ‘risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘rjsubscript𝑟𝑗r_{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’ respectively.

  4. 4.

    The embeddings obtained (‘risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘rjsubscript𝑟𝑗r_{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’) are then passed onto the projection head, which consists of three linear layers with input filter sizes of 64, 128, and 256. Every linear layer is separated by a RELU layer. This gives us new embeddings–‘zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’–which are then used as inputs to the loss function.

  5. 5.

    A contrastive loss function, NT-Xent [47], is used to maximize the similarity between augmented data points and minimize the similarity to the other data points in a given batch. The loss function is defined in Eq. 1. zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’ correspond to the representations obtained from the self-supervision module for an unlabeled data point ‘x𝑥xitalic_x’ collected in the smart home. ‘zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’ correspond to positive pairs in the batch whereas ‘zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’ and ‘zksubscript𝑧𝑘z_{k}italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’ correspond to the negative pairs. A batch size of 64 is used. ‘k𝑘kitalic_k’ represents the number of data point(s) in a given batch. Since each unlabeled data point results in 2 augmented versions the number of datapoints is two times the pre-defined batch size (2N). The cosine similarity is scaled by a temperature parameter τ𝜏\tauitalic_τ.

    Li,j=logexp(sim(zi,zj)/τk=12N𝟙[ki]exp(sim(zi,zk)/τL_{i,j}=-log\dfrac{exp(sim(z_{i},z_{j})/\tau}{\sum_{k=1}^{2N}\mathbbm{1}_{[k% \neq i]}exp(sim(z_{i},z_{k})/\tau}italic_L start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = - italic_l italic_o italic_g divide start_ARG italic_e italic_x italic_p ( italic_s italic_i italic_m ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_τ end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT [ italic_k ≠ italic_i ] end_POSTSUBSCRIPT italic_e italic_x italic_p ( italic_s italic_i italic_m ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_τ end_ARG (1)

In the fine-tuning procedure, we make use of small amounts of labeled data to fine-tune the encoder pretrained in the previous step. The data points and corresponding labels are used from the predictions of the bootstrapping procedure. Since the bootstrapping procedure identified prominent activities, the self-supervision module learns a model corresponding to only these prominent activities, with the goal of improving segmentation accuracy. The projection head from the pre-training step is discarded and only the encoder is used to obtain the embeddings. Similar to [14], the prediction head consists of two sequential layers with input feature sizes of 64 and 256. A RELU layer is used for activation between the two sequential layers. Cross-entropy loss is used for the activity recognition task.

III-D Update and Extend the Initial Bootstrapped Procedure

In this work, we design the update and extend procedure (Fig. 1) of the activity recognition system. The update procedure starts after the initial bootstrapping procedure (Fig. 2), which serves as a pre-requisite and starting point for the update procedure.

The motif models obtained provide the initial segmentation corresponding to the prominent activities in the home. Although these segments provide a good initialization point to identify the activities of interest, they are not accurate in providing the accurate start and end points corresponding to these activities. In the update and extend procedure, we use these initially identified segments to train the self-supervision module. The predictions are used as labels to fine-tune the self-supervision module. Softmax score obtained from the trained model are used to predict the occurrence (or not-) of the identified prominent activities.

In parallel, the motif discovery procedure identifies (new-) motifs with every update. Since the data observed during the Cold Phase increases with each update, new motifs capture increasing number of movement patterns. The predictions from the self-supervision model and the new motifs discovered are combined to obtain the final segmentation accuracy score.

Refer to caption
Figure 4: Data incremental procedure for update and extension of the bootstrapped model.
Algorithm 1 Cold Phase: Predict Action Units (with permission from [13])
Input: Data D: {sensorj,valuej}\langle sensor_{j},value_{j}\rangle\}⟨ italic_s italic_e italic_n italic_s italic_o italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v italic_a italic_l italic_u italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ }; BERT; k-Means \triangleright BERT and k-Means are trained models as described in Sec. III-B. Input data stream is made up of sensor event triggers (sensorj𝑠𝑒𝑛𝑠𝑜subscript𝑟𝑗sensor_{j}italic_s italic_e italic_n italic_s italic_o italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) and their corresponding values (valuej𝑣𝑎𝑙𝑢subscript𝑒𝑗value_{j}italic_v italic_a italic_l italic_u italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT)
Output: Action Unit Predictions AU: {au1,au2,,aui𝑎subscript𝑢1𝑎subscript𝑢2𝑎subscript𝑢𝑖au_{1},au_{2},...,au_{i}italic_a italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT}
windows = sliding_windows(D)
for wiwindowssubscript𝑤𝑖𝑤𝑖𝑛𝑑𝑜𝑤𝑠w_{i}\in windowsitalic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_w italic_i italic_n italic_d italic_o italic_w italic_s do
    encodingi𝑒𝑛𝑐𝑜𝑑𝑖𝑛subscript𝑔𝑖encoding_{i}italic_e italic_n italic_c italic_o italic_d italic_i italic_n italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = BERT(wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT)
    aui𝑎subscript𝑢𝑖au_{i}italic_a italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = k-Means(encodingi𝑒𝑛𝑐𝑜𝑑𝑖𝑛subscript𝑔𝑖encoding_{i}italic_e italic_n italic_c italic_o italic_d italic_i italic_n italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT)
    aui𝑎subscript𝑢𝑖au_{i}italic_a italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \rightarrow AU
end for
Algorithm 2 Update & Extend Initial Bootstrapped Procedure
Input: Mt={M1,M2,M3,..Mj}M_{t}=\{M_{1},M_{2},M_{3},..M_{j}\}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , . . italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }; \triangleright Initial Motif models (Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
Data Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT: {sensorj,valuej}\langle sensor_{j},value_{j}\rangle\}⟨ italic_s italic_e italic_n italic_s italic_o italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v italic_a italic_l italic_u italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⟩ }; \triangleright Input data stream
θ𝜃\thetaitalic_θ: self-supervision model
Output: Updated activity segments Aupdatedsubscript𝐴𝑢𝑝𝑑𝑎𝑡𝑒𝑑A_{updated}italic_A start_POSTSUBSCRIPT italic_u italic_p italic_d italic_a italic_t italic_e italic_d end_POSTSUBSCRIPT: At1subscript𝐴𝑡1A_{t-1}italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT \triangleright At1subscript𝐴𝑡1A_{t-1}italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT: refer Sec. III-B; Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT: refer Sec. III-D
windows = sliding_windows(D)
AU=predict action units(windows)𝐴𝑈predict action units𝑤𝑖𝑛𝑑𝑜𝑤𝑠AU=\textbf{predict action units}(windows)italic_A italic_U = predict action units ( italic_w italic_i italic_n italic_d italic_o italic_w italic_s ) \triangleright refer Alg. 1
for miMtsubscript𝑚𝑖subscript𝑀𝑡m_{i}\in M_{t}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT do \triangleright refer Sec. III-B
    if misubscript𝑚𝑖m_{i}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT matches AU[k:n]AU[k:n]italic_A italic_U [ italic_k : italic_n ] then
         detection=detection+AU[k:n]detection=detection+AU[k:n]italic_d italic_e italic_t italic_e italic_c italic_t italic_i italic_o italic_n = italic_d italic_e italic_t italic_e italic_c italic_t italic_i italic_o italic_n + italic_A italic_U [ italic_k : italic_n ]
\triangleright k <<< n: AU sequence indices
    end if
end for
non_detection=AUdetection𝑛𝑜𝑛_𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛𝐴𝑈𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛non\_detection=AU-detectionitalic_n italic_o italic_n _ italic_d italic_e italic_t italic_e italic_c italic_t italic_i italic_o italic_n = italic_A italic_U - italic_d italic_e italic_t italic_e italic_c italic_t italic_i italic_o italic_n
Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = θ(non_detection)+Mupdated(Dt)𝜃𝑛𝑜𝑛_𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛subscript𝑀𝑢𝑝𝑑𝑎𝑡𝑒𝑑subscript𝐷𝑡\theta(non\_detection)+M_{updated}(D_{t})italic_θ ( italic_n italic_o italic_n _ italic_d italic_e italic_t italic_e italic_c italic_t italic_i italic_o italic_n ) + italic_M start_POSTSUBSCRIPT italic_u italic_p italic_d italic_a italic_t italic_e italic_d end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) \triangleright θ(non_detection)𝜃𝑛𝑜𝑛_𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛\theta(non\_detection)italic_θ ( italic_n italic_o italic_n _ italic_d italic_e italic_t italic_e italic_c italic_t italic_i italic_o italic_n ): refer Sec. III-C; Mupdated(Dt)subscript𝑀𝑢𝑝𝑑𝑎𝑡𝑒𝑑subscript𝐷𝑡M_{updated}(D_{t})italic_M start_POSTSUBSCRIPT italic_u italic_p italic_d italic_a italic_t italic_e italic_d end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ): refer Sec. III-D
Aupdatedsubscript𝐴𝑢𝑝𝑑𝑎𝑡𝑒𝑑A_{updated}italic_A start_POSTSUBSCRIPT italic_u italic_p italic_d italic_a italic_t italic_e italic_d end_POSTSUBSCRIPT: At1subscript𝐴𝑡1A_{t-1}italic_A start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
return Aupdateddelimited-⟨⟩subscript𝐴𝑢𝑝𝑑𝑎𝑡𝑒𝑑\langle A_{updated}\rangle⟨ italic_A start_POSTSUBSCRIPT italic_u italic_p italic_d italic_a italic_t italic_e italic_d end_POSTSUBSCRIPT ⟩

III-E Deployment and Activity Recognition

We use a data incremental procedure during the deployment of the designed activity recognition system, where the activity recognition is updated every (n)𝑛(n)( italic_n ) weeks, through training the self-supervision module as shown in Fig. 4. During the deployment procedure, the motif models stored in the motif memory up to the last time step (t1)𝑡1(t-1)( italic_t - 1 ) are utilized. The first (n+m)𝑛𝑚(n+m)( italic_n + italic_m ) weeks are from initial bootstrapped procedure and the next ((t1)n)𝑡1𝑛((t-1)*n)( ( italic_t - 1 ) ∗ italic_n ) weeks are from the data observed in update and extend procedure.

An activity prediction is reported when there is a match between an activity model and a sequence of observations in the smart home. By utilizing the incremental versions through the update and extend procedure, the segmentation accuracy improves and the system gets closer to a fully-functional event-based recognition system. With increasing observation period in the home, large quantities of data become available to train the self-supervision module. The goal of building such update and extension mechanisms is to detect activity occurrences, which can provide activity logs that are to be used for activity monitoring and behavior analysis.

IV Experiments

Through our experimental analysis we explore the effectiveness of our update and extension procedure, with specific focus on segmentation of prominent activities identified in the bootstrapping procedure. The initial HAR system derived through the initial bootstrapping procedure; illustrated in Fig. 2 is developed for a given smart home using the first n+m weeks of observations. We extend this procedure to now improve the segmentation accuracy for the prominent activities.

As such, the update procedure is initiated after the first n+m weeks of a resident living in the smart home. Thus, the starting point for the maintenance procedure is through the previously defined Hot Phase. Updates to the recognition model are scheduled after every two weeks of data observed in the home. The number of weeks required for the update is a design choice that can be tweaked based on the requirements of the application. However, since the initial bootstrapping procedure used increments of two weeks to develop a model, we retain this design choice during the update and extension procedure. Thus, multiple versions of updated HAR systems are derived through observing–and processing blocks of n consecutive weeks of data. This data incremental procedure is illustrated in Fig. 4.

The evaluation procedure in this application scenario is challenging. Ground truth is obtained retrospectively from residents through surveys and sensor event triggers are then analyzed to provide activity labels and boundaries. Such annotations provide for noisy ground truth and obtaining sample-precise evaluations proves challenging. The methods developed here can be used “as-is” in any smart home, irrespective of varying layouts and residents’ activity patterns. Since the update procedure aims at improving the segmentation procedure by extending the seed points from the bootstrapping procedure it brings us closer to providing a functional system that determines activities in the home.

IV-A Methodology

The update and maintenance procedure targets the improvement of the segmentation of the prominent activities identified in the bootstrapping procedure. We provide experimental evaluations on the aforementioned CASAS datasets. A continuous evaluation protocol is used wherein the self-supervision module is continuously updated and used to improve the segmentation accuracy. We provide quantitative results as part of the evaluation procedure and compare improvements over the initial bootstrapping phase.

Evaluation Protocol: The self-supervision module is initially trained on the predictions from the bootstrapping procedure identified after the first n+m𝑛𝑚n+mitalic_n + italic_m weeks. The trained recognition model is then evaluated on all data henceforth. The same procedure is repeated after every n weeks during the update phase, wherein once the self-supervision model is updated–at time step (t1𝑡1t-1italic_t - 1)–the recognition model is used for evaluation on all data observed in the home from the next (n𝑛nitalic_n) weeks. Hence the updates to the self-supervision model come from n+m+(t1)n𝑛𝑚𝑡1𝑛n+m+(t-1)*nitalic_n + italic_m + ( italic_t - 1 ) ∗ italic_n weeks.

IV-B Datasets and Data Pre-Processing

Refer to caption
(a) CASAS-Aruba
Refer to caption
(b) CASAS-Milan
Refer to caption
(c) CASAS-Cairo
Figure 5: Floor plans for Smart Homes used for our experimental evaluation: (a) CASAS-Aruba, (b) CASAS-Milan and (c) CASAS-Cairo (with permission from [15]). Annotations for locations are used with permission from [13].

We base our explorations on publicly available datasets that are widely used in the research community. These datasets were collected as part of the Center of Advanced Studies in Adaptive Systems (CASAS), with ground truth annotations provided by residents.

We evaluate the effects of our method on three popular datasets: CASAS-Aruba, CASAS-Milan and CASAS-Cairo, which were collected over 219, 92 and 56 days respectively. CASAS-Aruba and CASAS-Milan are single-resident households, whereas CASAS-Cairo houses two residents. Both CASAS-Milan and CASAS-Cairo also house a pet. The CASAS-Cairo home has three storeys, the detailed layout of each are depicted in Fig. 5(c). Details of these datasets are in Tab. I.

Ambient sensors such as Motion (denoted as ‘M###’), Door (denoted as ‘D###’) and Temperature (denoted as ‘T###’) sensors are used to collect data in these homes as shown in Fig. 5. The Door and Motion sensors are binary sensors, where the states of the Door sensors correspond to OPEN or CLOSE and that of Motion sensor corresponds to ON or OFF, respectively. The sensors are numbered randomly, with no specific ordering. Door sensors, represented as green lines, specifically capture the opening and closing of the doors and drawers in the home (for example ‘D001’ near the entrance of the home in CASAS-Milan) whereas Motion sensors represented by the red dots capture movement close to where they are located. The red dots in the smart home layouts represent the detection of motion in localized areas whereas the radiating red dots capture movement over a wider area. Temperature sensors record changes in the home, but do not capture movement. Hence, as in previous work [48, 13], we do not make use of these sensors. We also do not make use of absolute timestamps to make our proposed approaches generalizable to implement across various smart home layouts and idiosyncrasies of the residents occupying them.

TABLE I: Details of the CASAS datasets used for our experimental evaluation
CASAS Dataset Aruba Milan Cairo
Days 219 92 56
Residents 1 1+pet 2+pet
Sensors 39 33 27
Activities 11 14 6

IV-C Deployment and Activity Recognition

Our update procedure is initiated after the first four weeks of the resident living in the home, which is when the initial HAR system was bootstrapped [13], and it is then subsequently updated every two weeks. We evaluate the effectiveness of our update and maintenance by reporting the segmentation accuracy using ground truth labels provided for CASAS-Aruba, CASAS-Milan and CASAS-Cairo. For the initial bootstrapping procedure, the derived HAR system was evaluated on all data of the Hot Phase. For the update and extend procedure, we evaluate the updated HAR system continuously, i.e., after each block of two weeks as shown in Fig. 4. The results on segmentation accuracy are tabulated in and Tab. II, Tab. III and Tab. IV. Each column in these tables represents the scores obtained from an update to the motif memory at time step t1𝑡1t-1italic_t - 1 for all data from time step t𝑡titalic_t until data is collected in the home. When no such update is observed, the corresponding block is empty.

Evaluation: The initial self supervision model is trained using the predictions from motif models obtained in the bootstrapping procedure. Since the bootstrapping procedure only recognizes prominent activities (a subset of the activities occurring in the home), the self-supervision module is trained on only these activities. The goal of this work is to improve the segmentation accuracy corresponding to these prominent activities, that form essential components of daily routines. The improved segmentation procedure will enable identification of the less frequently occurring activities by analyzing identified gaps by the HAR model and comparing it to routines of the resident. We use the ‘Segmentation Accuracy’ (Seg. Accuracy) as metric for evaluating our system, where ACidentified𝐴subscript𝐶𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑AC_{identified}italic_A italic_C start_POSTSUBSCRIPT italic_i italic_d italic_e italic_n italic_t italic_i italic_f italic_i italic_e italic_d end_POSTSUBSCRIPT is the number of action units identified in a given activity; whereas ACactivity𝐴subscript𝐶𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦AC_{activity}italic_A italic_C start_POSTSUBSCRIPT italic_a italic_c italic_t italic_i italic_v italic_i italic_t italic_y end_POSTSUBSCRIPT is the total number of action units that make up a given activity and N is the total number of activity instances:

Seg.Accuracy=n=1N(AUidentifiedAUactivity)n=1NAUactivityformulae-sequence𝑆𝑒𝑔𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦superscriptsubscript𝑛1𝑁𝐴subscript𝑈𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑𝐴subscript𝑈𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦superscriptsubscript𝑛1𝑁𝐴subscript𝑈𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦Seg.\ Accuracy=\dfrac{\sum_{n=1}^{N}(AU_{identified}\in AU_{activity})}{\sum_{% n=1}^{N}AU_{activity}}italic_S italic_e italic_g . italic_A italic_c italic_c italic_u italic_r italic_a italic_c italic_y = divide start_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_A italic_U start_POSTSUBSCRIPT italic_i italic_d italic_e italic_n italic_t italic_i italic_f italic_i italic_e italic_d end_POSTSUBSCRIPT ∈ italic_A italic_U start_POSTSUBSCRIPT italic_a italic_c italic_t italic_i italic_v italic_i italic_t italic_y end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_A italic_U start_POSTSUBSCRIPT italic_a italic_c italic_t italic_i italic_v italic_i italic_t italic_y end_POSTSUBSCRIPT end_ARG (2)

Evaluating updates: The evaluation procedure we use in this work makes use of most recent model obtained so far on all test sets observed after the given update. For example, the activity model from train 1 is evaluated on all four test sets observed in the future, whereas the activity model obtained after train 3 update is evaluated on test data observed after said update – test 3 and test 4 – both depicted as ‘M3’ and ‘M4’ respectively. We provide an example of the procedure for the CASAS-Aruba dataset, that spans over 32 weeks of data collection. During iteration 1, the HAR model trained over Weeks 1-4 (M1) is then evaluated on Weeks 5-6. For iteration 2, the HAR model is trained over Weeks 1-6. Following evaluation protocol 1, both models (M1 and M2) developed during Weeks 1-4 and Weeks 1-6 are evaluated on Weeks 7-8. The procedure is repeated for as long as data is observed in the smart home.

Results for CASAS-Cairo, CASAS-Aruba and CASAS-Milan are reported in Tab. II, Tab. III and Tab. IV respectively. The prominent activities identified for the CASAS-Aruba dataset correspond to Sleep, Work, Meal_Preparation and Relax. For CASAS-Milan, these correspond to Sleep, Kitchen_Activity (KA), Guest_Bathroom (GB), Read, Master_Bedroom_Activity (MBA), Master_Bathroom (MB), Watch_TV, Desk_Activity (DA), Dining_Rm_Activity (DRA). We observe that with each update the segmentation accuracy for the prominent activities in these three smart homes increases (as depicted in bold text). Although, the analysis for CASAS-Milan and CASAS-Cairo is over the entire duration of data collected in these homes, we limit the analysis for the CASAS-Aruba dataset to three iterations to showcase the validity of our approach.

CASAS-Aruba provides for a simpler analysis problem, as observed by the relatively high segmentation scores for the prominent activities. CASAS-Milan is a complex analysis problem, where the movement of the pet recorded in the home interferes with the movement of the resident, leading to noisy annotations.

TABLE II: Experimental Evaluation Cairo: GT: Ground Truth; M1: Model 1; M2: Model 2. Seg. Accuracy is Segmentation Accuracy for prominent activities.
Activity Test 1
Seg. Accuracy
Test 2
Seg. Accuracy
Eat GT: 34.96 ±14.63
M1: 5.86 ±4.72
GT: 37.83 ±17.26
M1: 9.72 ±7.67
M2: 13.75 ±9.14
Sleep GT: 8.36 ±6.84
M1: 0.0 ±0.0
GT: 6.58 ± 4.44
M1: 0.0 ±0.0
M2: 0.291 ±0.99
Work GT: 4.94 ±3.50 M1: 0.0 ±0.0 GT: 6.75 ±7.39
M1:0.0 ±0.0
M2: 0.571 ±1.89
TABLE III: Experimental Evaluation Aruba: GT: Ground Truth; M1: Model 1; M2: Model 2; M3: Model 3. Seg. Accuracy is Segmentation Accuracy for prominent activities.
Activity Test 1
Seg. Accuracy
Test 2
Seg. Accuracy
Test 3
Seg. Accuracy
Meal_Prep GT: 9.37 ±8.09
M1: 5.11 ±6.65
GT: 13.12 ±16.89
M1: 5.61 ±10.01
M2: 7.37 ±11.99
GT: 11.04 ±9.68
M1: 5.18 ±7.10
M2: 4.66 ±7.11
M3:6.89 ±6.07
Relax GT: 3.86 ±3.56
M1: 2.86 ±3.55
GT: 4.22 ±3.68
M1: 2.92 ±3.66
M2: 3.12 ±3.64
GT: 7.12 ±8.07
M1: 4.24 ±6.60
M2: 3.96 ±6.08
M3: 4.45 ±7.03
Sleep GT: 6.94 ±6.92
M1: 5.10 ±3.75
GT: 3.72 ±2.64
M1: 2.59 ±2.87
M2: 2.90 ±2.95
GT: 7.05 ±7.44
M1: 4.11 ±3.35
M2: 5.58 ±5.31
M3: 5.58 ±5.31
Work GT: 3.0 ±3.31
M1: 2.0 ±3.65
GT: 3.0 ±1.60
M1: 2.25 ±1.38
M2: 2.25 ±1.38
GT: 6.5 ±7.77
M1: 6.5±7.77
M2: 6.5±7.77
M3: 6.5±7.77
TABLE IV: Experimental Evaluation Milan: GT: Ground Truth; M1: Model 1; M2: Model 2 Seg. Accuracy is Segmentation Accuracy for prominent activities.
Activity Test 1
Seg. Accuracy
Test 2
Seg. Accuracy
Test 3
Seg. Accuracy
Sleep GT: 7.05 ±4.50
M1: 0.05 ±0.24
GT: 8.03 ±7.22
M1: 0.0 ±0.0
M2: 1.37 ±1.54
GT: 10.61 ±8.06
M1: 0.0 ±0.0
M2: 0.46 ±1.19
M3: 1.84 ±2.64
KA GT: 8.99 ±10.34
M1:7.67 ±9.93
GT: 8.44±10.05
M1: 0.11 ±0.52
M2: 1.14 ±3.71
GT: 8.90 ±8.93
M1: 0.23 ±0.83
M2:1.36 ±2.90
M3: 2.03 ±3.57
GB GT: 2.0 ±1.44
M1: 0.0 ±0.0
GT:1.67 ±1.18
M1: 0.0 ±0.0
M2: 0.10 ±0.41
GT: 2.43 ±1.97
M1: 0.0±0.0
M2: 0.0 ±0.0
M3: 0.16 ±0.55
Read GT: 7.97 ±6.13
M1: 6.13 ±4.01
GT: 6.56 ±5.67
M1: 0.07 ±0.50
M2: 3.41 ±3.71
GT: 5.94 ±5.09
M1: 0.08 ±0.36
M2:0.083 ±0.36
M3: 2.57 ±3.18
MBA GT: 8.16 ±7.19
M1: 0.32 ±0.80
GT: 6.15 ±5.37
M1: 0.0 ±0.0
M2: 1.01 ±1.70
GT: 90 ±9.87
M1: 0.0 ±0.0
M2: 0.0 ±0.0
M3: 2.36 ±4.36
MBath GT: 2.78 ±2.31
M1: 0.14 ±0.52
GT: 2.19 ±1.52
M1: 0.0 ±0.0
M2: 0.0 ±0.0
GT: 3.07 ±2.97
M1: 0.0 ±0.0
M2: 0.0 ±0.0
M3: 0.20 ±0.59
Watch_TV GT: 9.02 ±6.822
M1: 0.07 ±0.48
GT: 7.51 ±5.67
M1: 0.0 ±0.0
M2: 2.81 ±3.54
GT: 6.38 ±6.37
M1: 0.0 ±0.0
M2: 2.07 ±2.53
M3: 2.0 ±2.70
DA GT: 3.25 ±1.5
M1: 0.0 ±0.0
GT: 5.9 ±5.52
M1: 0.0 ±0.0
M2: 0.0 ±0.0:
GT: 8.82 ±12.64
M1: 0.0 ±0.0
M2: 0.0 ±0.0
M3: 3.29 ±3.86
DRA GT: 0.0 ±0.0 ±0.0
M1: 0.0 ±0.0
GT: 7.41 ±5.33
M1: 0.0 ±0.0
M2: 0.0 ±0.0
GT: 5.28 ±3.77
M1: 0.0±0.0
M2: 0.0 ±0.0
M3: 0.57 ±0.97

V Discussion

The premise of this work is that each smart home is different, and–crucially–each resident of such smart homes is different with regard to the activities they engage in. Based on these observations (which are backed up, e.g., by the substantial variability in existing smart home datasets), we design the activity recognition system for the home in a fully data-driven approach. An initial bootstrapped HAR system is was covered in previous work. In this work, we update and extend this recognition procedure, to improve the segmentation accuracy for the identified prominent activities.

We build on the initial bootstrapped procedure–where the HAR system was initialized from scratch and with minimal supervision. Model predictions from this system are used as seed points for developing the update and extend procedure. The predictions serve as data points to train the self-supervision module, which is then used to provide predictions at the level of action units. These predictions in addition to the segmentation provided by the motif models helps improve the segmentation accuracy corresponding to the majority of the activities (that are essential parts of the resident’s routines) in the home. Through an extensive experimental evaluation on three CASAS datasets we demonstrated the effectiveness of the proposed method. In what follows we discuss additional aspects relevant to our work and outline some next steps for this work.

V-A Boosting of Activity Models

Boosting is a method, in the machine-learning sphere, that aims at improving the accuracy of a given learning algorithm [49, 50, 51, 52]. It trains models sequentially by improving learners to provide for a single strong learning model. To do so, it assigns higher weights to misclassified data points, such that the subsequent learner labels these data points accurately.

Our maintenance and update procedure uses similar concepts. For the updates of the recognition procedure predictions through the bootstrapping procedure are used to train the self-supervision module. This model then improves segmentation accuracy through providing predictions in the “non-detection” regions. Thus, the overall recognition procedure, at any given time, becomes a “strong” learner by using the model predictions from previous iterations. Contrary to the stopping criteria used in boosting, which stops when the training errors produced by the learner are below a certain given threshold, our update procedure is continuous in identifying activity patterns are observed in the home.

V-B Refinement of Seed Points

Techniques developed for both the bootstrapping and update and extension procedure inform general procedures to develop a functional HAR model in the smart home using minimal supervision from residents. This recognition procedure gets us closer to a fully functional system capable of identifying the prominent activities as part of the resident’s daily living. Our work thereby serves as proof of concept, which can be further extended and optimized. We provide predictions corresponding to the “non-detection” regions, which correspond to portions of data where the initial bootstrapping procedure does not produce any predictions. This is achieved through i) training a self-supervision based module that uses data and the corresponding predicted label from the initial procedure and, ii) using updated motif models. Although these serve as good starting points they can be further refined to tune them for specific home settings or activity patterns.

In both discriminative and generative modeling, posterior probabilities [53, 54] are used to estimate classification confidence for data points. This allows the identification of “not-so-confident” data points that can be modeled better. Estimating such probabilities is not straightforward when using template-matching-based methods, as in our case of motif models. Hence the use of additional knowledge may help in identifying confident activity models [55, 56, 57, 58] . Identification of relevant sensor events, priors on when activities are usually performed in the home, and sequences of activities that make up routines will serve as additional information that can be used to refine the seed points and hence the recognition that eventually leads to a refined fully functional recognition system. It also aids in asking the resident appropriate questions about nuanced activities when they occur in the home.

V-C Knowledge-based Active Learning

The methods underlying the HAR system rely on small amounts of activity labels, requested from the resident. For the initial bootstrapping procedure the focus was on obtaining labels for the most prominent activities (through their motifs), whereas in the update and extension procedure we aim to extend the segmentation boundaries corresponding to these identified activities. This developed HAR system proves useful to model the major activities that are part of routines in the home.

Beyond that, for example for very short duration activities, additional methods will be required. This additional information could be based on heuristics such as using absolute timestamps (or coarse categories like morning or evening sub-routines) and the durations between consecutive analysis segments. An incremental knowledge-gathering procedure would ensure that we first start from a conservative procedure and move towards a more refined system, in the absence of which we may develop greedy or opportunistic model procedures. To incrementally refine and acquire more knowledge about the smart home and its resident, an ontology-based active learning procedure can be employed. For example, in Milan, two of the activities–‘Master_Bedroom_Activity’ and ‘Sleep’–are both activities that occur in the bedroom and comprise similar movement patterns. Similarly, for activities of ‘Respirate’ / ‘Meditate’ that are predicted as ‘Work’ activities in the bootstrapping procedure and hence not available for analysis in the update and extension procedure. A knowledge-based query procedure may help in distinguishing between these closely related yet different activities.

V-D Next Steps for HAR in Smart Home: Routine Assessment

The activity recognition system developed so far aims at developing a functional HAR system that recognizes regular activities in the home. In the next step, we aim to analyze activity routines. Routine assessment refers to the problem of identifying regular occurrences of activity sequences in the home [59]. Identifying such daily activity routines aids in improving the underlying recognition model through capture of infrequent and short duration activities – those that are not picked up by current recognition procedures.

The system can aim to provide a trigger to residents or caregivers on identifying anomalies in daily routines thus proving beneficial in assisting living [60, 61, 62]. An example of such an anomalous activity would be fall detection [63, 64, 65, 66, 67].

VI Conclusion

With reduced sensor costs and advancements in IoT technologies, there is an increased interest in instrumenting “regular” homes with sensors that can be used for activity monitoring, turning them into “smart homes”, which are of benefit, for example, for health care applications. However, developing such an activity recognition system is challenging because it needs to be tailored towards the particular smart home and especially towards the resident it is serving. The overarching goal of our work is to develop methods for automatically deriving tailored HAR systems thereby minimizing user involvement and focusing on the rapid availability of functional HAR systems that are then continuously updated and extended, mainly to capture activity sequences as shown in this work.

Based on previous work that covers the initial bootstrapping of such a tailored HAR system, in this work, we focused on update and extension of the initial system. With the proof-of-concept presented, and extensively evaluated, in this paper, the next steps shall focus on capturing less frequently occurring, yet important activities. Our framework serves as the basis for such an extension and, as such, has the potential to become the overarching, yet extendable, modeling framework for HAR in smart homes.

Acknowledgment

This work was partially supported by KDDI Research. We thank the whole CASAS team and in particular Dr. Diane Cook for aiding us in the process of understanding the CASAS datasets and for providing us with the detailed smart home layouts. We are also grateful for the comments and suggestions provided by the anonymous reviewers who helped us in improving the clarity of the presentation.

References

  • [1] M. Asadzadeh, A. Maher, M. Jafari, K. A. Mohammadzadeh, and S. M. Hosseini, “A review study of the providing elderly care services in different countries,” Journal of Family Medicine and Primary Care, vol. 11, no. 2, p. 458, 2022.
  • [2] M. A. R. Ahad, S. Inoue, D. Roggen, and K. Fujinami, Activity and Behavior Computing.   Springer, 2021.
  • [3] D. Bouchabou, S. M. Nguyen, C. Lohr, B. LeDuc, and I. Kanellos, “A survey of human activity recognition in smart homes based on iot sensors algorithms: Taxonomies, challenges, and opportunities with deep learning,” Sensors, vol. 21, no. 18, p. 6037, 2021.
  • [4] D. Bouchabou, C. Lohr, I. Kanellos, and S. M. Nguyen, “Har in smart homes,” arXiv preprint arXiv:2112.11232, 2021.
  • [5] S. K. Yadav, K. Tiwari, H. M. Pandey, and S. A. Akbar, “A review of multimodal human activity reognition with special emphasis on classification, applications, challenges and future directions,” Knowledge-Based Systems, vol. 223, p. 106970, 2021.
  • [6] A. Benmansour, A. Bouchachia, and M. Feham, “Human activity recognition in pervasive single resident smart homes: State of art,” in 2015 12th International Symposium on Programming and Systems (ISPS).   IEEE, 2015, pp. 1–9.
  • [7] D. Liciotti, M. Bernardini, L. Romeo, and E. Frontoni, “A sequential deep learning application for recognising human activities in smart homes,” Neurocomputing, vol. 396, pp. 501–513, 2020.
  • [8] D. Bouchabou, S. M. Nguyen, C. Lohr, B. Leduc, and I. Kanellos, “Fully convolutional network bootstrapped by word encoding and embedding for activity recognition in smart homes,” in Deep Learning for Human Activity Recognition: Second International Workshop, DL-HAR 2020, Held in Conjunction with IJCAI-PRICAI 2020, Kyoto, Japan, January 8, 2021, Proceedings 2.   Springer, 2021, pp. 111–125.
  • [9] D. Bouchabou, S. M. Nguyen, C. Lohr, B. LeDuc, and I. Kanellos, “Using language model to bootstrap human activity recognition ambient sensors based in smart homes,” Electronics, vol. 10, no. 20, p. 2498, 2021.
  • [10] A. Ghods and D. J. Cook, “Activity2vec: Learning adl embeddings from sensor data with a sequence-to-sequence model,” arXiv preprint arXiv:1907.05597, 2019.
  • [11] S. Aminikhanghahi and D. J. Cook, “Enhancing activity recognition using cpd-based activity segmentation,” Pervasive and Mobile Computing, vol. 53, pp. 75–89, 2019.
  • [12] S. Knox, L. Coyle, and S. Dobson, “Using ontologies in case-based activity recognition.” in FLAIRS Conference.   Citeseer, 2010, pp. 1–6.
  • [13] S. K. Hiremath, Y. Nishimura, S. Chernova, and T. Plötz, “Bootstrapping human activity recognition systems for smart homes from scratch,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, no. 3, pp. 1–27, 2022.
  • [14] H. Chen, C. Gouin-Vallerand, K. Bouchard, S. Gaboury, M. Couture, N. Bier, and S. Giroux, “Leveraging self-supervised learning for human activity recognition with ambient sensors,” in Proceedings of the 2023 ACM Conference on Information Technology for Social Good, 2023, pp. 324–332.
  • [15] D. J. Cook, A. S. Crandall, B. L. Thomas, and N. C. Krishnan, “Casas: A smart home in a box,” Computer, vol. 46, no. 7, pp. 62–69, 2012.
  • [16] A. Bulling, U. Blanke, and B. Schiele, “A tutorial on human activity recognition using body-worn inertial sensors,” ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1–33, 2014.
  • [17] T. Van Kasteren, G. Englebienne, and B. J. Kröse, “Activity recognition using semi-markov models on real world smart home datasets,” Journal of ambient intelligence and smart environments, vol. 2, no. 3, pp. 311–325, 2010.
  • [18] K. Wongpatikaseree, M. Ikeda, M. Buranarach, T. Supnithi, A. O. Lim, and Y. Tan, “Activity recognition using context-aware infrastructure ontology in smart home domain,” in 2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems.   IEEE, 2012, pp. 50–57.
  • [19] J. Rafferty, C. D. Nugent, J. Liu, and L. Chen, “From activity recognition to intention recognition for assisted living within smart homes,” IEEE Transactions on Human-Machine Systems, vol. 47, no. 3, pp. 368–379, 2017.
  • [20] Z. Hussain, M. Sheng, and W. E. Zhang, “Different approaches for human activity recognition: A survey,” arXiv preprint arXiv:1906.05074, 2019.
  • [21] L. Chen, J. Hoey, C. D. Nugent, D. J. Cook, and Z. Yu, “Sensor-based activity recognition,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, pp. 790–808, 2012.
  • [22] M. SEDKY, C. HOWARD, T. Alshammari, and N. Alshammari, “Evaluating machine learning techniques for activity classification in smart home environments,” International Journal of Information Systems and Computer Sciences, vol. 12, no. 2, pp. 48–54, 2018.
  • [23] D. J. Cook, “Learning setting-generalized activity models for smart spaces,” IEEE intelligent systems, vol. 2010, no. 99, p. 1, 2010.
  • [24] S. Aminikhanghahi, T. Wang, and D. J. Cook, “Real-time change point detection with application to smart home time series data,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 5, pp. 1010–1023, 2018.
  • [25] G. Sprint, D. J. Cook, and R. Fritz, “Behavioral differences between subject groups identified using smart homes and change point detection,” IEEE journal of biomedical and health informatics, vol. 25, no. 2, pp. 559–567, 2020.
  • [26] A. C. Jose and R. Malekian, “Improving smart home security: Integrating logical sensing into smart home,” IEEE Sensors Journal, vol. 17, no. 13, pp. 4269–4286, 2017.
  • [27] S. Aminikhanghahi and D. J. Cook, “Using change point detection to automate daily activity segmentation,” in 2017 IEEE international conference on pervasive computing and communications workshops (PerCom workshops).   IEEE, 2017, pp. 262–267.
  • [28] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI open, vol. 1, pp. 57–81, 2020.
  • [29] L. Li, Z. Gan, Y. Cheng, and J. Liu, “Relation-aware graph attention network for visual question answering,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 10 313–10 322.
  • [30] H. Chen, C. Gouin-Vallerand, K. Bouchard, S. Gaboury, M. Couture, N. Bier, and S. Giroux, “Enhancing human activity recognition in smart homes with self-supervised learning and self-attention,” Sensors, vol. 24, no. 3, p. 884, 2024.
  • [31] M. Gochoo, T.-H. Tan, S.-H. Liu, F.-R. Jean, F. S. Alnajjar, and S.-C. Huang, “Unobtrusive activity recognition of elderly people living alone using anonymous binary sensors and dcnn,” IEEE journal of biomedical and health informatics, vol. 23, no. 2, pp. 693–702, 2018.
  • [32] G. Mohmed, A. Lotfi, and A. Pourabdollah, “Employing a deep convolutional neural network for human activity recognition based on binary ambient sensor data,” in Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, 2020, pp. 1–7.
  • [33] D. Singh, E. Merdivan, S. Hanke, J. Kropf, M. Geist, and A. Holzinger, “Convolutional and recurrent neural networks for activity recognition in smart environment,” in Towards Integrative Machine Learning and Knowledge Extraction: BIRS Workshop, Banff, AB, Canada, July 24-26, 2015, Revised Selected Papers.   Springer, 2017, pp. 194–205.
  • [34] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning.   PMLR, 2020, pp. 1597–1607.
  • [35] X. Yang, Z. Zhang, and R. Cui, “Timeclr: A self-supervised contrastive learning framework for univariate time series representation,” Knowledge-Based Systems, vol. 245, p. 108606, 2022.
  • [36] K. Shah, D. Spathis, C. I. Tang, and C. Mascolo, “Evaluating contrastive learning on wearable timeseries for downstream clinical outcomes,” arXiv preprint arXiv:2111.07089, 2021.
  • [37] M. N. Mohsenvand, M. R. Izadi, and P. Maes, “Contrastive representation learning for electroencephalogram classification,” in Machine Learning for Health.   PMLR, 2020, pp. 238–253.
  • [38] H. Haresamudram, I. Essa, and T. Plötz, “Assessing the state of self-supervised human activity recognition using wearables,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 6, no. 3, sep 2022. [Online]. Available: https://doi.org/10.1145/3550299
  • [39] S. K. Hiremath and T. Plötz, “Deriving effective human activity recognition systems through objective task complexity assessment,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4, no. 4, pp. 1–24, 2020.
  • [40] B. Settles, “Active learning literature survey,” 2009.
  • [41] H. S. Hossain, M. A. A. H. Khan, and N. Roy, “Active learning enabled activity recognition,” Pervasive and Mobile Computing, vol. 38, pp. 312–330, 2017.
  • [42] R. Adaimi and E. Thomaz, “Leveraging active learning and conditional mutual information to minimize data annotation in human activity recognition,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 3, no. 3, pp. 1–23, 2019.
  • [43] S. Jones, L. Shao, and K. Du, “Active learning for human action retrieval using query pool selection,” Neurocomputing, vol. 124, pp. 89–96, 2014.
  • [44] T. Miu, T. Plötz, P. Missier, and D. Roggen, “On strategies for budget-based online annotation in human activity recognition,” in Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014, pp. 767–776.
  • [45] M. Ciliberto, L. P. Cuspinera, and D. Roggen, “Wlcsslearn: learning algorithm for template matching-based gesture recognition systems,” in 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR).   IEEE, 2019, pp. 91–96.
  • [46] C. I. Tang, I. Perez-Pozuelo, D. Spathis, and C. Mascolo, “Exploring contrastive learning in human activity recognition for healthcare,” arXiv preprint arXiv:2011.11542, 2020.
  • [47] Z. Jiang, T. Chen, T. Chen, and Z. Wang, “Robust pre-training by adversarial contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 16 199–16 210, 2020.
  • [48] P. Gupta, R. McClatchey, and P. Caleb-Solly, “Tracking changes in user activity from unlabelled smart home sensor data using unsupervised learning methods,” Neural Computing and Applications, vol. 32, pp. 12 351–12 362, 2020.
  • [49] R. E. Schapire, “A brief introduction to boosting,” in Ijcai, vol. 99.   Citeseer, 1999, pp. 1401–1406.
  • [50] ——, “The boosting approach to machine learning: An overview,” Nonlinear estimation and classification, pp. 149–171, 2003.
  • [51] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999.
  • [52] R. E. Schapire and Y. Freund, “Boosting: Foundations and algorithms,” Kybernetes, vol. 42, no. 1, pp. 164–166, 2013.
  • [53] T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings 1.   Springer, 2000, pp. 1–15.
  • [54] A. Niculescu-Mizil and R. Caruana, “Predicting good probabilities with supervised learning,” in Proceedings of the 22nd international conference on Machine learning, 2005, pp. 625–632.
  • [55] B. Smith, “Ontology,” in The furniture of the world.   Brill, 2012, pp. 47–68.
  • [56] J. Kim, J. Kim, D. Lee, and K.-Y. Chung, “Ontology driven interactive healthcare with wearable sensors,” Multimedia Tools and Applications, vol. 71, pp. 827–841, 2014.
  • [57] C. Villalonga, H. Pomares, I. Rojas, and O. Banos, “Mimu-wear: Ontology-based sensor selection for real-world wearable activity recognition,” Neurocomputing, vol. 250, pp. 76–100, 2017.
  • [58] D.-O. Kang, H.-J. Lee, E.-J. Ko, K. Kang, and J. Lee, “A wearable context aware system for ubiquitous healthcare,” in 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.   IEEE, 2006, pp. 5192–5195.
  • [59] N. Banovic, T. Buzali, F. Chevalier, J. Mankoff, and A. K. Dey, “Modeling and understanding human routine behavior,” in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 248–260.
  • [60] C. Zhu, W. Sheng, and M. Liu, “Wearable sensor-based behavioral anomaly detection in smart assisted living systems,” IEEE Transactions on automation science and engineering, vol. 12, no. 4, pp. 1225–1234, 2015.
  • [61] K. Mandarić, P. Skočir, M. Vuković, and G. Ježić, “Anomaly detection based on fixed and wearable sensors in assisted living environments,” in 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM).   IEEE, 2019, pp. 1–6.
  • [62] F. J. Parada Otte, B. Rosales Saurer, and W. Stork, “Unsupervised learning in ambient assisted living for pattern and anomaly detection: a survey,” in Evolving Ambient Intelligence: AmI 2013 Workshops, Dublin, Ireland, December 3-5, 2013. Revised Selected Papers 4.   Springer, 2013, pp. 44–53.
  • [63] O. Aran, D. Sanchez-Cortes, M.-T. Do, and D. Gatica-Perez, “Anomaly detection in elderly daily behavior in ambient sensing environments,” in Human Behavior Understanding: 7th International Workshop, HBU 2016, Amsterdam, The Netherlands, October 16, 2016, Proceedings 7.   Springer, 2016, pp. 51–67.
  • [64] U. Bakar, H. Ghayvat, S. Hasanm, and S. C. Mukhopadhyay, “Activity and anomaly detection in smart home: A survey,” Next Generation Sensors and Systems, pp. 191–220, 2016.
  • [65] E. Hoque, R. F. Dickerson, S. M. Preum, M. Hanson, A. Barth, and J. A. Stankovic, “Holmes: A comprehensive anomaly detection system for daily in-home activities,” in 2015 International Conference on Distributed Computing in Sensor Systems.   IEEE, 2015, pp. 40–51.
  • [66] S. W. Yahaya, A. Lotfi, and M. Mahmud, “A consensus novelty detection ensemble approach for anomaly detection in activities of daily living,” Applied Soft Computing, vol. 83, p. 105613, 2019.
  • [67] T. Yoshida, K. Kano, K. Higashiura, K. Yamaguchi, K. Takigami, K. Urano, S. Aoki, T. Yonezawa, and N. Kawaguchi, “A data-driven approach for online pre-impact fall detection with wearable devices,” in Sensor-and Video-Based Activity and Behavior Computing: Proceedings of 3rd International Conference on Activity and Behavior Computing (ABC 2021).   Springer, 2022, pp. 133–147.