Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Received March 13, 2021, accepted March 30, 2021, date of publication April 15, 2021, date of current version April 30, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3073469 Overview of Lifelogging: Current Challenges and Advances AMEL KSIBI1 , ALA SALEH D. ALLUHAIDAN 1 , (Member, IEEE), AMINA SALHI1 , AND SAHAR A. EL-RAHMAN 2,3 , (Senior Member, IEEE) 1 Information Systems Department, College of Computer and Information Science, Princess Nourah Bint Abdulrahman University, Riyadh 11564, Saudi Arabia Sciences Department, College of Computer and Information Science, Princess Nourah Bint Abdulrahman University, Riyadh 11564, Saudi Arabia 3 Electrical Engineering Department, Faculty of Engineering-Shoubra, Benha University, Cairo 13511, Egypt 2 Computer Corresponding author: Ala Saleh D. Alluhaidan (asalluhaidan@pnu.edu.sa) This work was supported by the Deanship of Scientific Research at Princess Nourah Bint Abdulrahman University through the Fast-Track Research Funding Program to support publication in the top journal under Grant 42-FTTJ-xx. ABSTRACT Lifelogging is the process of digital tracking of person’s daily experiences for a variety of purposes. In recent years, lifelogging has become an increasingly popular area of research due to not only the growing demands from many applications, such as wellbeing, entertainment, healthcare systems, and intelligent environments, but also to the advances in device technologies that offer the promise to record and store large volumes of personal data using inexpensive tools. However, getting insights from egocentric experience using huge deluge of unlabeled and unstructured data continues to pose major challenges. A large number of research have been conducted in recent years to cover these challenges but there is still a lack of studies that provide a comprehensive survey of the available literature, while most of the existing lifelogging surveys generally focus on only one aspect. This review highlights the advances of state-of-the-art in lifelogging from different angles, including its research history, current applications, activity recognition techniques, moment retrieval, storytelling, privacy and security issues, as well as challenges and future research trends. INDEX TERMS Activity recognition, challenges, lifelogging, moment retrieval, security, storytelling, trends, privacy. I. INTRODUCTION The topic of lifelogging has been around for a long time as it has started as records of events in diary published on social media like Facebook, Twitter, and Instagram. With the rapid increase in wearable technologies, over recent years, and advancement in storage, cloud services, sensing technology, and location awareness, recording events became easier, faster and more efficient. Lifelogging, generally, is defined as recording personal life including daily experiences via wearable sensors such as accelerometers, belts, FitBit, cameras, and others [1]. For example, there is an increased number of people who track their physical activity using wearable sensors to analyze their performance in variety of tasks. Lifelogging became an essential service for our wellbeing. This was a result of The associate editor coordinating the review of this manuscript and approving it for publication was Senthil Kumar 62630 . convergence of technologies to enhance our lifestyle. For example, people are used to putting their wearable cameras in personal spaces (such as home, cars, garden,...) to record unforseen events such as accidents or assaults, or to monitor safety-critical situations for healthcare assistance. Recent developments in pervasive computing research have repositioned SenseCam as an aid for human memory and monitoring. The small devices such as wearable cameras automatically and passively record daily activities. Data recorded at specific moments using video cameras provide deeper overview of daily activities. Moreover, data acquired over long periods of time are potential for getting insights from behavior patterns. Indeed, Lifelogging analysis could be useful to protect against diseases linked to unhealthy lifestyles such as obesity, depression,. . . etc. Additionally, this type of analysis could also helpful to inhibit cognitive and functional drop in elderly people. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 9, 2021 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances FIGURE 1. Aspects of lifelogging. Lifelogging stories are captured automatically using egocentric photo streams or video cameras. Some hybrid approaches use both photo-graphic and video cameras [2]. which yield to a massive amount of unlabeled collections that require specific tools to understand the semantics of these images and videos. Also, dealing with free motion of cameras and the changes of lighting conditions along with image content make analysis techniques more challenging. As such, object and activity recognition algorithms should also be able to process huge number of images and videos with variety of objects. Additionally, computational resources needed to handle this issue should be reliable and consistent [2]. VOLUME 9, 2021 Another important dimension of lifelogging is ethics and security while managing data, specifically, for health-related application using wearable cameras [3], [4]. This review aims to provide an inclusive coverage of lifelogging, starting with presenting the diverse domains of lifelogging applications, then, detailing the importance of activity recognition, following the challenges of moment retriveal and storytelling and concluding with highlights of lifelogging security and privacy issues. Thus far, most of the lifelogging research has focused mainly on one aspect [5], [6] while we provide different views to the topic which is the contribution of this research (Figure1). 62631 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances Thus, this paper will contribute as a comprehensive reference to help researchers understanding the lifelogging aspects and to provide them possible future directions. II. LIFELOGGING APPLICATIONS, CHALLENGES AND TRENDS Lifeloggging has been utilized and incorporated in many fields. Heathcare, wellbeing, and quality of life. For instance, lifelogging has been used to monitor behavior change for weight management in obese patients. The study [1] presents a mobile app for obese adults to monitor behavior focusing on (technical effectiveness, user efficiency, and user satisfaction). ‘‘Participants were asked to complete eight tasks for evaluating the technical effectiveness of the app.’’ [1] Timing was used to evaluate user efficiency. For user satisfaction, participants were asked to complete the System Usability Scale (SUS); sample size of 50 adults (14 men and 36 women, aged 20-59 years) was targeted. The app collects behavioral information through a questionnaire that includes questions about diet, sleep time, and stress. Results show that the app has a satisfactory technical effectiveness, user efficiency, and user satisfaction with further needed clinical efficacy evaluation. A theoretical framework presented by hypothesis to explain the value of SenseCam for memory retrieval in [2]. A SenseCam, a wearable camera, is used for image capturing to reinstates thoughts, feelings, and sensory information. In memory impairment, Dubourg et al., [2] propose that the environmental support can be employed via SenseCam for memory retrieval to further improve episodic information retrieval. Japan is imposing a new approach for monitoring patients using lifelogs in order to improve the health care, especially for elderly. A lifelog records person’s activity and can be used to predict a lifestyle-related disease. This prediction could be helpful for the healthcare of the elderly. Additionally, building a self-recording platform integrated with the medical platform is a convenient way to have all data in one place. Such a system can be used to send personalized health advices [7]. A framework specialized for dementia care based on lifelogging monitoring with activity recognition from egocentric vision and semantic context-enrichment is presented in [8]. Within this framework, multimodal egocentric data are collected from wearable bracelet and the accelerometer to give more accurate description of patient’s health state. Specifically, mechanical variables that include fine motion, such as jerk, enhances the recognition accuracy of activities. Furthermore, for building interoperable activity graphs using Semantic Web technologies, Meditskos et al., [8] present a framework for semantic activity representation and interpretation. Results show that the proposed system was successful and efficiently personalized with specific activity models. By using lifelogging applications, not only healthcare givers were able to support interventions, but also end-users felt more safe and confident [8]. 62632 In an attempt to evaluate a wearable lifelogging camera in a sample of older adults diagnosed with mild cognitive impairment (MCI), collective data such as a self-report questionnaire, images taken by users, and chains of focus group discussions were gathered. Results show good acceptance and usage of the camera along with adequate number of images taken daily. Factors measured were perceived severity and ease of use. Privacy concerns are overlooked, focusing on potential benefits for memory [9]. Lifelogging has been used to raise awareness regarding quality of life with wearable trackers, smartphone sensors, and manual entry to collect indicators. A general infrastructure for collecting and processing life-logs, and how the quality of life indicators are calculated with GUI of life Meter are described in [6]. Findings indicate good usability of such application pointing to how it supports users in raising their awareness of monitoring quality of life [5]. Similar to SenseCam, an attempt for lifelogging was used for monitoring of dietary intake. Of course, entry here depends on manual logging. As an example, the DietSense project uses a mobile phone with a camera to automatically collect pictures of the wearer’s day. The images are used as log of the wearer’s mealtimes and are further used to analyze the diet intake for obtaining feedback and improving diet choices [6]. Also, Meditskos et al., [8] used foodlogs in order to assess food balance for personal dietary monitoring and then Aizawa [10] suggests making societal donations through multimedia food log. Some of the lifelogging applications require data from multiple sources to give more insight and meaningful result. Therefore, fusing multiple sensors data is needed and it imposes a challenging task as it requires data cleaning, alignment, and temporal normalization [6]. Additionally, manual entry of some data in different applications can increase the complexity of data processing as well as application’s interface of the app. More challenges are discussed later on in sections IV. Lifelog Moment Retrieval and V. Storytelling. Lifelogging applications were extended to smoking cessation [11], since it is difficult to automatically sense smoking without wearable sensor or manual entry. Similarly, sleep monitoring is used for collecting time and quality of sleep. In previous examples, lifelogging were used for health promotion and to gain extra awareness of life activities. Other applications for lifelogging include: self-reporting errors in travel behavior, physical activities, sedentary behaviors, and forgotten calories [6]. Table 1 summarizes the different domains where lifelogging applications are implemented. With data collected using lifelogging applications, methods for extracting, processing, and summarizing need to be innovative. Next section is a review of current algorithms used to tackle each dimension of lifelogging. III. ACTIVITY RECOGNITION In recent years, sensors and hardware technology has reached the point where it became possible to record and store unlimited range of information that is generated in a lifetime by VOLUME 9, 2021 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances TABLE 1. Summary of conditions and targeted domains by lifelogging applications. a single person [13]. However, the possibilities of analyzing them in an automated way are rather limited, and until now, computer vision systems are far from being comparable to human vision. Automatic human behavior analyzing and understanding remains a complex subject, and it has been for a long time one of the main goals of artificial intelligence practitioners [14]. Human Activity recognition can be defined as the ability to detect human gesture or motion based on data received from different sensors, and then interpret it into well-defined activity or action. Sensors can be cameras, wearable sensors, external sensors deployed in the environment, among others [15]. Acquired data provide considerable potential for knowledge mining concerning people performance during their lives, hence, they open up new opportunities for many potential applications in various fields including healthcare, remote monitoring, ambient intelligence, smart homes, security and surveillance, and human-computer interaction [16], [17]. The existing literature in human activity recognition could be divided into three main categories: vision-based, sensor-based, and hybrid system-based approaches [18]. Table 2 shows which type of data, imitation, and algorithm of each approach. A. VISION BASED APPROACHES Vision-based approaches consist of using camera to capture data, hence providing rich contextual and environmental information about the performed behavior [19].Due to its ease of use, computer vision techniques has been widely applied, to recognize the different activities from captured data with satisfactory results. However, it still a challenging task due to numerous problems, such as privacy, light dependency, occlusion, background clutter, and camera motion. In [20] VOLUME 9, 2021 authors presented a detailed survey on existing methods and their abilities for handling with the above-mentioned challenges. A typical activity recognition task from video stream or still images generally involves two major steps [21], action representation and action classification. The goal of action representation is to convert an action into a series of feature vectors; these features should be representative distinctive and invariant to improve the recognition performance [22]. Then, in action classification, the activity category will be inferred based upon the input feature vector [23]. With the emergence of deep learning, the above-mentioned two steps are merged into a unified end-to-end trainable framework, where action features can automatically learned from acquired data [24]. Action representation methods are summarized into global representation, local representation, and recent depth-based representation. The earliest studies attempted to extract global descriptors from acquired videos or images and encode them as a whole features. Bobick and Davis [25] presented Motion Energy Image (MEI) and Motion History Image (MHI) framework to encode dynamic human motion into a single image. However, these methods are sensitive to viewpoint changes. Weinland et al. [26] propounded the 3D motion history volume (MHV) to overcome the viewpoint dependency. In [27] Kumari and Mitra proposed a discrete Fourier transforms (DFT) based approach to obtain information about the geometric structure in the spatial domain. Furthermore, in [28] Tasweer et al. used a blocked discrete cosine transform (DCT) from motion history image (HMI) to extract global features. Different from global representations, Local representations only focus on specific local regions having salient 62633 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances TABLE 2. Activity recognition comparison. motion information which are determined by interest point detectors, and thus inherently overcome the problem in global approaches. Furthermore, extracted features are more stable, and more robust to the corresponding transformations, occlusion and appearance variation [29]. The majority of local features extraction and representation methods was based on space-time interest points [30], and motion trajectory [31]. Recently, the emergence of depth cameras open the possibility to take advantage of depth maps that contain additional depth coordinates comparing to conventional RGB images and are able to capture color images sequences together with depth maps in real time [32]. Moreover, depth images are more robust to factors such as illumination, cluttered backgrounds, and occlusions [33]. To this end, various depth representation have been explored. For example, Jalal et al. [34] attempted to fuse spatiotemporal features in 62634 RGB data with depth data. Farooq et al. [35] proposed to construct depth motion maps (DMM) and to add the motion energy for each view, and then they calculate the body part of the action (BPoA) by bounding box with an optimal window size for each DMM to get the action recognition. Kamel et al. [36] proposed to use depth maps and posture data with convolutional neural networks for human action recognition. In the next stage, the classification algorithm will determine the action label or the action category. Different classifiers have been explored on the extracted features for activity recognition, such as support vector machine (SVM) [37], Markov model (HMM) [38] and dynamic Bayesian network [39], decision trees or random forest [40]. In recent years, the deep learning based human action recognition methods have become the major research direction and gradually replaced the traditional approaches [41]. VOLUME 9, 2021 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances B. SENSOR BASED APPROACHES This approach is based on the use of various types of emerging sensor, such as accelerometer, to determine human behavior. Sensors can be wearable, attached directly or indirectly the actor body, or can be dense sensors, embedded to objects that constitute the environment [42]. The generated data can be regarded as a continuous time series of motion changes represented as parameter values, and various features will be then extracted from these data using statistical or structural approach. Finally, those features serve as inputs to a machine learning algorithm to recognize human’s ongoing activity [43]. Some of the most common machine learning algorithms, which are used in human activity recognition: support vector machine (SVM) [44], Long Short-Term Memory Network (LSTM) [45], Random Forest [46], and Convolutional Neural Networks (CNNs) [47]. Recently, the advancement of deep learning makes it possible to perform automatic extraction of high-level and more meaningful feature, which is more suitable for complex activity recognition. Furthermore, when faced with a large amount of unlabeled data, the deep generative network structure is more suitable for unsupervised incremental learning. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition [48]. C. HYBRID SYSTEM-BASED APPROACHES In real applications, no single sensor can handle all possible activities, and usually different sensors are combined to improve HAR systems and overcome the problems of using a single sensor [49]. Many works have shown an accuracy improvement using multiple sensors in recent years. For instance, Zdravevski et al. [50] proposed an enhanced and real-time multimodal sensor-based activity recognition system, it was based on the fusion of vision based sensors and inertial sensors using machine learning for health monitoring. Ozcan and Velipasalar [51] proposed to combine features extracted from photo stream acquired by a wearable camera with data acquired from an accelerometer to perform fall detection for elderly persons. [52], video and IMU data captured synchronously by Google Glass were used to recognize wearer activities. The fused data resulted in an average accuracy higher than the individual accuracies of video and sensor data respectively. Further details on human activity recognition using various sensor fusion are reported in [53], [54]. IV. LIFELOG MOMENT RETRIEVAL Lifelogs represent rich repositories of individual’s daily experiences. These sources of information requires proper tools for retrieving specific life moments. Hence, there is a compelling need for appropriate retrieval systems that accurately remind a lifelogger about past moments. Rigorous comparative benchmarking tasks have been dealing with this issue such as Lifelog Semantic Access Task (LSAT) at NTCIR-14c [55], lifelog moment VOLUME 9, 2021 retrieval (LMRT) at the ImageCLEFlifelog 2019 [56] and the Lifelog Search Challenge (LSC) at ACM ICMR2019 [57], as shown in the table below. The task throughout these three competitions is similar. Given a topic of users’ daily activity or event (e.g. Find the moment when a user was taking a train from the city to home) as a natural language query, the system should retrieve the most relevant and informative images of the moments from users lifelogs. Researchers consider this task as a tedious task due to different issues. The first issue is regarding the semantic gap. There is no direct connection between lifelog images and query topics representing events/activities. This issue has a direct impact on the relevance of the search results [58]. The second issue concerns the quality of images. Since photos are taken whilst on the move, there is a problem of blurriness. In fact, the blurred images will not provide enough information; yet reduce the efficiency of search performance due to the wasted computation time [59], [60]. The third issue is about images redundancy. Since the lifelogger may be in stationary situations during the day, duplicates photos tend to exist within the lifelogs [59]. The retrieval of such images is time consuming without any benefit. According to the above-mentioned issues, relevance and diversity represent the major retrieval criteria to satisfy. After an exhaustive search of the literature, three main areas can be distinguished in order to improve the relevance of retrieval: data augmentation based on pre-trained models, filtering blurred images and natural language processing for query topics understanding. The advances of Deep learning in scalable Image Annotation lead to providing pre-trained models that extract effectively the visual concepts within different aspects such as attributes, objects, location, places. Therefore, most of research works tend to apply deep transfer learning [61] which leverages the outputs of different pre-trained models (VGG-19, Retinanet, InceptionV3, ResNet50, Faster RCNN [60], PlaceCNN [62], YOLOv3 [63]) over external resources such as ImageNet [64], Place365 [65], MS-COCO [66], Open Images [67], SUN [68] in order to enhance the initial concept annotation of lifelog images provided by the official competitions. Instead of collecting the outputs of the pre-trained models, other research works opt for fine-tuning method by deriving dense feature vectors from the last layer of the pre-trained models to be re-used to retrain their models [69]–[73]. According to these works, fine-tuning method outperforms the use of pre-trained models. For the noise of lifelog images, researchers focus on eliminating the blurred and uninformative images from the dataset. Min-Huan Fu et al. Proposed to apply lens calibration followed by blurriness and color homogeneity detection for pre-processing [58]. Soumyadeb et al. estimated Blur score using a Haar wavelet transform. Images below the threshold are pruned [59]. The UPB team [60] deals with the uninformativeness and blurriness. First, they ran a blur detection system that computes the variance of the Laplacian kernel for 62635 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances TABLE 3. Summary of benchmarking tasks. each image in order to capture both motion blur and large homogeneous areas. Then, they applied restriction rules on meta-data in order to remove the uninformative images. The ZJUTCVR team [74] applied Laplacian filters to determine the blur as the variance of convolution result, and they calculated the proportion of subjects in each image in order to detect occluded images. A major challenging task within lifelog retrieval is query topics understanding. A common approach is to apply natural language processing tools in order to tackle the complexity of queries. Most of reviewed works customized their solutions using this approach in different ways. Abdallah et al. [69] proposed an automatic retrieval system based on Long-TermShort-Memory(LSTM) for query processing. They built labelled textual descriptions of queries moments. They converted these words into numeric vectors by training a word embedding. After that, they created and trained an LSTM network based on the sequences of word vectors in order to extract the relevant concepts representing each topic. The retrieval phase entails matching the extracted query concepts using LSTM with the file containing the image concepts. The UAPT team [63] proposed an automatic retrieval process that extracts relevant words from topics titles and narratives and matches them with the lifelog images using a word-embedding model trained on Google News dataset. The TUC MI team [75] processed automatically the query with Natural Language Processing techniques by introducing the concept of token vector, which has the same dimension as image/segment vector, and defining a formula to compare the similarity between image/segment and token vector. In order to reduce the gap between the query topic and the visual concepts of images, Suzuki and Ikeda [76] applied the vector representation of words by training the word embedding with skip-gram model. Subsequently, the similarity is calculated between two bags-of-words for the query topic and visual concepts V representing the images. Since the image is worth more than one thousand words, they, also, proposed 62636 to transform the topic query into a set of images and train a topic classifier using the convolutional neural network over a collection of web pictures representing query topics. This classifier determines a topic score, which is the relevance of an image B for the query topic. The global similarity between query topic Q and image B is calculated by the sum of cosine similarity in the embedded space and the topic score. In order to cope with the weaknesses of the automatic retrieval process, user involvement process has been integrated aiming to enhance the quality of results through feedback mechanism. With the advent of the Lifelog Search Challenge (LSC), a number of interactive retrieval systems have been designed to support interactive retrieval from lifelogs [57]. LIFER2.0 is used as baseline system for Lifelog Moment Retrieval (LMRT) task in ImageCLEF2019. It is an interactive retrieval system based on faceted filtering and context browsing for gaining insights via simple interactions. As described in [72], the user query is submitted, as facets, in a criteria-matching engine based on similarity ranking. The LEMORE [73] system designs an interactive semantic engine that retrieves images based on their tags (high-level concepts) and temporal information. The engine combines matrix numeric processing and database queries for object and temporal tags, respectively. Chang et al. [58] proposed an interactive system that focuses on query expansion by suggesting top k similar concept words from the dataset. Due to the restriction of search in the embedding space, finding the top-similar concepts is computationally feasible by comparing the semantic similarity between concept words and query terms. regarding the problem of results redundancy, Chang et al. [58] provide a filtering mechanism that removes similar images by calculating the nearest neighbors in the embedding space. In order to reduce the computational overhead, they built an offline KD-trees and applied clustering in the embedded space. Then, in the online phase, they applied the BM25 for document retrieval that measures the probabilistic relevance. VOLUME 9, 2021 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances V. STORYTELLING Storytelling is the art of telling stories. Currently, a new concept called Digital Storytelling is gaining popularity and aims to generate automatically digital stories using artificial intelligence. This concept is strongly related to visual Lifelogging, mainly with the spread of digital tools such as images, audio and video. In fact, Digital Storytelling targets insights into users through examining their Lifelogs and mining their daily activities and lifestyles. These insights enclose different areas such as healthcare, security, leisure, lifestyle and wellbeing. However, automatic building of digital stories from unstructured lifelogs poses a major challenge towards browsing and mining insights efficiently from a huge volume of unconstructed egocentric data. Egocentric summarization techniques play an important role in overcoming this challenge by generating a concise and precise meaningful representation of lifelogs. By doing so, summarization can be seen as a support for the visualization, indexing, and browsing of historical events, with the least possible semantic loss (informativeness criteria) and the least information redundancy(representative criteria). Therefore, summarization is considered as multi-objective problem and can be defined as generating an optimized lifelog representation that maximizes the extracted information and minimizes the redundancy of information. According to these objectives, we found two categories of summarization approaches: informative selection approach and representative selection approach. Recently, object-driven approaches become of great interest. Lee et al. [77] proposed a process that creates object-driven summaries for egocentric videos by selecting frames that reflect the key object-driven events. Indeed, they extracted region cues representing high-level of saliency in egocentric video, and then, they applied a regression method to predict the relative importance of any new region based on these cues. Guo et al. [78] proposed a method that focuses on extracting video shots which reflect high stable salience, discrimination and representativeness in order to generate compact storyboard summary [79]. Lu et al., inspired from a work about studying links of news articles over time, defined a random walk-based metric that captures event connectivity beyond simple object co-occurrence, to provide a better sense of story [80]. Sun et al. identified the salient people and actions from videos ’in the wild’ to depict a montage. With the advances in Deep learning, there has been a great interest towards solving the problem of egocentric video summarization using unsupervised deep learning. In such orientation, the unsupervised video summarization is considered as a key frame selection problem. Shruti et al. extracted deep features using CNN and then they applied clustering algorithms to extract interesting keyframes [81]. Abhimanyu el al. addresses the problem of summarizing egocentric videos by applying deep features extraction and an optimal clustering approach (CSMIK K-means) through a combination of Integer Knapsack (IK) and CSM [82]. While Behrooz et al. applied long short-term memory network (LSTM) to learn a VOLUME 9, 2021 deep summarizer network via a generative adversarial framework for optimizing the frame selector [83]. In [84], Zhen applyied key-frame selection based on multidimensional time series. They considered the sequential frame features as a composition of a set of one-dimensional time-series data, then they performed the CUSUM statistics to time series simultaneously for each dimension. following, several consecutive clips containing similar contents are obtained after segmentation using the calculated statistics.Finally, clustering process is performed for key frames selection in the obtained video clips. Notably, Egocentric summarization has become a hot topic in new challenging international competitions that aim to develop benchmarks for summarizing egocentric lifelogging data and contribute to the improvement of summarization quality. In 2017, ImageCLEF2017 proposed Lifelog summarization (LST) task which aims to analyze all the lifelog images and summarize them according to specific requirements. The summary should be represented by 50 relevant and diverse images. ImageCLEF2018 organized Activities of Daily Living understanding (ADLT) task, which aimsgoal is to analyze the lifelog data for a given period of time and provide a summarization based on concepts describing the Activities of Daily Living and the contexts in which these activities take place. In 2019, NTCIR’14 published an exploratory task, Lifelog Insight Task (LIT), and the objective of this task is to gain insights into the lifelogger’s daily life activities by providing an efficient/effective means of visualization of the data. VI. PRIVACY AND SECURITY CHALLENGES Lifelogging system is a framework for the daily recording of personal and sensitive data of individuals, and usually includes asset constrained smart objects (sensors) [3]. With that in mind, lifelogging is subject to some security threats, as each emergent technology is able to interfere with the life of each life-logger [3], [4]. There are many security risks of lifelogging in connecting smart world because the suitable security standards and protocols of smart objects are not yet mature. There are several security vulnerabilities related to smart objects, which resulted from the limited smart objects resources to support robust demonstrated security and cryptography techniques. Therefore, these techniques must be taken into consideration for overcoming the issues of privacy and trust, thereby preventing avoiding security risks especially with the fast development of lifelogging applications. Trustworthy and secure lifelogging involves security challenges, threats and risk of smart objects on the communication layer and on the users who login, sharing and exchange the private data using smart devices [4]. But sharing and exchange any type of personal information are threats to privacy, where infringements on privacy are considered as one of the key challenges of lifelogging [85]. In particular, lifelogging information needs trustworthy and powerful security considerations because it may contain very sensitive personal information such as communication 62637 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances FIGURE 2. General lifelogging architecture with security requirements. logs, location, biological information, etc. [4], [86]. Unfortunately lifelogging has a debatable history, like a DARPA’s lifelogging project, that was cancelled due to the criticism of privacy indications of the system in 2004 [85], [86]. Also, regarding the lifelogging environment, the related risks of information security actually have substantial connotations on economy, society, privacy, and people’s psychology [85], [87]. These security risks actually impact on individuals, industries, and governments [87]. The complexity of stockholders’ behavior can also be evaluated in the social areas by utilizing social networks, messaging services, qualified self, and sharing of lifelogging information. A lot of those trends are in incomplete contradiction with the need of privacy based on well formulated human rights. Some individuals’ information may be publicized where the governments consider it as confidential private data [88]. The number of security issues is encountered in the lifelogging trend that needs to be determined and solved to proceed into the future connected world. To highlight the risks and the benefits of lifelogging [89], European Network and Information Security Agency (ENISA) in European Union (EU), [87] presented a scenario to study the challenges, risks, threats, and benefits of lifelogging in daily life. The main risks reported and detailed in [87] based on the individuals, industry, service providers, government, EU Institutions and regulators. ENISA addressed the risks related to lifelogging environment based on the stockholders; also ENISA reported a number of recommendations per stakeholder which they are addressed to. Petroulakis et al., [89] detailed the lifelogging topics in smart environments, and described the security threats, interconnection issues, and suggested a lightweight framework to ensure privacy, security, trustworthy, and powerful lifelogging system based on the security attacks impact on energy consumption. The researchers applied several mitigation factors including AES-128 encryption, channel assignment, and power control for secure lifelogging system. 62638 Fragkiadakis et al., [3] developed a joint encryption approach and compression model based on the development in compressed sensing principles. Rawassizadeh and Tjoa [86] discussed the security related risks for lifelogging system and the risks of sharing lifelogging information in the social communities. The researchers presented a sharing model that can reduce the sharing ability of a lifelogging information object using the expiration time. Allen [90] specified two potential hazards of lifelogging that are pernicious surveillance and pernicious memory. Generally, a lifelogging process consists of three phases and each phase requires particular security considerations. 1st phase is sensing the data from the customer environment with sensors, 2nd phase is gathering the sensed data, and 3rd phase allows the information browsing and retrieval from the user’s lifelogging dataset. The general lifelogging architecture with Security Requirements as indicated in Figure2. Lifelogging architecture parts that require to be secured are highlighted. Users need to define the collected information object and able to set configuration parameters of the sensors for example sensing interval. In the first phase, the lifelogging architecture is connected to sensors and reads the data of the sensor. Where some sensors need to be secure as they might need authentication and the confidential information in the sensor data needs to be secure using data encryption techniques through the data transferring from the sensor to the lifelogging architecture. So, the modules of dynamic security are provided for the sensing phase due to adding or removing the sensors dynamically to lifelogging architecture, which achieves a more flexible and scalable lifelog architecture. In the second phase, the sensed data in the lifelog object is collected to generate a lifelogging dataset of lifelogging records. Security modules should be considered by the developers through the gathering stage of data. In the third phase, the lifelog data is stored in storage devices which must be secured [86]. The most common techniques for securing the VOLUME 9, 2021 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances data are cryptography and data encryption using different techniques such as RSA (Rivest Shamir Adleman), AES (Advanced Encryption Standard), ECC (Elliptic Curve Cryptography), and Data Encryption Standard (DES) [4]. VII. CONCLUSION AND FUTURE DIRECTION Lifelogging has the potential to enrich individuals’ life with right analysis. This review summarizes current applications of lifelogging, its’ trends and challenges, presents analysis, focused on the progress made so far in this context of computer algorithms, from a storytelling perspective. A separate section was dedicated to cover the current techniques in activity recognition. Also, we discuss available literature around the privacy and security regarding lifelogging applications. Referring to applications, there is more potential usage of lifelogging within healthcare and education to enhance current practices on both domains. There is also a need to develop more algorithms suited to data obtained through photo cameras, in particular, for social interaction detection and analysis, as well as for activity and context recognition. With visual storytelling, semantics has to be reserved to extract egocentric data and summarization using ontologies. A case study would be a great research breakthrough to address those gaps. A promising area of research, has not been explored yet, is rendering a visual to text translation in real-time using multi-modal techniques in order to provide a human-like description of incidents happened in an event with a richer context. The risks of sharing lifelogging data and security recommendations to reduce risks are introduced. Security and privacy challenges indicate the shortage of suitable security protocols and mechanisms in connecting the smart world. Therefore, the main recommendation for stakeholders’ providers in lifelogging domains is to follow a mechanism that comply with high privacy and security standards for protecting the lifelogging users. This paper also shed light on the problem of moment retrieval from visual lifelogs. Specifically, we examine main issues regarding the quality of retrieved results. Then, we review state of the art solutions for these challenges. Explicitly, we emphasize the importance of both dataset filtering and natural language processing with transfer learning in reducing the semantic gap and improving the relevance of results. The essential role of digital storytelling with different approaches to deal with egocentric video summarization problem is also presented. From there we cover the strengths of object driven clustering and deep feature selection. Yet, there is a need to improve aspects about natural language semantics. As next steps, in view of our analysis, we plan to improve the quality of the query interpretation using advanced NLP algorithms. REFERENCES [1] J. Y. Lee, J. Y. Kim, S. J. You, Y. S. Kim, H. Y. Koo, J. H. Kim, S. Kim, J. H. Park, J. S. Han, S. Kil, H. Kim, Y. S. Yang, and K. M. Lee, ‘‘Development and usability of a life-logging behavior monitoring application for obese patients,’’ J. Obesity Metabolic Syndrome, vol. 28, no. 3, pp. 194–202, Sep. 2019, doi: 10.7570/jomes.2019.28.3.194. VOLUME 9, 2021 [2] L. Dubourg, A. R. Silva, C. Fitamen, C. J. A. Moulin, and C. Souchay, ‘‘SenseCam: A new tool for memory rehabilitation?’’ Revue Neurologique, vol. 172, no. 12, pp. 735–747, Dec. 2016, doi: 10.1016/j.neurol.2016.03.009. [3] A. Fragkiadakis, I. Askoxylakis, and E. Tragos, ‘‘Secure and energyefficient life-logging in wireless pervasive environments,’’ in Human Aspects of Information Security, Privacy, and Trust, L. Marinos and I. Askoxylakis, Eds. Berlin, Germany: Springer, 2013, pp. 306–315. [4] N. E. Petroulakis, E. Z. Tragos, A. G. Fragkiadakis, and G. Spanoudakis, ‘‘A lightweight framework for secure life-logging in smart environments,’’ Inf. Secur. Tech. Rep., vol. 17, no. 3, pp. 58–70, Feb. 2013, doi: 10.1016/j.istr.2012.10.005. [5] F. Zini, M. Reinstadler, and F. Ricci, ‘‘Increasing quality of life awareness with life-logging,’’ in eHealth 360◦ (Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering), K. Giokas, L. Bokor, and F. Hopfgartner, Eds. Cham, Switzerland: Springer, 2017, pp. 282–293, doi: 10.1007/978-3-319-49655-9_36. [6] C. Gurrin, A. F. Smeaton, and A. R. Doherty, ‘‘LifeLogging: Personal big data,’’ Found. Trends Inf. Retr., vol. 8, no. 1, pp. 1–125, 2014, doi: 10.1561/1500000033. [7] K. Karako, Y. Chen, P. Song, and W. Tang, ‘‘Super-aged society: Constructing an integrated information platform of self-recording lifelogs and medical records to support health care in Japan,’’ BioSci. Trends, vol. 13, no. 3, pp. 276–278, Jun. 2019, doi: 10.5582/bst.2019.01124. [8] G. Meditskos, P.-M. Plans, T. G. Stavropoulos, J. Benois-Pineau, V. Buso, and I. Kompatsiaris, ‘‘Multi-modal activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia,’’ J. Vis. Commun. Image Represent., vol. 51, pp. 169–190, Feb. 2018, doi: 10.1016/j.jvcir.2018.01.009. [9] O. Gelonch, M. Ribera, N. Codern-Bové, S. Ramos, M. Quintana, G. Chico, N. Cerulla, P. Lafarga, P. Radeva, and M. Garolera, ‘‘Acceptability of a lifelogging wearable camera in older adults with mild cognitive impairment: A mixed-method study,’’ BMC Geriatrics, vol. 19, no. 1, pp. 1–10, Dec. 2019, doi: 10.1186/s12877-019-1132-0. [10] K. Aizawa, ‘‘Multimedia FoodLog: Diverse applications from selfmonitoring to social contributions,’’ ITE Trans. Media Technol. Appl., vol. 1, no. 3, pp. 214–219, 2013, doi: 10.3169/mta.1.214. [11] K. G. Stanley and N. D. Osgood, ‘‘The potential of sensor-based monitoring as a tool for health care, health promotion, and research,’’ Ann. Family Med., vol. 9, no. 4, pp. 296–298, Jul. 2011, doi: 10.1370/afm.1292. [12] K. Aizawa, Y. Maruyama, H. Li, and C. Morikawa, ‘‘Food balance estimation by using personal dietary tendencies in a multimedia food log,’’ IEEE Trans. Multimedia, vol. 15, no. 8, pp. 2176–2185, Dec. 2013, doi: 10.1109/TMM.2013.2271474. [13] Y. Kong and Y. Fu, ‘‘Human action recognition and prediction: A survey,’’ 2018, arXiv:1806.11230. [Online]. Available: http://arxiv. org/abs/1806.11230 [14] J. Yang, J. Lee, and J. Choi, ‘‘Activity recognition based on RFID object usage for smart mobile devices,’’ J. Comput. Sci. Technol., vol. 26, no. 2, pp. 239–246, 2011. [15] Y. Liang, X. Zhou, Z. Yu, and B. Guo, ‘‘Energy-efficient motion related activity recognition on mobile devices for pervasive healthcare,’’ Mobile Netw. Appl., vol. 19, no. 3, pp. 303–317, Jun. 2014. [16] C. F. Crispim-Junior, V. Buso, K. Avgerinakis, G. Meditskos, A. Briassouli, J. Benois-Pineau, I. Y. Kompatsiaris, and F. Bremond, ‘‘Semantic event fusion of different visual modality concepts for activity recognition,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1598–1611, Aug. 2016. [17] Z. Hussain, M. Sheng, and W. E. Zhang, ‘‘Different approaches for human activity recognition: A survey,’’ 2019, arXiv:1906.05074. [Online]. Available: http://arxiv.org/abs/1906.05074 [18] L. Chen, J. Hoey, C. D. Nugent, D. J. Cook, and Z. Yu, ‘‘Sensor-based activity recognition,’’ IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 42, no. 6, pp. 790–808, Nov. 2012. [19] T.-H.-C. Nguyen, J.-C. Nebel, and F. Flórez-Revuelta, ‘‘Recognition of activities of daily living with egocentric vision: A review,’’ Sensors, vol. 16, no. 1, p. 72, Jan. 2016, doi: 10.3390/s16010072. [20] M. Ramanathan, W.-Y. Yau, and E. K. Teoh, ‘‘Human action recognition with video data: Research and evaluation challenges,’’ IEEE Trans. Human-Mach. Syst., vol. 44, no. 5, pp. 650–663, Oct. 2014. [21] R. Poppe, ‘‘A survey on vision-based human action recognition,’’ Image Vis. Comput., vol. 28, no. 6, pp. 976–990, Jun. 2010. [22] Y. Kong, Z. Tao, and Y. Fu, ‘‘Deep sequential context networks for action prediction,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 3662–3670, doi: 10.1109/ CVPR.2017.390. 62639 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances [23] Q. Shi, L. Cheng, L. Wang, and A. Smola, ‘‘Human action segmentation and recognition using discriminative semi-Markov models,’’ Int. J. Comput. Vis., vol. 93, no. 1, pp. 22–32, May 2011. [24] C. Feichtenhofer, A. Pinz, and R. P. Wildes, ‘‘Spatiotemporal multiplier networks for video action recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4768–4777. [25] A. F. Bobick and J. W. Davis, ‘‘The recognition of human movement using temporal templates,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 3, pp. 257–267, Mar. 2001. [26] D. Weinland, R. Ronfard, and E. Boyer, ‘‘Free viewpoint action recognition using motion history volumes,’’ Comput. Vis. Image Understand., vol. 104, nos. 2–3, pp. 249–257, Nov. 2006. [27] S. Kumari and S. K. Mitra, ‘‘Human action recognition using DFT,’’ in Proc. 3rd Nat. Conf. Comput. Vis., Pattern Recognit., Image Process. Graph., Dec. 2011, pp. 239–242. [28] T. Ahmad, J. Rafique, H. Muazzam, and T. Rizvi, ‘‘Using discrete cosine transform based features for human action recognition,’’ J. Image Graph., vol. 3, no. 2, pp. 96–101, 2015. [29] X. Peng, L. Wang, X. Wang, and Y. Qiao, ‘‘Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice,’’ 2014, arXiv:1405.4506. [Online]. Available: http://arxiv. org/abs/1405.4506 [30] D. D. Dawn and S. H. Shaikh, ‘‘A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector,’’ Vis. Comput., vol. 32, no. 3, pp. 289–306, Mar. 2016. [31] H. Wang and C. Schmid, ‘‘Action recognition with improved trajectories,’’ in Proc. IEEE Int. Conf. Comput. Vis., Sydney, NSW, Australia, Dec. 2013, pp. 3551–3558, doi: 10.1109/ICCV.2013.441. [32] X. Yang and Y. Tian, ‘‘Super normal vector for activity recognition using depth sequences,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 804–811, doi: 10.1109/CVPR.2014.108. [33] M. Li, H. Leung, and H. P. H. Shum, ‘‘Human action recognition via skeletal and depth based feature fusion,’’ in Proc. 9th Int. Conf. Motion Games, Oct. 2016, pp. 123–132. [34] A. Jalal, S. Kamal, and D. Kim, ‘‘A depth video-based human detection and activity recognition using multi-features and embedded hidden Markov models for health care monitoring systems,’’ Int. J. Interact. Multimedia Artif. Intell., vol. 4, no. 4, p. 54, 2017. [35] A. Farooq, F. Farooq, and A. V. Le, ‘‘Human action recognition via depth maps body parts of action,’’ TIIS, vol. 12, no. 5, pp. 2327–2347, 2018. [36] A. Kamel, B. Sheng, P. Yang, P. Li, R. Shen, and D. D. Feng, ‘‘Deep convolutional neural networks for human action recognition using depth maps and postures,’’ IEEE Trans. Syst., Man, Cybern. Syst., vol. 49, no. 9, pp. 1806–1819, Sep. 2019. [37] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, ‘‘Learning realistic human actions from movies,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [38] P. Natarajan and R. Nevatia, ‘‘Online, real-time tracking and recognition of human actions,’’ in Proc. IEEE Workshop Motion Video Comput., Jan. 2008, pp. 1–8. [39] H.-I. Suk, B.-K. Sin, and S.-W. Lee, ‘‘Hand gesture recognition based on dynamic Bayesian network framework,’’ Pattern Recognit., vol. 43, no. 9, pp. 3059–3072, Sep. 2010. [40] L. Xu, W. Yang, Y. Cao, and Q. Li, ‘‘Human activity recognition based on random forests,’’ in Proc. 13th Int. Conf. Natural Comput., Fuzzy Syst. Knowl. Discovery (ICNC-FSKD), Jul. 2017, pp. 548–553. [41] J. C. Núñez, R. Cabido, J. J. Pantrigo, A. S. Montemayor, and J. F. Vélez, ‘‘Convolutional neural networks and long short-term memory for skeletonbased human activity and hand gesture recognition,’’ Pattern Recognit., vol. 76, pp. 80–94, Apr. 2018. [42] Y. Lu, Y. Wei, L. Liu, J. Zhong, L. Sun, and Y. Liu, ‘‘Towards unsupervised physical activity recognition using smartphone accelerometers,’’ Multimedia Tools Appl., vol. 76, no. 8, pp. 10701–10719, Apr. 2017. [43] A. Jordao, A. C. Nazare, Jr., J. Sena, and W. R. Schwartz, ‘‘Human activity recognition based on wearable sensor data: A standardization of the state-of-the-art,’’ 2018, arXiv:1806.05226. [Online]. Available: http://arxiv.org/abs/1806.05226 [44] I. A. Lawal, F. Poiesi, D. Anguita, and A. Cavallaro, ‘‘Support vector motion clustering,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 11, pp. 2395–2408, Nov. 2017. [45] O. S. Eyobu and D. Han, ‘‘Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network,’’ Sensors, vol. 18, no. 9, p. 2892, Aug. 2018. 62640 [46] T. Sztyler, H. Stuckenschmidt, and W. Petrich, ‘‘Position-aware activity recognition with wearable devices,’’ Pervasive Mobile Comput., vol. 38, pp. 281–295, Jul. 2017. [47] I. A. Lawal and S. Bano, ‘‘Deep human activity recognition using wearable sensors,’’ in Proc. 12th ACM Int. Conf. Pervasive Technol. Rel. Assistive Environ., 2019, pp. 45–48. [48] T. Zebin, P. J. Scully, and K. B. Ozanyan, ‘‘Human activity recognition with inertial sensors using a deep learning approach,’’ in Proc. IEEE SENSORS, Oct. 2016, pp. 1–3. [49] G. Meditskos, P.-M. Plans, T. G. Stavropoulos, J. Benois-Pineau, V. Buso, and I. Kompatsiaris, ‘‘Multi-modal activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia,’’ J. Vis. Commun. Image Represent., vol. 51, pp. 169–190, Feb. 2018. [50] E. Zdravevski, B. R. Stojkoska, M. Standl, and H. Schulz, ‘‘Automatic machine-learning based identification of jogging periods from accelerometer measurements of adolescents under field conditions,’’ PLoS ONE, vol. 12, no. 9, Sep. 2017, Art. no. e0184216. [51] K. Ozcan and S. Velipasalar, ‘‘Wearable camera- and accelerometer-based fall detection on portable devices,’’ IEEE Embedded Syst. Lett., vol. 8, no. 1, pp. 6–9, Mar. 2016. [52] S. Song, V. Chandrasekhar, B. Mandal, L. Li, J.-H. Lim, G. S. Babu, P. P. San, and N.-M. Cheung, ‘‘Multimodal multi-stream deep learning for egocentric activity recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2016, pp. 24–31. [53] H. F. Nweke, Y. W. Teh, G. Mujtaba, and M. A. Al-garadi, ‘‘Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions,’’ Inf. Fusion, vol. 46, pp. 147–170, Mar. 2019. [54] B. Cvetković, R. Szeklicki, V. Janko, P. Lutomski, and M. Luštrek, ‘‘Realtime activity monitoring with a wristband and a smartphone,’’ Inf. Fusion, vol. 43, pp. 77–93, Sep. 2018. [55] C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, V.-T. Ninh, T.-K. Le, R. Al-Batal, D.-T. Dang-Nguyen, and G. Healy, ‘‘Advances in lifelog data organisation and retrieval at the NTCIR-14 Lifelog-3 task,’’ in Proc. 14th Int. Conf. NII Testbeds Community Inf. Access Res. (NTCIR), vol. 11966. Tokyo, Japan: Springer, Nov. 2019, pp. 16–28. [Online]. Available: http://eprints.whiterose.ac.uk/152522/ [56] D.-T. Dang-Nguyen, L. Piras, M. Riegler, L. Zhou, M. Lux, M.-T. Tran, T.-K. Le, V.-T. Ninh, and C. Gurrin, ‘‘Overview of ImageCLEFlifelog 2019: Solve my life puzzle and lifelog moment retrieval,’’ in Proc. Conf. Labs Eval. Forum, CEUR-Workshop (CLEF), vol. 2380, 2019, pp. 9–12. [57] T.-K. Le, V.-T. Ninh, D.-T. Dang-Nguyen, M.-T. Tran, L. Zhou, P. Redondo, S. Smyth, and C. Gurrin, ‘‘Lifeseeker: Interactive lifelog search engine at LSC 2019,’’ in Proc. ACM Workshop Lifelog Search Challenge (LSC). New York, NY, USA: Association for Computing Machinery, 2019, pp. 37–40, doi: 10.1145/3326460.3329162. [58] C. Chang, M. Fu, H. Huang, and H. Chen, ‘‘An interactive approach to integrating external textual knowledge for multimodal lifelog retrieval,’’ in Proc. ACM Workshop Lifelog Search Challenge (LSC, ICMR), C. Gurrin, K. Schöffmann, H. Joho, D. Dang-Nguyen, M. Riegler, and L. Piras, Eds., Ottawa, ON, Canada, Jun. 2019, pp. 41–44, doi: 10.1145/3326460.3329163. [59] S. Chowdhury, P. J. McParlane, M. S. Ferdous, and J. Jose, ‘‘‘My day in review’: Visually summarising noisy lifelog data,’’ in Proc. 5th ACM Int. Conf. Multimedia Retr. (ICMR). New York, NY, USA: Association for Computing Machinery, 2015, pp. 607–610, doi: 10.1145/2671188. 2749393. [60] M. Tournadre, G. Dupont, V. Pauwels, B. C. M. Lmami, and A. Gînsca, ‘‘A multimedia modular approach to lifelog moment retrieval,’’ in Proc. Working Notes CLEF, Conf. Labs Eval. Forum, CEUR Workshop, vol. 2380, L. Cappellato, N. Ferro, D. E. Losada, and H. Müller, Eds., Lugano, Switzerland, 2019, pp. 1–13. [61] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, ‘‘A survey on deep transfer learning,’’ arXiv:1808.01974. [Online]. Available: http://arxiv.org/abs/1808.01974 [62] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, ‘‘Learning deep features for scene recognition using places database,’’ in Proc. 27th Int. Conf. Neural Inf. Process. Syst. (NIPS), vol. 1. Cambridge, MA, USA: MIT Press, 2014, pp. 487–495. [63] R. Ribeiro, A. J. R. Neves, and J. L. Oliveira, ‘‘UA.PT bioinformatics at ImageCLEF 2019: Lifelog moment retrieval based on image annotation and natural language processing,’’ 2019. VOLUME 9, 2021 A. Ksibi et al.: Overview of Lifelogging: Current Challenges and Advances [64] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255. [65] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, ‘‘Places: A 10 million image database for scene recognition,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1452–1464, Jun. 2018. [66] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, ‘‘Microsoft COCO: Common objects in context,’’ 2014, arXiv:1405.0312. [Online]. Available: http://arxiv.org/abs/1405.0312 [67] I. Krasin, T. Duerig, N. Alldrin, A. Veit, S. Abu-El-Haija, S. Belongie, D. Cai, Z. Feng, V. Ferrari, and V. Gomes, ‘‘OpenImages: A public dataset for large-scale multi-label and multi-class image classification,’’ 2016. [68] G. Patterson, C. Xu, H. Su, and J. Hays, ‘‘The SUN attribute database: Beyond categories for deeper scene understanding,’’ Int. J. Comput. Vis., vol. 108, nos. 1–2, pp. 59–81, May 2014. [69] F. Abdallah, G. Feki, B. A. Anis, and C. B. Amar, ‘‘Big data for lifelog moments retrieval improvement,’’ 2019. [70] J. Lin and J. H. Lim, ‘‘VCI2R at the NTCIR-13 lifelog-2 lifelog semantic access task,’’ in Proc. 13th NTCIR Conf. Eval. Inf. Access Technol., 2017, pp. 1–5. [71] L. Xia, Y. Ma, and W. Fan, ‘‘VTIR at the NTCIR-12 2016 lifelog semantic access task,’’ in Proc. NTCIR, 2016, pp. 1–4. [72] S. Yamamoto, T. Nishimura, Y. Takimoto, T. Inoue, and H. Toda, ‘‘PBG at the NTCIR-13 lifelog-2 LAT, LSAT , and LEST tasks,’’ in Proc. 13th NTCIR Conf. Eval. Inf. Access Technol., 2017, pp. 1–8. [73] L. Zhou, L. Piras, M. Riegler, G. Boato, D.-T. Dang-Nguyen, and C. Gurrin, ‘‘Organizer team at ImageCLEFlifelog 2017: Baseline approaches for lifelog retrieval and summarization,’’ in Proc. CLEF, 2017, pp. 1–11. [74] P. Zhou, C. Bai, and J. Xia, ‘‘ZJUTCVR team at ImageCLEFlifelog2019 lifelog moment retrieval task,’’ in Proc. CLEF, 2019, pp. 1–11. [75] S. Taubert, S. Kahl, D. Kowerko, and M. Eibl, ‘‘Automated lifelog moment retrieval based on image segmentation and similarity scores,’’ in Proc. CLEF, 2019. [76] T. Suzuki and D. Ikeda, ‘‘Smart lifelog retrieval system with habit-based concepts and moment visualization,’’ in Proc. 14th NTCIR Conf., 2019, pp. 1–8. [77] Y. J. Lee, J. Ghosh, and K. Grauman, ‘‘Discovering important people and objects for egocentric video summarization,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 1346–1353. [78] Z. Guo, L. Gao, X. Zhen, F. Zou, F. Shen, and K. Zheng, ‘‘Spatial and temporal scoring for egocentric video summarization,’’ Neurocomputing, vol. 208, pp. 299–308, Oct. 2016, doi: 10.1016/j.neucom.2016.03.083. [79] Z. Lu and K. Grauman, ‘‘Story-driven summarization for egocentric video,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2714–2721. [80] M. Sun, A. Farhadi, B. Taskar, and S. Seitz, ‘‘Summarizing unconstrained videos using salient montages,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2256–2269, Nov. 2017. [81] S. Jadon and M. Jasim, ‘‘Video summarization using keyframe extraction and video skimming,’’ EasyChair, Tech. Rep. 1181. [82] A. Sahu and A. S. Chowdhury, ‘‘Summarizing egocentric videos using deep features and optimal clustering,’’ Neurocomputing, vol. 398, pp. 209–221, Jul. 2020. [83] B. Mahasseni, M. Lam, and S. Todorovic, ‘‘Unsupervised video summarization with adversarial LSTM networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2982–2991. [84] Z. Gao, G. Lu, and P. Yan, ‘‘Key-frame selection for video summarization: An approach of multidimensional time series analysis,’’ Multidimensional Syst. Signal Process., vol. 29, no. 4, pp. 1485–1505, Oct. 2018. [85] T. Jacquemard, P. Novitzky, F. O’Brolcháin, A. F. Smeaton, and B. Gordijn, ‘‘Challenges and opportunities of lifelog technologies: A literature review and critical analysis,’’ Sci. Eng. Ethics, vol. 20, no. 2, pp. 379–409, Jun. 2014. [86] R. Rawassizadeh and A. M. Tjoa, ‘‘Securing shareable life-logs,’’ in Proc. IEEE 2nd Int. Conf. Social Comput., Aug. 2010, pp. 1105–1110. [87] I. Askoxylakis, I. Brown, P. Dickman, M. Friedewald, K. Irion, E. Kosta, M. Langheinrich, P. McCarthy, D. Osimo, and S. Papiotis, ‘‘To log or not to log? Risks and benefits of emerging life-logging applications,’’ in Proc. ENISA. [88] T. D. et al., ‘‘Looking into the crystalball: A report on emerging technologies and security challenges,’’ TENSOR. VOLUME 9, 2021 [89] N. E. Petroulakis, I. G. Askoxylakis, and T. Tryfonas, ‘‘Life-logging in smart environments: Challenges and security threats,’’ in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2012, pp. 5680–5684. [90] A. Allen, ‘‘Dredging up the past: Lifelogging, memory and surveillance,’’ Fac. Scholarship Penn Law, Tech. Rep. 75. AMEL KSIBI received the B.S., M.S., and Ph.D. degrees in computer engineering from the National School of Engineering of Sfax (ENIS), Sfax University, Tunisia, in 2008, 2010, and 2014, respectively. She spent three years at ENIS as a Teaching Assistant before joining the Higher Institute of Computer Science and Multimedia Gabes (ISIMG) as a Permanent Assistant, in 2013. She joined the Computer Science Department, Umm Qura University (UQU), as an Assistant Professor, in 2014. She joined Princess Nourah Bint Abdulrahman University, in 2018, where she is currently an Assistant Professor with the Department of Information Systems, College of Computer Sciences and Information. Her research interests include computer vision and image and video analysis. These research activities are centered on deep learning applied to lifelog image and video understanding, indexing and retrieval. ALA SALEH D. ALLUHAIDAN (Member, IEEE) received the B.Sc. degree in computer science from Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia, the M.Sc. degree in computer information systems from Grand Valley State University, Allendale, MI USA, and the Ph.D. degree in information systems and technology from Claremont Graduate University, Claremont, CA, USA. She is currently an Assistant Professor with the Department of Information Systems, Princess Nourah Bint Abdulrahman University. Her research interests include health informatics, big data analytics, and machine learning. AMINA SALHI received the B.S. and M.S. degrees in computer sciences from the University of Guelma, Algeria, in 2008 and 2010, respectively, and the Ph.D. degree in image processing and computer vision from the University of BMA, Algeria, in 2017. Since 2018, she has been an Assistant Professor with Princess Norah Bint Abdulrahman University (PNU), Riyadh, Saudi Arabia. Her research interests include computer vision, image processing, and biometry. SAHAR A. EL-RAHMAN (Senior Member, IEEE) received the M.Sc. degree in an AI technique applied to machine aided translation and the Ph.D. degree in reconstruction of high-resolution image from a set of low-resolution images from the Faculty of Engineering-Shoubra, Benha University, Cairo, Egypt, in 2003 and 2008, respectively. Since 2008, she has been an Assistant Professor with the Faculty of Engineering-Shoubra, Benha University. She is currently an Assistant Professor with the College of Computer and Information System, Princess Nourah Bint Abdulrahman University, Saudi Arabia. Her research interests include computer vision, image processing, information security, human–computer interaction, e-health, big data, and cloud computing. 62641