Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Ferda  Ofli
  • HBKU Research Complex, Education City, Doha, Qatar
  • +97444541227
  • I am currently a senior scientist at the Qatar Computing Research Institute, an institute that strives for pursuing w... moreedit
People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making... more
People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making sense of social media data is a challenging task due to several reasons such as limitations of available tools to analyze high-volume and high-velocity data streams, dealing with information overload, among others. To eliminate such limitations, in this work, we first show that textual and imagery content on social media provide complementary information useful to improve situational awareness. We then explore ways in which various Artificial Intelligence techniques from Natural Language Processing and Computer Vision fields can exploit such complementary information generated during disaster events. Finally, we propose a methodological approach that combines several computational techniques effectively in a unified framework to help humanitarian organizations in their relief efforts. We conduct extensive experiments using textual and imagery content from millions of tweets posted during the three major disaster events in the 2017 Atlantic Hurricane season. Our study reveals that the distributions of various types of useful information can inform crisis managers and responders, and facilitate the development of future automated systems for disaster management.
In this paper, we introduce Recipe1M+ , a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M+ affords the ability to... more
In this paper, we introduce Recipe1M+ , a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M+ affords the ability to train high-capacity models on aligned, multimodal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M+ dataset and food and cooking in general. Code, data and models are publicly available.
This article describes a method for early detection of disaster-related damage to cultural heritage. It is based on data from social media, a timely and large-scale data source that is nevertheless quite noisy. First, we collect images... more
This article describes a method for early detection of disaster-related damage to cultural heritage. It is based on data from social media, a timely and large-scale data source that is nevertheless quite noisy. First, we collect images posted on social media that may refer to a cultural heritage site. Then, we automatically categorize these images according to two dimensions: whether they are indeed a photo in which a cultural heritage resource is the main subject, and whether they represent damage. Both categorizations are challenging image classification tasks, given the ambiguity of these visual categories; we tackle both tasks using a convolutional neural network. We test our methodology on a large collection of thousands of images from the web and social media, which exhibit the diversity and noise that is typical of these sources, and contain buildings and other architectural elements, heritage and not-heritage, damaged by disasters as well as intact. Our results show that while the automatic classification is not perfect, it can greatly reduce the manual effort required to find photos of damaged cultural heritage by accurately detecting relevant candidates to be examined by a cultural heritage professional.
People increasingly use Social Media (SM) platforms such as Twitter and Facebook during disasters and emergencies to post situational updates including reports of injured or dead people, infrastructure damage, requests of urgent needs,... more
People increasingly use Social Media (SM) platforms such as Twitter and Facebook during disasters and emergencies to post situational updates including reports of injured or dead people, infrastructure damage, requests of urgent needs, and the like. Information on SM comes in many forms, such as textual messages, images, and videos. Several studies have shown the utility of SM information for disaster response and management, which encouraged humanitarian organizations to start incorporating SM data sources into their workflows. However, several challenges prevent these organizations from using SM data for response efforts. These challenges include near-real-time information processing, information overload, information extraction, summar-ization, and verification of both textual and visual content. We highlight various applications and opportunities of SM multimodal data, latest advancements, current challenges, and future directions for the crisis informatics and other related research fields.
The past several years have witnessed a huge surge in the use of social media platforms during mass convergence events such as health emergencies, natural or human-induced disasters. These non-traditional data sources are becoming vital... more
The past several years have witnessed a huge surge in the use of social media platforms during mass convergence events such as health emergencies, natural or human-induced disasters. These non-traditional data sources are becoming vital for disease forecasts and surveillance when preparing for epidemic and pandemic outbreaks. In this paper, we present GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020. Moreover, we employ a gazetteer-based approach to infer the geolocation of tweets. We postulate that this large-scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis as well as to develop computational methods to address challenges such as identifying fake news, understanding communities' knowledge gaps, building disease forecast and surveillance models, among others.
Automatically classifying an image has been a central problem in computer vision for decades. A plethora of models has been proposed, from handcrafted feature solutions to more sophisticated approaches such as deep learning. The authors... more
Automatically classifying an image has been a central problem in computer vision for decades. A plethora of models has been proposed, from handcrafted feature solutions to more sophisticated approaches such as deep learning. The authors address the problem of remote sensing image classification, which is an important problem to many real world applications. They introduce a novel deep recurrent architecture that incorporates high-level feature descriptors to tackle this challenging problem. Their solution is based on the general encoder–decoder framework. To the best of the authors’ knowledge, this is the first study to use a recurrent network structure on this task. The experimental results show that the proposed framework outperforms the previous works in the three datasets widely used in the literature. They have achieved a state-of-the-art accuracy rate of 97.29% on the UC Merced dataset.
Background: The explosion of consumer electronics and social media are facilitating the rise of the Quantified Self (QS) movement where millions of users are tracking various aspects of their daily life using social media, mobile... more
Background: The explosion of consumer electronics and social media are facilitating the rise of the Quantified Self (QS) movement where millions of users are tracking various aspects of their daily life using social media, mobile technology, and wearable devices. Data from mobile phones, wearables and social media can facilitate a better understanding of the health behaviors of individuals. At the same time, there is an unprecedented increase in childhood obesity rates worldwide. This is a cause for grave concern due to its potential long-term health consequences (e.g., diabetes or cardiovascular diseases). Childhood obesity is highly prevalent in Qatar and the Gulf Region. In this study we examine the feasibility of capturing quantified-self data from social media, wearables and mobiles within a weight lost camp for overweight children in Qatar.
Research Interests:
Approaches for effectively filtering useful situational awareness information posted by eyewitnesses of disasters, in real time, are greatly needed. While many studies have focused on filtering textual information, the research on... more
Approaches for effectively filtering useful situational awareness information posted by eyewitnesses of disasters, in real time, are greatly needed. While many studies have focused on filtering textual information, the research on filtering disaster images is more limited. In particular, there are no studies on the applicability of domain adaptation to filter images from an emergent target disaster, when no labeled data is available for the target disaster. To fill in this gap, we propose to apply a domain adaptation approach, called domain adversarial neural networks (DANN), to the task of identifying images that show damage. The DANN approach has VGG-19 as its backbone, and uses the adversarial training to find a transformation that makes the source and target data indistinguishable. Experimental results on several pairs of disasters suggest that the DANN model generally gives similar or better results as compared to the VGG-19 model fine-tuned on the source labeled data.
Over the last few years, extensive research has been conducted to develop technologies to support humanitarian aid tasks. However, many technologies are still limited as they require both manual and automatic approaches, and more... more
Over the last few years, extensive research has been conducted to develop technologies to support humanitarian aid tasks. However, many technologies are still limited as they require both manual and automatic approaches, and more importantly, are not ready to be integrated into the disaster response workflows. To tackle this limitation, we develop automatic data processing services that are freely and publicly available, and made to be simple, efficient, and accessible to non-experts. Our services take textual messages (e.g., tweets, Facebook posts, SMS) as input to determine (i) which disaster type the message belongs to, (ii) whether it is informative or not, and (iii) what type of humanitarian information it conveys. We built our services upon machine learning classifiers that are obtained from large-scale comparative experiments utilizing both classical and deep learning algorithms. Our services outperform state-of-the-art publicly available tools in terms of classification accuracy.
A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an... more
A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weakly-supervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.
Rapid damage assessment is one of the core tasks that response organizations perform at the onset of a disaster to understand the scale of damage to infrastructures such as roads, bridges, and buildings. This work analyzes the usefulness... more
Rapid damage assessment is one of the core tasks that response organizations perform at the onset of a disaster to understand the scale of damage to infrastructures such as roads, bridges, and buildings. This work analyzes the usefulness of social media imagery content to perform rapid damage assessment during a real-world disaster. An automatic image processing system, which was activated in collaboration with a volunteer response organization, processed ⇠280K images to understand the extent of damage caused by the disaster. The system achieved an accuracy of 76% computed based on the feedback received from the domain experts who analyzed ⇠29K system-processed images during the disaster. An extensive error analysis reveals several insights and challenges faced by the system, which are vital for the research community to advance this line of research.
Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among... more
Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among others. Although many studies have shown the usefulness of both text and image content for disaster response purposes, the research has been mostly focused on analyzing only the text modality in the past. In this paper, we propose to use both text and image modalities of social media data to learn a joint representation using state-of-the-art deep learning techniques. Specifically, we utilize convolutional neural networks to define a multimodal deep learning architecture with a modality-agnostic shared representation. Extensive experiments on real-world disaster datasets show that the proposed multimodal architecture yields better performance than models trained using a single modality (e.g., either text or image).
Having reliable and up-to-date poverty data is a prerequisite for monitoring the United Nations Sustainable Development Goals (SDGs) and for planning effective poverty reduction interventions. Unfortunately, traditional data sources are... more
Having reliable and up-to-date poverty data is a prerequisite for monitoring the United Nations Sustainable Development Goals (SDGs) and for planning effective poverty reduction interventions. Unfortunately, traditional data sources are often outdated or lacking appropriate disaggregation. As a remedy, satellite imagery has recently become prominent in obtaining geographically-fine-grained and up-to-date poverty estimates. Satellite data can pick up signals of economic activity by detecting light at night, it can pick up development status by detecting infrastructure such as roads, and it can pick up signals for individual household wealth by detecting different building footprints and roof types. It can, however, not look inside the households and pick up signals from individuals. On the other hand, alternative data sources such as audience estimates from Facebook's advertising platform provide insights into the devices and internet connection types used by individuals in different locations. Previous work has shown the value of such anonymous, publicly-accessible advertising data from Facebook for studying migration, gender gaps, crime rates, and health, among others. In this work, we evaluate the added value of using Facebook data over satellite data for mapping socioeconomic development in two low and middle income countries-the Philippines and India. We show that Facebook features perform roughly similar to satellite data in the Philippines with value added for urban locations. In India, however, where Facebook penetration is lower, satellite data perform better.
Responding to natural disasters, such as earthquakes, floods, and wildfires, is a laborious task performed by on-the-ground emergency responders and analysts. Social media has emerged as a low-latency data source to quickly understand... more
Responding to natural disasters, such as earthquakes, floods, and wildfires, is a laborious task performed by on-the-ground emergency responders and analysts. Social media has emerged as a low-latency data source to quickly understand disaster situations. While most studies on social media are limited to text, images offer more information for understanding disaster and incident scenes. However, no large-scale image datasets for incident detection exists. In this work, we present the Incidents Dataset, which contains 446,684 images annotated by humans that cover 43 incidents across a variety of scenes. We employ a baseline classification model that mitigates false-positive errors and we perform image filtering experiments on millions of social media images from Flickr and Twitter. Through these experiments, we show how the Incidents Dataset can be used to detect images with incidents in the wild. Code, data, and models are available online at
During a disaster event, images shared on social media helps crisis managers gain situational awareness and assess incurred damages, among other response tasks. Recent advances in computer vision and deep neural networks have enabled the... more
During a disaster event, images shared on social media helps crisis managers gain situational awareness and assess incurred damages, among other response tasks. Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification for a number of tasks, including detecting crisis incidents, filtering irrelevant images, classifying images into specific humanitarian categories, and assessing the severity of damage. Despite several efforts, past works mainly suffer from limited resources (i.e., labeled images) available to train more robust deep learning models. In this study, we propose new datasets for disaster type detection, and informativeness classification, and damage severity assessment. Moreover, we relabel existing publicly available datasets for new tasks. We identify exact-and near-duplicates to form non-overlapping data splits, and finally consolidate them to create larger datasets. In our extensive experiments, we benchmark several state-of-the-art deep learning models and achieve promising results. We release our datasets and models publicly, aiming to provide proper baselines as well as to spur further research in the crisis informatics community.
People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making... more
People increasingly use microblogging platforms such as Twitter during natural disasters and emergencies. Research studies have revealed the usefulness of the data available on Twitter for several disaster response tasks. However, making sense of social media data is a challenging task due to several reasons such as limitations of available tools to analyze high-volume and high-velocity data streams. This work presents an extensive multidimensional analysis of textual and multimedia content from millions of tweets shared on Twitter during the three disaster events. Specifically, we employ various Artificial Intelligence techniques from Natural Language Processing and Computer Vision fields, which exploit different machine learning algorithms to process the data generated during the disaster events. Our study reveals the distributions of various types of useful information that can inform crisis managers and responders as well as facilitate the development of future automated systems for disaster management.
During natural and man-made disasters, people use social media platforms such as Twitter to post textual and multimedia content to report updates about injured or dead people, infrastructure damage, missing or found people, among other... more
During natural and man-made disasters, people use social media platforms such as Twitter to post textual and multimedia content to report updates about injured or dead people, infrastructure damage, missing or found people, among other information types. Studies have revealed that this online information, if processed timely and effectively, is extremely useful for humanitarian organizations to gain situational awareness and plan relief operations. In addition to the analysis of textual content, recent studies have shown that imagery content on social media can boost disaster response significantly. Despite extensive research that mainly focuses on textual content to extract useful information, limited work has focused on the use of imagery content or the combination of both content types. One of the reasons is the lack of labeled imagery data in this domain. Therefore, in this paper, we aim to tackle this limitation by releasing a large multi-modal dataset from natural disasters collected from Twitter. We provide three types of annotations, which are useful to address a number of crisis response and management tasks for different humanitarian organizations.
Increasing prevalence of obesity has disconcerting implications for communities, for nations and, most importantly, for individuals in aspects ranging from quality of life, longevity and health, to social and financial prosperity.... more
Increasing prevalence of obesity has disconcerting implications for communities, for nations and, most importantly, for individuals in aspects ranging from quality of life, longevity and health, to social and financial prosperity. Therefore, researchers from a variety of backgrounds study obesity from all angles. In this paper, we use a state-of-the-art computer vision system to predict a person's body-mass index (BMI) from their social media profile picture and demonstrate the type of analyses this approach enables using data from two culturally diverse settings -- the US and Qatar. Using large amounts of Instagram profile pictures, we show that (i) thinner profile pictures have more followers, and that (ii) there is weight-based network homophily in that users with a similar BMI tend to cluster together. To conclude, we also discuss the challenges and limitations related to inferring various user attributes from photos.
The extensive use of social media platforms, especially during disasters, creates unique opportunities for humanitarian organizations to gain situational awareness as disaster unfolds. In addition to textual content, people post... more
The extensive use of social media platforms, especially during disasters, creates unique opportunities for humanitarian organizations to gain situational awareness as disaster unfolds. In addition to textual content, people post overwhelming amounts of imagery content on social networks within minutes of a disaster hit. Studies point to the importance of this online imagery content for emergency response. Despite recent advances in computer vision research, making sense of the imagery content in real-time during disasters remains a challenging task. One of the important challenges is that a large proportion of images shared on social media is redundant or irrelevant, which requires robust filtering mechanisms. Another important challenge is that images acquired after major disasters do not share the same characteristics as those in large-scale image collections with clean annotations of well-defined object categories such as house, car, airplane, cat, dog, etc., used traditionally in computer vision research. To tackle these challenges, we present a social media image processing pipeline that combines human and machine intelligence to perform two important tasks: (i) capturing and filtering of social media imagery content (i.e., real-time image streaming, de-duplication, and relevancy filtering), and (ii) actionable information extraction (i.e., damage severity assessment) as a core situational awareness task during an ongoing crisis event. Results obtained from extensive experiments on real-world crisis datasets demonstrate the significance of the proposed pipeline for optimal utilization of both human and machine computing resources.
Research Interests:
[This corrects the article DOI: 10.2196/mhealth.6562.].
The importance of sleep is paramount to health. Insufficient sleep can reduce physical, emotional, and mental well-being and can lead to a multitude of health complications among people with chronic conditions. Physical activity and sleep... more
The importance of sleep is paramount to health. Insufficient sleep can reduce physical, emotional, and mental well-being and can lead to a multitude of health complications among people with chronic conditions. Physical activity and sleep are highly interrelated health behaviors. Our physical activity during the day (ie, awake time) influences our quality of sleep, and vice versa. The current popularity of wearables for tracking physical activity and sleep, including actigraphy devices, can foster the development of new advanced data analytics. This can help to develop new electronic health (eHealth) applications and provide more insights into sleep science. The objective of this study was to evaluate the feasibility of predicting sleep quality (ie, poor or adequate sleep efficiency) given the physical activity wearable data during awake time. In this study, we focused on predicting good or poor sleep efficiency as an indicator of sleep quality. Actigraphy sensors are wearable medical devices used to study sleep and physical activity patterns. The dataset used in our experiments contained the complete actigraphy data from a subset of 92 adolescents over 1 full week. Physical activity data during awake time was used to create predictive models for sleep quality, in particular, poor or good sleep efficiency. The physical activity data from sleep time was used for the evaluation. We compared the predictive performance of traditional logistic regression with more advanced deep learning methods: multilayer perceptron (MLP), convolutional neural network (CNN), simple Elman-type recurrent neural network (RNN), long short-term memory (LSTM-RNN), and a time-batched version of LSTM-RNN (TB-LSTM). Deep learning models were able to predict the quality of sleep (ie, poor or good sleep efficiency) based on wearable data from awake periods. More specifically, the deep learning methods performed better than traditional logistic regression. “CNN had the highest specificity and sensitivity, and an overall area under the receiver operating characteristic (ROC) curve (AUC) of 0.9449, which was 46% better as compared with traditional logistic regression (0.6463). Deep learning methods can predict the quality of sleep based on actigraphy data from awake periods. These predictive models can be an important tool for sleep research and to improve eHealth solutions for sleep.
Aerial imagery captured via unmanned aerial vehicles (UAVs) is playing an increasingly important role in disaster response. Unlike satellite imagery, aerial imagery can be captured and processed within hours rather than days. In addition,... more
Aerial imagery captured via unmanned aerial vehicles (UAVs) is playing an increasingly important role in disaster response. Unlike satellite imagery, aerial imagery can be captured and processed within hours rather than days. In addition, the spatial resolution of aerial imagery is an order of magnitude higher than the imagery produced by the most sophisticated commercial satellites today. Both the United States Federal Emergency Management Agency (FEMA) and the European Commission's Joint Research Center (JRC) have noted that aerial imagery will inevitably present a big data challenge. The purpose of this article is to get ahead of this future challenge by proposing a hybrid crowdsourcing and real-time machine learning solution to rapidly process large volumes of aerial data for disaster response in a time-sensitive manner. Crowdsourcing can be used to annotate features of interest in aerial images (such as damaged shelters and roads blocked by debris). These human-annotated features can then be used to train a supervised machine learning system to learn to recognize such features in new unseen images. In this article, we describe how this hybrid solution for image analysis can be implemented as a module (i.e., Aerial Clicker) to extend an existing platform called Artificial Intelligence for Disaster Response (AIDR), which has already been deployed to classify microblog messages during disasters using its Text Clicker module and in response to Cyclone Pam, a category 5 cyclone that devastated Vanuatu in March 2015. The hybrid solution we present can be applied to both aerial and satellite imagery and has applications beyond disaster response such as wildlife protection, human rights, and archeological exploration. As a proof of concept, we recently piloted this solution using very high-resolution aerial photographs of a wildlife reserve in Namibia to support rangers with their wildlife conservation efforts (SAVMAP project, http://lasig.epfl.ch/savmap ). The results suggest that the platform we have developed to combine crowdsourcing and machine learning to make sense of large volumes of aerial images can be used for disaster response.
ABSTRACT Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D data. A number of approaches have been proposed that... more
ABSTRACT Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D data. A number of approaches have been proposed that extract representative features from 3D depth data, a reconstructed 3D surface mesh or more commonly from the recovered estimate of the human skeleton. Recent advances in neuroscience have discovered a neural encoding of static 3D shapes in primate infero-temporal cortex that can be represented as a hierarchy of medial axis and surface features. We hypothesize a similar neural encoding might also exist for 3D shapes in motion and propose a hierarchy of dynamic medial axis structures at several spatio-temporal scales that can be modeled using a set of Linear Dynamical Systems (LDSs). We then propose novel discriminative metrics for comparing these sets of LDSs for the task of human activity recognition. Combined with simple classification frameworks, our proposed features and corresponding hierarchical dynamical models provide the highest human activity recognition rates as compared to state-of-the-art methods on several skeletal datasets.
The Microsoft Kinect camera is becoming increasingly popular in many areas aside from entertainment, including human activity monitoring and rehabilitation. Many people, however, fail to consider the reliability and accuracy of the Kinect... more
The Microsoft Kinect camera is becoming increasingly popular in many areas aside from entertainment, including human activity monitoring and rehabilitation. Many people, however, fail to consider the reliability and accuracy of the Kinect human pose estimation when they depend on it as a measuring system. In this paper we compare the Kinect pose estimation (skeletonization) with more established techniques for pose estimation from motion capture data, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. We have evaluated six physical exercises aimed at coaching of elderly population. Experimental results present pose estimation accuracy rates and corresponding error bounds for the Kinect system.
This paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancerpsilas body whereas the musical audio signal is processed to extract the... more
This paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancerpsilas body whereas the musical audio signal is processed to extract the beat information. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing
Although the positive effects of exercise on the well-being and quality of independent living for older adults are well-accepted, many elderly individuals lack access to exercise facilities, or the skills and motivation to perform... more
Although the positive effects of exercise on the well-being and quality of independent living for older adults are well-accepted, many elderly individuals lack access to exercise facilities, or the skills and motivation to perform exercise at home. To provide a more engaging environment that promotes physical activity, various fitness applications have been proposed. Many of the available products, however, are geared toward a younger population and are not appropriate or engaging for an older population. To address these issues, we developed an automated interactive exercise coaching system using the Microsoft Kinect. The coaching system guides users through a series of video exercises, tracks and measures their movements, provides real-time feedback, and records their performance over time. Our system consists of exercises to improve balance, flexibility, strength, and endurance, with the aim of reducing fall risk and improving performance of daily activities. In this paper, we report on the development of the exercise system, discuss the results of our recent field pilot study with six independently-living elderly individuals, and highlight the lessons learned relating to the in-home system setup, user tracking, feedback, and exercise performance evaluation.
We present a framework for selecting best audio features for audiovisual analysis and synthesis of dance figures. Dance figures are performed synchronously with the musical rhythm. They can be analyzed through the audio spectra using... more
We present a framework for selecting best audio features for audiovisual analysis and synthesis of dance figures. Dance figures are performed synchronously with the musical rhythm. They can be analyzed through the audio spectra using spectral and rhythmic musical features. In the proposed audio feature evaluation system, dance figures are manually labeled over the video stream. The music segments, which
In this paper we present a framework for analysis of dance figures from audio-visual data. Our audio-visual data is the mul-tiview video of a dancing actor which is acquired using 8 synchronized cameras. The multi-camera motion capture... more
In this paper we present a framework for analysis of dance figures from audio-visual data. Our audio-visual data is the mul-tiview video of a dancing actor which is acquired using 8 synchronized cameras. The multi-camera motion capture technique of this framework is based on 3D tracking of the markers attached to the dancer's body, using stereo color information. The extracted
We target to learn correlation models between music and dance performances to synthesize music driven dance choreographies. The proposed framework learns statistical mappings from mu- sical measures to dance figures using musical measure... more
We target to learn correlation models between music and dance performances to synthesize music driven dance choreographies. The proposed framework learns statistical mappings from mu- sical measures to dance figures using musical measure models, exchangeable figures model, choreography model and dance figure models. Alternative dance choreographies are synthe- sized based on these statistical mappings. Objective and subjec- tive evaluation results
... (if)f takip Daha sora f rinden ... IV duru an, { isi, 2, 6* }, 61lu~aktadir.FHer br X~rdn du= t mti i A> 'rnhdr0tken matrtiYle ili;;kiendirikilhio=8 &ek dizi,hLr > rn°eZk *etqimde topla.abilr:, F - {fj, f> } EBu:rada... more
... (if)f takip Daha sora f rinden ... IV duru an, { isi, 2, 6* }, 61lu~aktadir.FHer br X~rdn du= t mti i A> 'rnhdr0tken matrtiYle ili;;kiendirikilhio=8 &ek dizi,hLr > rn°eZk *etqimde topla.abilr:, F - {fj, f> } EBu:rada fe, arunhidaki ~rdt1ik yek1-toim ifade eder, A yapisini ku]la-narak ltmuz zarnaal ...
... Moreover, motion patterns may span time intervals of different lengths with respect to its audio counterparts. The recent work by Sargin et al. address the challenges similar to those mentioned above in the context of prosody-driven... more
... Moreover, motion patterns may span time intervals of different lengths with respect to its audio counterparts. The recent work by Sargin et al. address the challenges similar to those mentioned above in the context of prosody-driven head gesture synthesis in [10]. ...
Abstract—The goal of this project is to convert a given speaker's speech (the Source speaker) into another identified voice (the Target speaker) as well as analysing the face animation of the source to animate a 3D avatar imitating... more
Abstract—The goal of this project is to convert a given speaker's speech (the Source speaker) into another identified voice (the Target speaker) as well as analysing the face animation of the source to animate a 3D avatar imitating the source facial movements. We assume we have at our disposal a large amount of speech samples from the source and target voices with a reasonable amount of parallel data. Speech and video are processed separately and recombined at the end. Voice conversion is obtained in two steps: a voice ...
ABSTRACT Much of the existing work on action recognition combines simple features (e.g., joint angle trajectories, optical flow, spatio-temporal video features) with somewhat complex classifiers or dynamical models (e.g., kernel SVMs,... more
ABSTRACT Much of the existing work on action recognition combines simple features (e.g., joint angle trajectories, optical flow, spatio-temporal video features) with somewhat complex classifiers or dynamical models (e.g., kernel SVMs, HMMs, LDSs, deep belief networks). Although successful, these approaches represent an action with a set of parameters that usually do not have any physical meaning. As a consequence, such approaches do not provide any qualitative insight that relates an action to the actual motion of the body or its parts. For example, it is not necessarily the case that clapping can be correlated to hand motion or that walking can be correlated to a specific combination of motions from the feet, arms and body. In this paper, we propose a new representation of human actions called Sequence of the Most Informative Joints (SMIJ), which is extremely easy to interpret. At each time instant, we automatically select a few skeletal joints that are deemed to be the most informative for performing the current action. The selection of joints is based on highly interpretable measures such as the mean or variance of joint angles, maximum angular velocity of joints, etc. We then represent an action as a sequence of these most informative joints. Our experiments on multiple databases show that the proposed representation is very discriminative for the task of human action recognition and performs better than several state-of-the-art algorithms.
The extensive use of social media platforms, especially during disasters, creates unique opportunities for humanitarian organizations to gain situational awareness and launch relief operations accordingly. In addition to the textual... more
The extensive use of social media platforms, especially during disasters, creates unique opportunities for humanitarian organizations to gain situational awareness and launch relief operations accordingly. In addition to the textual content, people post overwhelming amounts of imagery data on social networks within minutes of a disaster hit. Studies point to the importance of this online imagery content for emergency response. Despite recent advances in the computer vision field, automatic processing of the crisis-related social media imagery data remains a challenging task. It is because a majority of which consists of redundant and irrelevant content. In this paper, we present an image processing pipeline that comprises de-duplication and relevancy filtering mechanisms to collect and filter social media image content in real-time during a crisis event. Results obtained from extensive experiments on real-world crisis datasets demonstrate the significance of the proposed pipeline fo...
The widespread usage of social networks during mass convergence events, such as health emergencies and disease outbreaks, provides instant access to citizen-generated data that carry rich information about public opinions, sentiments,... more
The widespread usage of social networks during mass convergence events, such as health emergencies and disease outbreaks, provides instant access to citizen-generated data that carry rich information about public opinions, sentiments, urgent needs, and situational reports. Such information can help authorities understand the emergent situation and react accordingly. Moreover, social media plays a vital role in tackling misinformation and disinformation. This work presents TBCOV, a large-scale Twitter dataset comprising more than two billion multilingual tweets related to the COVID-19 pandemic collected worldwide over a continuous period of more than one year. More importantly, several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities (e.g., mentions of persons, organizations, locations), user types, and gender information. Last but not least, a geotagging method is proposed to assign country, state...
Rapid damage assessment is one of the core tasks that response organizations perform at the onset of a disaster to understand the scale of damage to infrastructures such as roads, bridges, and buildings. This work analyzes the usefulness... more
Rapid damage assessment is one of the core tasks that response organizations perform at the onset of a disaster to understand the scale of damage to infrastructures such as roads, bridges, and buildings. This work analyzes the usefulness of social media imagery content to perform rapid damage assessment during a real-world disaster. An automatic image processing system, which was activated in collaboration with a volunteer response organization, processed ~280K images to understand the extent of damage caused by the disaster. The system achieved an accuracy of 76% computed based on the feedback received from the domain experts who analyzed ~29K system-processed images during the disaster. An extensive error analysis reveals several insights and challenges faced by the system, which are vital for the research community to advance this line of research.
We present a multi-camera system for audio-visual analysis of dance figures. The multi-view video of a dancing actor is acquired using 8 synchronized cameras. The motion capture technique of the proposed system is based on 3D tracking of... more
We present a multi-camera system for audio-visual analysis of dance figures. The multi-view video of a dancing actor is acquired using 8 synchronized cameras. The motion capture technique of the proposed system is based on 3D tracking of the markers attached to the person's body in the scene. The resulting set of 3D points is then used to extract the body motion features as 3D displacement vectors whereas MFC coefficients serve as the audio features. In the multi-modal analysis phase, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of the audio and body motion features such as legs and arms, separately, to determine the recurrent elementary audio and body motion patterns in the first stage. Then in the second stage, we investigate the correlation of body motion patterns with audio patterns that can be used towards estimation and synthesis of realistic audio-driven body animation.
The goal of this project is to convert a given speaker’s speech (the Source speaker) into another identified voice (the Target speaker) as well as analysing the face animation of the source to animate a 3D avatar imitating the source... more
The goal of this project is to convert a given speaker’s speech (the Source speaker) into another identified voice (the Target speaker) as well as analysing the face animation of the source to animate a 3D avatar imitating the source facial movements. We assume we have at our disposal a large amount of speech samples from the source and target voices with a reasonable amount of parallel data. Speech and video are processed separately and recombined at the end. Voice conversion is obtained in two steps: a voice mapping step followed by a speech synthesis step. In the speech synthesis step, we specifically propose to select speech frames directly from the large target speech corpus, in a way that recall the unit-selection principle used in state-of-the-art text-to-speech systems. The output of this four weeks work can be summarized as: a tailored source database, a set of open-source MATLAB and C files and finally audio and video files obtained by our conversion method. Experimental r...
Over the last few years, extensive research has been conducted to develop technologies to support humanitarian aid tasks. However, many technologies are still limited as they require both manual and automatic approaches, and more... more
Over the last few years, extensive research has been conducted to develop technologies to support humanitarian aid tasks. However, many technologies are still limited as they require both manual and automatic approaches, and more importantly, are not ready to be integrated into the disaster response workflows. To tackle this limitation, we develop automatic data processing services that are freely and publicly available, and made to be simple, efficient, and accessible to non-experts. Our services take textual messages (e.g., tweets, Facebook posts, SMS) as input to determine (i) which disaster type the message belongs to, (ii) whether it is informative or not, and (iii) what type of humanitarian information it conveys. We built our services upon machine learning classifiers that are obtained from large-scale comparative experiments utilizing both classical and deep learning algorithms. Our services outperform state-of-the-art publicly available tools in terms of classification accu...
This paper summarizes the recent progress we have made for the computer vision technologies in physical therapy with the accessible and affordable devices. We first introduce the remote health coaching system we build with Microsoft... more
This paper summarizes the recent progress we have made for the computer vision technologies in physical therapy with the accessible and affordable devices. We first introduce the remote health coaching system we build with Microsoft Kinect. Since the motion data captured by Kinect is noisy, we investigate the data accuracy of Kinect with respect to the high accuracy motion capture system. We also propose an outlier data removal algorithm based on the data distribution. In order to generate the kinematic parameter from the noisy data captured by Kinect, we propose a kinematic filtering algorithm based on Unscented Kalman Filter and the kinematic model of human skeleton. The proposed algorithm can obtain smooth kinematic parameter with reduced noise compared to the kinematic parameter generated from the raw motion data from Kinect.
A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income. At the societal level, "fat shaming" and other forms of "sizeism" are a growing... more
A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income. At the societal level, "fat shaming" and other forms of "sizeism" are a growing concern, while increasing obesity rates are linked to ever raising healthcare costs. For these reasons, researchers from a variety of backgrounds are interested in studying obesity from all angles. To obtain data, traditionally, a person would have to accurately self-report their body-mass index (BMI) or would have to see a doctor to have it measured. In this paper, we show how computer vision can be used to infer a person's BMI from social media images. We hope that our tool, which we release, helps to advance the study of social aspects related to body weight.
We propose Nazr-CNN, a deep learning pipeline for object detection and fine-grained classification in images acquired from Unmanned Aerial Vehicles (UAVs). The UAVs were deployed in the Island of Vanuatu to assess damage in the aftermath... more
We propose Nazr-CNN, a deep learning pipeline for object detection and fine-grained classification in images acquired from Unmanned Aerial Vehicles (UAVs). The UAVs were deployed in the Island of Vanuatu to assess damage in the aftermath of cyclone PAM in 2015. The images were labeled by a crowdsourcing effort and the labeling categories consisted of fine-grained levels of damage to built structures. Nazr-CNN consists of two components. The function of the first component is to localize objects (e.g. houses) in an image by carrying out a pixel-level classification. In the second component, a hidden layer of a Convolutional Neural Network (CNN) is used to encode Fisher Vectors (FV) of the segments generated from the first component in order to help discriminate between between different levels of damage. Since our data set is relatively small, a pre-trained network for pixel-level classification and FV encoding was used. Nazr-CNN attains promising results both for object detection an...
The explosion of consumer electronics and social media are facilitating the rise of the Quantified Self (QS) movement where millions of users are tracking various aspects of their daily life using social media, mobile technology, and... more
The explosion of consumer electronics and social media are facilitating the rise of the Quantified Self (QS) movement where millions of users are tracking various aspects of their daily life using social media, mobile technology, and wearable devices. Data from mobile phones, wearables and social media can facilitate a better understanding of the health behaviors of individuals. At the same time, there is an unprecedented increase in childhood obesity rates worldwide. This is a cause for grave concern due to its potential long-term health consequences (e.g., diabetes or cardiovascular diseases). Childhood obesity is highly prevalent in Qatar and the Gulf Region. In this study we examine the feasibility of capturing quantified-self data from social media, wearables and mobiles within a weight lost camp for overweight children in Qatar. Over 50 children (9-12 years old) and parents used a wide range of technologies, including wearable sensors (actigraphy), mobile and social media (Wha...
Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among... more
Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among others. Although many studies have shown the usefulness of both text and image content for disaster response purposes, the research has been mostly focused on analyzing only the text modality in the past. In this paper, we propose to use both text and image modalities of social media data to learn a joint representation using state-of-the-art deep learning techniques. Specifically, we utilize convolutional neural networks to define a multimodal deep learning architecture with a modality-agnostic shared representation. Extensive experiments on real-world disaster datasets show that the proposed multimodal architecture yields better performance than models trained using a single modality (e.g., either text or image).