ACM Transactions on Multimedia Computing, Communications, and Applications
We consider the task of temporal human action localization in lifestyle vlogs. We introduce a nov... more We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.
Events and situations unfold quickly in our modern world, generating streams of Internet articles... more Events and situations unfold quickly in our modern world, generating streams of Internet articles, photos, and videos. The ability to automatically sort through this wealth of information would allow us to identify which pieces of information are most important and credible, and how trends unfold over time. In this paper, we present the first piece of a system to sort through large amounts of political data from the web. Our system takes in raw multimodal input (e.g., text, images, and videos), and generates a knowledge graph connecting entities, events, and relations in meaningful ways. This work is part of the DARPA-funded Active Interpretation of Disparate Alternatives (AIDA) project, which aims to automatically build a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, building the first step of the overall system. Our approach is outlined in Figure 1 and will be discuss...
Understanding current world events in real-time involves sifting through news articles, tweets, p... more Understanding current world events in real-time involves sifting through news articles, tweets, photos, and videos from many different perspectives. The goal of the DARPA-funded AIDA project is to automate much of this process, building a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, and we are building the first step of the overall system. Given raw multimodal input (e.g., text, images, video), our goal is to generate a knowledge graph with entities, events, and relations. Figure 1 shows an overview of our pipeline. The first stage is pre-processing. This involves translating all the raw documents, as well as transcribing and translating audio and video data. All the translated information is input to our main processing module that extracts entities, events, and relations. Entities are extracted from both text and video data. In the final, output generation stage of t...
2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing (ICCP), 2016
The paper introduces a novel and efficient algorithm for determining the free-space in road drivi... more The paper introduces a novel and efficient algorithm for determining the free-space in road driving assistance scenarios. The input data for the algorithm is gathered from a stereo camera and is processed as a disparity image. Each column of the disparity image is segmented based on its relative extreme points. The idea is inspired from a time series compression article which presents a method for segmenting data measured at equal intervals of time (time series): electro cardiograms, monthly stocking-exchanges, etc. The novelty of the method consists in adapting an idea used in a different area of interest for an image recognition purpose. Compared to existing algorithms in the driving assistance field that share the same goal, the proposed method achieves great adaptability and a linear time performance. The adaptability of the method is worth mentioning as it gives good results both on precise data gathered with a lidar scanner and on noisy disparity inferred with a stereo camera. The algorithm filters most of the errors of measurement while preserving the points of interest that delimit the road, objects or sky. Because the filtering steps preserve the data of interest, additional post-processing steps are no longer required thus minimizing the time complexity.
ACM Transactions on Multimedia Computing, Communications, and Applications
We consider the task of temporal human action localization in lifestyle vlogs. We introduce a nov... more We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.
Events and situations unfold quickly in our modern world, generating streams of Internet articles... more Events and situations unfold quickly in our modern world, generating streams of Internet articles, photos, and videos. The ability to automatically sort through this wealth of information would allow us to identify which pieces of information are most important and credible, and how trends unfold over time. In this paper, we present the first piece of a system to sort through large amounts of political data from the web. Our system takes in raw multimodal input (e.g., text, images, and videos), and generates a knowledge graph connecting entities, events, and relations in meaningful ways. This work is part of the DARPA-funded Active Interpretation of Disparate Alternatives (AIDA) project, which aims to automatically build a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, building the first step of the overall system. Our approach is outlined in Figure 1 and will be discuss...
Understanding current world events in real-time involves sifting through news articles, tweets, p... more Understanding current world events in real-time involves sifting through news articles, tweets, photos, and videos from many different perspectives. The goal of the DARPA-funded AIDA project is to automate much of this process, building a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, and we are building the first step of the overall system. Given raw multimodal input (e.g., text, images, video), our goal is to generate a knowledge graph with entities, events, and relations. Figure 1 shows an overview of our pipeline. The first stage is pre-processing. This involves translating all the raw documents, as well as transcribing and translating audio and video data. All the translated information is input to our main processing module that extracts entities, events, and relations. Entities are extracted from both text and video data. In the final, output generation stage of t...
2016 IEEE 12th International Conference on Intelligent Computer Communication and Processing (ICCP), 2016
The paper introduces a novel and efficient algorithm for determining the free-space in road drivi... more The paper introduces a novel and efficient algorithm for determining the free-space in road driving assistance scenarios. The input data for the algorithm is gathered from a stereo camera and is processed as a disparity image. Each column of the disparity image is segmented based on its relative extreme points. The idea is inspired from a time series compression article which presents a method for segmenting data measured at equal intervals of time (time series): electro cardiograms, monthly stocking-exchanges, etc. The novelty of the method consists in adapting an idea used in a different area of interest for an image recognition purpose. Compared to existing algorithms in the driving assistance field that share the same goal, the proposed method achieves great adaptability and a linear time performance. The adaptability of the method is worth mentioning as it gives good results both on precise data gathered with a lidar scanner and on noisy disparity inferred with a stereo camera. The algorithm filters most of the errors of measurement while preserving the points of interest that delimit the road, objects or sky. Because the filtering steps preserve the data of interest, additional post-processing steps are no longer required thus minimizing the time complexity.
Uploads
Papers by Oana Ignat