I am a Researcher in Video Processing in the Sigmedia group of the department of Electronic & Electrical Engineering at Trinity College Dublin.
My research is concerned with media signal processing applications related to film postproduction and film restoration. The aim of my research is to develop tools that can enhance the cinematic experience of immersion. Phone: +353 1896 3818 Address: AAP 2.18 5 College Green Trinity College Dublin
Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-le... more Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-learning based interactive segmentation algorithms have made significant progress in handling complex images and rough binary selections can typically be obtained with just a few clicks. Yet, deep learning techniques tend to plateau once this rough selection has been reached. In this work, we interpret this plateau as the inability of current algorithms to sufficiently leverage each user interaction and also as the limitations of current training/testing datasets. We propose a novel interactive architecture and a novel training scheme that are both tailored to better exploit the user workflow. We also show that significant improvements can be further gained by introducing a synthetic training dataset that is specifically designed for complex object boundaries. Comprehensive experiments support our approach, and our network achieves state of the art performance.
Cutting out an object and estimating its opacity mask, known as image matting, is a key task in m... more Cutting out an object and estimating its opacity mask, known as image matting, is a key task in many image editing applications. Deep learning approaches have made significant progress by adapting the encoder-decoder architecture of segmentation networks. However, most of the existing networks only predict the alpha matte and post-processing methods must then be used to recover the original foreground and background colours in the transparent regions. Recently, two methods have shown improved results by also estimating the foreground colours, but at a significant computational and memory cost. In this paper, we propose a low-cost modification to alpha matting networks to also predict the foreground and background colours. We study variations of the training regime and explore a wide range of existing and novel loss functions for the joint prediction. Our method achieves the state of the art performance on the Adobe Composition-1k dataset for alpha matte and composite colour quality....
The current state of the art alpha matting methods mainly rely on the trimap as the secondary and... more The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input channel. Extensive experiments are performed to analyse the effect of the background information in image and video matting such as training with mildly and heavily distorted backgrounds. Based on the quantitative evaluations performed on Adobe Composition-1k dataset, the proposed pipeline significantly outperforms the state of the art methods using AlphaMatting benchmark metrics.
Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds -- a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.
With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.
Machine Learning and Knowledge Discovery in Databases, 2019
With the rapid proliferation of multimedia data in the internet, there has been a fast rise in th... more With the rapid proliferation of multimedia data in the internet, there has been a fast rise in the creation of videos for the viewers. This enables the viewers to skip the advertisement breaks in the videos, using ad blockers and 'skip ad' buttons -bringing online marketing and publicity to a stall. In this paper, we demonstrate a system that can effectively integrate a new advertisement into a video sequence. We use state-of-the-art techniques from deep learning and computational photogrammetry, for effective detection of existing adverts, and seamless integration of new adverts into video sequences. This is helpful for targeted advertisement, paving the path for next-gen publicity.
2018 25th IEEE International Conference on Image Processing (ICIP), 2018
Many Defocus blur estimation methods have been proposed in recent years but, when applied to vide... more Many Defocus blur estimation methods have been proposed in recent years but, when applied to video sequences in a frame-by-frame manner, they typically exhibit temporal inconsistencies or flickering. This paper presents a temporal coherence scheme that can be coupled to any existing defocus blur estimation for still images, aiming to produce spatiotemporally coherent defocus blur map videos. The proposed method is based on the design of a Kalman Filter which is applied on a patch level. Experimental results show that the proposed method can smooth out undesirable temporal fluctuations whilst still being able to preserve the abrupt local appearance changes due to motion, occlusions or dis-occlusions.
2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019
Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds-a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.
Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track, 2021
Over the past decade, the evolution of video-sharing platforms has attracted a significant amount... more Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.
2019 16th International Conference on Machine Vision Applications (MVA), 2019
With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.
This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking tas... more This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking task. Our approach uses a RNN based neural network to learn a semantic representation of text (news topics), images and videos (collected from Twitter and Flickr posts) in the same latent space. We applied a two-stage (pre-train + fine-tuning) learning architecture to train and adjust the model (using Flickr30k and labels from online search as additional data). During the search phase of the task we take a different strategy to generate five different runs by leveraging video-length normalization and controlling the training source.
A combination of computer vision and projector-based illumination opens the possibility for a new... more A combination of computer vision and projector-based illumination opens the possibility for a new type of computer vision technologies. One of them is augmented reality: selectively illuminating the scene to improve or manipulate how the reality itself, rather than its display, appears to a human. One such example is the Smart Headlight being developed at Carnegie Mellon University's Robotics Institute. The project team has been working on a set of new capabilities for the headlight, such as making rain drops and snowflakes disappear, allowing for the high beams to always be on without glare, and enhancing the appearance of objects of interest. Using the Smart Headlight as an example, this talk will further discuss various ideas, concepts and possible applications of coaxial and non-coaxial projector-camera systems. About the speaker: Professor Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics at Carnegie Mellon University. He rec...
Online video advertising gives content providers the ability to deliver compelling content, reach... more Online video advertising gives content providers the ability to deliver compelling content, reach a growing audience, and generate additional revenue from online media. Recently, advertising strategies are designed to look for original advert(s) in a video frame, and replacing them with new adverts. These strategies, popularly known as product placement or embedded marketing, greatly help the marketing agencies to reach out to a wider audience. However, in the existing literature, such detection of candidate frames in a video sequence for the purpose of advert integration, is done manually. In this paper, we propose a deep-learning architecture called ADNet, that automatically detects the presence of advertisements in video frames. Our approach is the first of its kind that automatically detects the presence of adverts in a video frame, and achieves state-of-the-art results on a public dataset.
This research project aims to provide an AI enhanced productivity tool for media generation to vi... more This research project aims to provide an AI enhanced productivity tool for media generation to video editors and producers. The goal is to improve productivity among producers and artists in terms of augmenting video content with new objects or effects, in a natural and appealing way. Furthermore, the project aims to bridge the gap between offline augmented reality technologies, occlusion handling, and camera tracking.
Proceedings of the 2021 International Conference on Multimodal Interaction, 2021
Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. ... more Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. These time-synchronous comments are overlaid on the video playback and uniquely enrich the viewing experience, engaging hundreds of millions of users in rich community discussions. The presence of danmu comments has become a determining factor for video popularity. Recent work has proposed a model to automatically generate comments, but very little work has so far considered the problem of where to insert the comments in the video timeline. In this work, we propose to address both the what and where of automatic danmu generation, by jointly predicting the danmu comment content to be generated, as well as its optimal insertion point in the video timeline. Our model exploits the video visual content, subtitles, audio signals, and any existing surrounding comments, in one unified architecture and can handle scenarios where the videos are already heavily commented or when the video has no comments yet. Experiments show that our proposed unified framework is in general observed to outperform state-of-the-art comment generation methods.
Proceedings of the Third Workshop on Multimodal Artificial Intelligence, 2021
Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu a... more Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu are time-synchronous comments that are overlaid on a video playback. These comments uniquely enrich the experience and engagement of their users, and have become a determining factor in the popularity of videos on these platforms. Similar to the "cold start problem" in recommender systems, a video will only start to attract attention when sufficient danmu comments have been posted on it. We study this video cold start problem and examine how new comments can be generated automatically on less-commented videos. We propose to predict danmu comments to promote user engagement, by exploiting a multi-modal combination of the video visual content, subtitles, audio signals, and any surrounding comments (when they exist). Our method fuses these multiple modalities in a transformer network which is then trained for different comment density scenarios. We evaluate our proposed system through both a retrieval based evaluation method, as well as human judgement. Results show that our proposed system improves significantly over stateof-the-art methods.
2016 IEEE International Conference on Image Processing (ICIP), 2016
Cutting out and object and estimate its transparency mask is a key task in many applications. We ... more Cutting out and object and estimate its transparency mask is a key task in many applications. We take on the work on closed-form matting by Levin et al.[1], that is used at the core of many matting techniques, and propose an alternative formulation that offers more flexible controls over the matting priors. We also show that this new approach is efficient at upscaling transparency maps from coarse estimates.
This paper introduces a new database of freely available stereo-3D content designed to facilitate... more This paper introduces a new database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm is proposed that replaces the saturated data by interpolating data from the unsaturated view in the wavelet domain.
IET 4th European Conference on Visual Media Production (CVMP 2007), 2007
This paper presents an automatic method to enhance video presentations for distance learning appl... more This paper presents an automatic method to enhance video presentations for distance learning applications. From a material recorded by a fixed, non professional camera, the system matches the slides displayed during the presentation with their electronic versions. The process to achieve slide recognition consists of two phases. In the first phase, the location where the slides are displayed is located by colour matching. Then a shot detection is performed in the display area and a frame is selected for each slide displayed in the video. The second phase consists of matching the frames previously selected to the electronic version of the slides. Using correlation measure, a likelihood is computed for each electronic slide to correspond to the slides displayed in the frames selected. A prior distribution is then defined to model the probability of each possible slide transition. Finally the most probable sequence of slides displayed in the video is determined using the Viterbi algorithm. The results show that the method presented is robust against luminance conditions, occlusion by the lecturer and can be performed for a large variety of presentations.
Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-le... more Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-learning based interactive segmentation algorithms have made significant progress in handling complex images and rough binary selections can typically be obtained with just a few clicks. Yet, deep learning techniques tend to plateau once this rough selection has been reached. In this work, we interpret this plateau as the inability of current algorithms to sufficiently leverage each user interaction and also as the limitations of current training/testing datasets. We propose a novel interactive architecture and a novel training scheme that are both tailored to better exploit the user workflow. We also show that significant improvements can be further gained by introducing a synthetic training dataset that is specifically designed for complex object boundaries. Comprehensive experiments support our approach, and our network achieves state of the art performance.
Cutting out an object and estimating its opacity mask, known as image matting, is a key task in m... more Cutting out an object and estimating its opacity mask, known as image matting, is a key task in many image editing applications. Deep learning approaches have made significant progress by adapting the encoder-decoder architecture of segmentation networks. However, most of the existing networks only predict the alpha matte and post-processing methods must then be used to recover the original foreground and background colours in the transparent regions. Recently, two methods have shown improved results by also estimating the foreground colours, but at a significant computational and memory cost. In this paper, we propose a low-cost modification to alpha matting networks to also predict the foreground and background colours. We study variations of the training regime and explore a wide range of existing and novel loss functions for the joint prediction. Our method achieves the state of the art performance on the Adobe Composition-1k dataset for alpha matte and composite colour quality....
The current state of the art alpha matting methods mainly rely on the trimap as the secondary and... more The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input channel. Extensive experiments are performed to analyse the effect of the background information in image and video matting such as training with mildly and heavily distorted backgrounds. Based on the quantitative evaluations performed on Adobe Composition-1k dataset, the proposed pipeline significantly outperforms the state of the art methods using AlphaMatting benchmark metrics.
Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds -- a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.
With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.
Machine Learning and Knowledge Discovery in Databases, 2019
With the rapid proliferation of multimedia data in the internet, there has been a fast rise in th... more With the rapid proliferation of multimedia data in the internet, there has been a fast rise in the creation of videos for the viewers. This enables the viewers to skip the advertisement breaks in the videos, using ad blockers and 'skip ad' buttons -bringing online marketing and publicity to a stall. In this paper, we demonstrate a system that can effectively integrate a new advertisement into a video sequence. We use state-of-the-art techniques from deep learning and computational photogrammetry, for effective detection of existing adverts, and seamless integration of new adverts into video sequences. This is helpful for targeted advertisement, paving the path for next-gen publicity.
2018 25th IEEE International Conference on Image Processing (ICIP), 2018
Many Defocus blur estimation methods have been proposed in recent years but, when applied to vide... more Many Defocus blur estimation methods have been proposed in recent years but, when applied to video sequences in a frame-by-frame manner, they typically exhibit temporal inconsistencies or flickering. This paper presents a temporal coherence scheme that can be coupled to any existing defocus blur estimation for still images, aiming to produce spatiotemporally coherent defocus blur map videos. The proposed method is based on the design of a Kalman Filter which is applied on a patch level. Experimental results show that the proposed method can smooth out undesirable temporal fluctuations whilst still being able to preserve the abrupt local appearance changes due to motion, occlusions or dis-occlusions.
2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019
Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds-a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.
Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track, 2021
Over the past decade, the evolution of video-sharing platforms has attracted a significant amount... more Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.
2019 16th International Conference on Machine Vision Applications (MVA), 2019
With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.
This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking tas... more This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking task. Our approach uses a RNN based neural network to learn a semantic representation of text (news topics), images and videos (collected from Twitter and Flickr posts) in the same latent space. We applied a two-stage (pre-train + fine-tuning) learning architecture to train and adjust the model (using Flickr30k and labels from online search as additional data). During the search phase of the task we take a different strategy to generate five different runs by leveraging video-length normalization and controlling the training source.
A combination of computer vision and projector-based illumination opens the possibility for a new... more A combination of computer vision and projector-based illumination opens the possibility for a new type of computer vision technologies. One of them is augmented reality: selectively illuminating the scene to improve or manipulate how the reality itself, rather than its display, appears to a human. One such example is the Smart Headlight being developed at Carnegie Mellon University's Robotics Institute. The project team has been working on a set of new capabilities for the headlight, such as making rain drops and snowflakes disappear, allowing for the high beams to always be on without glare, and enhancing the appearance of objects of interest. Using the Smart Headlight as an example, this talk will further discuss various ideas, concepts and possible applications of coaxial and non-coaxial projector-camera systems. About the speaker: Professor Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics at Carnegie Mellon University. He rec...
Online video advertising gives content providers the ability to deliver compelling content, reach... more Online video advertising gives content providers the ability to deliver compelling content, reach a growing audience, and generate additional revenue from online media. Recently, advertising strategies are designed to look for original advert(s) in a video frame, and replacing them with new adverts. These strategies, popularly known as product placement or embedded marketing, greatly help the marketing agencies to reach out to a wider audience. However, in the existing literature, such detection of candidate frames in a video sequence for the purpose of advert integration, is done manually. In this paper, we propose a deep-learning architecture called ADNet, that automatically detects the presence of advertisements in video frames. Our approach is the first of its kind that automatically detects the presence of adverts in a video frame, and achieves state-of-the-art results on a public dataset.
This research project aims to provide an AI enhanced productivity tool for media generation to vi... more This research project aims to provide an AI enhanced productivity tool for media generation to video editors and producers. The goal is to improve productivity among producers and artists in terms of augmenting video content with new objects or effects, in a natural and appealing way. Furthermore, the project aims to bridge the gap between offline augmented reality technologies, occlusion handling, and camera tracking.
Proceedings of the 2021 International Conference on Multimodal Interaction, 2021
Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. ... more Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. These time-synchronous comments are overlaid on the video playback and uniquely enrich the viewing experience, engaging hundreds of millions of users in rich community discussions. The presence of danmu comments has become a determining factor for video popularity. Recent work has proposed a model to automatically generate comments, but very little work has so far considered the problem of where to insert the comments in the video timeline. In this work, we propose to address both the what and where of automatic danmu generation, by jointly predicting the danmu comment content to be generated, as well as its optimal insertion point in the video timeline. Our model exploits the video visual content, subtitles, audio signals, and any existing surrounding comments, in one unified architecture and can handle scenarios where the videos are already heavily commented or when the video has no comments yet. Experiments show that our proposed unified framework is in general observed to outperform state-of-the-art comment generation methods.
Proceedings of the Third Workshop on Multimodal Artificial Intelligence, 2021
Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu a... more Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu are time-synchronous comments that are overlaid on a video playback. These comments uniquely enrich the experience and engagement of their users, and have become a determining factor in the popularity of videos on these platforms. Similar to the "cold start problem" in recommender systems, a video will only start to attract attention when sufficient danmu comments have been posted on it. We study this video cold start problem and examine how new comments can be generated automatically on less-commented videos. We propose to predict danmu comments to promote user engagement, by exploiting a multi-modal combination of the video visual content, subtitles, audio signals, and any surrounding comments (when they exist). Our method fuses these multiple modalities in a transformer network which is then trained for different comment density scenarios. We evaluate our proposed system through both a retrieval based evaluation method, as well as human judgement. Results show that our proposed system improves significantly over stateof-the-art methods.
2016 IEEE International Conference on Image Processing (ICIP), 2016
Cutting out and object and estimate its transparency mask is a key task in many applications. We ... more Cutting out and object and estimate its transparency mask is a key task in many applications. We take on the work on closed-form matting by Levin et al.[1], that is used at the core of many matting techniques, and propose an alternative formulation that offers more flexible controls over the matting priors. We also show that this new approach is efficient at upscaling transparency maps from coarse estimates.
This paper introduces a new database of freely available stereo-3D content designed to facilitate... more This paper introduces a new database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm is proposed that replaces the saturated data by interpolating data from the unsaturated view in the wavelet domain.
IET 4th European Conference on Visual Media Production (CVMP 2007), 2007
This paper presents an automatic method to enhance video presentations for distance learning appl... more This paper presents an automatic method to enhance video presentations for distance learning applications. From a material recorded by a fixed, non professional camera, the system matches the slides displayed during the presentation with their electronic versions. The process to achieve slide recognition consists of two phases. In the first phase, the location where the slides are displayed is located by colour matching. Then a shot detection is performed in the display area and a frame is selected for each slide displayed in the video. The second phase consists of matching the frames previously selected to the electronic version of the slides. Using correlation measure, a likelihood is computed for each electronic slide to correspond to the slides displayed in the frames selected. A prior distribution is then defined to model the probability of each possible slide transition. Finally the most probable sequence of slides displayed in the video is determined using the Viterbi algorithm. The results show that the method presented is robust against luminance conditions, occlusion by the lecturer and can be performed for a large variety of presentations.
Uploads
Papers by François Pitié