Skip to main content

François Pitié

Trinity College Dublin, Department of Electronic & Electrical Engineering, Post-Doc

Followers

30

Following

1

Public Views

I am a Researcher in Video Processing in the Sigmedia group of the department of Electronic & Electrical Engineering at Trinity College Dublin.

My research is concerned with media signal processing applications related to film postproduction and film restoration. The aim of my research is to develop tools that can enhance the cinematic experience of immersion.
Phone: +353 1896 3818
Address: AAP 2.18
5 College Green
Trinity College Dublin

less

Quoc-cuong Pham

Massimo Bordignon

Università Cattolica del Sacro Cuore (Catholic University of the Sacred Heart)

Université Paris-Est

Md. Kamrul Hasan

Khulna University of Engineering and Technology

Toufick E Elahi

Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Bangladesh

S M Taslim Uddin Raju

Khulna University of Engineering & Technology

Interests

Uploads

Papers by François Pitié

Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-le... more Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-learning based interactive segmentation algorithms have made significant progress in handling complex images and rough binary selections can typically be obtained with just a few clicks. Yet, deep learning techniques tend to plateau once this rough selection has been reached. In this work, we interpret this plateau as the inability of current algorithms to sufficiently leverage each user interaction and also as the limitations of current training/testing datasets. We propose a novel interactive architecture and a novel training scheme that are both tailored to better exploit the user workflow. We also show that significant improvements can be further gained by introducing a synthetic training dataset that is specifically designed for complex object boundaries. Comprehensive experiments support our approach, and our network achieves state of the art performance.

F, B, Alpha Matting

Cutting out an object and estimating its opacity mask, known as image matting, is a key task in m... more Cutting out an object and estimating its opacity mask, known as image matting, is a key task in many image editing applications. Deep learning approaches have made significant progress by adapting the encoder-decoder architecture of segmentation networks. However, most of the existing networks only predict the alpha matte and post-processing methods must then be used to recover the original foreground and background colours in the transparent regions. Recently, two methods have shown improved results by also estimating the foreground colours, but at a significant computational and memory cost. In this paper, we propose a low-cost modification to alpha matting networks to also predict the foreground and background colours. We study variations of the training regime and explore a wide range of existing and novel loss functions for the joint prediction. Our method achieves the state of the art performance on the Adobe Composition-1k dataset for alpha matte and composite colour quality....

Background Matting

The current state of the art alpha matting methods mainly rely on the trimap as the secondary and... more The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input channel. Extensive experiments are performed to analyse the effect of the background information in image and video matting such as training with mildly and heavily distorted backgrounds. Based on the quantitative evaluations performed on Adobe Composition-1k dataset, the proposed pipeline significantly outperforms the state of the art methods using AlphaMatting benchmark metrics.

Restoration Of Image Burnout In 3D-Stereoscopic Media Using Inter-View Gradient Interpolation

Publication in the conference proceedings of EUSIPCO, Barcelona, Spain, 2011

Localizing Adverts in Outdoor Scenes

Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds -- a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.

The CASE Dataset of Candidate Spaces for Advert Implantation

With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.

An Advert Creation System for Next-Gen Publicity

Machine Learning and Knowledge Discovery in Databases, 2019

With the rapid proliferation of multimedia data in the internet, there has been a fast rise in th... more With the rapid proliferation of multimedia data in the internet, there has been a fast rise in the creation of videos for the viewers. This enables the viewers to skip the advertisement breaks in the videos, using ad blockers and 'skip ad' buttons -bringing online marketing and publicity to a stall. In this paper, we demonstrate a system that can effectively integrate a new advertisement into a video sequence. We use state-of-the-art techniques from deep learning and computational photogrammetry, for effective detection of existing adverts, and seamless integration of new adverts into video sequences. This is helpful for targeted advertisement, paving the path for next-gen publicity.

Temporal Consistency for Still Image Based Defocus Blur Estimation Methods

2018 25th IEEE International Conference on Image Processing (ICIP), 2018

Many Defocus blur estimation methods have been proposed in recent years but, when applied to vide... more Many Defocus blur estimation methods have been proposed in recent years but, when applied to video sequences in a frame-by-frame manner, they typically exhibit temporal inconsistencies or flickering. This paper presents a temporal coherence scheme that can be coupled to any existing defocus blur estimation for still images, aiming to produce spatiotemporally coherent defocus blur map videos. The proposed method is based on the design of a Kalman Filter which is applied on a patch level. Experimental results show that the proposed method can smooth out undesirable temporal fluctuations whilst still being able to preserve the abrupt local appearance changes due to motion, occlusions or dis-occlusions.

Localizing Adverts in Outdoor Scenes

2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019

Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds-a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.

An Advert Creation System for 3D Product Placements

Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track, 2021

Over the past decade, the evolution of video-sharing platforms has attracted a significant amount... more Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.

The CASE Dataset of Candidate Spaces for Advert Implantation

2019 16th International Conference on Machine Vision Applications (MVA), 2019

With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.

A Two-stage Transfer Learning Approach for Storytelling Linking

This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking tas... more This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking task. Our approach uses a RNN based neural network to learn a semantic representation of text (news topics), images and videos (collected from Twitter and Flickr posts) in the same latent space. We applied a two-stage (pre-train + fine-tuning) learning architecture to train and adjust the model (using Flickr30k and labels from online search as additional data). During the search phase of the task we take a different strategy to generate five different runs by leveraging video-length normalization and controlling the training source.

IRISH MACHINE VISION & IMAGE PROCESSINGConference proceedings 2015

A combination of computer vision and projector-based illumination opens the possibility for a new... more A combination of computer vision and projector-based illumination opens the possibility for a new type of computer vision technologies. One of them is augmented reality: selectively illuminating the scene to improve or manipulate how the reality itself, rather than its display, appears to a human. One such example is the Smart Headlight being developed at Carnegie Mellon University's Robotics Institute. The project team has been working on a set of new capabilities for the headlight, such as making rain drops and snowflakes disappear, allowing for the high beams to always be on without glare, and enhancing the appearance of objects of interest. Using the Smart Headlight as an example, this talk will further discuss various ideas, concepts and possible applications of coaxial and non-coaxial projector-camera systems. About the speaker: Professor Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics at Carnegie Mellon University. He rec...

ADNet: A Deep Network for Detecting Adverts

Online video advertising gives content providers the ability to deliver compelling content, reach... more Online video advertising gives content providers the ability to deliver compelling content, reach a growing audience, and generate additional revenue from online media. Recently, advertising strategies are designed to look for original advert(s) in a video frame, and replacing them with new adverts. These strategies, popularly known as product placement or embedded marketing, greatly help the marketing agencies to reach out to a wider audience. However, in the existing literature, such detection of candidate frames in a video sequence for the purpose of advert integration, is done manually. In this paper, we propose a deep-learning architecture called ADNet, that automatically detects the presence of advertisements in video frames. Our approach is the first of its kind that automatically detects the presence of adverts in a video frame, and achieves state-of-the-art results on a public dataset.

DeepReal-A Deep Learning Based 3D Advert Integration System

This research project aims to provide an AI enhanced productivity tool for media generation to vi... more This research project aims to provide an AI enhanced productivity tool for media generation to video editors and producers. The goal is to improve productivity among producers and artists in terms of augmenting video content with new objects or effects, in a natural and appealing way. Furthermore, the project aims to bridge the gap between offline augmented reality technologies, occlusion handling, and camera tracking.

Knowing Where and What to Write in Automated Live Video Comments: A Unified Multi-Task Approach

Proceedings of the 2021 International Conference on Multimodal Interaction, 2021

Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. ... more Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. These time-synchronous comments are overlaid on the video playback and uniquely enrich the viewing experience, engaging hundreds of millions of users in rich community discussions. The presence of danmu comments has become a determining factor for video popularity. Recent work has proposed a model to automatically generate comments, but very little work has so far considered the problem of where to insert the comments in the video timeline. In this work, we propose to address both the what and where of automatic danmu generation, by jointly predicting the danmu comment content to be generated, as well as its optimal insertion point in the video timeline. Our model exploits the video visual content, subtitles, audio signals, and any existing surrounding comments, in one unified architecture and can handle scenarios where the videos are already heavily commented or when the video has no comments yet. Experiments show that our proposed unified framework is in general observed to outperform state-of-the-art comment generation methods.

Cold Start Problem For Automated Live Video Comments

Proceedings of the Third Workshop on Multimodal Artificial Intelligence, 2021

Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu a... more Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu are time-synchronous comments that are overlaid on a video playback. These comments uniquely enrich the experience and engagement of their users, and have become a determining factor in the popularity of videos on these platforms. Similar to the "cold start problem" in recommender systems, a video will only start to attract attention when sufficient danmu comments have been posted on it. We study this video cold start problem and examine how new comments can be generated automatically on less-commented videos. We propose to predict danmu comments to promote user engagement, by exploiting a multi-modal combination of the video visual content, subtitles, audio signals, and any surrounding comments (when they exist). Our method fuses these multiple modalities in a transformer network which is then trained for different comment density scenarios. We evaluate our proposed system through both a retrieval based evaluation method, as well as human judgement. Results show that our proposed system improves significantly over stateof-the-art methods.

An alternative matting Laplacian

2016 IEEE International Conference on Image Processing (ICIP), 2016

Cutting out and object and estimate its transparency mask is a key task in many applications. We ... more Cutting out and object and estimate its transparency mask is a key task in many applications. We take on the work on closed-form matting by Levin et al.[1], that is used at the core of many matting techniques, and propose an alternative formulation that offers more flexible controls over the matting priors. We also show that this new approach is efficient at upscaling transparency maps from coarse estimates.

A Video Database for the Development of Stereo-3D Post-Production Algorithms

2010 Conference on Visual Media Production, 2010

This paper introduces a new database of freely available stereo-3D content designed to facilitate... more This paper introduces a new database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm is proposed that replaces the saturated data by interpolating data from the unsaturated view in the wavelet domain.

Electronic slide matching and enhancement of a lecture video

IET 4th European Conference on Visual Media Production (CVMP 2007), 2007

This paper presents an automatic method to enhance video presentations for distance learning appl... more This paper presents an automatic method to enhance video presentations for distance learning applications. From a material recorded by a fixed, non professional camera, the system matches the slides displayed during the presentation with their electronic versions. The process to achieve slide recognition consists of two phases. In the first phase, the location where the slides are displayed is located by colour matching. Then a shot detection is performed in the display area and a frame is selected for each slide displayed in the video. The second phase consists of matching the frames previously selected to the electronic version of the slides. Using correlation measure, a likelihood is computed for each electronic slide to correspond to the slides displayed in the frames selected. A prior distribution is then defined to model the probability of each possible slide transition. Finally the most probable sequence of slides displayed in the video is determined using the Viterbi algorithm. The results show that the method presented is robust against luminance conditions, occlusion by the lecturer and can be performed for a large variety of presentations.

Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-le... more Interactive object cutout tools are the cornerstone of the image editing workflow. Recent deep-learning based interactive segmentation algorithms have made significant progress in handling complex images and rough binary selections can typically be obtained with just a few clicks. Yet, deep learning techniques tend to plateau once this rough selection has been reached. In this work, we interpret this plateau as the inability of current algorithms to sufficiently leverage each user interaction and also as the limitations of current training/testing datasets. We propose a novel interactive architecture and a novel training scheme that are both tailored to better exploit the user workflow. We also show that significant improvements can be further gained by introducing a synthetic training dataset that is specifically designed for complex object boundaries. Comprehensive experiments support our approach, and our network achieves state of the art performance.

F, B, Alpha Matting

Cutting out an object and estimating its opacity mask, known as image matting, is a key task in m... more Cutting out an object and estimating its opacity mask, known as image matting, is a key task in many image editing applications. Deep learning approaches have made significant progress by adapting the encoder-decoder architecture of segmentation networks. However, most of the existing networks only predict the alpha matte and post-processing methods must then be used to recover the original foreground and background colours in the transparent regions. Recently, two methods have shown improved results by also estimating the foreground colours, but at a significant computational and memory cost. In this paper, we propose a low-cost modification to alpha matting networks to also predict the foreground and background colours. We study variations of the training regime and explore a wide range of existing and novel loss functions for the joint prediction. Our method achieves the state of the art performance on the Adobe Composition-1k dataset for alpha matte and composite colour quality....

Background Matting

The current state of the art alpha matting methods mainly rely on the trimap as the secondary and... more The current state of the art alpha matting methods mainly rely on the trimap as the secondary and only guidance to estimate alpha. This paper investigates the effects of utilising the background information as well as trimap in the process of alpha calculation. To achieve this goal, a state of the art method, AlphaGan is adopted and modified to process the background information as an extra input channel. Extensive experiments are performed to analyse the effect of the background information in image and video matting such as training with mildly and heavily distorted backgrounds. Based on the quantitative evaluations performed on Adobe Composition-1k dataset, the proposed pipeline significantly outperforms the state of the art methods using AlphaMatting benchmark metrics.

Restoration Of Image Burnout In 3D-Stereoscopic Media Using Inter-View Gradient Interpolation

Publication in the conference proceedings of EUSIPCO, Barcelona, Spain, 2011

Localizing Adverts in Outdoor Scenes

Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds -- a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.

The CASE Dataset of Candidate Spaces for Advert Implantation

With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.

An Advert Creation System for Next-Gen Publicity

Machine Learning and Knowledge Discovery in Databases, 2019

With the rapid proliferation of multimedia data in the internet, there has been a fast rise in th... more With the rapid proliferation of multimedia data in the internet, there has been a fast rise in the creation of videos for the viewers. This enables the viewers to skip the advertisement breaks in the videos, using ad blockers and 'skip ad' buttons -bringing online marketing and publicity to a stall. In this paper, we demonstrate a system that can effectively integrate a new advertisement into a video sequence. We use state-of-the-art techniques from deep learning and computational photogrammetry, for effective detection of existing adverts, and seamless integration of new adverts into video sequences. This is helpful for targeted advertisement, paving the path for next-gen publicity.

Temporal Consistency for Still Image Based Defocus Blur Estimation Methods

2018 25th IEEE International Conference on Image Processing (ICIP), 2018

Many Defocus blur estimation methods have been proposed in recent years but, when applied to vide... more Many Defocus blur estimation methods have been proposed in recent years but, when applied to video sequences in a frame-by-frame manner, they typically exhibit temporal inconsistencies or flickering. This paper presents a temporal coherence scheme that can be coupled to any existing defocus blur estimation for still images, aiming to produce spatiotemporally coherent defocus blur map videos. The proposed method is based on the design of a Kalman Filter which is applied on a patch level. Experimental results show that the proposed method can smooth out undesirable temporal fluctuations whilst still being able to preserve the abrupt local appearance changes due to motion, occlusions or dis-occlusions.

Localizing Adverts in Outdoor Scenes

2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019

Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of... more Online videos have witnessed an unprecedented growth over the last decade, owing to wide range of content creation. This provides the advertisement and marketing agencies plethora of opportunities for targeted advertisements. Such techniques involve replacing an existing advertisement in a video frame, with a new advertisement. However, such post-processing of online videos is mostly done manually by video editors. This is cumbersome and time-consuming. In this paper, we propose DeepAds-a deep neural network, based on the simple encoder-decoder architecture, that can accurately localize the position of an advert in a video frame. Our approach of localizing billboards in outdoor scenes using neural nets, is the first of its kind, and achieves the best performance. We benchmark our proposed method with other semantic segmentation algorithms, on a public dataset of outdoor scenes with manually annotated billboard binary maps.

An Advert Creation System for 3D Product Placements

Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track, 2021

Over the past decade, the evolution of video-sharing platforms has attracted a significant amount... more Over the past decade, the evolution of video-sharing platforms has attracted a significant amount of investments on contextual advertising. The common contextual advertising platforms utilize the information provided by users to integrate 2D visual ads into videos. The existing platforms face many technical challenges such as ad integration with respect to occluding objects and 3D ad placement. This paper presents a Video Advertisement Placement & Integration (Adverts) framework, which is capable of perceiving the 3D geometry of the scene and camera motion to blend 3D virtual objects in videos and create the illusion of reality. The proposed framework contains several modules such as monocular depth estimation, object segmentation, background-foreground separation, alpha matting and camera tracking. Our experiments conducted using Adverts framework indicates the significant potential of this system in contextual ad integration, and pushing the limits of advertising industry using mixed reality technologies.

The CASE Dataset of Candidate Spaces for Advert Implantation

2019 16th International Conference on Machine Vision Applications (MVA), 2019

With the advent of faster internet services and growth of multimedia content, we observe a massiv... more With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.

A Two-stage Transfer Learning Approach for Storytelling Linking

This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking tas... more This paper provides an overview of our participation in the TRECVID 2018 Storytelling Linking task. Our approach uses a RNN based neural network to learn a semantic representation of text (news topics), images and videos (collected from Twitter and Flickr posts) in the same latent space. We applied a two-stage (pre-train + fine-tuning) learning architecture to train and adjust the model (using Flickr30k and labels from online search as additional data). During the search phase of the task we take a different strategy to generate five different runs by leveraging video-length normalization and controlling the training source.

IRISH MACHINE VISION & IMAGE PROCESSINGConference proceedings 2015

A combination of computer vision and projector-based illumination opens the possibility for a new... more A combination of computer vision and projector-based illumination opens the possibility for a new type of computer vision technologies. One of them is augmented reality: selectively illuminating the scene to improve or manipulate how the reality itself, rather than its display, appears to a human. One such example is the Smart Headlight being developed at Carnegie Mellon University's Robotics Institute. The project team has been working on a set of new capabilities for the headlight, such as making rain drops and snowflakes disappear, allowing for the high beams to always be on without glare, and enhancing the appearance of objects of interest. Using the Smart Headlight as an example, this talk will further discuss various ideas, concepts and possible applications of coaxial and non-coaxial projector-camera systems. About the speaker: Professor Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics at Carnegie Mellon University. He rec...

ADNet: A Deep Network for Detecting Adverts

Online video advertising gives content providers the ability to deliver compelling content, reach... more Online video advertising gives content providers the ability to deliver compelling content, reach a growing audience, and generate additional revenue from online media. Recently, advertising strategies are designed to look for original advert(s) in a video frame, and replacing them with new adverts. These strategies, popularly known as product placement or embedded marketing, greatly help the marketing agencies to reach out to a wider audience. However, in the existing literature, such detection of candidate frames in a video sequence for the purpose of advert integration, is done manually. In this paper, we propose a deep-learning architecture called ADNet, that automatically detects the presence of advertisements in video frames. Our approach is the first of its kind that automatically detects the presence of adverts in a video frame, and achieves state-of-the-art results on a public dataset.

DeepReal-A Deep Learning Based 3D Advert Integration System

This research project aims to provide an AI enhanced productivity tool for media generation to vi... more This research project aims to provide an AI enhanced productivity tool for media generation to video editors and producers. The goal is to improve productivity among producers and artists in terms of augmenting video content with new objects or effects, in a natural and appealing way. Furthermore, the project aims to bridge the gap between offline augmented reality technologies, occlusion handling, and camera tracking.

Knowing Where and What to Write in Automated Live Video Comments: A Unified Multi-Task Approach

Proceedings of the 2021 International Conference on Multimodal Interaction, 2021

Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. ... more Live video comments, or “danmu”, are an emerging social feature on Asian online video platforms. These time-synchronous comments are overlaid on the video playback and uniquely enrich the viewing experience, engaging hundreds of millions of users in rich community discussions. The presence of danmu comments has become a determining factor for video popularity. Recent work has proposed a model to automatically generate comments, but very little work has so far considered the problem of where to insert the comments in the video timeline. In this work, we propose to address both the what and where of automatic danmu generation, by jointly predicting the danmu comment content to be generated, as well as its optimal insertion point in the video timeline. Our model exploits the video visual content, subtitles, audio signals, and any existing surrounding comments, in one unified architecture and can handle scenarios where the videos are already heavily commented or when the video has no comments yet. Experiments show that our proposed unified framework is in general observed to outperform state-of-the-art comment generation methods.

Cold Start Problem For Automated Live Video Comments

Proceedings of the Third Workshop on Multimodal Artificial Intelligence, 2021

Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu a... more Live video comments, or "danmu", are an emerging feature on Asian online video platforms. Danmu are time-synchronous comments that are overlaid on a video playback. These comments uniquely enrich the experience and engagement of their users, and have become a determining factor in the popularity of videos on these platforms. Similar to the "cold start problem" in recommender systems, a video will only start to attract attention when sufficient danmu comments have been posted on it. We study this video cold start problem and examine how new comments can be generated automatically on less-commented videos. We propose to predict danmu comments to promote user engagement, by exploiting a multi-modal combination of the video visual content, subtitles, audio signals, and any surrounding comments (when they exist). Our method fuses these multiple modalities in a transformer network which is then trained for different comment density scenarios. We evaluate our proposed system through both a retrieval based evaluation method, as well as human judgement. Results show that our proposed system improves significantly over stateof-the-art methods.

An alternative matting Laplacian

2016 IEEE International Conference on Image Processing (ICIP), 2016

Cutting out and object and estimate its transparency mask is a key task in many applications. We ... more Cutting out and object and estimate its transparency mask is a key task in many applications. We take on the work on closed-form matting by Levin et al.[1], that is used at the core of many matting techniques, and propose an alternative formulation that offers more flexible controls over the matting priors. We also show that this new approach is efficient at upscaling transparency maps from coarse estimates.

A Video Database for the Development of Stereo-3D Post-Production Algorithms

2010 Conference on Visual Media Production, 2010

This paper introduces a new database of freely available stereo-3D content designed to facilitate... more This paper introduces a new database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm is proposed that replaces the saturated data by interpolating data from the unsaturated view in the wavelet domain.

Electronic slide matching and enhancement of a lecture video

IET 4th European Conference on Visual Media Production (CVMP 2007), 2007

This paper presents an automatic method to enhance video presentations for distance learning appl... more This paper presents an automatic method to enhance video presentations for distance learning applications. From a material recorded by a fixed, non professional camera, the system matches the slides displayed during the presentation with their electronic versions. The process to achieve slide recognition consists of two phases. In the first phase, the location where the slides are displayed is located by colour matching. Then a shot detection is performed in the display area and a frame is selected for each slide displayed in the video. The second phase consists of matching the frames previously selected to the electronic version of the slides. Using correlation measure, a likelihood is computed for each electronic slide to correspond to the slides displayed in the frames selected. A prior distribution is then defined to model the probability of each possible slide transition. Finally the most probable sequence of slides displayed in the video is determined using the Viterbi algorithm. The results show that the method presented is robust against luminance conditions, occlusion by the lecturer and can be performed for a large variety of presentations.