Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Michele Covell

    synchronization of video facial images and audio tracks
    L'identification de donnees cachees, tels que des points de commande pilotes en fonction des caracteristiques dans une image, a partir d'un ensemble de donnees pouvant etre observees, telle que l'image, est realisee selon une... more
    L'identification de donnees cachees, tels que des points de commande pilotes en fonction des caracteristiques dans une image, a partir d'un ensemble de donnees pouvant etre observees, telle que l'image, est realisee selon une approche en deux etapes. La premiere etape comprend un processus d'apprentissage, dans lequel un certain nombre d'ensembles de donnees echantillons, par exemple, des images, sont analyses pour identifier la correspondance entre les donnees pouvant etre observees, comme les aspects visuels de l'image, et les donnees cachees requises, comme les points de commande. Deux modeles sont crees. Un modele a aspect de caracteristique uniquement est cree a partir des exemples alignes de la caracteristique dans les donnees observees. En outre, chaque ensemble de donnees marque est traite pour generer un modele couple des donnees observees alignees et des donnees cachees associees. Dans le mode de realisation de traitement de l'image, ces deux mo...
    Real-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions and improving air quality. Much of the current research in... more
    Real-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions and improving air quality. Much of the current research in traffic-light optimization relies on extending the capabilities of traffic lights to either communicate with each other or communicate with vehicles. However, before such capabilities become ubiquitous, opportunities exist to improve traffic lights by being more responsive to current traffic situations within the current, already deployed, infrastructure. In this paper, we introduce a traffic light controller that employs bidding within micro-auctions to efficiently incorporate traffic sensor information; no other outside sources of information are assumed. We train and test traffic light controllers on large-scale data collected from opted-in Android cell-phone users over a period of several months in Mountain View, California and the River North neighborh...
    In the context of stochastic search, once regions of high performance are found, having the property that small changes in the candidate solution correspond to searching nearby neighborhoods provides the ability to perform effective local... more
    In the context of stochastic search, once regions of high performance are found, having the property that small changes in the candidate solution correspond to searching nearby neighborhoods provides the ability to perform effective local optimization. To achieve this, Gray Codes are often employed for encoding ordinal points or discretized real numbers. In this paper, we present a method to label similar and/or close points within arbitrary graphs with small Hamming distances. The resultant point labels can be viewed as an approximate highdimensional variant of Gray Codes. The labeling procedure is useful for any task in which the solution requires the search algorithm to select a small subset of items out of many. A large number of empirical results using these encodings with a combination of genetic algorithms and hill-climbing are presented. Keywords—Gray Code; Graph Labeling; Local Search; Genetic Algorithms; Stochastic Search
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Examples include searching for, recommending, and advertising against image, audio, and video content. These labeling... more
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Examples include searching for, recommending, and advertising against image, audio, and video content. These labeling problems must handle millions of interconnected entities (users, domains, content segments) and thousands of competing labels (interests, tags, recommendations, topics). Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weighted-connection graph. In their work on video recommendations, Baluja et al. [1] showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order...
    Research Interests:
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across... more
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weighted-connection graph. In their work on video recommendations, Baluja et al. showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order to allow the normalization to be correctly determined. Interleaved normalization also prevented use of standard linear-algebra methods, like stabilized bi-conjugate gradient descent (BiCGStab) and Gaussian elimination. We show how to replace the interleaved normalization with a single pre-n...
    This paper presents a novel, real-time, minimal-latency technique for dissolve detection which handles the widely varying camera techniques, expertise, and overall video quality seen in amateur, semi-professional, and professional video... more
    This paper presents a novel, real-time, minimal-latency technique for dissolve detection which handles the widely varying camera techniques, expertise, and overall video quality seen in amateur, semi-professional, and professional video footage. We achieve 88 % recall and 93 % precision for dissolve detection. In contrast, on the same data set, at a similar recall rate (87%), DCD has more than 3 times the number of false positives, giving a precision of only 81 % for dissolve detection. 1. OVERVIEW This paper discusses an improved approach for dissolve detection. A dissolve gradually cross-fades from the old shot’s footage to the new shot’s footage. The dissolve is the most common transition used in post-production. It is also available as an “in-camera ” effect on many consumer-grade camcorders. We use the results from our dissolve detector (along with our cut and fade detectors) to support scene-based video browsing and editing [1]. By placing our detector at the heart of an inexp...
    FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data,... more
    FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing the correlation matrices.
    A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites... more
    A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image...
    This paper compares the efficacy and efficiency of different clustering approaches for selecting a set of exemplar images, to present in the context of a semantic concept. We evaluate these approaches using 900 diverse queries, each... more
    This paper compares the efficacy and efficiency of different clustering approaches for selecting a set of exemplar images, to present in the context of a semantic concept. We evaluate these approaches using 900 diverse queries, each associated with 1000 web images, and comparing the exemplars chosen by clustering to the top 20 images for that search term. Our results suggest that Affinity Propagation is effective in selecting exemplars that match the top search images but at high computational cost. We improve on these early results using a simple distribution-based selection filter on incomplete clustering results. This improvement allows us to use more computationally efficient approaches to clustering, such as Hierarchical Agglomerative Clustering (HAC) and Partitioning Around Medoids (PAM), while still reaching the same (or better) quality of results as were given by Affinity Propagation in the original study. The computational savings is significant since these alternatives are...
    Decades of research have been directed towards improving the timing of traffic lights. The ubiquity of cell phones among drivers has created the opportunity to design new sensors for traffic light controllers. These new sensors, which... more
    Decades of research have been directed towards improving the timing of traffic lights. The ubiquity of cell phones among drivers has created the opportunity to design new sensors for traffic light controllers. These new sensors, which search for radio signals that are constantly emanating from cell phones, hold the hope of replacing the typical induction-loop sensors that are installed within road pavements. A replacement to induction sensors is desired as they require significant roadwork to install, frequent maintenance and checkups, are sensitive to proper repairs and installation work, and the construction techniques, materials, and even surrounding unrelated ground work can be sources of failure. However, before cell phone sensors can be widely deployed, users must become comfortable with the passive use of their cell phones by municipalities for this purpose. Despite complete anonymization, public privacy concerns may remain. This presents a chicken-and-egg problem: without sh...
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across... more
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weightedconnection graph. In their work on video recommendations, Baluja et al. showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order to allow the normalization to be correctly determined. Interleaved normalization also prevented use of standard linear-algebra methods, like stabilized bi-conjugate gradient descent (BiCGStab) and Gaussian elimination. We show how to replace the interleaved normalization with a single prenor...
    This paper describes techniques to automatically morph from one sound to another. Audio morphing is accomplished by representing the sound in a multi-dimensional space that is warped or modified to produce a desired result. The... more
    This paper describes techniques to automatically morph from one sound to another. Audio morphing is accomplished by representing the sound in a multi-dimensional space that is warped or modified to produce a desired result. The multi-dimensional space encodes the spectral shape and pitch on orthogonal axes. After matching components of the sound, a morph smoothly interpolates the amplitudes to describe a new sound in the same perceptual space. Finally, the representation is inverted to produce a sound. This paper describes representations for morphing, techniques for matching, and algorithms for interpolating and morphing each sound component. Spectrographic images of a complete morph are shown at the end.
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Examples include searching for, recommending, and advertising against image, audio, and video content. These labeling... more
    Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Examples include searching for, recommending, and advertising against image, audio, and video content. These labeling problems must handle millions of interconnected entities (users, domains, content segments) and thousands of competing labels (interests, tags, recommendations, topics). Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weightedconnection graph. In their work on video recommendations, Baluja et al. [1] showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order ...
    A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited and metered bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and... more
    A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited and metered bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and extremely compressed image previews as part of the initial page-load process. Recent work, based on an adaptive triangulation of the target image, has performed well at extreme compression rates: 200 bytes or less. Gains have been shown, in terms of PSNR and SSIM, over both JPEG and WebP standards. However, qualitative assessments and preservation of semantic content can be less favorable. We present a novel method to significantly improve the reconstruction quality of the original image that requires no changes to the encoded information. Our neural-based decoding triples the amount of semantic-level content preservation while also improving both SSIM and PSNR scores. In addition, by keeping the same encoding stream, our solution is completely inter-...
    This paper describes mass personalization, a framework for combining mass media with a highly personalized Web-based experience. We introduce four applications for mass personalization: personalized content layers, ad hoc social... more
    This paper describes mass personalization, a framework for combining mass media with a highly personalized Web-based experience. We introduce four applications for mass personalization: personalized content layers, ad hoc social communities, real-time popularity ratings and virtual media library services. Using the ambient audio originating from the television, the four applications are available with no more effort than simple television channel surfing. Our audio identification system does not use dedicated interactive TV hardware and does not compromise the user’s privacy. Feasibility tests of the proposed applications are provided both with controlled conversational interference and with “living-room” evaluations.
    A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited- and metered-bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and... more
    A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited- and metered-bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and extremely compressed image previews as part of the initial page-load process. Recent work, based on an adaptive triangulation of the target image, has shown the ability to generate thumbnails of full images at extreme compression rates: 200 bytes or less with impressive gains (in terms of PSNR and SSIM) over both JPEG and WebP standards. However, qualitative assessments and preservation of semantic content can be less favorable. We present a novel method to significantly improve the reconstruction quality of the original image with no changes to the encoded information. Our neural-based decoding not only achieves higher PSNR and SSIM scores than the original methods, but also yields a substantial increase in semantic-level content preservation. In ad...
    T HE huge success of deep-learning–based approaches in computer vision has inspired research in learned solutions to classic image/video processing problems, such as denoising, deblurring, dehazing, deraining, super-resolution (SR), and... more
    T HE huge success of deep-learning–based approaches in computer vision has inspired research in learned solutions to classic image/video processing problems, such as denoising, deblurring, dehazing, deraining, super-resolution (SR), and compression. Hence, learning-based methods have emerged as a promising nonlinear signal-processing framework for image/video restoration and compression. Recent works have shown that learned models can achieve significant performance gains, especially in terms of perceptual quality measures, over traditional methods. Hence, the state of the art in image restoration and compression is getting redefined. This special issue covers the state of the art in learned image/video restoration and compression to promote further progress in innovative architectures and training methods for effective and efficient networks for image/video restoration and compression. In the following, we provide a short overview of the state of the art in learned image and video ...
    This paper describes analM--H3u[M FaceSync, that measures the degree of synchronization between the video image of a face and the associated audiosignal We can do this task by synthesizing thetalH0H face, using techniques such as Video... more
    This paper describes analM--H3u[M FaceSync, that measures the degree of synchronization between the video image of a face and the associated audiosignal We can do this task by synthesizing thetalH0H face, using techniques such as Video Rewrite [1], and then comparing the synthesized video with the test video. That process, however, is expensive. Our solC39-- finds alH394 operator that, whenapplM0 to the audio and videosignalH generates an audio--video-synchronization-errorsignal Theleu22 operator gathers information from throughout the image and thus als ws us to do the computation inexpensivel . Hershey and Movel03 [2] describe an approach based on measuring themutual information between the audiosignal and individual pixel in the video. ThecorrelCu[H between the audiosignal x, and one pixel in the image y, is given by Pearson'scorrel[HHCH
    In this work, we propose to quantize all parts of standard classification networks and replace the activation-weight--multiply step with a simple table-based lookup. This approach results in networks that are free of floating-point... more
    In this work, we propose to quantize all parts of standard classification networks and replace the activation-weight--multiply step with a simple table-based lookup. This approach results in networks that are free of floating-point operations and free of multiplications, suitable for direct FPGA and ASIC implementations. It also provides us with two simple measures of per-layer and network-wide compactness as well as insight into the distribution characteristics of activationoutput and weight values. We run controlled studies across different quantization schemes, both fixed and adaptive and, within the set of adaptive approaches, both parametric and model-free. We implement our approach to quantization with minimal, localized changes to the training process, allowing us to benefit from advances in training continuous-valued network architectures. We apply our approach successfully to AlexNet, ResNet, and MobileNet. We show results that are within 1.6% of the reported, non-quantized...

    And 157 more