The MMSys conference series has been a reference international forum for researchers to present, share and discuss their most recent contributions and findings on multimedia systems, with a manifold of domains.
A questionnaire-based and physiology-inspired quality of experience evaluation of an immersive multisensory wheelchair simulator
Immersive multimedia technologies such as virtual reality (VR) are now finding potential applications in domains outside of entertainment and gaming in areas such as health, education, and tourism, to name a few. This article presents a Quality of ...
Deep variational learning for multiple trajectory prediction of 360° head movements
- Quentin Guimard,
- Lucile Sassatelli,
- Francesco Marchetti,
- Federico Becattini,
- Lorenzo Seidenari,
- Alberto Del Bimbo
Prediction of head movements in immersive media is key to design efficient streaming systems able to focus the bandwidth budget on visible areas of the content. Numerous proposals have therefore been made in the recent years to predict 360° images and ...
Context-aware image compression optimization for visual analytics offloading
Convolutional Neural Networks (CNN) have given rise to numerous visual analytics applications at the edge of the Internet. The image is typically captured by cameras and then live-streamed to edge servers for analytics due to the prohibitive cost of ...
Spatial audio in 360° videos: does it influence visual attention?
Immersive technologies are rapidly gaining traction across a variety of application domains. 360° video is one such technology, which can be captured with an omnidirectional multi-camera arrangement. With a Virtual Reality (VR) Head Mounted Display (HMD)...
3DeformR: freehand 3D model editing in virtual environments considering head movements on mobile headsets
3D objects are the primary media in virtual reality environments in immersive cyberspace, also known as the Metaverse. Users, through editing such objects, can communicate with other individuals on mobile headsets. Knowing that the tangible controllers ...
CrispSearch: low-latency on-device language-based image retrieval
Advances in deep learning have enabled accurate language-based search and retrieval, e.g., over user photos, in the cloud. Many users prefer to store their photos in the home due to privacy concerns. As such, a need arises for models that can perform ...
Automatic thumbnail selection for soccer videos using machine learning
- Andreas Husa,
- Cise Midoglu,
- Malek Hammou,
- Steven A. Hicks,
- Dag Johansen,
- Tomas Kupka,
- Michael A. Riegler,
- Pål Halvorsen
Thumbnail selection is a very important aspect of online sport video presentation, as thumbnails capture the essence of important events, engage viewers, and make video clips attractive to watch. Traditional solutions in the soccer domain for presenting ...
Less annoying: quality of experience of commonly used mobile applications
In recent years, research on the Quality of Experience (QoE) of smartphone applications has received attention from both industry and academia due to the complexity of quantifying and managing it. This paper proposes a smartphone-embedded system able to ...
RL-AFEC: adaptive forward error correction for real-time video communication based on reinforcement learning
Real-time video communication is profoundly changing people's lives, especially in today's pandemic situation. However, packet loss during video transmission degrades reconstructed video quality, thus impairing users' Quality of Experience (QoE). ...
C2: consumption context cognizant ABR streaming for improved QoE and resource usage tradeoffs
Smartphones have emerged as ubiquitous platforms for people to consume content in a wide range of consumption contexts (C2), e.g., over cellular or WiFi, playing back audio and video directly on phone or through peripheral devices such as external ...
Swipe along: a measurement study of short video services
Short videos have recently emerged as a popular form of short-duration User Generated Content (UGC) within modern social media. Short video content is generally less than a minute long and predominantly produced in vertical orientation on smartphones. ...
Unsupervised method for video action segmentation through spatio-temporal and positional-encoded embeddings
- Guilherme de A. P. Marques,
- Antonio José G. Busson,
- Alan Lívio V. Guedes,
- Julio Cesar Duarte,
- Sérgio Colcher
Action segmentation consists of temporally segmenting a video and labeling each segmented interval with a specific action label. In this work, we propose a novel action segmentation method that requires no prior video analysis and no annotated data. Our ...
GreenABR: energy-aware adaptive bitrate streaming with deep reinforcement learning
- Bekir Oguzhan Turkkan,
- Ting Dai,
- Adithya Raman,
- Tevfik Kosar,
- Changyou Chen,
- Muhammed Fatih Bulut,
- Jaroslaw Zola,
- Daby Sow
Adaptive bitrate (ABR) algorithms aim to make optimal bitrate decisions in dynamically changing network conditions to ensure a high quality of experience (QoE) for the users during video streaming. However, most of the existing ABRs share the ...
Visual privacy protection in mobile image recognition using protective perturbation
Deep neural networks (DNNs) have been widely adopted in mobile image recognition applications. Considering intellectual property and computation resources, the image recognition model is often deployed at the service provider end, which takes input ...
Encrypted video search: scalable, modular, and content-similar
Video-based services have become popular. Clients often outsource their videos to the cloud to relieve local maintenance. However, privacy has emerged as a major concern since many videos contain sensitive information. While retrieving (unencrypted) ...
AQP: an open modular Python platform for objective speech and audio quality metrics
Audio quality assessment has been widely researched in the signal processing area. Full-reference objective metrics (e.g., POLQA, ViSQOL) have been developed to estimate the audio quality relying only on human rating experiments. To evaluate the audio ...
Njord: a fishing trawler dataset
- Tor-Arne Schmidt Nordmo,
- Aril Bernhard Ovesen,
- Bjørn Aslak Juliussen,
- Steven Alexander Hicks,
- Vajira Thambawita,
- Håvard Dagenborg Johansen,
- Pål Halvorsen,
- Michael Alexander Riegler,
- Dag Johansen
Fish is one of the main sources of food worldwide. The commercial fishing industry has a lot of different aspects to consider, ranging from sustainability to reporting. The complexity of the domain also attracts a lot of research from different fields ...
Huldra: a framework for collecting crowdsourced feedback on multimedia assets
- Malek Hammou,
- Cise Midoglu,
- Steven A. Hicks,
- Andrea Storås,
- Saeed Shafiee Sabet,
- Inga Strümke,
- Michael A. Riegler,
- Pål Halvorsen
Collecting crowdsourced feedback to evaluate, rank, or score multimedia content can be cumbersome and time-consuming. Most of the existing survey tools are complicated, hard to customize, or tailored for a specific asset type. In this paper, we present ...
Nagare media ingest: a server for live CMAF ingest workflows
New media ingest protocols have been presented recently. SRT and RIST compete with old protocols such as RTMP while the DASH-IF specified an HTTP-based ingest protocol for CMAF formatted media that lends itself towards delivery protocols such as DASH ...
Multi-codec ultra high definition 8K MPEG-DASH dataset
Many applications and online services produce and deliver multimedia traffic over the Internet. Video streaming services with a rapidly growing desire for more resources to provide better quality, such as Ultra High Definition (UHD) 8K content, are on ...
SILVR: a synthetic immersive large-volume plenoptic dataset
In six-degrees-of-freedom light-field (LF) experiences, the viewer's freedom is limited by the extent to which the plenoptic function was sampled. Existing LF datasets represent only small portions of the plenoptic function, such that they either cover ...
NewsImages: addressing the depiction gap with an online news dataset for text-image rematching
- Andreas Lommatzsch,
- Benjamin Kille,
- Özlem Özgöbek,
- Yuxiao Zhou,
- Jelena Tešić,
- Cláudio Bartolomeu,
- David Semedo,
- Lidia Pivovarova,
- Mingliang Liang,
- Martha Larson
We present NewsImages, a dataset of online news items, and the related NewsImages rematching task. The goal of NewsImages is to provide researchers with a means of studying the depiction gap, which we define to be the difference between what an image ...
VCD: video complexity dataset
This paper provides an overview of the open Video Complexity Dataset (VCD) which comprises 500 Ultra High Definition (UHD) resolution test video sequences. These sequences are provided at 24 frames per second (fps) and stored online in losslessly ...
Enabling scalable emulation of differentiated services in mininet
Evolving Internet applications, such as immersive multimedia and Industry 4, exhibit stringent delay, loss, and rate requirements. Realizing these requirements would be difficult without advanced dynamic traffic management solutions that leverage state-...
Realistic video sequences for subjective QoE analysis
Multimedia streaming over the Internet (live and on demand) is the cornerstone of modern Internet carrying more than 60% of all traffic. With such high demand, delivering outstanding user experience is a crucial and challenging task. To evaluate user ...
PEM360: a dataset of 360° videos with continuous physiological measurements, subjective emotional ratings and motion traces
- Quentin Guimard,
- Florent Robert,
- Camille Bauce,
- Aldric Ducreux,
- Lucile Sassatelli,
- Hui-Yin Wu,
- Marco Winckler,
- Auriane Gros
From a user perspective, immersive content can elicit more intense emotions than flat-screen presentations. From a system perspective, efficient storage and distribution remain challenging, and must consider user attention. Understanding the connection ...
VCA: video complexity analyzer
For online analysis of the video content complexity in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. To this light, for each video (segment), two features, i.e., the ...
A new free viewpoint video dataset and DIBR benchmark
Free viewpoint video (FVV) has drawn great attention in recent years, which provides viewers with strong interactive and immersive experience. Despite the developments made, further progress of FVV research is limited by existing datasets that mostly ...
CGD: a cloud gaming dataset with gameplay video and network recordings
With advances in network capabilities, the gaming industry is increasingly turning towards offering "gaming on demand" solutions, with cloud gaming services such as Sony PlayStation Now, Google Stadia, and NVIDIA GeForce NOW expanding their market ...
Enhancing situational awareness with adaptive firefighting drones: leveraging diverse media types and classifiers
High-rise fires are among the largest threats to safety in modern cities, and autonomous drones with multi-modal sensors can be employed to enhance situational awareness in such unfortunate disasters. In this paper, we study the fine-grained measurement ...