Yashar Deldjoo
I am a research assistant at the Information Retrieval Laboratory of University of Milano Bicocca, Italy. I obtained my Masters degree in Electrical Engineering from Chalmers University of Technology, Sweden in 2010. I earned my Ph.D. with Distinction in Computer Science from Poitecnico di Milano, Italy in 2018. My main research interests include recommender systems, user modeling, multimedia recommendation and multimedia computing. Despite being a very young researcher, I have (co-)authored quite many refereed articles at internationally recognized conferences and peer- reviewed journals (among others, published in ACM RecSys, MMSys, CHI, IEEE Transactions on Knowledge and Data Engineering, and User Modeling and User-Adapted Interaction, Journal of Data Semantics and Springer journal of multimedia information retrieval). I am also holder of a US Patent. I have served as a PC member or reviewer for top-tier conferences including SIGIR, CIKM, AAAI, ACM MM, RecSys, ECML-PKDD, ECIR, MMSys, IDC, IPMU as well as top-tier journals, including Expert Systems with Applications and IEEE Access.
Supervisors: Paolo Cremonesi (PhD advisor)
Supervisors: Paolo Cremonesi (PhD advisor)
less
Uploads
Articles by Yashar Deldjoo
Conference Proceedings by Yashar Deldjoo
We evaluate the proposed multi-modal recommender system comprehensively against metadata-based baselines. To this end, we conduct two empirical studies: (i) a system-centric study to measure the offline quality of recommendations in terms of accuracy-related and beyond-accuracy performance measures (novelty, diversity, and coverage), and (ii) a user-centric online experiment, measuring different subjective metrics, including relevance, satisfaction, and diversity. In both studies, we use a dataset of more than 4,000 movie trailers, which makes our approach versatile. Our results shed light on the accuracy and beyond-accuracy performance of audio, visual, and textual features in content-based movie recommender systems.
In this paper, we explore a different point of view by leveraging the gap between low-level and high-level features. We experiment with a recent approach for movie recommendation that extract low- level Mise-en-Sce`ne features from multimedia content and combine it with high-level features provided by the wisdom of the crowd.
To this end, we first performed an offline performance assess- ment by implementing a pure content-based recommender system with three different versions of the same algorithm, respectively based on (i) conventional movie attributes, (ii) mise-en-sce`ne fea- tures, and (iii) a hybrid method that interleaves recommendations based on movie attributes and mise-en-sce`ne features. In a second study, we designed an empirical study involving 100 subjects and collected data regarding the quality perceived by the users. Results from both studies show that the introduction of mise-en-scene features in conjunction with traditional movie attributes improves both offline and online quality of recommendations.
Workshops by Yashar Deldjoo
We evaluate the proposed multi-modal recommender system comprehensively against metadata-based baselines. To this end, we conduct two empirical studies: (i) a system-centric study to measure the offline quality of recommendations in terms of accuracy-related and beyond-accuracy performance measures (novelty, diversity, and coverage), and (ii) a user-centric online experiment, measuring different subjective metrics, including relevance, satisfaction, and diversity. In both studies, we use a dataset of more than 4,000 movie trailers, which makes our approach versatile. Our results shed light on the accuracy and beyond-accuracy performance of audio, visual, and textual features in content-based movie recommender systems.
In this paper, we explore a different point of view by leveraging the gap between low-level and high-level features. We experiment with a recent approach for movie recommendation that extract low- level Mise-en-Sce`ne features from multimedia content and combine it with high-level features provided by the wisdom of the crowd.
To this end, we first performed an offline performance assess- ment by implementing a pure content-based recommender system with three different versions of the same algorithm, respectively based on (i) conventional movie attributes, (ii) mise-en-sce`ne fea- tures, and (iii) a hybrid method that interleaves recommendations based on movie attributes and mise-en-sce`ne features. In a second study, we designed an empirical study involving 100 subjects and collected data regarding the quality perceived by the users. Results from both studies show that the introduction of mise-en-scene features in conjunction with traditional movie attributes improves both offline and online quality of recommendations.
In this short paper, we present a recommender system for movies based of Factorization Machines that makes use of the low-level visual features extracted automatically from movies as side information. Low-level visual features – such as lighting, colors and motion – represent the design aspects of a movie and characterize its aesthetic and style.
Our experiments on a dataset of more than 13K movies show that recommendations based on low-level visual features provides almost 10 times better accuracy in comparison to genre based recommendations, in terms of various evaluation metrics.
environments.
By mounting four diodes on the listener's headset, a stationary Wii-Remote can be used to track the head movements of the listener (both position and orientation) and provide the 3D audio rendering motor with the necessary 6DOF data on the listener's movements to do a fully interactive rendering of the audio scene.
In this project, a head tracking solution is developed and is integrated into a 3D audio rendering engine to obtain an interactive virtual environment. Head pose estimation is the central part in the head tracking algorithm. Prior to tracking, 3D feature points are obtained
and used as a model. Pose is then estimated by solving a robust version of “Perspective n-Point Problem”. In order to analyze the accuracy of the algorithm, results were validated primarily by simulation of a random walk movement of a listener in MATLAB and used as the
input to the head tracking algorithm. Thereafter, the solution was compiled as a library in C and tested in real time tracking. A graphical output is used to demonstrate the applicability of our method in real-time application. The final solution is integrated into a 3D audio rendering
engine as a part of the virtual environment.
In relation to a closely related work by Johnny Chung Lee on Wii-Remote for head tracking (can be found in YouTube), this work tracks the full 6DOF movement of the listener (translation + orientation) while Chung lee’s work only tracked the translation (3DOF) assuming that the orientation was the negated translation.