-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Authors:
Fei Deng,
Qifei Wang,
Wei Wei,
Matthias Grundmann,
Tingbo Hou
Abstract:
Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, renderi…
▽ More
Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts. In this paper, we propose Proximal Reward Difference Prediction (PRDP), enabling stable black-box reward finetuning for diffusion models for the first time on large-scale prompt datasets with over 100K prompts. Our key innovation is the Reward Difference Prediction (RDP) objective that has the same optimal solution as the RL objective while enjoying better training stability. Specifically, the RDP objective is a supervised regression objective that tasks the diffusion model with predicting the reward difference of generated image pairs from their denoising trajectories. We theoretically prove that the diffusion model that obtains perfect reward difference prediction is exactly the maximizer of the RL objective. We further develop an online algorithm with proximal updates to stably optimize the RDP objective. In experiments, we demonstrate that PRDP can match the reward maximization ability of well-established RL-based methods in small-scale training. Furthermore, through large-scale training on text prompts from the Human Preference Dataset v2 and the Pick-a-Pic v1 dataset, PRDP achieves superior generation quality on a diverse set of complex, unseen prompts whereas RL-based methods completely fail.
△ Less
Submitted 27 March, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Binaural Angular Separation Network
Authors:
Yang Yang,
George Sung,
Shao-Fu Shih,
Hakan Erdogan,
Chehung Lee,
Matthias Grundmann
Abstract:
We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time diffe…
▽ More
We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time difference of arrival (TDOA) cues, or what we call delay contrast, to separate target and interference sources while remaining robust in various reverberation environments. We demonstrate the model is not only generalizable to a commercially available device with a slightly different microphone geometry, but also outperforms our previous work which uses one additional microphone on the same device. The model runs in real-time on-device and is suitable for low-latency streaming applications such as telephony and video conferencing.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
StreamVC: Real-Time Low-Latency Voice Conversion
Authors:
Yang Yang,
Yury Kartynnik,
Yunpeng Li,
Jiuqiang Tang,
Xing Li,
George Sung,
Matthias Grundmann
Abstract:
We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing,…
▽ More
We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
On-device Real-time Custom Hand Gesture Recognition
Authors:
Esha Uboweja,
David Tian,
Qifei Wang,
Yi-Chun Kuo,
Joe Zou,
Lu Wang,
George Sung,
Matthias Grundmann
Abstract:
Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets…
▽ More
Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets users easily customize and deploy their own gesture recognition pipeline. Our framework provides a pre-trained single-hand embedding model that can be fine-tuned for custom gesture recognition. Users can perform gestures in front of a webcam to collect a small amount of images per gesture. We also offer a low-code solution to train and deploy the custom gesture recognition model. This makes it easy for users with limited ML expertise to use our framework. We further provide a no-code web front-end for users without any ML expertise. This makes it even easier to build and test the end-to-end pipeline. The resulting custom HGR is then ready to be run on-device for real-time scenarios. This can be done by calling a simple function in our open-sourced model inference API, MediaPipe Tasks. This entire process only takes a few minutes.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction
Authors:
Ivan Grishchenko,
Geng Yan,
Eduard Gabriel Bazavan,
Andrei Zanfir,
Nikolai Chinaev,
Karthik Raveendran,
Matthias Grundmann,
Cristian Sminchisescu
Abstract:
We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time…
▽ More
We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time model that predicts blendshape coefficients based on facial landmarks.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond
Authors:
Yang Zhao,
Tingbo Hou,
Yu-Chuan Su,
Xuhui Jia. Yandong Li,
Matthias Grundmann
Abstract:
An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e.g., image enhancement, video communication, and taking portrait. Most of the advanced face restoration models can recover high-quality faces from low-quality ones but usually fail to faithfully generate realistic and high-frequency details that are favored by users. To achieve authentic…
▽ More
An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e.g., image enhancement, video communication, and taking portrait. Most of the advanced face restoration models can recover high-quality faces from low-quality ones but usually fail to faithfully generate realistic and high-frequency details that are favored by users. To achieve authentic restoration, we propose $\textbf{IDM}$, an $\textbf{I}$teratively learned face restoration system based on denoising $\textbf{D}$iffusion $\textbf{M}$odels (DDMs). We define the criterion of an authentic face restoration system, and argue that denoising diffusion models are naturally endowed with this property from two aspects: intrinsic iterative refinement and extrinsic iterative enhancement. Intrinsic learning can preserve the content well and gradually refine the high-quality details, while extrinsic enhancement helps clean the data and improve the restoration task one step further. We demonstrate superior performance on blind face restoration tasks. Beyond restoration, we find the authentically cleaned data by the proposed restoration system is also helpful to image generation tasks in terms of training stabilization and sample quality. Without modifying the models, we achieve better quality than state-of-the-art on FFHQ and ImageNet generation using either GANs or diffusion models.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Towards a Formal Verification of the Lightning Network with TLA+
Authors:
Matthias Grundmann,
Hannes Hartenstein
Abstract:
Payment channel networks are an approach to improve the scalability of blockchain-based cryptocurrencies. Because payment channel networks are used for transfer of financial value, their security in the presence of adversarial participants should be verified formally. We formalize the protocol of the Lightning Network, a payment channel network built for Bitcoin, and show that the protocol fulfill…
▽ More
Payment channel networks are an approach to improve the scalability of blockchain-based cryptocurrencies. Because payment channel networks are used for transfer of financial value, their security in the presence of adversarial participants should be verified formally. We formalize the protocol of the Lightning Network, a payment channel network built for Bitcoin, and show that the protocol fulfills the expected security properties. As the state space of a specification consisting of multiple participants is too large for model checking, we formalize intermediate specifications and use a chain of refinements to validate the security properties where each refinement is justified either by model checking or by a pen-and-paper proof.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Semi-Implicit Denoising Diffusion Models (SIDDMs)
Authors:
Yanwu Xu,
Mingming Gong,
Shaoan Xie,
Wei Wei,
Matthias Grundmann,
Kayhan Batmanghelich,
Tingbo Hou
Abstract:
Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) at…
▽ More
Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) attempted to circumvent this limitation by integrating a GAN model for larger jumps in the diffusion process. However, DDGAN encountered scalability limitations when applied to large datasets. To address these limitations, we introduce a novel approach that tackles the problem by matching implicit and explicit factors. More specifically, our approach involves utilizing an implicit model to match the marginal distributions of noisy data and the explicit conditional distribution of the forward diffusion. This combination allows us to effectively match the joint denoising distributions. Unlike DDPM but similar to DDGAN, we do not enforce a parametric distribution for the reverse step, enabling us to take large steps during inference. Similar to the DDPM but unlike DDGAN, we take advantage of the exact form of the diffusion process. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
△ Less
Submitted 10 October, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Determination of acoustic phonon anharmonicities via second-order Raman scattering in CuI
Authors:
Ron Hildebrandt,
Michael Seifert,
Janine George,
Steffen Blaurock,
Silvana Botti,
Harald Krautscheid,
Marius Grundmann,
Chris Sturm
Abstract:
We demonstrate the determination of anharmonic acoustic phonon properties via second-order Raman scattering exemplarily on copper iodide single crystals. The origin of multi-phonon features from the second-order Raman spectra was assigned by the support of the calculated 2-phonon density of states. In this way, the temperature dependence of acoustic phonons was determined down to 10\,K. To determi…
▽ More
We demonstrate the determination of anharmonic acoustic phonon properties via second-order Raman scattering exemplarily on copper iodide single crystals. The origin of multi-phonon features from the second-order Raman spectra was assigned by the support of the calculated 2-phonon density of states. In this way, the temperature dependence of acoustic phonons was determined down to 10\,K. To determine independently the harmonic contributions of respective acoustic phonons, density functional theory (DFT) in quasi-harmonic approximation was used. Finally, the anharmonic contributions were determined. The results are in agreement with earlier publications and extend CuI's determined acoustic phonon properties to lower temperatures with higher accuracy. This approach demonstrates that it is possible to characterize the acoustic anharmonicities via Raman scattering down to zero-temperature renormalization constants of at least 0.1\,cm$^{-1}$.
△ Less
Submitted 15 September, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations
Authors:
Yu-Hui Chen,
Raman Sarokin,
Juhyun Lee,
Jiuqiang Tang,
Chuo-Ling Chang,
Andrei Kulik,
Matthias Grundmann
Abstract:
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, commo…
▽ More
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.
△ Less
Submitted 16 June, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Impact of magnetization and hyperfine field distribution on high magnetoelectric coupling strength in BaTiO$_3$-BiFeO$_3$ multilayers
Authors:
Johanna K. Jochum,
Michael Lorenz,
Haraldur P. Gunnlaugsson,
Christian Patzig,
Thomas Höche,
Marius Grundmann,
André Vantomme,
Kristiaan Temst,
Margriet J. Van Bael,
Vera Lazenka
Abstract:
Understanding the mechanisms of magnetoelectric (ME) coupling within multiferroic structures is paramount from a fundamental as well as an applied point of view. We report here that the magnetoelectric properties, as well as the magnetization, of BaTiO$_3$-BiFeO$_3$ superlattices can be tuned by varying the BiFeO$_3$ layer thickness. The magnetoelectric voltage coefficient ($α_{ME}$) reaches its m…
▽ More
Understanding the mechanisms of magnetoelectric (ME) coupling within multiferroic structures is paramount from a fundamental as well as an applied point of view. We report here that the magnetoelectric properties, as well as the magnetization, of BaTiO$_3$-BiFeO$_3$ superlattices can be tuned by varying the BiFeO$_3$ layer thickness. The magnetoelectric voltage coefficient ($α_{ME}$) reaches its maximum of 60.2 Vcm$^{-1}$Oe$^{-1}$ at 300 K, one of the highest values reported so far, for a sample with a BiFeO$_3$ thickness of 5 nm and a BaTiO$_3$ thickness of 10 nm. To gain deeper insight into the increased magnetoelectric coupling, and both the local and macroscopic magnetic properties, samples with varying BiFeO$_3$ thicknesses have been investigated. Correlations were established between the hyperfine field (HFF), the magnetoelectric voltage coefficient and the magnetization. The possible mechanisms responsible for the strong magnetoelectric coupling are discussed.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Guided Speech Enhancement Network
Authors:
Yang Yang,
Shao-Fu Shih,
Hakan Erdogan,
Jamie Menjay Lin,
Chehung Lee,
Yunpeng Li,
George Sung,
Matthias Grundmann
Abstract:
High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various devices. Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-cha…
▽ More
High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various devices. Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-channel speech enhancement model that cleans up the beamformer output. In this work, we propose a speech enhancement solution that takes both the raw microphone and beamformer outputs as the input for an ML model. We devise a simple yet effective training scheme that allows the model to learn from the cues of the beamformer by contrasting the two inputs and greatly boost its capability in spatial rejection, while conducting the general tasks of denoising and dereverberation. The proposed solution takes advantage of classical spatial filtering algorithms instead of competing with them. By design, the beamformer module then could be selected separately and does not require a large amount of data to be optimized for a given form factor, and the network model can be considered as a standalone module which is highly transferable independently from the microphone array. We name the ML module in our solution as GSENet, short for Guided Speech Enhancement Network. We demonstrate its effectiveness on real world data collected on multi-microphone devices in terms of the suppression of noise and interfering speech.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Mid- and far-infrared localized surface plasmon resonances in chalcogen-hyperdoped silicon
Authors:
Mao Wang,
Ye Yu,
Slawomir Prucnal,
Yonder Berencén,
Mohd Saif Shaikh,
Lars Rebohle,
Muhammad Bilal Khan,
Vitaly Zviagin,
René Hübner,
Alexej Pashkin,
Artur Erbe,
Yordan M. Georgiev,
Marius Grundmann,
Manfred Helm,
Robert Kirchner,
Shengqiang Zhou
Abstract:
Plasmonic sensing in the infrared region employs the direct interaction of the vibrational fingerprints of molecules with the plasmonic resonances, creating surface-enhanced sensing platforms that are superior than the traditional spectroscopy. However, the standard noble metals used for plasmonic resonances suffer from high radiative losses as well as fabrication challenges, such as tuning the sp…
▽ More
Plasmonic sensing in the infrared region employs the direct interaction of the vibrational fingerprints of molecules with the plasmonic resonances, creating surface-enhanced sensing platforms that are superior than the traditional spectroscopy. However, the standard noble metals used for plasmonic resonances suffer from high radiative losses as well as fabrication challenges, such as tuning the spectral resonance positions into mid- to far-infrared regions, and the compatibility issue with the existing complementary metal-oxide-semiconductor (CMOS) manufacturing platform. Here, we demonstrate the occurrence of mid-infrared localized surface plasmon resonances (LSPR) in thin Si films hyperdoped with the known deep-level impurity tellurium. We show that the mid-infrared LSPR can be further enhanced and spectrally extended to the far-infrared range by fabricating two-dimensional arrays of micrometer-sized antennas in a Te-hyperdoped Si chip. Since Te-hyperdoped Si can also work as an infrared photodetector, we believe that our results will unlock the route toward the direct integration of plasmonic sensors with the one-chip CMOS platform, greatly advancing the possibility of mass manufacturing of high-performance plasmonic sensing systems.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Efficient Heterogeneous Video Segmentation at the Edge
Authors:
Jamie Menjay Lin,
Siargey Pisarchyk,
Juhyun Lee,
David Tian,
Tingbo Hou,
Karthik Raveendran,
Raman Sarokin,
George Sung,
Trent Tolley,
Matthias Grundmann
Abstract:
We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute. Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures and operations on top of already light-weight backbones, targeting commercially available edge inference engines. We further analyze and optimize the hete…
▽ More
We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute. Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures and operations on top of already light-weight backbones, targeting commercially available edge inference engines. We further analyze and optimize the heterogeneous data flows in our systems across the CPU, the GPU and the NPU. Our approach has empirically factored well into our real-time AR system, enabling remarkably higher accuracy with quadrupled effective resolutions, yet at much shorter end-to-end latency, much higher frame rate, and even lower power consumption on edge platforms.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Dielectric function of CuBr$_\mathrm{x}$I$_{1-\mathrm{x}}$ alloy thin films
Authors:
Michael Seifert,
Evgeny Krüger,
Michael S. Bar,
Stefan Merker,
Holger von Wenckstern,
Harald Krautscheid,
Marius Grundmann,
Chris Sturm,
Silvana Botti
Abstract:
We study the dielectric function of CuBr$_\mathrm{x}$I$_{1-\mathrm{x}}$ thin film alloys using spectroscopic ellipsometry in the spectral range between 0.7 eV to 6.4 eV, in combination with first-principles calculations based on density functional theory. Through the comparison of theory and experiment, we attribute features in the dielectric function to electronic transitions at specific k-points…
▽ More
We study the dielectric function of CuBr$_\mathrm{x}$I$_{1-\mathrm{x}}$ thin film alloys using spectroscopic ellipsometry in the spectral range between 0.7 eV to 6.4 eV, in combination with first-principles calculations based on density functional theory. Through the comparison of theory and experiment, we attribute features in the dielectric function to electronic transitions at specific k-points in the Brillouin zone. The observed bandgap bowing as a function of alloy composition is discussed in terms of different physical and chemical contributions. The band splitting at the top of the valence band due to spin-orbit coupling is found to decrease with increasing Br-concentration, from a value of 660 meV for CuI to 150 meV for CuBr. This result can be understood considering the contribution of copper d-orbitals to the valence band maximum as a function of the alloy composition.
△ Less
Submitted 9 September, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation
Authors:
Ivan Grishchenko,
Valentin Bazarevsky,
Andrei Zanfir,
Eduard Gabriel Bazavan,
Mihai Zanfir,
Richard Yee,
Karthik Raveendran,
Matsvei Zhdanovich,
Matthias Grundmann,
Cristian Sminchisescu
Abstract:
We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, i…
▽ More
We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, ii) updated 3D body tracking with additional hand landmarks and iii) full body pose estimation from a monocular image.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Light absorption and emission by defects in doped nickel oxide
Authors:
Robert Karsthof,
Ymir Kalmann Frodason,
Augustinas Galeckas,
Philip Michael Weiser,
Vitaly Zviagin,
Marius Grundmann
Abstract:
Nickel oxide is a versatile p-type semiconducting oxide with many applications in opto-electronic devices, but high doping concentrations are often required to achieve necessary electrical conductivity. In contrast to many other transparent oxide semiconductors, even moderate levels of doping of NiO can lead to significant optical absorption in the visible spectral range, limiting the application…
▽ More
Nickel oxide is a versatile p-type semiconducting oxide with many applications in opto-electronic devices, but high doping concentrations are often required to achieve necessary electrical conductivity. In contrast to many other transparent oxide semiconductors, even moderate levels of doping of NiO can lead to significant optical absorption in the visible spectral range, limiting the application range of the material. This correlation has been reported extensively in literature, but its origin has been unknown until now. This work combines experimental data on optical properties from a variety of NiO samples with results from hybrid density functional theory calculations. It shows that strong electron-phonon interaction leads to a significant blue shift (0.6-1 eV) of electronic transitions from the valence band maximum to defect states by light absorption with respect to the thermodynamic charge transition levels. This essentially renders NiO a narrow-gap semiconductor by defect band formation already at moderate doping levels, with strong light absorption for photon energies of approximately 1 eV. The calculations are also shown to be fully consistent with experimental data on defect-related light emission in NiO.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
On-device Real-time Hand Gesture Recognition
Authors:
George Sung,
Kanstantsin Sokal,
Esha Uboweja,
Valentin Bazarevsky,
Jonathan Baccash,
Eduard Gabriel Bazavan,
Chuo-Ling Chang,
Matthias Grundmann
Abstract:
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We cre…
▽ More
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We create two different gesture classifiers, one based on heuristics and the other using neural networks (NN).
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Estimating the Peer Degree of Reachable Peers in the Bitcoin P2P Network
Authors:
Matthias Grundmann,
Max Baumstark,
Hannes Hartenstein
Abstract:
A recent spam wave of IP addresses in the Bitcoin P2P network allowed us to estimate the degree distribution of reachable peers in the network. The resulting distribution shows that about every second reachable peer runs with Bitcoin Core's default setting of a maximum of 125 concurrent connections and nearly all connection slots are taken. We validate this result and, in addition, use our observa…
▽ More
A recent spam wave of IP addresses in the Bitcoin P2P network allowed us to estimate the degree distribution of reachable peers in the network. The resulting distribution shows that about every second reachable peer runs with Bitcoin Core's default setting of a maximum of 125 concurrent connections and nearly all connection slots are taken. We validate this result and, in addition, use our observations of the spam wave to group addresses that belong to the same peer. By doing this grouping, we improve on previous measurements and show that simply counting addresses overestimates the number of reachable peers by 13 %.
△ Less
Submitted 15 December, 2021; v1 submitted 2 August, 2021;
originally announced August 2021.
-
On the Estimation of the Number of Unreachable Peers in the Bitcoin P2P Network by Observation of Peer Announcements
Authors:
Matthias Grundmann,
Hedwig Amberg,
Hannes Hartenstein
Abstract:
Bitcoin is based on a P2P network that is used to propagate transactions and blocks. While the P2P network design intends to hide the topology of the P2P network, information about the topology is required to understand the network from a scientific point of view. Thus, there is a natural tension between the 'desire' for unobservability on the one hand, and for observability on the other hand. On…
▽ More
Bitcoin is based on a P2P network that is used to propagate transactions and blocks. While the P2P network design intends to hide the topology of the P2P network, information about the topology is required to understand the network from a scientific point of view. Thus, there is a natural tension between the 'desire' for unobservability on the one hand, and for observability on the other hand. On a middle ground, one would at least be interested on some statistical features of the Bitcoin network like the number of peers that participate in the propagation of transactions and blocks. This number is composed of the number of reachable peers that accept incoming connections and unreachable peers that do not accept incoming connections. While the number of reachable peers can be measured, it is inherently difficult to determine the number of unreachable peers. Thus, the number of unreachable peers can only be estimated based on some indicators. In this paper, we first define our understanding of unreachable peers and then propose the PAL (Passive Announcement Listening) method which gives an estimate of the number of unreachable peers by observing ADDR messages that announce active IP addresses in the network. The PAL method allows for detecting unreachable peers that indicate that they provide services useful to the P2P network. In conjunction with previous methods, the PAL method can help to get a better estimate of the number of unreachable peers. We use the PAL method to analyze data from a long-term measurement of the Bitcoin P2P network that gives insights into the development of the number of unreachable peers over five years from 2015 to 2020. Results show that about 31,000 unreachable peers providing useful services were active per day at the end of the year 2020. An empirical validation indicates that the approach finds about 50 % of unreachable peers that provide useful services.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations
Authors:
Adel Ahmadyan,
Liangkai Zhang,
Jianing Wei,
Artsiom Ablavatski,
Matthias Grundmann
Abstract:
3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval. We introduce the Objectron dataset to advance the state of the art in 3D object detection and foster new research and applications, such as 3D object tracking, view synthesis, and improved 3D shape representation. The dataset contains object-centric short videos w…
▽ More
3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval. We introduce the Objectron dataset to advance the state of the art in 3D object detection and foster new research and applications, such as 3D object tracking, view synthesis, and improved 3D shape representation. The dataset contains object-centric short videos with pose annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos. We also propose a new evaluation metric, 3D Intersection over Union, for 3D object detection. We demonstrate the usefulness of our dataset in 3D object detection tasks by providing baseline models trained on this dataset. Our dataset and evaluation source code are available online at http://www.objectron.dev
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Fundamental Properties of the Layer Below a Payment Channel Network (Extended Version)
Authors:
Matthias Grundmann,
Hannes Hartenstein
Abstract:
Payment channel networks are a highly discussed approach for improving scalability of cryptocurrencies such as Bitcoin. As they allow processing transactions off-chain, payment channel networks are referred to as second layer technology, while the blockchain is the first layer. We uncouple payment channel networks from blockchains and look at them as first-class citizens. This brings up the questi…
▽ More
Payment channel networks are a highly discussed approach for improving scalability of cryptocurrencies such as Bitcoin. As they allow processing transactions off-chain, payment channel networks are referred to as second layer technology, while the blockchain is the first layer. We uncouple payment channel networks from blockchains and look at them as first-class citizens. This brings up the question what model payment channel networks require as first layer. In response, we formalize a model (called RFL Model) for a first layer below a payment channel network. While transactions are globally made available by a blockchain, the RFL Model only provides the reduced property that a transaction is delivered to the users being affected by a transaction. We show that the reduced model's properties still suffice to implement payment channels. By showing that the RFL Model can not only be instantiated by the Bitcoin blockchain but also by trusted third parties like banks, we show that the reduction widens the design space for the first layer. Further, we show that the stronger property provided by blockchains allows for optimizations that can be used to reduce the time for locking collateral during payments over multiple hops in a payment channel network.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Identification of Li$_{\text{Ni}}$ and V$_{\text{Ni}}$ acceptor levels in doped nickel oxide
Authors:
Robert Karsthof,
Holger von Wenckstern,
Marius Grundmann
Abstract:
Nickel oxide, in particular in its doped, semiconducting form, is an important component of several optoelectronic devices. Doping NiO is commonly achieved either by incorporation of lithium, which readily occupies Ni sites substitutionally, producing the Li$_{\text{Ni}}$ acceptor, or by supplying reactive oxygen species during NiO film deposition, which leads to the formation of Ni vacancies (V…
▽ More
Nickel oxide, in particular in its doped, semiconducting form, is an important component of several optoelectronic devices. Doping NiO is commonly achieved either by incorporation of lithium, which readily occupies Ni sites substitutionally, producing the Li$_{\text{Ni}}$ acceptor, or by supplying reactive oxygen species during NiO film deposition, which leads to the formation of Ni vacancies (V$_{\mathrm{Ni}}$). However, the energetic position of these acceptors in the NiO band gap has not been experimentally determined until today. In this work, we close this knowledge gap by studying rectifying n$^{++}$p heterojunctions of NiO on top of fluorine-doped tin oxide. These structures show sufficient rectification to perform electric characterization by defect spectroscopic techniques, specifically capacitance-voltage and thermal admittance spectroscopy. Using these methods, the (0/-) charge transition levels are determined to be 190meV and 409meV above the valence band edge for the Li$_{\text{Ni}}$ and the V$_{\text{Ni}}$ acceptor, respectively.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
SnO/$β$-Ga2O3 vertical $pn$ heterojunction diodes
Authors:
Melanie Budde,
Daniel Splith,
Piero Mazzolini,
Abbes Tahraoui,
Johannes Feldl,
Manfred Ramsteiner,
Holger von Wenckstern,
Marius Grundmann,
Oliver Bierwagen
Abstract:
Vertical $pn$ heterojunction diodes were prepared by plasma-assisted molecular beam epitaxy of unintentionally-doped $p$-type SnO layers with hole concentrations ranging from $p=10^{18}$ to $10^{19}$cm$^{-3}$ on unintentionally-doped $n$-type $β$-Ga$_{2}$O$_{3}$(-201) substrates with an electron concentration of $n=2.0\times10^{17}$cm$^{-3}$. The SnO layers consist of (001)-oriented grains without…
▽ More
Vertical $pn$ heterojunction diodes were prepared by plasma-assisted molecular beam epitaxy of unintentionally-doped $p$-type SnO layers with hole concentrations ranging from $p=10^{18}$ to $10^{19}$cm$^{-3}$ on unintentionally-doped $n$-type $β$-Ga$_{2}$O$_{3}$(-201) substrates with an electron concentration of $n=2.0\times10^{17}$cm$^{-3}$. The SnO layers consist of (001)-oriented grains without in-plane expitaxial relation to the substrate. After subsequent contact processing and mesa etching (which drastically reduced the reverse current spreading in the SnO layer and associated high leakage) electrical characterization by current-voltage and capacitance-voltage measurement was performed. The results reveal a type-I band alignment and junction transport by thermionic emission in forward bias. A rectification of $2\times10^{8}$ at $\pm1$V, an ideality factor of 1.16, differential specific on-resistance of 3.9m$Ω\thinspace$cm$^{2}$, and built-in voltage of 0.96V were determined. The $pn$-junction isolation prevented parallel conduction in the highly-conductive Ga$_{2}$O$_{3}$ substrate (sheet resistance $R_{S}\approx3\thinspaceΩ$) during van-der-Pauw Hall measurements of the SnO layer on top ($R_{S}\approx150$k$Ω$, $p\approx2.5\times10^{18}$cm$^{-3}$, Hall mobility $\approx1$cm$^{2}$/Vs). The measured maximum reverse breakdown voltage of the diodes was 66V, corresponding to a peak breakdown field 2.2MV/cm in the Ga$_{2}$O$_{3}$-depletion region. Higher breakdown voltages that are required in high-voltage devices could be achieved by reducing the donor concentration in the $β$-Ga$_{2}$O$_{3}$ to increase the depletion width as well as improving the contact geometry to reduce field crowding.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Investigating the ranges of (meta)stable phase formation in (InxGa1-x)2O3: Impact of the cation coordination
Authors:
C. Wouters,
C. Sutton,
L. M. Ghiringhelli,
T. Markurt,
R. Schewski,
A. Hassa,
H. von Wenckstern,
M. Grundmann,
M. Scheffler,
M. Albrecht
Abstract:
We investigate the phase diagram of the heterostructural solid solution (InxGa1-x)2O3 both computationally, by combining cluster expansion and density functional theory, and experimentally, by means of TEM measurements of pulsed laser deposited (PLD) heteroepitaxial thin films. The shapes of the Gibbs free energy curves for the monoclinic, hexagonal and cubic bixbyite alloy as a function of compos…
▽ More
We investigate the phase diagram of the heterostructural solid solution (InxGa1-x)2O3 both computationally, by combining cluster expansion and density functional theory, and experimentally, by means of TEM measurements of pulsed laser deposited (PLD) heteroepitaxial thin films. The shapes of the Gibbs free energy curves for the monoclinic, hexagonal and cubic bixbyite alloy as a function of composition can be explained in terms of the preferred cation coordination environments of indium and gallium. We show by atomically resolved STEM that the strong preference of indium for six-fold coordination results in ordered monoclinic and hexagonal lattices. This ordering impacts the configurational entropy in the solid solution and thereby the (InxGa1-x)2O3 phase diagram. The resulting phase diagram is characterized by very limited solubilities of gallium and indium in the monoclinic, hexagonal and cubic ground state phases respectively but exhibits wide metastable ranges at realistic growth temperatures. On the indium rich side of the phase diagram a wide miscibility gap is found, which results in phase separated layers. The experimentally observed indium solubilities in the PLD samples are in the range of x=0.45 and x=0.55 for monoclinic and hexagonal single-phase films, while for phase separated films we find x=0.5 for the monoclinic phase, x=0.65-0.7 for the hexagonal phase and x>0.9 for the cubic phase. These values are consistent with the computed metastable ranges for each phase.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Instant 3D Object Tracking with Applications in Augmented Reality
Authors:
Adel Ahmadyan,
Tingbo Hou,
Jianing Wei,
Liangkai Zhang,
Artsiom Ablavatski,
Matthias Grundmann
Abstract:
Tracking object poses in 3D is a crucial building block for Augmented Reality applications. We propose an instant motion tracking system that tracks an object's pose in space (represented by its 3D bounding box) in real-time on mobile devices. Our system does not require any prior sensory calibration or initialization to function. We employ a deep neural network to detect objects and estimate thei…
▽ More
Tracking object poses in 3D is a crucial building block for Augmented Reality applications. We propose an instant motion tracking system that tracks an object's pose in space (represented by its 3D bounding box) in real-time on mobile devices. Our system does not require any prior sensory calibration or initialization to function. We employ a deep neural network to detect objects and estimate their initial 3D pose. Then the estimated pose is tracked using a robust planar tracker. Our tracker is capable of performing relative-scale 9-DoF tracking in real-time on mobile devices. By combining use of CPU and GPU efficiently, we achieve 26-FPS+ performance on mobile devices.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Attention Mesh: High-fidelity Face Mesh Prediction in Real-time
Authors:
Ivan Grishchenko,
Artsiom Ablavatski,
Yury Kartynnik,
Karthik Raveendran,
Matthias Grundmann
Abstract:
We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions. Our neural network is designed for real-time on-device inference and runs at over 50 FPS on a Pixel 2 phone. Our solution enables applications like AR makeup, eye tracking and AR puppeteering that rely on highly accurate landmarks for eye and lips regions. Our m…
▽ More
We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions. Our neural network is designed for real-time on-device inference and runs at over 50 FPS on a Pixel 2 phone. Our solution enables applications like AR makeup, eye tracking and AR puppeteering that rely on highly accurate landmarks for eye and lips regions. Our main contribution is a unified network architecture that achieves the same accuracy on facial landmarks as a multi-stage cascaded approach, while being 30 percent faster.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
MediaPipe Hands: On-device Real-time Hand Tracking
Authors:
Fan Zhang,
Valentin Bazarevsky,
Andrey Vakunov,
Andrei Tkachenka,
George Sung,
Chuo-Ling Chang,
Matthias Grundmann
Abstract:
We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs a…
▽ More
We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs and high prediction quality. MediaPipe Hands is open sourced at https://mediapipe.dev.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
BlazePose: On-device Real-time Body Pose tracking
Authors:
Valentin Bazarevsky,
Ivan Grishchenko,
Karthik Raveendran,
Tyler Zhu,
Fan Zhang,
Matthias Grundmann
Abstract:
We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language reco…
▽ More
We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language recognition. Our main contributions include a novel body pose tracking solution and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision
Authors:
Tingbo Hou,
Adel Ahmadyan,
Liangkai Zhang,
Jianing Wei,
Matthias Grundmann
Abstract:
In this paper, we address the problem of detecting unseen objects from RGB images and estimating their poses in 3D. We propose two mobile friendly networks: MobilePose-Base and MobilePose-Shape. The former is used when there is only pose supervision, and the latter is for the case when shape supervision is available, even a weak one. We revisit shape features used in previous methods, including se…
▽ More
In this paper, we address the problem of detecting unseen objects from RGB images and estimating their poses in 3D. We propose two mobile friendly networks: MobilePose-Base and MobilePose-Shape. The former is used when there is only pose supervision, and the latter is for the case when shape supervision is available, even a weak one. We revisit shape features used in previous methods, including segmentation and coordinate map. We explain when and why pixel-level shape supervision can improve pose estimation. Consequently, we add shape prediction as an intermediate layer in the MobilePose-Shape, and let the network learn pose from shape. Our models are trained on mixed real and synthetic data, with weak and noisy shape supervision. They are ultra lightweight that can run in real-time on modern mobile devices (e.g. 36 FPS on Galaxy S20). Comparing with previous single-shot solutions, our method has higher accuracy, while using a significantly smaller model (2~3% in model size or number of parameters).
△ Less
Submitted 7 March, 2020;
originally announced March 2020.
-
The nickel vacancy acceptor in NiO: doping beyond thermodynamic equilibrium
Authors:
Robert Karsthof,
Arthur Markus Anton,
Friedrich Kremer,
Marius Grundmann
Abstract:
This work reports on temperature-induced out-diffusion and concentration decay of the prominent intrinsic point defect VNi (nickel vacancy) in the wide-gap p-type semiconductor nickel oxide (NiO). VNi can easily be introduced into NiO thin films by offering high oxygen partial pressures during film growth, rendering nonstoichiometric semiconducting structures. However, exposure to lower oxygen sup…
▽ More
This work reports on temperature-induced out-diffusion and concentration decay of the prominent intrinsic point defect VNi (nickel vacancy) in the wide-gap p-type semiconductor nickel oxide (NiO). VNi can easily be introduced into NiO thin films by offering high oxygen partial pressures during film growth, rendering nonstoichiometric semiconducting structures. However, exposure to lower oxygen supply after growth, e.g. in a standard atmosphere, usually leads to a gradual decrease of film conductivity, because the vacancy concentration equilibrates. In this study, we observe this process in situ by performing temperature-dependent measurements of the electrical conductivity on a room temperature-grown NiO film. At a temperature of 420K under exclusion of oxygen, the doping level decreases by a factor of 8 while the associated room temperature dc conductivity drops by six orders of magnitude. At the same time, out-diffusion of the mobile VNi species can be indirectly observed through the occurrence of electrode polarization characteristics.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Record-Breaking Magnetoresistance at the Edge of a Microflake of Natural Graphite
Authors:
Christian E. Precker,
Jose Barzola-Quiquia,
Pablo D. Esquinazi,
Markus Stiller,
Mun K. Chan,
Marcelo Jaime,
Zhipeng Zhang,
Marius Grundmann
Abstract:
Placing several electrodes at the edge of a micrometer-size Sri Lankan natural graphite sample at distances comparable to the size of the internal crystalline regions, we found record values for the change of the resistance with magnetic field. At low temperatures and at $B \sim 21$T the magnetoresistance (MR) reaches $\sim 10^7$%. The MR values exceed by far all earlier reported ones for graphite…
▽ More
Placing several electrodes at the edge of a micrometer-size Sri Lankan natural graphite sample at distances comparable to the size of the internal crystalline regions, we found record values for the change of the resistance with magnetic field. At low temperatures and at $B \sim 21$T the magnetoresistance (MR) reaches $\sim 10^7$%. The MR values exceed by far all earlier reported ones for graphite and they are comparable or even larger (at $T > 50$K) than the largest reported in solids including the Weyl semimetals. The origin of this large MR lies in the existence of highly conducting 2D interfaces aligned parallel to the graphene planes.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Nickel oxide-based heterostructures with large band offsets
Authors:
Robert Karsthof,
Holger von Wenckstern,
Jesus Zuniga-Perez,
Christiane Deparis,
Marius Grundmann
Abstract:
We present research results on the electronic transport in heterostructures based on p-type nickel oxide (NiO) with the n-type oxide semiconductors zinc oxide (ZnO) and cadmium oxide (CdO). NiO is a desirable candidate for application in (opto-)electronic devices. However, because of its small electron affinity, heterojunctions with most n-type oxide semiconductors exhibit conduction and valence b…
▽ More
We present research results on the electronic transport in heterostructures based on p-type nickel oxide (NiO) with the n-type oxide semiconductors zinc oxide (ZnO) and cadmium oxide (CdO). NiO is a desirable candidate for application in (opto-)electronic devices. However, because of its small electron affinity, heterojunctions with most n-type oxide semiconductors exhibit conduction and valence band offsets at the heterointerface in excess of 1 eV. ZnO/NiO junctions exhibit a so called type-II band alignment, making electron-hole recombination the only process by which a current can vertically flow through the structure. These heterojunctions are nevertheless shown to be of practical use in efficient optoelectronic devices, as exemplified here by our UV-converting transparent solar cells. These devices, although exhibiting high conversion efficiencies, suffer from two light-activated recombination channels connected to the type-II interface, one of which we identify and analyse in more detail here. Furthermore, CdO/NiO contacts were studied - a heterostructure with even larger band offsets such that a type-III band alignment is achieved. This situation theoretically enables the development of a 2-dimensional electronic system consisting of topologically protected states. We present experiments demonstrating that the CdO/NiO heterostructure indeed hosts a conductive layer absent in both materials when studied separately.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
Control of Magnetic Order in Spinel ZnFe$_2$O$_4$ Thin Films Through Intrinsic Defect Manipulation
Authors:
Vitaly Zviagin,
Chris Sturm,
Pablo Esquinazi,
Marius Grundmann,
Rüdiger Schmidt-Grund
Abstract:
We present a systematic study of the magnetic properties of semiconducting ZnFe$_2$O$_4$ thin films fabricated by pulsed laser deposition at low and high oxygen partial pressure and annealed in oxygen and argon atmosphere, respectively. The magnetic response is enhanced by annealing the films at 250$^{\circ}$C and diminished at annealing temperatures above 300$^{\circ}$C. The initial increase is a…
▽ More
We present a systematic study of the magnetic properties of semiconducting ZnFe$_2$O$_4$ thin films fabricated by pulsed laser deposition at low and high oxygen partial pressure and annealed in oxygen and argon atmosphere, respectively. The magnetic response is enhanced by annealing the films at 250$^{\circ}$C and diminished at annealing temperatures above 300$^{\circ}$C. The initial increase is attributed to the formation of oxygen vacancies after argon treatment, evident by the increase in the low energy absorption at $\sim$ 0.9 eV involving Fe$^{2+}$ cations. The weakened magnetic response is related to a decline in disorder with a cation redistribution toward a normal spinel configuration. The structural renormalization is consistent with the decrease and increase in oscillator strength of respective electronic transitions involving tetrahedrally (at $\sim$ 3.5 eV) and octahedrally (at $\sim$ 5.7 eV) coordinated Fe$^{3+}$ cations.
△ Less
Submitted 30 September, 2019;
originally announced September 2019.
-
Instant Motion Tracking and Its Applications to Augmented Reality
Authors:
Jianing Wei,
Genzhi Ye,
Tyler Mullen,
Matthias Grundmann,
Adel Ahmadyan,
Tingbo Hou
Abstract:
Augmented Reality (AR) brings immersive experiences to users. With recent advances in computer vision and mobile computing, AR has scaled across platforms, and has increased adoption in major products. One of the key challenges in enabling AR features is proper anchoring of the virtual content to the real world, a process referred to as tracking. In this paper, we present a system for motion track…
▽ More
Augmented Reality (AR) brings immersive experiences to users. With recent advances in computer vision and mobile computing, AR has scaled across platforms, and has increased adoption in major products. One of the key challenges in enabling AR features is proper anchoring of the virtual content to the real world, a process referred to as tracking. In this paper, we present a system for motion tracking, which is capable of robustly tracking planar targets and performing relative-scale 6DoF tracking without calibration. Our system runs in real-time on mobile phones and has been deployed in multiple major products on hundreds of millions of devices.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs
Authors:
Yury Kartynnik,
Artsiom Ablavatski,
Ivan Grishchenko,
Matthias Grundmann
Abstract:
We present an end-to-end neural network-based model for inferring an approximate 3D mesh representation of a human face from single camera input for AR applications. The relatively dense mesh model of 468 vertices is well-suited for face-based AR effects. The proposed model demonstrates super-realtime inference speed on mobile GPUs (100-1000+ FPS, depending on the device and model variant) and a h…
▽ More
We present an end-to-end neural network-based model for inferring an approximate 3D mesh representation of a human face from single camera input for AR applications. The relatively dense mesh model of 468 vertices is well-suited for face-based AR effects. The proposed model demonstrates super-realtime inference speed on mobile GPUs (100-1000+ FPS, depending on the device and model variant) and a high prediction quality that is comparable to the variance in manual annotations of the same image.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs
Authors:
Valentin Bazarevsky,
Yury Kartynnik,
Andrey Vakunov,
Karthik Raveendran,
Matthias Grundmann
Abstract:
We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, fac…
▽ More
We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.
△ Less
Submitted 14 July, 2019; v1 submitted 11 July, 2019;
originally announced July 2019.
-
On-Device Neural Net Inference with Mobile GPUs
Authors:
Juhyun Lee,
Nikolay Chirkov,
Ekaterina Ignasheva,
Yury Pisarchyk,
Mogan Shieh,
Fabio Riccardi,
Raman Sarokin,
Andrei Kulik,
Matthias Grundmann
Abstract:
On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Re…
▽ More
On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at https://tensorflow.org/lite.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
MediaPipe: A Framework for Building Perception Pipelines
Authors:
Camillo Lugaresi,
Jiuqiang Tang,
Hadon Nash,
Chris McClanahan,
Esha Uboweja,
Michael Hays,
Fan Zhang,
Chuo-Ling Chang,
Ming Guang Yong,
Juhyun Lee,
Wan-Teh Chang,
Wei Hua,
Manfred Georg,
Matthias Grundmann
Abstract:
Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenge…
▽ More
Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenges. A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms. We show that these features enable a developer to focus on the algorithm or model development and use MediaPipe as an environment for iteratively improving their application with results reproducible across different devices and platforms. MediaPipe will be open-sourced at https://github.com/google/mediapipe.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Polaronic inter-acceptor hopping transport in intrinsically doped nickel oxide
Authors:
Robert Karsthof,
Marius Grundmann,
Markus Arthur Anton,
Friedrich Kremer
Abstract:
In this work, we revisit the issue of the nature of electronic transport in nickel oxide (NiO) and show that the widely used model of free small polaron hopping, initially raised to characterize transport in high-purity samples, is not appropriate for modeling intrinsically doped NiO. Instead, we present extensive evidence, collected by means of temperature- and frequency-dependent measurements of…
▽ More
In this work, we revisit the issue of the nature of electronic transport in nickel oxide (NiO) and show that the widely used model of free small polaron hopping, initially raised to characterize transport in high-purity samples, is not appropriate for modeling intrinsically doped NiO. Instead, we present extensive evidence, collected by means of temperature- and frequency-dependent measurements of the electrical conductivity $σ$, that the model of polaronic inter-acceptor hopping can be used to consistently explain the electronic conduction process. In this framework, holes are localized to acceptors (Ni vacancies), forming a strongly bound, polaron-like state. They can only move through the film by hopping to a neighboring, at least partially unoccupied, acceptor. This renders the spatial overlap between neighboring polaronic wave functions a highly critical parameter. The signature of this process is the occurrence of two temperature regions of the DC conductivity, separated by about half the Debye temperature $θ_D/2 \approx 200 K$. For $T > θ_D/2$, holes are transferred by phonon-assisted hopping over the potential barrier between two sites, whereas phonon-assisted tunneling through the barrier dominates below that temperature. We also show that the degree of structural and electronic disorder plays a vital role in determining the characteristics of the transport process: high disorder leads to strong energetic broadening of the acceptor states such that hopping to more distant sites may be favored over transfer to nearest neighbors (variable range hopping). The assumption of high binding energies of the charge carriers at VNi is in accordance with the recent paradigm shift regarding the understanding of the electronic structure of NiO: holes doped into NiO couple to Ni 3d spins, thereby occupying deep polaron-like states within the band gap (Zhang-Rice bound doublets).
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Ultrafast dynamics of hot charge carriers in an oxide semiconductor probed by femtosecond spectroscopic ellipsometry
Authors:
Steffen Richter,
Oliver Herrfurth,
Shirly Espinoza,
Mateusz Rebarz,
Miroslav Kloz,
Joshua A. Leveillee,
André Schleife,
Stefan Zollner,
Marius Grundmann,
Jakob Andreasson,
Rüdiger Schmidt-Grund
Abstract:
Many linked processes occur concurrently in strongly excited semiconductors, such as interband and intraband absorption, scattering of electrons and holes by the heated lattice, Pauli blocking, bandgap renormalization and the formation of Mahan excitons. In this work, we disentangle their dynamics and contributions to the optical response of a ZnO thin film. Using broadband pump-probe ellipsometry…
▽ More
Many linked processes occur concurrently in strongly excited semiconductors, such as interband and intraband absorption, scattering of electrons and holes by the heated lattice, Pauli blocking, bandgap renormalization and the formation of Mahan excitons. In this work, we disentangle their dynamics and contributions to the optical response of a ZnO thin film. Using broadband pump-probe ellipsometry, we can directly and unambiguously obtain the real and imaginary part of the transient dielectric function which we compare with first-principles simulations. We find interband and excitonic absorption partially blocked and screened by the photo-excited electron occupation of the conduction band and hole occupation of the valence band (absorption bleaching). Exciton absorption turns spectrally narrower upon pumping and sustains the Mott transition, indicating Mahan excitons. Simultaneously, intra-valence-band transitions occur at sub-picosecond time scales after holes scatter to the edge of the Brillouin zone. Our results pave new ways for the understanding of non-equilibrium charge-carrier dynamics in materials by reliably distinguishing between changes in absorption coefficient and refractive index, thereby separating competing processes. This information will help to overcome the limitations of materials for high-power optical devices that owe their properties from dynamics in the ultrafast regime.
△ Less
Submitted 27 July, 2020; v1 submitted 15 February, 2019;
originally announced February 2019.
-
Strain and Band-Gap Engineering in Ge-Sn Alloys via P Doping
Authors:
Slawomir Prucnal,
Yonder Berencén,
Mao Wang,
Jörg Grenzer,
Matthias Voelskow,
Rene Hübner,
Yuji Yamamoto,
Alexander Scheit,
Florian Bärwolf,
Vitaly Zviagin,
Rüdiger Schmidt-Grund,
Marius Grundmann,
Jerzy Żuk,
Marcin Turek,
Andrzej Droździel,
Krzysztof Pyszniak,
Robert Kudrawiec,
Maciej P. Polak,
Lars Rebohle,
Wolfgang Skorupa,
Manfred Helm,
Shengqiang Zhou
Abstract:
Ge with a quasi-direct band gap can be realized by strain engineering, alloying with Sn, or ultrahigh n-type doping. In this work, we use all three approaches together to fabricate direct-band-gap Ge-Sn alloys. The heavily doped n-type Ge-Sn is realized with CMOS-compatible nonequilibrium material processing. P is used to form highly doped n-type Ge-Sn layers and to modify the lattice parameter of…
▽ More
Ge with a quasi-direct band gap can be realized by strain engineering, alloying with Sn, or ultrahigh n-type doping. In this work, we use all three approaches together to fabricate direct-band-gap Ge-Sn alloys. The heavily doped n-type Ge-Sn is realized with CMOS-compatible nonequilibrium material processing. P is used to form highly doped n-type Ge-Sn layers and to modify the lattice parameter of P-doped Ge-Sn alloys. The strain engineering in heavily-P-doped Ge-Sn films is confirmed by x-ray diffraction and micro Raman spectroscopy. The change of the band gap in P-doped Ge-Sn alloy as a function of P concentration is theoretically predicted by density functional theory and experimentally verified by near-infrared spectroscopic ellipsometry. According to the shift of the absorption edge, it is shown that for an electron concentration greater than 1x10^20 cm-3 the band-gap renormalization is partially compensated by the Burstein-Moss effect. These results indicate that Ge-based materials have high potential for use in near-infrared optoelectronic devices, fully compatible with CMOS technology.
△ Less
Submitted 7 January, 2019;
originally announced January 2019.
-
Effect of annealing on the magnetic properties of zinc ferrite thin films
Authors:
Yogesh Kumar,
Israel Lorite,
Michael Lorenz,
Pablo D. Esquinazi,
Marius Grundmann
Abstract:
We report on the magnetic properties of zinc ferrite thin film deposited on SrTiO$_3$ single crystal using pulsed laser deposition. X-ray diffraction result indicates the highly oriented single phase growth of the film along with the presence of the strain. In comparison to the bulk antiferromagnetic order, the as-deposited film has been found to exhibit ferrimagnetic ordering with a coercive fiel…
▽ More
We report on the magnetic properties of zinc ferrite thin film deposited on SrTiO$_3$ single crystal using pulsed laser deposition. X-ray diffraction result indicates the highly oriented single phase growth of the film along with the presence of the strain. In comparison to the bulk antiferromagnetic order, the as-deposited film has been found to exhibit ferrimagnetic ordering with a coercive field of 1140~Oe at 5~K. A broad maximum, at $\approx$105~K, observed in zero-field cooled magnetization curve indicates the wide grain size distribution for the as-deposited film. Reduction in magnetization and blocking temperature has been observed after annealing in both argon as well as oxygen atmospheres, where the variation was found to be dependent on the annealing temperature.
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
Exceptional points in anisotropic planar microcavities
Authors:
Steffen Richter,
Tom Michalsky,
Chris Sturm,
Bernd Rosenow,
Marius Grundmann,
Rüdiger Schmidt-Grund
Abstract:
Planar microcavities allow the control and manipulation of spin-polarization, manifested in phenomena like the optical spin Hall effect due to the intrinsic polarization mode splitting. Here, we study a transparent microcavity with broken rotational symmetry, realized by aligning the optical axis of a uniaxial cavity material in the cavity plane. We demonstrate that the in-plane optical anisotropy…
▽ More
Planar microcavities allow the control and manipulation of spin-polarization, manifested in phenomena like the optical spin Hall effect due to the intrinsic polarization mode splitting. Here, we study a transparent microcavity with broken rotational symmetry, realized by aligning the optical axis of a uniaxial cavity material in the cavity plane. We demonstrate that the in-plane optical anisotropy gives rise to exceptional points in the dispersion relation, which occur pair-wise, are circularly polarized, and are cores of polarization vortices. These exceptional points are a result of the non-Hermitian character of the system, and are in close relationship to singular optical axes in absorptive biaxial systems.
△ Less
Submitted 24 September, 2016;
originally announced September 2016.
-
Raman tensor elements of $β\text{-Ga}_2\text{O}_3$
Authors:
Christian Kranert,
Chris Sturm,
Rüdiger Schmidt-Grund,
Marius Grundmann
Abstract:
The Raman spectrum and particularly the Raman scattering intensities of monoclinic $β\text{-Ga}_2\text{O}_3$ are investigated by experiment and theory. The low symmetry of $β\text{-Ga}_2\text{O}_3$ results in a complex dependence of the Raman intensity for the individual phonon modes on the scattering geometry which is additionally affected by birefringence. We measured the Raman spectra in depend…
▽ More
The Raman spectrum and particularly the Raman scattering intensities of monoclinic $β\text{-Ga}_2\text{O}_3$ are investigated by experiment and theory. The low symmetry of $β\text{-Ga}_2\text{O}_3$ results in a complex dependence of the Raman intensity for the individual phonon modes on the scattering geometry which is additionally affected by birefringence. We measured the Raman spectra in dependence on the polarization direction for backscattering on three crystallographic planes of $β\text{-Ga}_2\text{O}_3$ and modeled these dependencies using a modified Raman tensor formalism which takes birefringence into account. The spectral position of all 15 Raman-active phonon modes and the Raman tensor elements of 13 modes were determined and are compared to results from ab-initio calculations.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Photo-enhanced magnetization in Fe-doped ZnO nanowires
Authors:
I. Lorite,
Y. Kumar,
P. Esquinazi,
S. Friedländer,
A. Pöppl,
T. Michalsky,
J. Meijer,
M. Grundmann,
T. Meyer,
I. Estrela-Lopis
Abstract:
An emerging branch of electronics, the optospintronics, would be highly boosted if the control of magnetic order by light is implemented in magnetic semiconductors nanostructures being compatible with the actual technology. Here we show that the ferromagnetic magnetization of low Fe-doped ZnO nanowires prepared by carbothermal process is enhanced under illumination up to temperatures slightly belo…
▽ More
An emerging branch of electronics, the optospintronics, would be highly boosted if the control of magnetic order by light is implemented in magnetic semiconductors nanostructures being compatible with the actual technology. Here we show that the ferromagnetic magnetization of low Fe-doped ZnO nanowires prepared by carbothermal process is enhanced under illumination up to temperatures slightly below room temperature. This enhancement is related to the existence of an oxygen vacancy V$_{\rm O}$ in the neighbouring of an antiferromagnetic superexchange Fe$^{3+}$-Fe$^{3+}$ pair. Under illumination the V$_{\rm O}$ is ionized to V$_{\rm O}^+$ giving an electron to a close Fe$^{3+}$ ion from the antiferromagnetic pair. This light excited electron transition allows the transition of Fe$^{3+}$ to Fe$^{2+}$ forming stable ferromagnetic double exchange pairs, increasing the total magnetization. The results here presented indicate an efficient way to influence the magnetic properties of ZnO based nanostructures by light illumination at high temperatures.
△ Less
Submitted 22 June, 2016;
originally announced June 2016.
-
Fundamental absorption edges in heteroepitaxial YBiO$_3$ thin films
Authors:
Marcus Jenderka,
Steffen Richter,
Michael Lorenz,
Marius Grundmann
Abstract:
The dielectric function of heteroepitaxial YBiO$_3$ grown on $a$-Al$_2$O$_3$ single crystals via pulsed laser deposition is determined in the spectral range from 0.03 eV to 4.5 eV by simultaneous modeling of spectroscopic ellipsometry and optical transmission data of YBiO$_3$ films of different thickness. The (111)-oriented YBiO$_3$ films are nominally unstrained and crystallize in a defective flu…
▽ More
The dielectric function of heteroepitaxial YBiO$_3$ grown on $a$-Al$_2$O$_3$ single crystals via pulsed laser deposition is determined in the spectral range from 0.03 eV to 4.5 eV by simultaneous modeling of spectroscopic ellipsometry and optical transmission data of YBiO$_3$ films of different thickness. The (111)-oriented YBiO$_3$ films are nominally unstrained and crystallize in a defective fluorite-type structure with $Fm\bar{3}m$ space group. From the calculated absorption spectrum, a direct electronic bandgap energy of 3.6(1) eV and the signature of an indirect electronic transition around 0.5 eV are obtained. These values provide necessary experimental feedback to previous conflicting electronic band structure calculations predicting either a topologically trivial or non-trivial insulating ground state in YBiO$_3$.
△ Less
Submitted 19 September, 2016; v1 submitted 13 June, 2016;
originally announced June 2016.
-
Coexistence of strong and weak coupling in ZnO nanowire cavities
Authors:
Tom Michalsky,
Helena Franke,
Robert Buschlinger,
Ulf Peschel,
Marius Grundmann,
Rüdiger Schmidt-Grund
Abstract:
We present a high quality two-dimensional cavity structure based on ZnO nanowires coated with concentrical Bragg reflectors. The spatial mode distribution leads to the simultaneous appearance of the weak and strong coupling regime even at room temperature. Photoluminescence measurements agree with FDTD simulations. Furthermore the ZnO core nanowires allow for the observation of middle polariton br…
▽ More
We present a high quality two-dimensional cavity structure based on ZnO nanowires coated with concentrical Bragg reflectors. The spatial mode distribution leads to the simultaneous appearance of the weak and strong coupling regime even at room temperature. Photoluminescence measurements agree with FDTD simulations. Furthermore the ZnO core nanowires allow for the observation of middle polariton branches between the A- and B-exciton ground state resonances. Further, lasing emission up to room temperature is detected in excitation dependent photoluminescence measurements.
△ Less
Submitted 22 February, 2016;
originally announced February 2016.
-
Dipole Analysis of the Dielectric Function of Colour Dispersive Materials: Application to Monoclinic Ga$_2$O$_3$
Authors:
Chris Sturm,
Rüdiger Schmidt-Grund,
Christian Kranert,
Jürgen Furthmüller,
Friedhelm Bechstedt,
Marius Grundmann
Abstract:
We apply a generalized model for the determination and analysis of the dielectric function of optically anisotropic materials with colour dispersion to phonon modes and show that it can also be generalized to excitonic polarizabilities and electronic band-band transitions. We take into account that the tensor components of the dielectric function within the cartesian coordinate system are not inde…
▽ More
We apply a generalized model for the determination and analysis of the dielectric function of optically anisotropic materials with colour dispersion to phonon modes and show that it can also be generalized to excitonic polarizabilities and electronic band-band transitions. We take into account that the tensor components of the dielectric function within the cartesian coordinate system are not independent from each other but are rather projections of the polarization of dipoles oscillating along directions defined by the, non-cartesian, crystal symmetry and polarizability. The dielectric function is then composed of a series of oscillators pointing in different directions. The application of this model is exemplarily demonstrated for monoclinic ($β$-phase) Ga$_2$O$_3$ bulk single crystals. Using this model, we are able to relate electronic transitions observed in the dielectric function to atomic bond directions and orbitals in the real space crystal structure. For thin films revealing rotational domains we show that the optical biaxiality is reduced to uniaxial optical response.
△ Less
Submitted 28 January, 2016;
originally announced January 2016.
-
Carrier density driven lasing dynamics in ZnO nanowires
Authors:
Marcel Wille,
Chris Sturm,
Tom Michalsky,
Robert Röder,
Carsten Ronning,
Rüdiger Schmidt-Grund,
Marius Grundmann
Abstract:
We report on the temporal lasing dynamics of high quality ZnO nanowires using time-resolved micro-photoluminescence technique. The temperature dependence of the lasing characteristics and of the corresponding decay constants demonstrate the formation of an electron-hole plasma to be the underlying gain mechanism in the considered temperature range from 10 K to 300 K. We found that the temperature…
▽ More
We report on the temporal lasing dynamics of high quality ZnO nanowires using time-resolved micro-photoluminescence technique. The temperature dependence of the lasing characteristics and of the corresponding decay constants demonstrate the formation of an electron-hole plasma to be the underlying gain mechanism in the considered temperature range from 10 K to 300 K. We found that the temperature dependent emission onset-time ($t_{\text{on}}$) strongly depends on the excitation power and becomes smallest in the lasing regime, with values below 5 ps. Furthermore, the observed red shift of the dominating lasing modes in time is qualitatively discussed in terms of the carrier density induced change of the refractive index dispersion after the excitation laser pulse. This theory is supported by extending an existing model for the calculation of the carrier density dependent complex refractive index for different temperatures. This model coincides with the experimental observations and reliably describes the evolution of the refractive index after the excitation laser pulse.
△ Less
Submitted 15 January, 2016;
originally announced January 2016.