Search | arXiv e-print repository

arXiv:2407.20175 [pdf, other]

Towards Localized Fine-Grained Control for Facial Expression Generation

Authors: Tuomas Varanka, Huai-Qian Khor, Yante Li, Mengting Wei, Hanwei Kung, Nicu Sebe, Guoying Zhao

Abstract: Generative models have surged in popularity recently due to their ability to produce high-quality images and video. However, steering these models to produce images with specific attributes and precise control remains challenging. Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate… ▽ More Generative models have surged in popularity recently due to their ability to produce high-quality images and video. However, steering these models to produce images with specific attributes and precise control remains challenging. Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity. Other basic expressions like anger are possible, but are limited to the stereotypical expression, while other unconventional facial expressions like doubtful are difficult to reliably generate. In this work, we propose the use of AUs (action units) for facial expression control in face generation. AUs describe individual facial muscle movements based on facial anatomy, allowing precise and localized control over the intensity of facial movements. By combining different action units, we unlock the ability to create unconventional facial expressions that go beyond typical emotional models, enabling nuanced and authentic reactions reflective of real-world expressions. The proposed method can be seamlessly integrated with both text and image prompts using adapters, offering precise and intuitive control of the generated results. Code and dataset are available in {https://github.com/tvaranka/fineface}. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.08798 [pdf, other]

Electronically-driven switching of topology in LaSbTe

Authors: J. Bannies, M. Michiardi, H. -H. Kung, S. Godin, J. W. Simonson, M. Oudah, M. Zonno, S. Gorovikov, S. Zhdanovich, I. S. Elfimov, A. Damascelli, M. C. Aronson

Abstract: In the past two decades, various classes of topological materials have been discovered, spanning topological insulators, semimetals, and metals. While the observation and understanding of the topology of a material has been a primary focus so far, the precise and easy control of topology in a single material remains largely unexplored. Here, we demonstrate full experimental control over the topolo… ▽ More In the past two decades, various classes of topological materials have been discovered, spanning topological insulators, semimetals, and metals. While the observation and understanding of the topology of a material has been a primary focus so far, the precise and easy control of topology in a single material remains largely unexplored. Here, we demonstrate full experimental control over the topological Dirac nodal loop in the square-net material LaSb$_\mathrm{x}$Te$_\mathrm{2-x}$ by chemical substitution and electron doping. Using angle-resolved photoemission spectroscopy (ARPES), we show that changing the antimony concentration x from 0.9 to 1.0 in the bulk opens a gap as large as 400 meV in the nodal loop. Our symmetry analysis based on single-crystal X-ray diffraction and a minimal tight binding model establishes that the breaking of \textit{n} glide symmetry in the square-net layer is responsible for the opening of the gap. Remarkably, we can also realize this topological phase transition \textit{in situ} on the surface of LaSb$_\mathrm{x}$Te$_\mathrm{2-x}$ by chemical gating using potassium deposition, which enables the reversible switching of the topology from gapped to gapless nodal loop. The underlying control parameter for the structural and topological transition in the bulk and on the surface is the electron concentration. It opens a pathway towards applications in devices based on switching topology by electrostatic gating. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2403.16451 [pdf, other]

DeepMachining: Online Prediction of Machining Errors of Lathe Machines

Authors: Xiang-Li Lu, Hwai-Jung Hsu, Che-Wei Chou, H. T. Kung, Chen-Hsin Lee, Sheng-Mao Cheng

Abstract: We describe DeepMachining, a deep learning-based AI system for online prediction of machining errors of lathe machine operations. We have built and evaluated DeepMachining based on manufacturing data from factories. Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states. Then, we fine-tune the pretrained model… ▽ More We describe DeepMachining, a deep learning-based AI system for online prediction of machining errors of lathe machine operations. We have built and evaluated DeepMachining based on manufacturing data from factories. Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states. Then, we fine-tune the pretrained model to adapt to specific machining tasks. We demonstrate that DeepMachining achieves high prediction accuracy for multiple tasks that involve different workpieces and cutting tools. To the best of our knowledge, this work is one of the first factory experiments using pre-trained deep-learning models to predict machining errors of lathe machines. △ Less

Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.15504 [pdf, other]

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Authors: Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen

Abstract: Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts --… ▽ More Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e.g., LAION). Second, given an image containing multiple personalized concepts, there lacks a holistic metric that evaluates performance on not just the degree of resemblance of personalized concepts, but also whether all concepts are present in the image and whether the image accurately reflects the overall text description. To address these issues, we introduce Gen4Gen, a semi-automated dataset creation pipeline utilizing generative models to combine personalized concepts into complex compositions along with text-descriptions. Using this, we create a dataset called MyCanvas, that can be used to benchmark the task of multi-concept personalization. In addition, we design a comprehensive metric comprising two scores (CP-CLIP and TI-CLIP) for better quantifying the performance of multi-concept, personalized text-to-image diffusion methods. We provide a simple baseline built on top of Custom Diffusion with empirical prompting strategies for future researchers to evaluate on MyCanvas. We show that by improving data quality and prompting strategies, we can significantly increase multi-concept personalized image generation quality, without requiring any modifications to model architecture or training algorithms. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Preprint; Project Page: https://danielchyeh.github.io/Gen4Gen/

arXiv:2311.11402 [pdf, ps, other]

Discovery of Superconductivity and Electron-Phonon Drag in the Non-Centrosymmetric Weyl Semimetal LaRhGe$_3$

Authors: Mohamed Oudah, Hsiang-Hsi Kung, Samikshya Sahu, Niclas Heinsdorf, Armin Schulz, Kai Philippi, Marta-Villa De Toro Sanchez, Yipeng Cai, Kenji Kojima, Andreas P. Schnyder, Hidenori Takagi, Bernhard Keimer, Doug A. Bonn, Alannah M. Hallas

Abstract: We present an exploration of the effect of electron-phonon coupling and broken inversion symmetry on the electronic and thermal properties of the semimetal LaRhGe$_3$. Our transport measurements reveal evidence for electron-hole compensation at low temperatures, resulting in a large magnetoresistance of 3000% at 1.8 K and 14 T. The carrier concentration is on the order of $10^{21}\rm{/cm}^3$ with… ▽ More We present an exploration of the effect of electron-phonon coupling and broken inversion symmetry on the electronic and thermal properties of the semimetal LaRhGe$_3$. Our transport measurements reveal evidence for electron-hole compensation at low temperatures, resulting in a large magnetoresistance of 3000% at 1.8 K and 14 T. The carrier concentration is on the order of $10^{21}\rm{/cm}^3$ with high carrier mobilities of $2000~\rm{cm}^2/\rm{Vs}$. When coupled to our theoretical demonstration of symmetry-protected $\textit{almost movable}$ Weyl nodal lines, we conclude that LaRhGe$_3$ supports a Weyl semimetallic state. We discover superconductivity in this compound with a $T_{\text c}$ of 0.39(1) K and $B_{\rm{c}}(0)$ of 2.2(1) mT, with evidence from specific heat and transverse-field muon spin relaxation. We find an exponential dependence in the normal state electrical resistivity below $\sim50$ K, while Seebeck coefficient and thermal conductivity measurements each reveal a prominent peak at low temperatures, indicative of strong electron-phonon interactions. To this end, we examine the temperature-dependent Raman spectra of LaRhGe$_3$ and find that the lifetime of the lowest energy $A_1$ phonon is dominated by phonon-electron scattering instead of anharmonic decay. We conclude that LaRhGe$_3$ has strong electron-phonon coupling in the normal state, while the superconductivity emerges from weak electron-phonon coupling. These results open up the investigation of electron-phonon interactions in the normal state of superconducting non-centrosymmetric Weyl semimetals. △ Less

Submitted 29 May, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2310.04394 [pdf, other]

doi 10.1103/PhysRevB.109.L041111

Spin-Mediated Direct Photon Scattering by Plasmons in BiTeI

Authors: A. C. Lee, S. Sarkar, K. Du, H. -H. Kung, C. J. Won, K. Wang, S. -W. Cheong, S. Maiti, G. Blumberg

Abstract: We use polarization resolved Raman spectroscopy to demonstrate that for a 3D giant Rashba system the bulk plasmon collective mode can directly couple to the Raman response even in the long wavelength $\mathbf q \rightarrow 0$ limit. Although conventional theory predicts the plasmon spectral weight to be suppressed as the square of its quasi-momentum and thus negligibly weak in the Raman spectra, w… ▽ More We use polarization resolved Raman spectroscopy to demonstrate that for a 3D giant Rashba system the bulk plasmon collective mode can directly couple to the Raman response even in the long wavelength $\mathbf q \rightarrow 0$ limit. Although conventional theory predicts the plasmon spectral weight to be suppressed as the square of its quasi-momentum and thus negligibly weak in the Raman spectra, we observe a sharp in-gap plasmon mode in the Raman spectrum of BiTeI below the Rashba continuum. This coupling, in a polar system with spin-orbit coupling, occurs without assistance from phonons when the incoming photon excitation is resonant with Rashba-split intermediate states. We discuss the distinctive features of BiTeI's giant Rashba system band structure that enable the direct observation of plasmon in Raman scattering. △ Less

Submitted 18 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Editors' Suggestion

Journal ref: Phys. Rev. B 109, L041111 (2024)

arXiv:2310.03170 [pdf]

Critical Role of Disorder for Superconductivity in the Series of Epitaxial Ti(O,N) Films

Authors: Fengmiao Li, Oliver Dicks, Myung-Geun Han, Solveig Aamlid, Giorgio Levy, Ronny Sutarto, Chong Liu, Hsiang-Hsi Kung, Oleksandr Foyevstov, Simon Godin, Bruce A. Davidson, Andrea Damascelli, Yimei Zhu, Christoph Heil, Ilya Elfimov, George A. Sawatzky, Ke Zou

Abstract: Experimental manipulation of superconductivity is of paramount importance, not only for practical applications but also for identifying the key factors involved in electron pairing. In this work, we have undertaken a meticulous study of the superconductivity in a series of titanium compounds with a rocksalt structure, synthesized as epitaxial films. We find that substituting nitrogen (N) for oxyge… ▽ More Experimental manipulation of superconductivity is of paramount importance, not only for practical applications but also for identifying the key factors involved in electron pairing. In this work, we have undertaken a meticulous study of the superconductivity in a series of titanium compounds with a rocksalt structure, synthesized as epitaxial films. We find that substituting nitrogen (N) for oxygen (O) in titanium monoxide (TiO) with the stoichiometry close to TiO$_{0.6}$N$_{0.4}$ leads to superconductivity with a transition temperature (T$_c$) of ~2.6 K, about five times higher than that of TiO at ~0.5 K and half as high as the T$_c$ of ~6 K in titanium nitride (TiN). However, Eliashberg theoretical calculations predict similar Tc in TiO, Ti oxynitride and TiN. The analysis of electron mean free path suggests the presence of significant disorder in TiO and a remarkable reduction in the impact of disorder in oxynitrides. Density functional theory (DFT) calculations reveal that disorder decreases the coherence of electronic states for non-zero momenta, which would degrade the influence of electron-phonon. Our findings demonstrate the disorder and superconductivity depend strongly on the N/O ratio, highlighting the critical role of disorder for superconductivity in this series of Ti(O,N) materials. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2307.03930 [pdf, other]

Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

Authors: Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting

Abstract: We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to e… ▽ More We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to effectively utilize processor cores and minimize data movement without the need for auto-tuning or search space exploration. Rosko can be integrated with other outer product scheduling methods, allowing them to leverage row skipping by using Rosko's packing format to skip unnecessary computation. Rosko kernels outperform existing auto-tuning and search-based solutions as well as state-of-the-art vendor-optimized libraries on real hardware across a variety of neural network workloads. For matrices with sparsities ranging from 65% to 99.8% typically found in machine learning, Rosko kernels achieve up to a 6.5x runtime reduction on Intel and ARM CPUs. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Rosko's CPU implementation can be found at https://github.com/vnatesh/Rosko

arXiv:2305.17546 [pdf, other]

doi 10.1103/PhysRevB.108.174301

Electronic and Vibrational Excitations on the Surface of the Three-Dimensional Topological Insulator Bi$_2$Te$_{3-x}$Se$_{x}$ (x = 0, 2, 3)

Authors: A. Lee, H. -H. Kung, Xueyun Wang, S. -W. Cheong, G. Blumberg

Abstract: We study surface states in the three-dimensional topological insulators Bi$_2$Te$_{3-x}$Se$_{x}$ (x = 0, 2, 3) by polarization resolved resonant Raman spectroscopy. By tracking the spectral intensity of the surface phonon modes with respect to the incident photon energy, we show that the surface phonons are qualitatively similar to their bulk counterparts. Using the resonant Raman excitation profi… ▽ More We study surface states in the three-dimensional topological insulators Bi$_2$Te$_{3-x}$Se$_{x}$ (x = 0, 2, 3) by polarization resolved resonant Raman spectroscopy. By tracking the spectral intensity of the surface phonon modes with respect to the incident photon energy, we show that the surface phonons are qualitatively similar to their bulk counterparts. Using the resonant Raman excitation profile, we estimated the binding energy of the surface conduction bands relative to bulk conduction bands. In addition, we selectively excite the surface-to-bulk electronic continuum near the Fermi energy in Bi$_2$Se$_3$ to determine the strength of Fano interaction between the most prominent surface phonon and the surface-to-bulk continuum. △ Less

Submitted 14 January, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

arXiv:2304.05544 [pdf, other]

MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

Authors: Andrew Sabot, Vikas Natesh, H. T. Kung, Wei-Te Ting

Abstract: We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in th… ▽ More We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in the current practice, that is, optimal schedules tend to be found only through a time consuming and heuristic search of a large scheduling space. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems. For example, for neural network benchmarks on the ARM Cortex-M4, we achieve up to a 1.8x speedup and 44% energy reduction over CMSIS-NN. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: Accepted as a full paper by the TinyML Research Symposium 2023

arXiv:2301.01947 [pdf, ps, other]

StitchNet: Composing Neural Networks from Pre-Trained Fragments

Authors: Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H. T. Kung

Abstract: We propose StitchNet, a novel neural network creation paradigm that stitches together fragments (one or more consecutive network layers) from multiple pre-trained neural networks. StitchNet allows the creation of high-performing neural networks without the large compute and data requirements needed under traditional model creation processes via backpropagation training. We leverage Centered Kernel… ▽ More We propose StitchNet, a novel neural network creation paradigm that stitches together fragments (one or more consecutive network layers) from multiple pre-trained neural networks. StitchNet allows the creation of high-performing neural networks without the large compute and data requirements needed under traditional model creation processes via backpropagation training. We leverage Centered Kernel Alignment (CKA) as a compatibility measure to efficiently guide the selection of these fragments in composing a network for a given task tailored to specific accuracy needs and computing resource constraints. We then show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks at a fraction of computing resource and data requirements. Finally, we explore a novel on-the-fly personalized model creation and inference application enabled by this new paradigm. The code is available at https://github.com/steerapi/stitchnet. △ Less

Submitted 23 September, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

arXiv:2209.12127 [pdf, other]

SpeedLimit: Neural Architecture Search for Quantized Transformer Models

Authors: Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments. △ Less

Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

arXiv:2207.09413 [pdf, other]

SphereFed: Hyperspherical Federated Learning

Authors: Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung

Abstract: Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the… ▽ More Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. Specifically, all clients learn their local representations by minimizing the loss with respect to a fixed classifier whose weights span the unit hypersphere. After federated training in improving the global model, this classifier is further calibrated with a closed-form solution by minimizing a mean squared loss. We show that the calibration solution can be computed efficiently and distributedly without direct access of local data. Extensive experiments indicate that our SphereFed approach is able to improve the accuracy of multiple existing federated learning algorithms by a considerable margin (up to 6% on challenging datasets) with enhanced computation and communication efficiency across datasets and model architectures. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: European Conference on Computer Vision 2022

arXiv:2204.04705 [pdf, other]

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

Authors: Xin Dong, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T. Kung, Ziyun Li

Abstract: We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural archi… ▽ More We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural architecture search framework, SplitNets, is introduced to conduct model designing, splitting, and communication reduction simultaneously. We further extend the framework to multi-view systems for learning to fuse inputs from multiple camera sensors with optimal performance and systemic efficiency. We validate SplitNets for single-view system on ImageNet as well as multi-view system on 3D classification, and show that the SplitNets framework achieves state-of-the-art (SOTA) performance and system latency compared with existing approaches. △ Less

Submitted 10 April, 2022; originally announced April 2022.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

arXiv:2202.09642 [pdf, other]

Anisotropy of Kondo-lattice coherence in momentum space for CeCoIn5

Authors: Mai Ye, Hsiang-Hsi Kung, Priscila F. S. Rosa, Eric D. Bauer, Kristjan Haule, Girsh Blumberg

Abstract: We study the electronic and phononic excitations of heavy-fermion metal CeCoIn$_5$ by polarization-resolved Raman spectroscopy to explore the Kondo-lattice coherence. Below the coherence temperature T*\,=\,45\,K, the continuum of electronic excitations in the XY scattering geometry is suppressed at frequencies below 50\,cm$^{-1}$, whereas the low-frequency continuum in the X'Y' geometry exhibits n… ▽ More We study the electronic and phononic excitations of heavy-fermion metal CeCoIn$_5$ by polarization-resolved Raman spectroscopy to explore the Kondo-lattice coherence. Below the coherence temperature T*\,=\,45\,K, the continuum of electronic excitations in the XY scattering geometry is suppressed at frequencies below 50\,cm$^{-1}$, whereas the low-frequency continuum in the X'Y' geometry exhibits no change across T*. We relate the suppression to the reduced electron-electron scattering rate resulting from the coherence effect. The presence of suppression in the XY geometry and absence of it in the X'Y' geometry implies that the $α$ and $β$ bands become coherent below T*, whereas the $γ$ band remains largely incoherent down to 10\,K. Moreover, two optical phonon modes exhibit anomalies in their temperature dependence of the frequency and linewidth below T*, which results from developing coherent spectral weight near the Fermi level and reduced electron-phonon scattering rate. Our results further support the key role of anisotropic hybridization in CeCoIn$_5$. △ Less

Submitted 19 February, 2022; originally announced February 2022.

arXiv:2202.03569 [pdf, other]

doi 10.1103/PhysRevB.105.L161105

Chiral Electronic Excitations in a Quasi-2D Rashba System BiTeI

Authors: A. C. Lee, B. Peng, K. Du, H. -H. Kung, B. Monserrat, S. -W. Cheong, C. J. Won, G. Blumberg

Abstract: The optical transitions between spin-polarized bands of the quasi-two dimensional Rashba system BiTeI are investigated using polarization resolved resonant Raman spectroscopy. We detect chiral excitations between states with opposite helicity and compare spectra to calculations within a three-band model. Using the resonant Raman excitation profile, we deduce the Rashba parameters and band gaps of… ▽ More The optical transitions between spin-polarized bands of the quasi-two dimensional Rashba system BiTeI are investigated using polarization resolved resonant Raman spectroscopy. We detect chiral excitations between states with opposite helicity and compare spectra to calculations within a three-band model. Using the resonant Raman excitation profile, we deduce the Rashba parameters and band gaps of the higher conduction bands near the Fermi level, and compare the parameters to values obtained by ab initio density function theory (DFT). △ Less

Submitted 25 April, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2110.15456 [pdf, other]

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

Authors: Sai Qian Zhang, Bradley McDanel, H. T. Kung

Abstract: Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precis… ▽ More Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precision BFP input operands, enabling incremental increases in DNN precision throughout training. By increasing the BFP precision across both training iterations and DNN layers, FAST can greatly shorten the training time while reducing overall hardware resource usage. Our FAST Multipler-Accumulator (fMAC) supports dot product computations under multiple BFP precisions. We validate our FAST system on multiple DNNs with different datasets, demonstrating a 2-6$\times$ speedup in training on a single-chip platform over prior work based on \textbf{mixed-precision or block} floating point number systems while achieving similar performance in validation accuracy. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2107.06304 [pdf, other]

Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

Authors: Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung

Abstract: Mobile edge devices see increased demands in deep neural networks (DNNs) inference while suffering from stringent constraints in computing resources. Split computing (SC) emerges as a popular approach to the issue by executing only initial layers on devices and offloading the remaining to the cloud. Prior works usually assume that SC offers privacy benefits as only intermediate features, instead o… ▽ More Mobile edge devices see increased demands in deep neural networks (DNNs) inference while suffering from stringent constraints in computing resources. Split computing (SC) emerges as a popular approach to the issue by executing only initial layers on devices and offloading the remaining to the cloud. Prior works usually assume that SC offers privacy benefits as only intermediate features, instead of private data, are shared from devices to the cloud. In this work, we debunk this SC-induced privacy protection by (i) presenting a novel data-free model inversion method and (ii) demonstrating sample inversion where private data from devices can still be leaked with high fidelity from the shared feature even after tens of neural network layers. We propose Divide-and-Conquer Inversion (DCI) which partitions the given deep network into multiple shallow blocks and inverts each block with an inversion method. Additionally, cycle-consistency technique is introduced by re-directing the inverted results back to the model under attack in order to better supervise the training of the inversion modules. In contrast to prior art based on generative priors and computation-intensive optimization in deriving inverted samples, DCI removes the need for real device data and generative priors, and completes inversion with a single quick forward pass over inversion modules. For the first time, we scale data-free and sample-specific inversion to deep architectures and large datasets for both discriminative and generative networks. We perform model inversion attack to ResNet and RepVGG models on ImageNet and SNGAN on CelebA and recover the original input from intermediate features more than 40 layers deep into the network. △ Less

Submitted 24 October, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: A new data-free inversion method to reverse neural networks and get input from intermediate feature maps. BMVC'22

arXiv:2106.11423 [pdf, other]

Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

Authors: Huiwen Luo, Koki Nagano, Han-Wei Kung, Mclean Goldwhite, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, Hao Li

Abstract: We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person's face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face… ▽ More We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person's face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments. The key challenges for existing methods to work is the lack of training and ground truth data containing normalized 3D faces. We propose a two-stage approach to address this problem. First, we adopt a highly robust normalized 3D face generator by embedding a non-linear morphable face model into a StyleGAN2 network. This allows us to generate detailed but normalized facial assets. This inference is then followed by a perceptual refinement step that uses the generated assets as regularization to cope with the limited available training samples of normalized faces. We further introduce a Normalized Face Dataset, which consists of a combination photogrammetry scans, carefully selected photographs, and generated fake people with neutral expressions in diffuse lighting conditions. While our prepared dataset contains two orders of magnitude less subjects than cutting edge GAN-based 3D facial reconstruction methods, we show that it is possible to produce high-quality normalized face models for very challenging unconstrained input images, and demonstrate superior performance to the current state-of-the-art. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: Accepted to CVPR 2021

arXiv:2105.09320 [pdf, other]

doi 10.1038/s41467-022-30742-5

Optical manipulation of Rashba-split 2-Dimensional Electron Gas

Authors: M. Michiardi, F. Boschini, H. -H. Kung, M. X. Na, S. K. Y. Dufresne, A. Currie, G. Levy, S. Zhdanovich, A. K. Mills, D. J. Jones, J. L. Mi, B. B. Iversen, Ph. Hofmann, A. Damascelli

Abstract: In spintronic devices, the two main approaches to actively control the electrons' spin degree of freedom involve either static magnetic or electric fields. An alternative avenue relies on the application of optical fields to generate spin currents, which promises to bolster spin-device performance allowing for significantly faster and more efficient spin logic. To date, research has mainly focused… ▽ More In spintronic devices, the two main approaches to actively control the electrons' spin degree of freedom involve either static magnetic or electric fields. An alternative avenue relies on the application of optical fields to generate spin currents, which promises to bolster spin-device performance allowing for significantly faster and more efficient spin logic. To date, research has mainly focused on the optical injection of spin currents through the photogalvanic effect, and little is known about the direct optical control of the intrinsic spin splitting. Here, to explore the all-optical manipulation of a material's spin properties, we consider the Rashba effect at a semiconductor interface. The Rashba effect has long been a staple in the field of spintronics owing to its superior tunability, which allows the observation of fully spin-dependent phenomena, such as the spin-Hall effect, spin-charge conversion, and spin-torque in semiconductor devices. In this work, by means of time and angle-resolved photoemission spectroscopy (TR-ARPES), we demonstrate that an ultrafast optical excitation can be used to manipulate the Rashba-induced spin splitting of a two-dimensional electron gas (2DEG) engineered at the surface of the topological insulator Bi$_{2}$Se$_{3}$. We establish that light-induced photovoltage and charge carrier redistribution -- which in concert modulate the spin-orbit coupling strength on a sub-picosecond timescale -- can offer an unprecedented platform for achieving all optically-driven THz spin logic devices. △ Less

Submitted 2 June, 2022; v1 submitted 19 May, 2021; originally announced May 2021.

Journal ref: Nature Communications 13, 3096 (2022)

arXiv:2104.11408 [pdf, other]

Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

Authors: Xin Dong, Junfeng Guo, Ang Li, Wei-Te Ting, Cong Liu, H. T. Kung

Abstract: Various approaches have been proposed for out-of-distribution (OOD) detection by augmenting models, input examples, training sets, and optimization objectives. Deviating from existing work, we have a simple hypothesis that standard off-the-shelf models may already contain sufficient information about the training set distribution which can be leveraged for reliable OOD detection. Our empirical stu… ▽ More Various approaches have been proposed for out-of-distribution (OOD) detection by augmenting models, input examples, training sets, and optimization objectives. Deviating from existing work, we have a simple hypothesis that standard off-the-shelf models may already contain sufficient information about the training set distribution which can be leveraged for reliable OOD detection. Our empirical study on validating this hypothesis, which measures the model activation's mean for OOD and in-distribution (ID) mini-batches, surprisingly finds that activation means of OOD mini-batches consistently deviate more from those of the training data. In addition, training data's activation means can be computed offline efficiently or retrieved from batch normalization layers as a 'free lunch'. Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data. Leveraging the simplicity of NMD, we propose an efficient OOD detector that computes neural means by a standard forward pass followed by a lightweight classifier. Extensive experiments show that NMD outperforms state-of-the-art OOD approaches across multiple datasets and model architectures in terms of both detection accuracy and computational cost. △ Less

Submitted 26 March, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

arXiv:2103.13515 [pdf, other]

doi 10.1103/PhysRevB.103.155144

Extremely large magnetoresistance from electron-hole compensation in the nodal loop semimetal ZrP$_2$

Authors: J. Bannies, E. Razzoli, M. Michiardi, H. -H. Kung, I. S. Elfimov, M. Yao, A. Fedorov, J. Fink, C. Jozwiak, A. Bostwick, E. Rotenberg, A. Damascelli, C. Felser

Abstract: Several early transition metal dipnictides have been found to host topological semimetal states and exhibit large magnetoresistance. In this study, we use angle-resolved photoemission spectroscopy (ARPES) and magneto-transport to study the electronic properties of a new transition metal dipnictide ZrP$_2$. We find that ZrP$_2$ exhibits an extremely large and unsaturated magnetoresistance of up to… ▽ More Several early transition metal dipnictides have been found to host topological semimetal states and exhibit large magnetoresistance. In this study, we use angle-resolved photoemission spectroscopy (ARPES) and magneto-transport to study the electronic properties of a new transition metal dipnictide ZrP$_2$. We find that ZrP$_2$ exhibits an extremely large and unsaturated magnetoresistance of up to 40,000 % at 2 K, which originates from an almost perfect electron-hole compensation. Our band structure calculations further show that ZrP$_2$ hosts a topological nodal loop in proximity to the Fermi level. Based on the ARPES measurements, we confirm the results of our calculations and determine the surface band structure. Our study establishes ZrP$_2$ as a new platform to investigate near-perfect electron-hole compensation and its interplay with topological band structures. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: Accepted for publication in Physical Review B

Journal ref: Phys. Rev. B 103, 155144 (2021)

arXiv:2007.06389 [pdf, other]

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

Abstract: We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of… ▽ More We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of the two vectors in the dot product. By exploiting normal-like weight and data distributions typically present in DNNs, TR has a minimal impact on DNN model performance (i.e., accuracy or perplexity). We use TR to facilitate tightly synchronized processor arrays, such as systolic arrays, for efficient parallel processing. We show an FPGA implementation that can use a small number of control bits to switch between conventional quantization and TR-enabled quantization with a negligible delay. To enhance TR efficiency further, we use a signed digit representation (SDR), as opposed to classic binary encoding with only nonnegative power-of-two terms. To perform conversion from binary to SDR, we develop an efficient encoding method called HESE (Hybrid Encoding for Signed Expressions) that can be performed in one pass looking at only two bits at a time. We evaluate TR with HESE encoded values on an MLP for MNIST, multiple CNNs for ImageNet, and an LSTM for Wikitext-2, and show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same level of model performance. △ Less

Submitted 26 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: 13 pages, 19 figures, 4 tables, To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020 Update: Revised writing/figures and added more references for Section IV Update: Revised Section IV writing/figures and added additional references on signed digit representations

arXiv:1907.08377 [pdf, other]

DaiMoN: A Decentralized Artificial Intelligence Model Network

Authors: Surat Teerapittayanon, H. T. Kung

Abstract: We introduce DaiMoN, a decentralized artificial intelligence model network, which incentivizes peer collaboration in improving the accuracy of machine learning models for a given classification problem. It is an autonomous network where peers may submit models with improved accuracy and other peers may verify the accuracy improvement. The system maintains an append-only decentralized ledger to kee… ▽ More We introduce DaiMoN, a decentralized artificial intelligence model network, which incentivizes peer collaboration in improving the accuracy of machine learning models for a given classification problem. It is an autonomous network where peers may submit models with improved accuracy and other peers may verify the accuracy improvement. The system maintains an append-only decentralized ledger to keep the log of critical information, including who has trained the model and improved its accuracy, when it has been improved, by how much it has improved, and where to find the newly updated model. DaiMoN rewards these contributing peers with cryptographic tokens. A main feature of DaiMoN is that it allows peers to verify the accuracy improvement of submitted models without knowing the test labels. This is an essential component in order to mitigate intentional model overfitting by model-improving peers. To enable this model accuracy evaluation with hidden test labels, DaiMoN uses a novel learnable Distance Embedding for Labels (DEL) function proposed in this paper. Specific to each test dataset, DEL scrambles the test label vector by embedding it in a low-dimension space while approximately preserving the distance between the dataset's test label vector and a label vector inferred by the classifier. It therefore allows proof-of-improvement (PoI) by peers without providing them access to true test labels. We provide analysis and empirical evidence that under DEL, peers can accurately assess model accuracy. We also argue that it is hard to invert the embedding function and thus, DEL is resilient against attacks aiming to recover test labels in order to cheat. Our prototype implementation of DaiMoN is available at https://github.com/steerapi/daimon. △ Less

Submitted 19 July, 2019; originally announced July 2019.

Comments: 2019 IEEE International Conference on Blockchain

arXiv:1906.07148 [pdf, other]

CheckNet: Secure Inference on Untrusted Devices

Authors: Marcus Comiter, Surat Teerapittayanon, H. T. Kung

Abstract: We introduce CheckNet, a method for secure inference with deep neural networks on untrusted devices. CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker. CheckNet is completely transpa… ▽ More We introduce CheckNet, a method for secure inference with deep neural networks on untrusted devices. CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker. CheckNet is completely transparent to the third party running the computation, applicable to all types of neural networks, does not require specialized hardware, adds little overhead, and has negligible impact on model performance. CheckNet can be configured to provide different levels of security depending on application needs and compute/communication budgets. We present both empirical and theoretical validation of CheckNet on multiple popular deep neural network models, showing excellent attack detection (0.88-0.99 AUC) and attack success bounds. △ Less

Submitted 17 June, 2019; originally announced June 2019.

arXiv:1905.00462 [pdf, other]

Full-stack Optimization for Accelerating CNNs with FPGA Validation

Authors: Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

Abstract: We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference la… ▽ More We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference latency, energy efficiency, hardware utilization and inference accuracy. As a validation vehicle, we have implemented a 170MHz FPGA inference chip achieving 2.28ms latency for the ImageNet benchmark. The achieved latency is among the lowest reported in the literature while achieving comparable accuracy. However, our chip shines in that it has 9x higher energy efficiency compared to other implementations achieving comparable latency. A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware. For instance, compared to a FPGA implementation for a traditional 8-bit MAC, SAC substantially reduces required hardware resources (4.85x fewer Look-up Tables) and power consumption (2.48x). △ Less

Submitted 1 May, 2019; originally announced May 2019.

arXiv:1903.01999 [pdf, other]

doi 10.1073/pnas.1813514116

Observation of Chiral Surface Excitons in a Topological Insulator Bi$_2$Se$_3$

Authors: H. -H. Kung, A. P. Goyal, D. L. Maslov, X. Wang, A. Lee, A. F. Kemper, S. -W. Cheong, G. Blumberg

Abstract: The protected electron states at the boundaries or on the surfaces of topological insulators (TIs) have been the subject of intense theoretical and experimental investigations. Such states are enforced by very strong spin-orbit interaction in solids composed of heavy elements. Here, we study the composite particles -- chiral excitons -- formed by the Coulomb attraction between electrons and holes… ▽ More The protected electron states at the boundaries or on the surfaces of topological insulators (TIs) have been the subject of intense theoretical and experimental investigations. Such states are enforced by very strong spin-orbit interaction in solids composed of heavy elements. Here, we study the composite particles -- chiral excitons -- formed by the Coulomb attraction between electrons and holes residing on the surface of an archetypical three-dimensional topological insulator (TI), Bi$_2$Se$_3$. Photoluminescence (PL) emission arising due to recombination of excitons in conventional semiconductors is usually unpolarized because of scattering by phonons and other degrees of freedom during exciton thermalization. On the contrary, we observe almost perfectly polarization-preserving PL emission from chiral excitons. We demonstrate that the chiral excitons can be optically oriented with circularly polarized light in a broad range of excitation energies, even when the latter deviate from the (apparent) optical band gap by hundreds of meVs, and that the orientation remains preserved even at room temperature. Based on the dependences of the PL spectra on the energy and polarization of incident photons, we propose that chiral excitons are made from massive holes and massless (Dirac) electrons, both with chiral spin textures enforced by strong spin-orbit coupling. A theoretical model based on such proposal describes quantitatively the experimental observations. The optical orientation of composite particles, the chiral excitons, emerges as a general result of strong spin-orbit coupling in a 2D electron system. Our findings can potentially expand applications of TIs in photonics and optoelectronics. △ Less

Submitted 5 March, 2019; originally announced March 2019.

Comments: 22 pages, 11 figures

Journal ref: Proceedings of the National Academy of Sciences Feb 2019, 116 (10) 4006-4011

arXiv:1902.09049 [pdf, other]

doi 10.1103/PhysRevMaterials.3.065003

Raman spectroscopy of $f$-electron metals: an example of CeB$_{6}$

Authors: Mai Ye, H. -H. Kung, Priscila F. S. Rosa, Eric D. Bauer, Zachary Fisk, G. Blumberg

Abstract: We performed an optical spectroscopy study of electronic and magnetic excitations for a rare-earth system with a single electron quasi-localized in the f-shell on an ion at high-symmetry crystallographic site in application to CeB$_{6}$ heavy-fermion metal. We carried out group-theoretical classification of the electronic crystal field (CF) transitions and assessed their coupling to light cross-se… ▽ More We performed an optical spectroscopy study of electronic and magnetic excitations for a rare-earth system with a single electron quasi-localized in the f-shell on an ion at high-symmetry crystallographic site in application to CeB$_{6}$ heavy-fermion metal. We carried out group-theoretical classification of the electronic crystal field (CF) transitions and assessed their coupling to light cross-sections for polarization resolved Raman scattering processes. We discuss applicability of symmetrized Raman susceptibility to studies of exotic charge and spin high multiplet ordering phases in f-electron systems. We study temperature effects on intra- and inter-multiplet CF transitions and also on the coupling between the CF excitations with the lattice vibrations. We acquired temperature dependence of the low-frequency polarization resolved Raman response and obtained the static Raman susceptibility for all Raman-allowed symmetry channels: A$_{1g}$, E$_{g}$, T$_{1g}$, and T$_{2g}$ of the cubic O$_{h}$ point group. We demonstrate that for CeB$_{6}$ system only T$_{1g}$-symmetry static Raman susceptibility shows an anomalous temperature dependence which is consistent with the magnetic susceptibility data measured by other techniques. This anomalous behavior in the T$_{1g}$-channel signifies the presence of long wavelength magnetic fluctuations, while the lack of susceptibility enhancement for all the remaining symmetry channels indicates that long wavelength charge quadrupole fluctuations at low-temperature are weak. △ Less

Submitted 23 June, 2019; v1 submitted 24 February, 2019; originally announced February 2019.

Journal ref: Phys. Rev. Materials 3, 065003 (2019)

arXiv:1812.05083 [pdf, other]

Adversarial Learning of Semantic Relevance in Text to Image Synthesis

Authors: Miriam Cha, Youngjune L. Gwon, H. T. Kung

Abstract: We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching… ▽ More We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching the text input. A key to our training methods is how to form positive and negative training examples with respect to the class label of a given image. Instead of selecting random training examples, we perform negative sampling based on the semantic distance from a positive example in the class. We evaluate our approach using the Oxford-102 flower dataset, adopting the inception score and multi-scale structural similarity index (MS-SSIM) metrics to assess discriminability and diversity of the generated images. The empirical results indicate greater diversity in the generated images, especially when we gradually select more negative training examples closer to a positive example in the semantic space. △ Less

Submitted 5 February, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

arXiv:1811.04770 [pdf, other]

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

Abstract: This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter ma… ▽ More This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter matrix. In combining columns, for each row, all filter weights but one with the largest magnitude are pruned. We retrain the remaining weights to preserve high accuracy. We demonstrate that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset (e.g., 10\% for CIFAR-10). We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators. We present analysis and empirical evidence on the superior performance of our column combining approach against prior arts under metrics such as energy efficiency (3x) and inference latency (12x). △ Less

Submitted 7 November, 2018; originally announced November 2018.

Comments: To appear in ASPLOS 2019

arXiv:1809.09467 [pdf, other]

doi 10.1103/PhysRevMaterials.3.053402

Intrinsic Insulating Ground State in Transition Metal Dichalcogenide TiSe2

Authors: Daniel J. Campbell, Chris Eckberg, Peter Y. Zavalij, Hsiang-Hsi Kung, Elia Razzoli, Matteo Michiardi, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Andrea Damascelli, Johnpierre Paglione

Abstract: The transition metal dichalcogenide TiSe$_2$ has received significant research attention over the past four decades. Different studies have presented ways to suppress the 200~K charge density wave transition, vary low temperature resistivity by several orders of magnitude, and stabilize magnetism or superconductivity. Here we give the results of a new synthesis technique whereby samples were grown… ▽ More The transition metal dichalcogenide TiSe$_2$ has received significant research attention over the past four decades. Different studies have presented ways to suppress the 200~K charge density wave transition, vary low temperature resistivity by several orders of magnitude, and stabilize magnetism or superconductivity. Here we give the results of a new synthesis technique whereby samples were grown in a high pressure environment with up to 180~bar of argon gas. Above 100~K, properties are nearly unchanged from previous reports, but a hysteretic resistance region that begins around 80~K, accompanied by insulating low temperature behavior, is distinct from anything previously observed. An accompanying decrease in carrier concentration is seen in Hall effect measurements, and photoemission data show a removal of an electron pocket from the Fermi surface in an insulating sample. We conclude that high inert gas pressure synthesis accesses an underlying nonmetallic ground state in a material long speculated to be an excitonic insulator. △ Less

Submitted 17 February, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

Comments: 11 pages, 7 figures

Journal ref: Phys. Rev. Materials 3, 053402 (2019)

arXiv:1806.07467 [pdf, other]

HairNet: Single-View Hair Reconstruction using Convolutional Neural Networks

Authors: Yi Zhou, Liwen Hu, Jun Xing, Weikai Chen, Han-Wei Kung, Xin Tong, Hao Li

Abstract: We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image. Our method can recover local strand details and has real-time performance. State-of-the-art hair modeling techniques rely on large hairstyle collections for nearest neighbor retrieval and then perform ad-hoc refinement. Our deep learning approach, in contrast, is highly efficient in storage and… ▽ More We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image. Our method can recover local strand details and has real-time performance. State-of-the-art hair modeling techniques rely on large hairstyle collections for nearest neighbor retrieval and then perform ad-hoc refinement. Our deep learning approach, in contrast, is highly efficient in storage and can run 1000 times faster while generating hair with 30K strands. The convolutional neural network takes the 2D orientation field of a hair image as input and generates strand features that are evenly distributed on the parameterized 2D scalp. We introduce a collision loss to synthesize more plausible hairstyles, and the visibility of each strand is also used as a weight term to improve the reconstruction accuracy. The encoder-decoder architecture of our network naturally provides a compact and continuous representation for hairstyles, which allows us to interpolate naturally between hairstyles. We use a large set of rendered synthetic hair models to train our network. Our method scales to real images because an intermediate 2D orientation field, automatically calculated from the real image, factors out the difference between synthetic and real hairs. We demonstrate the effectiveness and robustness of our method on a wide range of challenging real Internet pictures and show reconstructed hair sequences from videos. △ Less

Submitted 10 July, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

Comments: 21 pages, 17 figures

arXiv:1802.03373 [pdf, other]

InferBeam: A Fast Beam Alignment Protocol for Millimeter-wave Networking

Authors: Sai Qian Zhang, H. T. Kung, Youngjune Gwon

Abstract: We introduce fast millimeter-wave base station (BS) and its antenna sector selection for user equipment based on its location. Using a conditional random field inference model with specially designed parameters, which are robust to change of environment, InferBeam allows the use of measurement samples on best beam selection at a small number of locations to infer the rest dynamically. Compared to… ▽ More We introduce fast millimeter-wave base station (BS) and its antenna sector selection for user equipment based on its location. Using a conditional random field inference model with specially designed parameters, which are robust to change of environment, InferBeam allows the use of measurement samples on best beam selection at a small number of locations to infer the rest dynamically. Compared to beam-sweeping based approaches in the literature, InferBeam can drastically reduce the setup cost for beam alignment for a new environment, and also the latency in acquiring a new beam under intermittent blockage. We have evaluated InferBeam using a discrete event simulation. Our results indicate that the system can make best beam selection for 98% of locations in test environments comprising smallsized apartment or office spaces, while sampling fewer than 1% of locations. InferBeam is a complete protocol for best beam inference that can be integrated into millimeter-wave standards for accelerating the much-needed fast and economic beam alignment capability. △ Less

Submitted 5 March, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

arXiv:1712.06066 [pdf, other]

On the origin of critical nematic fluctuations in pnictide superconductors

Authors: S. -F. Wu, W. -L. Zhang, L. Li, H. B. Cao, H. -H. Kung, A. S. Sefat, H. Ding, P. Richard, G. Blumberg

Abstract: We employ polarization-resolved Raman spectroscopy to study critical nematic fluctuations in Ba(Fe$_{1-x}$Au$_x$)$_2$As$_2$ superconductors above and across well separated tetragonal to orthorhombic phase transition at temperature $T_S(x)$ and the Néel transition at $T_N(x)$. The static Raman susceptibility in $XY$ symmetry channel increases upon cooling from room temperature following the Curie-W… ▽ More We employ polarization-resolved Raman spectroscopy to study critical nematic fluctuations in Ba(Fe$_{1-x}$Au$_x$)$_2$As$_2$ superconductors above and across well separated tetragonal to orthorhombic phase transition at temperature $T_S(x)$ and the Néel transition at $T_N(x)$. The static Raman susceptibility in $XY$ symmetry channel increases upon cooling from room temperature following the Curie-Weiss law, with Weiss temperature $T_θ(x)$ several tens of degrees lower than $T_S(x)$. Data reveals a hidden nematic quantum critical point at $x_{c} = 0.031$ when the system becomes superconducting, indicating a direct connection between quantum critical nematic fluctuations and unconventional superconductivity. We attribute the origin of the nematicity to charge quadrupole fluctuations due to electron transfer between the nearly degenerate $d_{xz}/d_{yz}$ orbitals. △ Less

Submitted 17 December, 2017; originally announced December 2017.

arXiv:1712.01903 [pdf, other]

doi 10.1103/PhysRevB.102.014501

Anomalous magneto-elastic coupling in Au-doped BaFe2As2

Authors: S. -F. Wu, W. -L. Zhang, L. Li, H. -B. Cao, H. -H. Kung, A. S. Sefat, H. Ding, P. Richard, G. Blumberg

Abstract: We used polarization-resolved Raman scattering to study magneto-elastic coupling in Ba(Fe$_{1-x}$Au$_{x}$)$_2$As$_2$ crystals as a function of light Au-doping, materials for which temperatures of the structural transition ($T_S$) and of the magnetic ordering transition ($T_N$) split. We study the appearance of the $A_g$(As)phonon intensity in the $XY$ scattering geometry that is very weak just bel… ▽ More We used polarization-resolved Raman scattering to study magneto-elastic coupling in Ba(Fe$_{1-x}$Au$_{x}$)$_2$As$_2$ crystals as a function of light Au-doping, materials for which temperatures of the structural transition ($T_S$) and of the magnetic ordering transition ($T_N$) split. We study the appearance of the $A_g$(As)phonon intensity in the $XY$ scattering geometry that is very weak just below $T_S$, but for which the intensity is significantly enhanced below $T_N$. In addition, the $A_g$(As) phonon shows an asymmetric line shape below $T_N$ and an anomalous linewidth broadening upon Au-doping in the magnetic phase. We demonstrate that the anomalous behavior of the $A_g$(As) phonon mode in the $XY$ scattering geometry can be consistently described by a Fano model involving the $A_g$(As) phonon mode interacting with the $B_{2g}$ symmetry-like magnetic continuum in which the magneto-elastic coupling constant is proportional to the magnetic order parameter. △ Less

Submitted 5 December, 2017; originally announced December 2017.

Journal ref: Phys. Rev. B 102, 014501 (2020)

arXiv:1710.07830 [pdf, other]

Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference

Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

Abstract: We propose the use of incomplete dot products (IDP) to dynamically adjust the number of input channels used in each layer of a convolutional neural network during feedforward inference. IDP adds monotonically non-increasing coefficients, referred to as a "profile", to the channels during training. The profile orders the contribution of each channel in non-increasing order. At inference time, the n… ▽ More We propose the use of incomplete dot products (IDP) to dynamically adjust the number of input channels used in each layer of a convolutional neural network during feedforward inference. IDP adds monotonically non-increasing coefficients, referred to as a "profile", to the channels during training. The profile orders the contribution of each channel in non-increasing order. At inference time, the number of channels used can be dynamically adjusted to trade off accuracy for lowered power consumption and reduced latency by selecting only a beginning subset of channels. This approach allows for a single network to dynamically scale over a computation range, as opposed to training and deploying multiple networks to support different levels of computation scaling. Additionally, we extend the notion to multiple profiles, each optimized for some specific range of computation scaling. We present experiments on the computation and accuracy trade-offs of IDP for popular image classification models and datasets. We demonstrate that, for MNIST and CIFAR-10, IDP reduces computation significantly, e.g., by 75%, without significantly compromising accuracy. We argue that IDP provides a convenient and effective means for devices to lower computation costs dynamically to reflect the current computation budget of the system. For example, VGG-16 with 50% IDP (using only the first 50% of channels) achieves 70% in accuracy on the CIFAR-10 dataset compared to the standard network which achieves only 35% accuracy when using the reduced channel set. △ Less

Submitted 21 October, 2017; originally announced October 2017.

arXiv:1709.02260 [pdf, other]

Embedded Binarized Neural Networks

Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

Abstract: We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. We focus on minimizing the required memory footprint, given that these devices often have memory as small as tens of kilobytes (KB). Beyond minimizing the memory required to store weights, as… ▽ More We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. We focus on minimizing the required memory footprint, given that these devices often have memory as small as tens of kilobytes (KB). Beyond minimizing the memory required to store weights, as in a BNN, we show that it is essential to minimize the memory used for temporaries which hold intermediate results between layers in feedforward inference. To accomplish this, eBNN reorders the computation of inference while preserving the original BNN structure, and uses just a single floating-point temporary for the entire neural network. All intermediate results from a layer are stored as binary values, as opposed to floating-points used in current BNN implementations, leading to a 32x reduction in required temporary space. We provide empirical evidence that our proposed eBNN approach allows efficient inference (10s of ms) on devices with severely limited memory (10s of KB). For example, eBNN achieves 95\% accuracy on the MNIST dataset running on an Intel Curie with only 15 KB of usable memory with an inference runtime of under 50 ms per sample. To ease the development of applications in embedded contexts, we make our source code available that allows users to train and discover eBNN models for a learning task at hand, which fit within the memory constraint of the target device. △ Less

Submitted 6 September, 2017; originally announced September 2017.

arXiv:1709.01921 [pdf, other]

Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

Abstract: We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed c… ▽ More We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed computing hierarchy, a DDNN can scale up in neural network size and scale out in geographical span. Due to its distributed nature, DDNNs enhance sensor fusion, system fault tolerance and data privacy for DNN applications. In implementing a DDNN, we map sections of a DNN onto a distributed computing hierarchy. By jointly training these sections, we minimize communication and resource usage for devices and maximize usefulness of extracted features which are utilized in the cloud. The resulting system has built-in support for automatic sensor fusion and fault tolerance. As a proof of concept, we show a DDNN can exploit geographical diversity of sensors to improve object recognition accuracy and reduce communication cost. In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x. △ Less

Submitted 6 September, 2017; originally announced September 2017.

arXiv:1709.01888 [pdf, other]

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

Abstract: We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression… ▽ More We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression. Also, by representing features in terms of histograms, our approach can naturally address documents of varying lengths. An empirical evaluation using the Common Core Standards corpus reveals that the features formed on our clustering-based language model significantly improve the previously known results for the same corpus in readability prediction. We also evaluate the task of sentence matching based on semantic relatedness using the Wiki-SimpleWiki corpus and find that our features lead to superior matching performance. △ Less

Submitted 4 September, 2017; originally announced September 2017.

arXiv:1709.01686 [pdf, other]

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

Abstract: Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real… ▽ More Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real-time and energy-sensitive applications. To address this issue, we present BranchyNet, a novel deep network architecture that is augmented with additional side branch classifiers. The architecture allows prediction results for a large portion of test samples to exit the network early via these branches when samples can already be inferred with high confidence. BranchyNet exploits the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points. For more difficult samples, which are expected less frequently, BranchyNet will use further or all network layers to provide the best likelihood of correct prediction. We study the BranchyNet architecture using several well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) and show that it can both improve accuracy and significantly reduce the inference time of the network. △ Less

Submitted 6 September, 2017; originally announced September 2017.

arXiv:1708.09321 [pdf, other]

Adversarial nets with perceptual losses for text-to-image synthesis

Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

Abstract: Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text. Despite the overall fair quality, the generated images often expose visible flaws that lack structural definition for an object of interest. In this paper, we aim to extend state of the art for GAN-based text-to-image synthesis by improving perceptual quality of generate… ▽ More Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text. Despite the overall fair quality, the generated images often expose visible flaws that lack structural definition for an object of interest. In this paper, we aim to extend state of the art for GAN-based text-to-image synthesis by improving perceptual quality of generated images. Differentiated from previous work, our synthetic image generator optimizes on perceptual loss functions that measure pixel, feature activation, and texture differences against a natural image. We present visually more compelling synthetic images of birds and flowers generated from text descriptions in comparison to some of the most prominent existing work. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1706.05776 [pdf, other]

doi 10.1103/PhysRevLett.119.136802

Chiral Spin Mode on the Surface of a Topological Insulator

Authors: H. -H. Kung, S. Maiti, X. Wang, S. -W. Cheong, D. L. Maslov, G. Blumberg

Abstract: Using polarization-resolved resonant Raman spectroscopy, we explore collective spin excitations of the chiral surface states in a three dimensional topological insulator, Bi$_2$Se$_3$. We observe a sharp peak at 150 meV in the pseudovector $A_2$ symmetry channel of the Raman spectra. By comparing the data with calculations, we identify this peak as the transverse collective spin mode of surface Di… ▽ More Using polarization-resolved resonant Raman spectroscopy, we explore collective spin excitations of the chiral surface states in a three dimensional topological insulator, Bi$_2$Se$_3$. We observe a sharp peak at 150 meV in the pseudovector $A_2$ symmetry channel of the Raman spectra. By comparing the data with calculations, we identify this peak as the transverse collective spin mode of surface Dirac fermions. This mode, unlike a Dirac plasmon or a surface plasmon in the charge sector of excitations, is analogous to a spin wave in a partially polarized Fermi liquid, with spin-orbit coupling playing the role of an effective magnetic field. △ Less

Submitted 31 August, 2017; v1 submitted 19 June, 2017; originally announced June 2017.

Comments: 9 pages, 7 figures, accepted for publication in Phys. Rev. Lett

Journal ref: Phys. Rev. Lett. 119, 136802 (2017)

arXiv:1611.05926 [pdf, other]

doi 10.1103/PhysRevB.95.245406

Surface vibrational modes of the topological insulator Bi$_2$Se$_3$ observed by Raman spectroscopy

Authors: H. -H. Kung, M. Salehi, I. Boulares, A. F. Kemper, N. Koirala, M. Brahlek, P. Lošťák, C. Uher, R. Merlin, X. Wang, S. -W. Cheong, S. Oh, G. Blumberg

Abstract: We present polarization resolved Raman scattering study of surface vibration modes in the topological insulator Bi$_2$Se$_3$ single crystal and thick films. Besides the four Raman active bulk phonons, we observed four additional modes with much weaker intensity and slightly lower energy than the bulk counterparts. Using symmetry analysis, we assigned these additional modes to out-of-plane surface… ▽ More We present polarization resolved Raman scattering study of surface vibration modes in the topological insulator Bi$_2$Se$_3$ single crystal and thick films. Besides the four Raman active bulk phonons, we observed four additional modes with much weaker intensity and slightly lower energy than the bulk counterparts. Using symmetry analysis, we assigned these additional modes to out-of-plane surface phonons. Comparing with first principle calculations, we conclude that the appearance of these modes is due to $c$-axis lattice distortion and van der Waals gap expansion near the crystal surface. Two of the surface modes at 60 and 173 cm$^{-1}$ are associated with Raman active $A_{1g}$ bulk phonon modes, the other two at 136 and 158 cm$^{-1}$ are associated with infrared active bulk phonons with $A_{2u}$ symmetry. The latter become Raman allowed due to reduction of crystalline symmetry from $D_{3d}$ in the bulk to $C_{3v}$ on the crystal surface. In particular, the 158 cm$^{-1}$ surface phonon mode shows a Fano lineshape under resonant excitation, suggesting interference in the presence of electron-phonon coupling of the surface excitations. △ Less

Submitted 9 June, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

Comments: 11 pages, 5 figures and 3 tables

Journal ref: Phys. Rev. B 95, 245406 (2017)

arXiv:1608.01748 [pdf, other]

doi 10.1103/PhysRevLett.117.227601

Analogy between the "Hidden Order" and the Orbital Antiferromagnetism in URu$_{2-x}$Fe$_x$Si$_2$

Authors: H. -H. Kung, S. Ran, N. Kanchanavatee, V. Krapivin, A. Lee, J. A. Mydosh, K. Haule, M. B. Maple, G. Blumberg

Abstract: We study URu$_{2-x}$Fe$_x$Si$_2$, in which two types of staggered phases compete at low temperature as the iron concentration $x$ is varied: the nonmagnetic "hidden order" (HO) phase below the critical concentration $x_c$, and unconventional antiferromagnetic (AF) phase above $x_c$. By using polarization resolved Raman spectroscopy, we detect a collective mode of pseudovector-like $A_{2g}$ symmetr… ▽ More We study URu$_{2-x}$Fe$_x$Si$_2$, in which two types of staggered phases compete at low temperature as the iron concentration $x$ is varied: the nonmagnetic "hidden order" (HO) phase below the critical concentration $x_c$, and unconventional antiferromagnetic (AF) phase above $x_c$. By using polarization resolved Raman spectroscopy, we detect a collective mode of pseudovector-like $A_{2g}$ symmetry whose energy continuously evolves with increasing $x$; it monotonically decreases in the HO phase until it vanishes at $x=x_c$, and then reappears with increasing energy in the AF phase. The mode's evolution provides direct evidence for unified order parameter for both nonmagnetic and magnetic phases arising from the orbital degrees-of-freedom of the uranium-5$f$ electrons. △ Less

Submitted 28 November, 2016; v1 submitted 4 August, 2016; originally announced August 2016.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. Lett. 117, 227601 (2016)

arXiv:1607.06575 [pdf, other]

Collective excitations of dynamic Fermi surface deformations in BaFe$_2$(As$_{0.5}$P$_{0.5}$)$_2$

Authors: S. -F. Wu, W. -L. Zhang, D. Hu, H. -H. Kung, A. Lee, H. -C. Mao, P. -C. Dai, H. Ding, P. Richard, G. Blumberg

Abstract: We use electronic Raman scattering to study the low-energy excitations in BaFe$_2$(As$_{0.5}$P$_{0.5}$)$_2$ ($T_c \approx 16$ K) samples. In addition to a superconducting pair breaking peak (2$Δ=6.7$ meV) in the A$_{1g}$ channel with a linear tail towards zero energy, suggesting a nodal gap structure, we detect spectral features associated to Pomeranchuk oscillations in the A$_{1g}$, B$_{1g}$ and… ▽ More We use electronic Raman scattering to study the low-energy excitations in BaFe$_2$(As$_{0.5}$P$_{0.5}$)$_2$ ($T_c \approx 16$ K) samples. In addition to a superconducting pair breaking peak (2$Δ=6.7$ meV) in the A$_{1g}$ channel with a linear tail towards zero energy, suggesting a nodal gap structure, we detect spectral features associated to Pomeranchuk oscillations in the A$_{1g}$, B$_{1g}$ and B$_{2g}$ channels. We argue that the small Fermi energy of the system is an essential condition for these Pomeranchuk oscillations to be underdamped. The Pomeranchuk oscillations have the same frequencies in the B$_{1g}$ and B$_{2g}$ channels, which we explain by the mixing of these symmetries resulting from the removal of the $σ_v$ and $σ_v$ symmetry planes due to a large As/P disorder. Interestingly, we show that the temperature at which the peaks corresponding to the Pomeranchuk oscillations get underdamped is consistent with the non-Fermi liquid to Femi liquid crossover determined by transport, suggesting that the Pomeranchuk instability plays an important role in the low-energy physics of the Fe-based superconductors. △ Less

Submitted 22 July, 2016; originally announced July 2016.

Comments: 5 pages, 4 figures

arXiv:1605.05212 [pdf, other]

Multimodal Sparse Coding for Event Detection

Authors: Youngjune Gwon, William Campbell, Kevin Brady, Douglas Sturim, Miriam Cha, H. T. Kung

Abstract: Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as… ▽ More Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as GMM supervectors and sparse RBM. We report the cross-validated classification accuracy and mean average precision of the MED system trained on features learned from our unimodal and multimodal settings for a subset of the TRECVID MED 2014 dataset. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: Multimodal Machine Learning Workshop at NIPS 2015

arXiv:1601.02040 [pdf, other]

doi 10.1103/PhysRevLett.116.196401

Discovery of unconventional charge density wave at the surface of K0.9Mo6O17

Authors: Daixiang Mou, Aashish Sapkota, H. -H. Kung, Viktor Krapivin, Yun Wu, A. Kreyssig, Xingjiang Zhou, A. I. Goldman, G. Blumberg, Rebecca Flint, Adam Kaminski

Abstract: We use Angle Resolved Photoemission Spectroscopy (ARPES), Raman spectroscopy, Low Energy Electron Diffraction (LEED) and x-ray scattering to reveal an unusual electronically mediated charge density wave (CDW) in K0.9Mo6O17. Not only does K0.9Mo6O17 lack signatures of electron-phonon coupling, but it also hosts an extraordinary surface CDW, with TS CDW =220 K nearly twice that of the bulk CDW, TB C… ▽ More We use Angle Resolved Photoemission Spectroscopy (ARPES), Raman spectroscopy, Low Energy Electron Diffraction (LEED) and x-ray scattering to reveal an unusual electronically mediated charge density wave (CDW) in K0.9Mo6O17. Not only does K0.9Mo6O17 lack signatures of electron-phonon coupling, but it also hosts an extraordinary surface CDW, with TS CDW =220 K nearly twice that of the bulk CDW, TB CDW =115 K. While the bulk CDW has a BCS-like gap of 12 meV, the surface gap is ten times larger and well in the strong coupling regime. Strong coupling behavior combined with the absence of signatures of strong electron-phonon coupling indicates that the CDW is likely mediated by electronic interactions enhanced by low dimensionality. △ Less

Submitted 8 January, 2016; originally announced January 2016.

Comments: 9 pages, 6 figures

Journal ref: Phys. Rev. Lett. 116, 196401 (2016)

arXiv:1511.06238 [pdf, other]

Multimodal sparse representation learning and applications

Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

Abstract: Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between modalities. The framework can model relationships at a higher level by forcing the shared sparse representation. In particular, we propose the use of joint dictionary lea… ▽ More Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between modalities. The framework can model relationships at a higher level by forcing the shared sparse representation. In particular, we propose the use of joint dictionary learning technique for sparse coding and formulate the joint representation for concision, cross-modal representations (in case of a missing modality), and union of the cross-modal representations. Given the accelerated growth of multimodal data posted on the Web such as YouTube, Wikipedia, and Twitter, learning good multimodal features is becoming increasingly important. We show that the shared representations enabled by our framework substantially improve the classification performance under both unimodal and multimodal settings. We further show how deep architectures built on the proposed framework are effective for the case of highly nonlinear correlations between modalities. The effectiveness of our approach is demonstrated experimentally in image denoising, multimedia event detection and retrieval on the TRECVID dataset (audio-video), category classification on the Wikipedia dataset (image-text), and sentiment classification on PhotoTweet (image-text). △ Less

Submitted 2 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

arXiv:1410.6398 [pdf, other]

doi 10.1126/science.1259729

Chirality density wave of the 'hidden order' phase in URu$_2$Si$_2$

Authors: H. -H. Kung, R. E. Baumbach, E. D. Bauer, V. K. Thorsmølle, W. -L. Zhang, K. Haule, J. A. Mydosh, G. Blumberg

Abstract: A second-order phase transition is associated with emergence of an "order parameter" and a spontaneous symmetry breaking. For the heavy fermion superconductor URu$_2$Si$_2$, the symmetry of the order parameter associated with its ordered phase below 17.5 K has remained ambiguous despite 30 years of research, and hence is called "hidden order" (HO). Here we use polarization resolved Raman spectrosc… ▽ More A second-order phase transition is associated with emergence of an "order parameter" and a spontaneous symmetry breaking. For the heavy fermion superconductor URu$_2$Si$_2$, the symmetry of the order parameter associated with its ordered phase below 17.5 K has remained ambiguous despite 30 years of research, and hence is called "hidden order" (HO). Here we use polarization resolved Raman spectroscopy to specify the symmetry of the low energy excitations above and below the HO transition. These excitations involve transitions between interacting heavy uranium 5f orbitals, responsible for the broken symmetry in the HO phase. From the symmetry analysis of the collective mode, we determine that the HO parameter breaks local vertical and diagonal reflection symmetries at the uranium sites, resulting in crystal field states with distinct chiral properties, which order to a commensurate chirality density wave ground state. △ Less

Submitted 23 October, 2014; originally announced October 2014.

Journal ref: Science, Vol. 347 no. 6228 pp. 1339-1342 (20 March 2015)

arXiv:1212.2894 [pdf, other]

Reducing Reconciliation Communication Cost with Compressed Sensing

Authors: H. T. Kung, Chia-Mu Yu

Abstract: We consider a reconciliation problem, where two hosts wish to synchronize their respective sets. Efficient solutions for minimizing the communication cost between the two hosts have been previously proposed in the literature. However, they rely on prior knowledge about the size of the set differences between the two sets to be reconciled. In this paper, we propose a method which can achieve compar… ▽ More We consider a reconciliation problem, where two hosts wish to synchronize their respective sets. Efficient solutions for minimizing the communication cost between the two hosts have been previously proposed in the literature. However, they rely on prior knowledge about the size of the set differences between the two sets to be reconciled. In this paper, we propose a method which can achieve comparable efficiency without assuming this prior knowledge. Our method uses compressive sensing techniques which can leverage the expected sparsity in set differences. We study the performance of the method via theoretical analysis and numerical simulations. △ Less

Submitted 4 December, 2012; originally announced December 2012.

Comments: 4 pages, 2 figures

Showing 1–50 of 57 results for author: Kung, H