-
Modeling Ambient Scene Dynamics for Free-view Synthesis
Authors:
Meng-Li Shih,
Jia-Bin Huang,
Changil Kim,
Rajvi Shah,
Johannes Kopf,
Chen Gao
Abstract:
We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require m…
▽ More
We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require multi-camera captures, and often fail to generalize to unseen motions, limiting their practical application. Our approach overcomes these constraints by leveraging the periodicity of ambient motions to learn the motion trajectory model, coupled with careful regularization. We also propose important practical strategies to improve the visual quality of the baseline 3DGS static reconstructions and to improve memory efficiency critical for GPU-memory intensive learning. We demonstrate high-quality photorealistic novel view synthesis of several ambient natural scenes with intricate textures and fine structural elements.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Authors:
Meng-Li Shih,
Wei-Chiu Ma,
Lorenzo Boyice,
Aleksander Holynski,
Forrester Cole,
Brian L. Curless,
Janne Kontkanen
Abstract:
We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reco…
▽ More
We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reconstructing those regions consistently with diffusion models. Our primary contributions include a visibility-aware diffusion-based inpainting module that is fine-tuned on the input imagery, yielding an initial NeRF with moderate quality (often blurry) inpainted regions, followed by a second diffusion model trained on the input imagery to consistently enhance, notably sharpen, the inpainted imagery from the first pass. We demonstrate high-quality results, extrapolating beyond a small number of (typically six or fewer) input views, effectively outpainting the NeRF as well as inpainting newly disoccluded regions inside the original viewing volume. We compare with related work both quantitatively and qualitatively and show significant gains over prior art.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
GSQA: An End-to-End Model for Generative Spoken Question Answering
Authors:
Min-Han Shih,
Ho-Lam Chung,
Yu-Chi Pai,
Ming-Hao Hsu,
Guan-Ting Lin,
Shang-Wen Li,
Hung-yi Lee
Abstract:
In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from t…
▽ More
In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA. Our code is available at https://voidful.github.io/GSQA
△ Less
Submitted 2 July, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Falsification of Internal and External Validity in Observational Studies via Conditional Moment Restrictions
Authors:
Zeshan Hussain,
Ming-Chieh Shih,
Michael Oberst,
Ilker Demirel,
David Sontag
Abstract:
Randomized Controlled Trials (RCT)s are relied upon to assess new treatments, but suffer from limited power to guide personalized treatment decisions. On the other hand, observational (i.e., non-experimental) studies have large and diverse populations, but are prone to various biases (e.g. residual confounding). To safely leverage the strengths of observational studies, we focus on the problem of…
▽ More
Randomized Controlled Trials (RCT)s are relied upon to assess new treatments, but suffer from limited power to guide personalized treatment decisions. On the other hand, observational (i.e., non-experimental) studies have large and diverse populations, but are prone to various biases (e.g. residual confounding). To safely leverage the strengths of observational studies, we focus on the problem of falsification, whereby RCTs are used to validate causal effect estimates learned from observational data. In particular, we show that, given data from both an RCT and an observational study, assumptions on internal and external validity have an observable, testable implication in the form of a set of Conditional Moment Restrictions (CMRs). Further, we show that expressing these CMRs with respect to the causal effect, or "causal contrast", as opposed to individual counterfactual means, provides a more reliable falsification test. In addition to giving guarantees on the asymptotic properties of our test, we demonstrate superior power and type I error of our approach on semi-synthetic and real world datasets. Our approach is interpretable, allowing a practitioner to visualize which subgroups in the population lead to falsification of an observational study.
△ Less
Submitted 6 March, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Systematic Analysis for Pretrained Language Model Priming for Parameter-Efficient Fine-tuning
Authors:
Shih-Cheng Huang,
Shih-Heng Wang,
Min-Han Shih,
Saurav Sahay,
Hung-yi Lee
Abstract:
Parameter-efficient (PE) methods (like Prompts or Adapters) for adapting pre-trained language models (PLM) to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization. To tackle these issues, we propose a general PE priming framework…
▽ More
Parameter-efficient (PE) methods (like Prompts or Adapters) for adapting pre-trained language models (PLM) to downstream tasks have been popular recently. However, hindrances still prevent these methods from reaching their full potential. For example, two significant challenges are few-shot adaptation and cross-task generalization. To tackle these issues, we propose a general PE priming framework to enhance and explore the few-shot adaptation and generalization ability of PE methods. In this framework, PLMs are primed with PE methods for rapidly adapting to various target tasks. To evaluate the generalization ability of these PE methods, we conduct experiments on a few-shot cross-domain benchmark containing 160 diverse NLP tasks. Our experiment not only reveals the best priming strategy but also verifies that priming facilitates the adaptation to target tasks.
△ Less
Submitted 30 May, 2024; v1 submitted 2 December, 2022;
originally announced December 2022.
-
Falsification before Extrapolation in Causal Effect Estimation
Authors:
Zeshan Hussain,
Michael Oberst,
Ming-Chieh Shih,
David Sontag
Abstract:
Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and lack data on broader populations of interest. Causal effects in these populations are often estimated using observational datasets, which may suffer from unobserved confounding and selection bias. Given a set of observational estimates (e.g. from multiple studies), w…
▽ More
Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and lack data on broader populations of interest. Causal effects in these populations are often estimated using observational datasets, which may suffer from unobserved confounding and selection bias. Given a set of observational estimates (e.g. from multiple studies), we propose a meta-algorithm that attempts to reject observational estimates that are biased. We do so using validation effects, causal effects that can be inferred from both RCT and observational data. After rejecting estimators that do not pass this test, we generate conservative confidence intervals on the extrapolated causal effects for subgroups not observed in the RCT. Under the assumption that at least one observational estimator is asymptotically normal and consistent for both the validation and extrapolated effects, we provide guarantees on the coverage probability of the intervals output by our algorithm. To facilitate hypothesis testing in settings where causal effect transportation across datasets is necessary, we give conditions under which a doubly-robust estimator of group average treatment effects is asymptotically normal, even when flexible machine learning methods are used for estimation of nuisance parameters. We illustrate the properties of our approach on semi-synthetic and real world datasets, and show that it compares favorably to standard meta-analysis techniques.
△ Less
Submitted 6 March, 2023; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Automated Fidelity Assessment for Strategy Training in Inpatient Rehabilitation using Natural Language Processing
Authors:
Hunter Osterhoudt,
Courtney E. Schneider,
Haneef A Mohammad,
Minmei Shih,
Alexandra E. Harper,
Leming Zhou,
Elizabeth R Skidmore,
Yanshan Wang
Abstract:
Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is…
▽ More
Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is used to measure adherence to treatment principles by examining guided and directed verbal cues in video recordings of rehabilitation sessions. Although the fidelity assessment for detecting guided and directed verbal cues is valid and feasible for single-site studies, it can become labor intensive, time consuming, and expensive in large, multi-site pragmatic trials. To address this challenge to widespread strategy training implementation, we leveraged natural language processing (NLP) techniques to automate the strategy training fidelity assessment, i.e., to automatically identify guided and directed verbal cues from video recordings of rehabilitation sessions. We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task. The best performance was achieved by the BERT model with a 0.8075 F1-score. This BERT model was verified on an external validation dataset collected from a separate major regional health system and achieved an F1 score of 0.8259, which shows that the BERT model generalizes well. The findings from this study hold widespread promise in psychology and rehabilitation intervention research and practice.
△ Less
Submitted 24 January, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
3D Photography using Context-aware Layered Depth Inpainting
Authors:
Meng-Li Shih,
Shih-Yang Su,
Johannes Kopf,
Jia-Bin Huang
Abstract:
We propose a method for converting a single RGB-D input image into a 3D photo - a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that synthesizes new local color…
▽ More
We propose a method for converting a single RGB-D input image into a 3D photo - a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines. We validate the effectiveness of our method on a wide range of challenging everyday scenes and show fewer artifacts compared with the state of the arts.
△ Less
Submitted 10 June, 2020; v1 submitted 9 April, 2020;
originally announced April 2020.
-
P2FAAS: Toward Privacy-Preserving Fuzzing as a Service
Authors:
Fan Sang,
Daehee Jang,
Ming-Wei Shih,
Taesoo Kim
Abstract:
Global corporations (e.g., Google and Microsoft) have recently introduced a new model of cloud services, fuzzing-as-a-service (FaaS). Despite effectively alleviating the cost of fuzzing, the model comes with privacy concerns. For example, the end user has to trust both cloud and service providers who have access to the application to be fuzzed. Such concerns are due to the platform is under the co…
▽ More
Global corporations (e.g., Google and Microsoft) have recently introduced a new model of cloud services, fuzzing-as-a-service (FaaS). Despite effectively alleviating the cost of fuzzing, the model comes with privacy concerns. For example, the end user has to trust both cloud and service providers who have access to the application to be fuzzed. Such concerns are due to the platform is under the control of its provider and the application and the fuzzer are highly coupled. In this paper, we propose P2FaaS, a new ecosystem that preserves end user's privacy while providing FaaS in the cloud. The key idea of P2FaaS is to utilize Intel SGX for preventing cloud and service providers from learning information about the application. Our preliminary evaluation shows that P2FaaS imposes 45% runtime overhead to the fuzzing compared to the baseline. In addition, P2FaaS demonstrates that, with recently introduced hardware, Intel SGX Card, the fuzzing service can be scaled up to multiple servers without native SGX support.
△ Less
Submitted 24 September, 2019;
originally announced September 2019.
-
Random Sampling for Group-By Queries
Authors:
Trong Duc Nguyen,
Ming-Hung Shih,
Sai Sree Parvathaneni,
Bojian Xu,
Divesh Srivastava,
Srikanta Tirthapura
Abstract:
Random sampling has been widely used in approximate query processing on large databases, due to its potential to significantly reduce resource usage and response times, at the cost of a small approximation error. We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after…
▽ More
Random sampling has been widely used in approximate query processing on large databases, due to its potential to significantly reduce resource usage and response times, at the cost of a small approximation error. We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after filtering through a predicate. The challenge with group-by queries is that a sampling method cannot focus on optimizing the quality of a single answer (e.g. the mean of selected data), but must simultaneously optimize the quality of a set of answers (one per group).
We present CVOPT, a query- and data-driven sampling framework for a set of group-by queries. To evaluate the quality of a sample, CVOPT defines a metric based on the norm (e.g. $\ell_2$ or $\ell_\infty$) of the coefficients of variation (CVs) of different answers, and constructs a stratified sample that provably optimizes the metric. CVOPT can handle group-by queries on data where groups have vastly different statistical characteristics, such as frequencies, means, or variances. CVOPT jointly optimizes for multiple aggregations and multiple group-by clauses, and provides a way to prioritize specific groups or aggregates. It can be tuned to cases when partial information about a query workload is known, such as a data warehouse where queries are scheduled to run periodically.
Our experimental results show that CVOPT outperforms current state-of-the-art on sample quality and estimation accuracy for group-by queries. On a set of queries on two real-world data sets, CVOPT yields relative errors that are 5x smaller than competing approaches, under the same space budget.
△ Less
Submitted 12 September, 2019; v1 submitted 5 September, 2019;
originally announced September 2019.
-
Self-Supervised Learning of Depth and Camera Motion from 360° Videos
Authors:
Fu-En Wang,
Hou-Ning Hu,
Hsien-Tzu Cheng,
Juan-Ting Lin,
Shang-Ta Yang,
Meng-Li Shih,
Hung-Kuo Chu,
Min Sun
Abstract:
As 360° cameras become prevalent in many autonomous systems (e.g., self-driving cars and drones), efficient 360° perception becomes more and more important. We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360° video. In particular, starting from the SfMLearner, which is designed for cameras with normal field-of-view, we introdu…
▽ More
As 360° cameras become prevalent in many autonomous systems (e.g., self-driving cars and drones), efficient 360° perception becomes more and more important. We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360° video. In particular, starting from the SfMLearner, which is designed for cameras with normal field-of-view, we introduce three key features to process 360° images efficiently. Firstly, we convert each image from equirectangular projection to cubic projection in order to avoid image distortion. In each network layer, we use Cube Padding (CP), which pads intermediate features from adjacent faces, to avoid image boundaries. Secondly, we propose a novel "spherical" photometric consistency constraint on the whole viewing sphere. In this way, no pixel will be projected outside the image boundary which typically happens in images with normal field-of-view. Finally, rather than naively estimating six independent camera motions (i.e., naively applying SfM-Learner to each face on a cube), we propose a novel camera pose consistency loss to ensure the estimated camera motions reaching consensus. To train and evaluate our approach, we collect a new PanoSUNCG dataset containing a large amount of 360° videos with groundtruth depth and camera motion. Our approach achieves state-of-the-art depth prediction and camera motion estimation on PanoSUNCG with faster inference speed comparing to equirectangular. In real-world indoor videos, our approach can also achieve qualitatively reasonable depth prediction by acquiring model pre-trained on PanoSUNCG.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.
-
SPX: Preserving End-to-End Security for Edge Computing
Authors:
Ketan Bhardwaj,
Ming-Wei Shih,
Ada Gavrilovska,
Taesoo Kim,
Chengyu Song
Abstract:
Beyond point solutions, the vision of edge computing is to enable web services to deploy their edge functions in a multi-tenant infrastructure present at the edge of mobile networks. However, edge functions can be rendered useless because of one critical issue: Web services are delivered over end-to-end encrypted connections, so edge functions cannot operate on encrypted traffic without compromisi…
▽ More
Beyond point solutions, the vision of edge computing is to enable web services to deploy their edge functions in a multi-tenant infrastructure present at the edge of mobile networks. However, edge functions can be rendered useless because of one critical issue: Web services are delivered over end-to-end encrypted connections, so edge functions cannot operate on encrypted traffic without compromising security or degrading performance. Any solution to this problem must interoperate with existing protocols like TLS, as well as with new emerging security protocols for client and IoT devices. The edge functions must remain invisible to client-side endpoints but may require explicit control from their service-side web services. Finally, a solution must operate within overhead margins which do not obviate the benefits of the edge.
To address this problem, this paper presents SPX - a solution for edge-ready and end-to-end secure protocol extensions, which can efficiently maintain end-to-edge-to-end ($E^3$) security semantics. Using our SPX prototype, we allow edge functions to operate on encrypted traffic, while ensuring that security semantics of secure protocols still hold. SPX uses Intel SGX to bind the communication channel with remote attestation and to provide a solution that not only defends against potential attacks but also results in low performance overheads, and neither mandates any changes on the end-user side nor breaks interoperability with existing protocols.
△ Less
Submitted 24 September, 2018;
originally announced September 2018.
-
Variance-Optimal Offline and Streaming Stratified Random Sampling
Authors:
Trong Duc Nguyen,
Ming-Hung Shih,
Divesh Srivastava,
Srikanta Tirthapura,
Bojian Xu
Abstract:
Stratified random sampling (SRS) is a fundamental sampling technique that provides accurate estimates for aggregate queries using a small size sample, and has been used widely for approximate query processing. A key question in SRS is how to partition a target sample size among different strata. While Neyman allocation provides a solution that minimizes the variance of an estimate using this sampl…
▽ More
Stratified random sampling (SRS) is a fundamental sampling technique that provides accurate estimates for aggregate queries using a small size sample, and has been used widely for approximate query processing. A key question in SRS is how to partition a target sample size among different strata. While Neyman allocation provides a solution that minimizes the variance of an estimate using this sample, it works under the assumption that each stratum is abundant, i.e., has a large number of data points to choose from. This assumption may not hold in general: one or more strata may be bounded, and may not contain a large number of data points, even though the total data size may be large.
We first present VOILA, an offline method for allocating sample sizes to strata in a variance-optimal manner, even for the case when one or more strata may be bounded. We next consider SRS on streaming data that are continuously arriving. We show a lower bound, that any streaming algorithm for SRS must have (in the worst case) a variance that is Ω(r) factor away from the optimal, where r is the number of strata. We present S-VOILA, a practical streaming algorithm for SRS that is locally variance-optimal in its allocation of sample sizes to different strata. Our result from experiments on real and synthetic data show that VOILA can have significantly (1.4 to 50.0 times) smaller variance than Neyman allocation. The streaming algorithm S-VOILA results in a variance that is typically close to VOILA, which was given the entire input beforehand.
△ Less
Submitted 20 February, 2018; v1 submitted 27 January, 2018;
originally announced January 2018.
-
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Authors:
Yen-Chen Lin,
Zhang-Wei Hong,
Yuan-Hong Liao,
Meng-Li Shih,
Ming-Yu Liu,
Min Sun
Abstract:
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps pre…
▽ More
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/
△ Less
Submitted 12 November, 2019; v1 submitted 7 March, 2017;
originally announced March 2017.
-
Navigable videos for presenting scientific data on head-mounted displays
Authors:
Jacqueline Chu,
Leonardo Ferrer,
Min Shih,
Kwan-Liu Ma
Abstract:
Immersive, stereoscopic viewing enables scientists to better analyze the spatial structures of visualized physical phenomena. However, their findings cannot be properly presented in traditional media, which lack these core attributes. Creating a presentation tool that captures this environment poses unique challenges, namely related to poor viewing accessibility. Immersive scientific renderings of…
▽ More
Immersive, stereoscopic viewing enables scientists to better analyze the spatial structures of visualized physical phenomena. However, their findings cannot be properly presented in traditional media, which lack these core attributes. Creating a presentation tool that captures this environment poses unique challenges, namely related to poor viewing accessibility. Immersive scientific renderings often require high-end equipment, which can be impractical to obtain. We address these challenges with our authoring tool and navigational interface, which is designed for affordable head-mounted displays. With the authoring tool, scientists can show salient data features as connected 360° video paths, resulting in a "choose-your-own-adventure" experience. Our navigational interface features bidirectional video playback for added viewing control when users traverse the tailor-made content. We evaluate our system's benefits by authoring case studies on several data sets and conducting a usability study on the navigational interface's design. In summary, our approach provides scientists an immersive medium to visually present their research to the intended audience--spanning from students to colleagues--on affordable virtual reality headsets.
△ Less
Submitted 27 November, 2016;
originally announced November 2016.
-
Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing
Authors:
Sangho Lee,
Ming-Wei Shih,
Prasun Gera,
Taesoo Kim,
Hyesoon Kim,
Marcus Peinado
Abstract:
In this paper, we explore a new, yet critical, side-channel attack against Intel Software Guard Extension (SGX), called a branch shadowing attack, which can reveal fine-grained control flows (i.e., each branch) of an enclave program running on real SGX hardware. The root cause of this attack is that Intel SGX does not clear the branch history when switching from enclave mode to non-enclave mode, l…
▽ More
In this paper, we explore a new, yet critical, side-channel attack against Intel Software Guard Extension (SGX), called a branch shadowing attack, which can reveal fine-grained control flows (i.e., each branch) of an enclave program running on real SGX hardware. The root cause of this attack is that Intel SGX does not clear the branch history when switching from enclave mode to non-enclave mode, leaving the fine-grained traces to the outside world through a branch-prediction side channel. However, exploiting the channel is not so straightforward in practice because 1) measuring branch prediction/misprediction penalties based on timing is too inaccurate to distinguish fine-grained control-flow changes and 2) it requires sophisticated control over the enclave execution to force its execution to the interesting code blocks. To overcome these challenges, we developed two novel exploitation techniques: 1) Intel PT- and LBR-based history-inferring techniques and 2) APIC-based technique to control the execution of enclave programs in a fine-grained manner. As a result, we could demonstrate our attack by breaking recent security constructs, including ORAM schemes, Sanctum, SGX-Shield, and T-SGX. Not limiting our work to the attack itself, we thoroughly studied the feasibility of hardware-based solutions (e.g., branch history clearing) and also proposed a software-based countermeasure, called Zigzagger, to mitigate the branch shadowing attack in practice.
△ Less
Submitted 1 June, 2017; v1 submitted 21 November, 2016;
originally announced November 2016.