-
Speed-up of Data Analysis with Kernel Trick in Encrypted Domain
Authors:
Joon Soo Yoo,
Baek Kyung Song,
Tae Min Ahn,
Ji Won Heo,
Ji Won Yoon
Abstract:
Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc…
▽ More
Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performance in ML/STAT algorithms within encrypted domains. This technique, independent of underlying HE mechanisms and complementing existing optimizations, notably reduces costly HE multiplications, offering near constant time complexity relative to data dimension. Aimed at accessibility, this method is tailored for data scientists and developers with limited cryptography background, facilitating advanced data analysis in secure environments.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task
Authors:
Unggi Lee,
Jiyeong Bae,
Dohee Kim,
Sookbun Lee,
Jaekwon Park,
Taekyung Ahn,
Gunho Lee,
Damji Stratton,
Hyeoncheol Kim
Abstract:
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that…
▽ More
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.
△ Less
Submitted 9 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation
Authors:
Meher Niger,
Helya Goharbavang,
Taeyong Ahn,
Emily K. Alley,
Joshua D. Wythe,
Guoning Chen,
David Mayerich
Abstract:
Microvascular networks are challenging to model because these structures are currently near the diffraction limit for most advanced three-dimensional imaging modalities, including confocal and light sheet microscopy. This makes semantic segmentation difficult, because individual components of these networks fluctuate within the confines of individual pixels. Level set methods are ideally suited to…
▽ More
Microvascular networks are challenging to model because these structures are currently near the diffraction limit for most advanced three-dimensional imaging modalities, including confocal and light sheet microscopy. This makes semantic segmentation difficult, because individual components of these networks fluctuate within the confines of individual pixels. Level set methods are ideally suited to solve this problem by providing surface and topological constraints on the resulting model, however these active contour techniques are extremely time intensive and impractical for terabyte-scale images. We propose a reformulation and implementation of the region-scalable fitting (RSF) level set model that makes it amenable to three-dimensional evaluation using both single-instruction multiple data (SIMD) and single-program multiple-data (SPMD) parallel processing. This enables evaluation of the level set equation on independent regions of the data set using graphics processing units (GPUs), making large-scale segmentation of high-resolution networks practical and inexpensive.
We tested this 3D parallel RSF approach on multiple data sets acquired using state-of-the-art imaging techniques to acquire microvascular data, including micro-CT, light sheet fluorescence microscopy (LSFM) and milling microscopy. To assess the performance and accuracy of the RSF model, we conducted a Monte-Carlo-based validation technique to compare results to other segmentation methods. We also provide a rigorous profiling to show the gains in processing speed leveraging parallel hardware. This study showcases the practical application of the RSF model, emphasizing its utility in the challenging domain of segmenting large-scale high-topology network structures with a particular focus on building microvascular models.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
Authors:
Taekyung Ahn,
Yeonjung Hong,
Younggon Im,
Do Hyung Kim,
Dayoung Kang,
Joo Won Jeong,
Jae Won Kim,
Min Jung Kim,
Ah-ra Cho,
Dae-Hyun Jang,
Hosung Nam
Abstract:
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit…
▽ More
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
mdTLS: How to make middlebox-aware TLS more efficient?
Authors:
Taehyun Ahn,
Jiwon Kwak,
Seungjoo Kim
Abstract:
Recently, many organizations have been installing middleboxes in their networks in large numbers to provide various services to their customers. Although middleboxes have the advantage of not being dependent on specific hardware and being able to provide a variety of services, they can become a new attack target for hackers. Therefore, many researchers have proposed security-enchanced TLS protocol…
▽ More
Recently, many organizations have been installing middleboxes in their networks in large numbers to provide various services to their customers. Although middleboxes have the advantage of not being dependent on specific hardware and being able to provide a variety of services, they can become a new attack target for hackers. Therefore, many researchers have proposed security-enchanced TLS protocols, but their results have some limitations. In this paper, we proposed a middlebox-delegated TLS (mdTLS) protocol that not only achieves the same security level but also requires relatively less computation compared to recent research results. mdTLS is a TLS protocol designed based on the proxy signature scheme, which requires about 39% less computation than middlebox-aware TLS (maTLS), which is the best in security and performance among existing research results. In order to substantiate the enhanced security of mdTLS, we conducted a formal verification using the Tamarin. Our verification demonstrates that mdTLS not only satisfies the security properties set forth by maTLS but also complies with the essential security properties required for proxy signature scheme.
△ Less
Submitted 27 September, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control
Authors:
Taejoo Ahn,
Licong Lin,
Song Mei
Abstract:
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR), while concurrently identifying a greater number of relevant variables. Model-X methods, such as Knockoffs and conditional randomization tests, achieve the primary goal of finite-sample FDR control, assuming a known distribution of covariates.…
▽ More
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR), while concurrently identifying a greater number of relevant variables. Model-X methods, such as Knockoffs and conditional randomization tests, achieve the primary goal of finite-sample FDR control, assuming a known distribution of covariates. However, whether these methods can also achieve the secondary goal of maximizing discoveries remains uncertain. In fact, designing procedures to discover more relevant variables with finite-sample FDR control is a largely open question, even within the arguably simplest linear models.
In this paper, we develop near-optimal multiple testing procedures for high dimensional Bayesian linear models with isotropic covariates. We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified, and conjecturally achieve near-optimal power when the data follow the Bayesian linear model. Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled Conditional randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values (eBH). The optimality conjecture of PoEdCe is based on a heuristic calculation of its asymptotic true positive proportion (TPP) and false discovery proportion (FDP), which is supported by methods from statistical physics as well as extensive numerical simulations. Our result establishes the Bayesian linear model as a benchmark for comparing the power of various multiple testing procedures.
△ Less
Submitted 21 July, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Farthest-point Voronoi diagrams in the presence of rectangular obstacles
Authors:
Mincheol Kim,
Chanyang Seo,
Taehoon Ahn,
Hee-Kap Ahn
Abstract:
We present an algorithm to compute the geodesic $L_1$ farthest-point Voronoi diagram of $m$ point sites in the presence of $n$ rectangular obstacles in the plane. It takes $O(nm+n \log n + m\log m)$ construction time using $O(nm)$ space. This is the first optimal algorithm for constructing the farthest-point Voronoi diagram in the presence of obstacles. We can construct a data structure in the sam…
▽ More
We present an algorithm to compute the geodesic $L_1$ farthest-point Voronoi diagram of $m$ point sites in the presence of $n$ rectangular obstacles in the plane. It takes $O(nm+n \log n + m\log m)$ construction time using $O(nm)$ space. This is the first optimal algorithm for constructing the farthest-point Voronoi diagram in the presence of obstacles. We can construct a data structure in the same construction time and space that answers a farthest-neighbor query in $O(\log(n+m))$ time.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Implicit Simulation Methods for Stochastic Chemical Kinetics
Authors:
Tae-Hyuk Ahn,
Adrian Sandu,
Xiaoying Han
Abstract:
In biochemical systems some of the chemical species are present with only small numbers of molecules. In this situation discrete and stochastic simulation approaches are more relevant than continuous and deterministic ones. The fundamental Gillespie's stochastic simulation algorithm (SSA) accounts for every reaction event, which occurs with a probability determined by the configuration of the syst…
▽ More
In biochemical systems some of the chemical species are present with only small numbers of molecules. In this situation discrete and stochastic simulation approaches are more relevant than continuous and deterministic ones. The fundamental Gillespie's stochastic simulation algorithm (SSA) accounts for every reaction event, which occurs with a probability determined by the configuration of the system. This approach requires a considerable computational effort for models with many reaction channels and chemical species. In order to improve efficiency, tau-leaping methods represent multiple firings of each reaction during a simulation step by Poisson random variables. For stiff systems the mean of this variable is treated implicitly in order to ensure numerical stability.
This paper develops fully implicit tau-leaping-like algorithms that treat implicitly both the mean and the variance of the Poisson variables. The construction is based on adapting weakly convergent discretizations of stochastic differential equations to stochastic chemical kinetic systems. Theoretical analyses of accuracy and stability of the new methods are performed on a standard test problem. Numerical results demonstrate the performance of the proposed tau-leaping methods.
△ Less
Submitted 14 March, 2013;
originally announced March 2013.
-
Core-Periphery Segregation in Evolving Prisoner's Dilemma Networks
Authors:
Yunkyu Sohn,
Jung-Kyoo Choi,
T. K. Ahn
Abstract:
Dense cooperative networks are an essential element of social capital for a prosperous society. These networks enable individuals to overcome collective action dilemmas by enhancing trust. In many biological and social settings, network structures evolve endogenously as agents exit relationships and build new ones. However, the process by which evolutionary dynamics lead to self-organization of de…
▽ More
Dense cooperative networks are an essential element of social capital for a prosperous society. These networks enable individuals to overcome collective action dilemmas by enhancing trust. In many biological and social settings, network structures evolve endogenously as agents exit relationships and build new ones. However, the process by which evolutionary dynamics lead to self-organization of dense cooperative networks has not been explored. Our large group prisoner's dilemma experiments with exit and partner choice options show that core-periphery segregation of cooperators and defectors drives the emergence of cooperation. Cooperators' Quit-for-Tat and defectors' Roving strategy lead to a highly asymmetric core and periphery structure. Densely connected to each other, cooperators successfully isolate defectors and earn larger payoffs than defectors. Our analysis of the topological characteristics of evolving networks illuminates how social capital is generated.
△ Less
Submitted 9 December, 2012; v1 submitted 3 May, 2011;
originally announced May 2011.