Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–10 of 10 results for author: Kamath, G M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2107.04202  [pdf, other

    cs.IT

    Sketching and Sequence Alignment: A Rate-Distortion Perspective

    Authors: Ilan Shomorony, Govinda M. Kamath

    Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. A standard approach to speed up this task is to compute "sketches" of the DNA reads (typically via hashing-based techniques) that allow the efficient computation of pairwise alignment scores. We propose a rate-distortion framework to study the problem of computing… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  2. arXiv:2104.09732  [pdf, other

    stat.ML cs.LG

    Knowledge Distillation as Semiparametric Inference

    Authors: Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

    Abstract: A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric in… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  3. arXiv:2011.04832  [pdf, other

    cs.LG cs.IT q-bio.GN stat.ML

    Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment

    Authors: Govinda M. Kamath, Tavor Z. Baharav, Ilan Shomorony

    Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. State-of-the-art approaches to speed up this task use hashing to identify short segments (k-mers) that are shared by pairs of reads, which can then be used to estimate alignment scores. However, when the number of reads is large, accurately estimating alignment sc… ▽ More

    Submitted 12 February, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020

  4. arXiv:1805.08321  [pdf, other

    cs.LG cs.DS cs.IT stat.CO stat.ML

    Bandit-Based Monte Carlo Optimization for Nearest Neighbors

    Authors: Vivek Bagaria, Tavor Z. Baharav, Govinda M. Kamath, David N. Tse

    Abstract: The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply th… ▽ More

    Submitted 28 April, 2021; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: Accepted to the IEEE Journal on Selected Areas in Information Theory (JSAIT) - Special Issue on Sequential, Active, and Reinforcement Learning

  5. arXiv:1711.00817  [pdf, other

    stat.ML cs.DS cs.IT cs.LG

    Medoids in almost linear time via multi-armed bandits

    Authors: Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos, Martin J. Zhang, David Tse

    Abstract: Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-pr… ▽ More

    Submitted 7 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

  6. arXiv:1605.01941  [pdf, other

    cs.IT q-bio.GN

    Partial DNA Assembly: A Rate-Distortion Perspective

    Authors: Ilan Shomorony, Govinda M. Kamath, Fei Xia, Thomas A. Courtade, David N. Tse

    Abstract: Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization o… ▽ More

    Submitted 6 May, 2016; originally announced May 2016.

    Comments: To be published at ISIT-2016. 11 pages, 10 figures

  7. arXiv:1502.01975  [pdf, other

    cs.IT cs.CE q-bio.GN stat.AP

    Optimal Haplotype Assembly from High-Throughput Mate-Pair Reads

    Authors: Govinda M. Kamath, Eren Şaşoğlu, David Tse

    Abstract: Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring… ▽ More

    Submitted 6 February, 2015; originally announced February 2015.

    Comments: 10 pages, 4 figures, Submitted to ISIT 2015

  8. arXiv:1302.0744  [pdf, other

    cs.IT

    Explicit MBR All-Symbol Locality Codes

    Authors: Govinda M. Kamath, Natalia Silberstein, N. Prakash, Ankit S. Rawat, V. Lalitha, O. Ozan Koyluoglu, P. Vijay Kumar, Sriram Vishwanath

    Abstract: Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; and codes with locality, which minimize the number of nodes participating in the repair process. This paper focuses on regenerating codes with locality, using pre-coding… ▽ More

    Submitted 27 May, 2013; v1 submitted 4 February, 2013; originally announced February 2013.

  9. arXiv:1211.1932  [pdf, other

    cs.IT

    Codes with Local Regeneration

    Authors: Govinda M. Kamath, N. Prakash, V. Lalitha, P. Vijay Kumar

    Abstract: Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. In… ▽ More

    Submitted 4 February, 2013; v1 submitted 8 November, 2012; originally announced November 2012.

    Comments: 44 pages, 7 figures. A class of codes termed as Uniform Rank Accumulation (URA) codes is introduced and a minimum distance bound is derived when the local codes are URA codes. Also, the results of our earlier arXiv submssion(arXiv:1202:2414[cs.IT]) are included in Section 3 of this version

  10. arXiv:1202.2414  [pdf, ps, other

    cs.IT

    Optimal Linear Codes with a Local-Error-Correction Property

    Authors: N. Prakash, Govinda M. Kamath, V. Lalitha, P. Vijay Kumar

    Abstract: Motivated by applications to distributed storage, Gopalan \textit{et al} recently introduced the interesting notion of information-symbol locality in a linear code. By this it is meant that each message symbol appears in a parity-check equation associated with small Hamming weight, thereby enabling recovery of the message symbol by examining a small number of other code symbols. This notion is exp… ▽ More

    Submitted 11 February, 2012; originally announced February 2012.

    Comments: 13 pages, Shorter version submitted to ISIT 2012