Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–26 of 26 results for author: Tung, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.03698  [pdf, other

    cs.LG cs.AI cs.DB

    Towards Controllable Time Series Generation

    Authors: Yifan Bao, Yihao Ang, Qiang Huang, Anthony K. H. Tung, Zhiyong Huang

    Abstract: Time Series Generation (TSG) has emerged as a pivotal technique in synthesizing data that accurately mirrors real-world time series, becoming indispensable in numerous applications. Despite significant advancements in TSG, its efficacy frequently hinges on having large training datasets. This dependency presents a substantial challenge in data-scarce scenarios, especially when dealing with rare or… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 14 pages, 13 figures, and 5 tables

  2. arXiv:2402.13858  [pdf, other

    cs.IR cs.DB cs.DS

    Diversity-Aware $k$-Maximum Inner Product Search Revisited

    Authors: Qiang Huang, Yanhao Wang, Yiqun Sun, Anthony K. H. Tung

    Abstract: The $k$-Maximum Inner Product Search ($k$MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing $k$MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 14 pages, 9 figures, and 5 tables

  3. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  4. arXiv:2310.04145  [pdf, other

    cs.LG cs.DB

    From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying

    Authors: Biao Wu, Qiang Huang, Anthony K. H. Tung

    Abstract: Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data during storage, transmission, and consumption, fewer studies have been developed to detect whether they are already leaked for model training with… ▽ More

    Submitted 17 April, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted and To Appear in VLDB 2024

  5. arXiv:2309.03755  [pdf, other

    cs.LG cs.AI cs.DB

    TSGBench: Time Series Generation Benchmark

    Authors: Yihao Ang, Qiang Huang, Yifan Bao, Anthony K. H. Tung, Zhiyong Huang

    Abstract: Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized sy… ▽ More

    Submitted 7 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted and to appear in VLDB 2024

  6. arXiv:2308.09031  [pdf, ps, other

    cs.IT cs.CR quant-ph

    New Properties of Intrinsic Information and Their Relation to Bound Secrecy

    Authors: Andrey Boris Khesin, Andrew Tung, Karthik Vedula

    Abstract: The secret-key rate measures the rate at which Alice and Bob can extract secret bits from sampling a joint probability distribution, unknown to an eavesdropper Eve. The secret-key rate has been bounded above by the intrinsic information and reduced intrinsic information. However, we prove that the reduced intrinsic information is 0 if and only if the intrinsic information is 0. This result implies… ▽ More

    Submitted 7 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 23 pages, 1 figure

  7. arXiv:2302.10626  [pdf, other

    cs.DB cs.CG cs.DS cs.IR

    Lightweight-Yet-Efficient: Revitalizing Ball-Tree for Point-to-Hyperplane Nearest Neighbor Search

    Authors: Qiang Huang, Anthony K. H. Tung

    Abstract: Finding the nearest neighbor to a hyperplane (or Point-to-Hyperplane Nearest Neighbor Search, simply P2HNNS) is a new and challenging problem with applications in many research domains. While existing state-of-the-art hashing schemes (e.g., NH and FH) are able to achieve sublinear time complexity without the assumption of the data being in a unit hypersphere, they require an asymmetric transformat… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE ICDE 2023

  8. arXiv:2211.12751  [pdf, other

    cs.IR cs.DB cs.DS cs.LG

    SAH: Shifting-aware Asymmetric Hashing for Reverse $k$-Maximum Inner Product Search

    Authors: Qiang Huang, Yanhao Wang, Anthony K. H. Tung

    Abstract: This paper investigates a new yet challenging problem called Reverse $k$-Maximum Inner Product Search (R$k$MIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of R$k$MIPS aims to find a set of user vectors whose inner products with the query vector are one of the $k$ largest among the query and item vectors. We propose the first subquadratic-time algor… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI 2023

  9. arXiv:2210.02589  [pdf, other

    cs.DC q-bio.GN

    Spot-on: A Checkpointing Framework for Fault-Tolerant Long-running Workloads on Cloud Spot Instances

    Authors: Ashley Tung, Haiyan Wang, Yue Li, Zhong Wang, Jingchao Sun

    Abstract: Spot instances offer a cost-effective solution for applications running in the cloud computing environment. However, it is challenging to run long-running jobs on spot instances because they are subject to unpredictable evictions. Here, we present Spot-on, a generic software framework that supports fault-tolerant long-running workloads on spot instances through checkpoint and restart. Spot-on leve… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: 3 pages, 3 figures, accepted to "Third International Symposium on Checkpointing for Supercomputing (SuperCheck-SC22) https://supercheck.lbl.gov/

  10. arXiv:2206.10326  [pdf, other

    cs.HC cs.AI cs.CV cs.DB cs.DC

    The Metaverse Data Deluge: What Can We Do About It?

    Authors: Beng Chin Ooi, Gang Chen, Mike Zheng Shou, Kian-Lee Tan, Anthony Tung, Xiaokui Xiao, James Wei Luen Yip, Meihui Zhang

    Abstract: In the Metaverse, the physical space and the virtual space co-exist, and interact simultaneously. While the physical space is virtually enhanced with information, the virtual space is continuously refreshed with real-time, real-world information. To allow users to process and manipulate information seamlessly between the real and digital spaces, novel technologies must be developed. These include… ▽ More

    Submitted 10 November, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

  11. arXiv:2202.11016  [pdf, other

    cs.DB

    DIOT: Detecting Implicit Obstacles from Trajectories

    Authors: Yifan Lei, Qiang Huang, Mohan Kankanhalli, Anthony Tung

    Abstract: In this paper, we study a new data mining problem of obstacle detection from trajectory data. Intuitively, given two kinds of trajectories, i.e., reference and query trajectories, the obstacle is a region such that most query trajectories need to bypass this region, whereas the reference trajectories can go through as usual. We introduce a density-based definition for the obstacle based on a new n… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: 19 pages, 6 figures, DASFAA 2022

  12. arXiv:2112.05251  [pdf, other

    cs.RO cs.AI cs.LG

    Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation

    Authors: Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín

    Abstract: In mobile manipulation (MM), robots can both navigate within and interact with their environment and are thus able to complete many more tasks than robots only capable of navigation or manipulation. In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks. Much prior work has shown that IL can train visuo-motor policies for either manipula… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: CoRL 2021

  13. arXiv:2108.09597  [pdf, other

    cs.CL cs.AI

    Hierarchical Summarization for Longform Spoken Dialog

    Authors: Daniel Li, Thomas Chen, Albert Tung, Lydia Chilton

    Abstract: Every day we are surrounded by spoken dialog. This medium delivers rich diverse streams of information auditorily; however, systematically understanding dialog can often be non-trivial. Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor, especially when compared to written prose. Furthermore, compared to understanding… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

  14. arXiv:2106.10515  [pdf, ps, other

    cs.DB cs.DC

    A Generic Distributed Clustering Framework for Massive Data

    Authors: Pingyi Luo, Qiang Huang, Anthony K. H. Tung

    Abstract: In this paper, we introduce a novel Generic distributEd clustEring frameworK (GEEK) beyond $k$-means clustering to process massive amounts of data. To deal with different data types, GEEK first converts data in the original feature space into a unified format of buckets; then, we design a new Seeding method based on simILar bucKets (SILK) to determine initial seeds. Compared with state-of-the-art… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: 11 pages, 7 figures

  15. arXiv:2101.12010  [pdf, other

    physics.soc-ph cs.CV cs.DB cs.LG eess.SY

    Modeling Spatial Nonstationarity via Deformable Convolutions for Deep Traffic Flow Prediction

    Authors: Wei Zeng, Chengqiao Lin, Kang Liu, Juncong Lin, Anthony K. H. Tung

    Abstract: Deep neural networks are being increasingly used for short-term traffic flow prediction, which can be generally categorized as convolutional (CNNs) or graph neural networks (GNNs). CNNs are preferable for region-wise traffic prediction by taking advantage of localized spatial correlations, whilst GNNs achieves better performance for graph-structured traffic data. When applied to region-wise traffi… ▽ More

    Submitted 7 October, 2021; v1 submitted 8 January, 2021; originally announced January 2021.

  16. arXiv:2012.06738  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Multi-Arm Manipulation Through Collaborative Teleoperation

    Authors: Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

    Comments: First two authors contributed equally

  17. arXiv:2007.03596  [pdf

    cs.CL cs.CY cs.LG

    An Emergency Medical Services Clinical Audit System driven by Named Entity Recognition from Deep Learning

    Authors: Wang Han, Wesley Yeung, Angeline Tung, Joey Tay Ai Meng, Davin Ryanputera, Feng Mengling, Shalini Arulanadam

    Abstract: Clinical performance audits are routinely performed in Emergency Medical Services (EMS) to ensure adherence to treatment protocols, to identify individual areas of weakness for remediation, and to discover systemic deficiencies to guide the development of the training syllabus. At present, these audits are performed by manual chart review which is time-consuming and laborious. In this paper, we pr… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

  18. arXiv:2006.08259  [pdf, other

    cs.LG stat.ML

    Robust Federated Recommendation System

    Authors: Chen Chen, Jingfeng Zhang, Anthony K. H. Tung, Mohan Kankanhalli, Gang Chen

    Abstract: Federated recommendation systems can provide good performance without collecting users' private data, making them attractive. However, they are susceptible to low-cost poisoning attacks that can degrade their performance. In this paper, we develop a novel federated recommendation technique that is robust against the poisoning attack where Byzantine clients prevail. We argue that the key to Byzanti… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

  19. arXiv:2004.05345  [pdf, ps, other

    cs.DB cs.DS

    Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

    Authors: Yifan Lei, Qiang Huang, Mohan Kankanhalli, Anthony K. H. Tung

    Abstract: Locality-Sensitive Hashing (LSH) is one of the most popular methods for $c$-Approximate Nearest Neighbor Search ($c$-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search framework (LCCS-LSH) with a theoretical guarantee. We introduce a novel concept of LCCS and a new data structure named Circular Shift Array (CSA) f… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

    Comments: 16 pages, 10 figures

  20. arXiv:2002.09919  [pdf, other

    cs.CL cs.AI

    Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

    Authors: Yixuan Tang, Hwee Tou Ng, Anthony K. H. Tung

    Abstract: Multi-hop question answering (QA) requires a model to retrieve and integrate information from different parts of a long text to answer a question. Humans answer this kind of complex questions via a divide-and-conquer approach. In this paper, we investigate whether top-performing models for multi-hop questions understand the underlying sub-questions like humans. We adopt a neural decomposition mode… ▽ More

    Submitted 26 January, 2021; v1 submitted 23 February, 2020; originally announced February 2020.

  21. arXiv:2001.06770  [pdf, other

    cs.DB

    Efficient Radial Pattern Keyword Search on Knowledge Graphs in Parallel

    Authors: Yueji Yang, Anthony K. H. Tung

    Abstract: Recently, keyword search on Knowledge Graphs (KGs) becomes popular. Typical keyword search approaches aim at finding a concise subgraph from a KG, which can reflect a close relationship among all input keywords. The connection paths between keywords are selected in a way that leads to a result subgraph with a better semantic score. However, such a result may not meet user information need because… ▽ More

    Submitted 18 January, 2020; originally announced January 2020.

  22. arXiv:1911.04052  [pdf, other

    cs.RO cs.HC cs.LG

    Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity

    Authors: Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

    Abstract: Large, richly annotated datasets have accelerated progress in fields such as computer vision and natural language processing, but replicating these successes in robotics has been challenging. While prior data collection methodologies such as self-supervision have resulted in large datasets, the data can have poor signal-to-noise ratio. By contrast, previous efforts to collect task demonstrations w… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

    Comments: Published at IROS 2019

  23. arXiv:1811.02790  [pdf, other

    cs.RO cs.AI cs.LG

    RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

    Authors: Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei

    Abstract: Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modest-sized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to ad… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: Published at the Conference on Robot Learning (CoRL) 2018

  24. arXiv:1603.08390  [pdf, ps, other

    cs.DB cs.CV cs.DC cs.DS

    A Generic Inverted Index Framework for Similarity Search on the GPU - Technical Report

    Authors: Jingbo Zhou, Qi Guo, H. V. Jagadish, Luboš Krčál, Siyuan Liu, Wenhao Luan, Anthony K. H. Tung, Yueji Yang, Yuxin Zheng

    Abstract: We propose a novel generic inverted index framework on the GPU (called GENIE), aiming to reduce the programming complexity of the GPU for parallel similarity search of different data types. Not every data type and similarity measure are supported by GENIE, but many popular ones are. We present the system design of GENIE, and demonstrate similarity search with GENIE on several data types along with… ▽ More

    Submitted 14 August, 2018; v1 submitted 28 March, 2016; originally announced March 2016.

    Comments: 18 pages, technical report for the ICDE 2018 paper

  25. arXiv:1601.00182  [pdf, ps, other

    cs.DB

    Cohort Query Processing

    Authors: Dawei Jiang, Qingchao Cai, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Anthony K. H. Tung

    Abstract: Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis, or finding unusual user behavioral trends, in these large tables of activity records. In a traditional database system, cohort analysis queries are both painful to specify and expensive to evaluate. We propose to extend database systems to support cohort analysis. We… ▽ More

    Submitted 4 May, 2016; v1 submitted 2 January, 2016; originally announced January 2016.

  26. arXiv:cs/0003072  [pdf, ps, other

    cs.DS cs.LG

    MOO: A Methodology for Online Optimization through Mining the Offline Optimum

    Authors: Jason W. H. Lee, Y. C. Tay, Anthony K. H. Tung

    Abstract: Ports, warehouses and courier services have to decide online how an arriving task is to be served in order that cost is minimized (or profit maximized). These operators have a wealth of historical data on task assignments; can these data be mined for knowledge or rules that can help the decision-making? MOO is a novel application of data mining to online optimization. The idea is to mine (logg… ▽ More

    Submitted 22 March, 2000; originally announced March 2000.

    Comments: 12 pages, 4 figures

    Report number: Research Report No. 743 ACM Class: F.2.2; H.2.8; F.1.2