Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 54 results for author: Tong, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  2. arXiv:2406.01276  [pdf, other

    cs.CL

    EduNLP: Towards a Unified and Modularized Library for Educational Resources

    Authors: Zhenya Huang, Yuting Ning, Longhu Qin, Shiwei Tong, Shangzi Xue, Tong Xiao, Xin Lin, Jiayu Liu, Qi Liu, Enhong Chen, Shijing Wang

    Abstract: Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridg… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.10292  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

    Authors: Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

    Abstract: Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  4. arXiv:2405.07437  [pdf, other

    cs.CL cs.AI

    Evaluation of Retrieval-Augmented Generation: A Survey

    Authors: Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu

    Abstract: Retrieval-Augmented Generation (RAG) has recently gained traction in natural language processing. Numerous studies and real-world applications are leveraging its ability to enhance generative models through external information retrieval. Evaluating these RAG systems, however, poses unique challenges due to their hybrid structure and reliance on dynamic knowledge sources. To better understand thes… ▽ More

    Submitted 3 July, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

  5. arXiv:2403.10953  [pdf, other

    cs.CV

    Ctrl123: Consistent Novel View Synthesis via Closed-Loop Transcription

    Authors: Hongxiang Zhao, Xili Dai, Jianan Wang, Shengbang Tong, Jingyuan Zhang, Weida Wang, Lei Zhang, Yi Ma

    Abstract: Large image diffusion models have demonstrated zero-shot capability in novel view synthesis (NVS). However, existing diffusion-based NVS methods struggle to generate novel views that are accurately consistent with the corresponding ground truth poses and appearances, even on the training set. This consequently limits the performance of downstream tasks, such as image-to-multiview generation and 3D… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  6. Automating psychological hypothesis generation with AI: when large language models meet causal graph

    Authors: Song Tong, Kai Mao, Zhen Huang, Yukun Zhao, Kaiping Peng

    Abstract: Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 pote… ▽ More

    Submitted 15 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  7. arXiv:2402.02065  [pdf, other

    cs.LG

    Training Implicit Networks for Image Deblurring using Jacobian-Free Backpropagation

    Authors: Linghai Liu, Shuaicheng Tong, Lisa Zhao

    Abstract: Recent efforts in applying implicit networks to solve inverse problems in imaging have achieved competitive or even superior results when compared to feedforward networks. These implicit networks only require constant memory during backpropagation, regardless of the number of layers. However, they are not necessarily easy to train. Gradient calculations are computationally expensive because they r… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  8. arXiv:2401.06209  [pdf, other

    cs.CV

    Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

    Authors: Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie

    Abstract: Is vision good enough for language? Recent advancements in multimodal models primarily stem from the powerful reasoning abilities of large language models (LLMs). However, the visual component typically depends only on the instance-level contrastive language-image pre-training (CLIP). Our research reveals that the visual capabilities in recent multimodal LLMs (MLLMs) still exhibit systematic short… ▽ More

    Submitted 25 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Project page: https://tsb0601.github.io/mmvp_blog/

  9. arXiv:2401.01519  [pdf

    cs.LG cs.AI

    Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review

    Authors: Luoma Ke, Song Tong, Peng Cheng, Kaiping Peng

    Abstract: This paper explores the frontiers of large language models (LLMs) in psychology applications. Psychology has undergone several theoretical changes, and the current use of Artificial Intelligence (AI) and Machine Learning, particularly LLMs, promises to open up new research directions. We provide a detailed exploration of how LLMs like ChatGPT are transforming psychological research. It discusses t… ▽ More

    Submitted 16 March, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  10. arXiv:2311.13110  [pdf, other

    cs.LG cs.CL cs.CV

    White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

    Authors: Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma

    Abstract: In this paper, we contend that a natural objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a low-dimensional Gaussian mixture supported on incoherent subspaces. The goodness of such a representation can be evaluated by a principled measure, called sparse rate reduction, that simultaneously maximizes the intrinsic information… ▽ More

    Submitted 24 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: This paper integrates the works arXiv:2306.01129 and arXiv:2308.16271 into a complete story. In this paper, we improve the writing and organization, and also add conceptual, empirical, and theoretical improvements over the previous work. V2: small typo fixes and formatting improvements

  11. arXiv:2309.16681  [pdf, other

    cs.IT cs.AI

    Alternate Learning based Sparse Semantic Communications for Visual Transmission

    Authors: Siyu Tong, Xiaoxue Yu, Rongpeng Li, Kun Lu, Zhifeng Zhao, Honggang Zhang

    Abstract: Semantic communication (SemCom) demonstrates strong superiority over conventional bit-level accurate transmission, by only attempting to recover the essential semantic information of data. In this paper, in order to tackle the non-differentiability of channels, we propose an alternate learning based SemCom system for visual transmission, named SparseSBC. Specially, SparseSBC leverages two separate… ▽ More

    Submitted 30 July, 2023; originally announced September 2023.

  12. arXiv:2309.10313  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating the Catastrophic Forgetting in Multimodal Large Language Models

    Authors: Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

    Abstract: Following the success of GPT4, there has been a surge in interest in multimodal large language model (MLLM) research. This line of research focuses on developing general-purpose LLMs through fine-tuning pre-trained LLMs and vision models. However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  13. arXiv:2309.09449  [pdf

    cs.DL

    Multi-Affiliated Authors Behave Differently across Fields and Host Country Preferences: A Comparison in G7 and BRICS

    Authors: Sichao Tong, Liying Yang

    Abstract: This paper study author simultaneously engaged in multiple affiliations based on bibliometric data covered in the Web of Science for the 2017-2021 period. Based on the affiliation information in publication records, we propose a general classification for multiple affiliations within-country or cross-country for analyzing authors' behavior in multiple affiliations and preferences of host countries… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  14. arXiv:2308.16271  [pdf, other

    cs.CV cs.LG

    Emergence of Segmentation with Minimalistic White-Box Transformers

    Authors: Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma

    Abstract: Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection. Previous works have shown that segmentation properties emerge in vision transformers (ViTs) trained using self-supervised methods such as DINO, but not in those trained on supervised classification tasks. In this study, we probe whether segmentatio… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Code: https://github.com/Ma-Lab-Berkeley/CRATE

  15. arXiv:2307.08556  [pdf, other

    stat.ML cs.LG eess.IV

    Machine-Learning-based Colorectal Tissue Classification via Acoustic Resolution Photoacoustic Microscopy

    Authors: Shangqing Tong, Peng Ge, Yanan Jiao, Zhaofu Ma, Ziye Li, Longhai Liu, Feng Gao, Xiaohui Du, Fei Gao

    Abstract: Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to impro… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  16. arXiv:2306.13843  [pdf, other

    cs.CV eess.IV

    Score-based Generative Models for Photoacoustic Image Reconstruction with Rotation Consistency Constraints

    Authors: Shangqing Tong, Hengrong Lan, Liming Nie, Jianwen Luo, Fei Gao

    Abstract: Photoacoustic tomography (PAT) is a newly emerged imaging modality which enables both high optical contrast and acoustic depth of penetration. Reconstructing images of photoacoustic tomography from limited amount of senser data is among one of the major challenges in photoacoustic imaging. Previous works based on deep learning were trained in supervised fashion, which directly map the input partia… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  17. arXiv:2306.12105  [pdf, other

    cs.LG cs.CL cs.SE

    Mass-Producing Failures of Multimodal Systems with Language Models

    Authors: Shengbang Tong, Erik Jones, Jacob Steinhardt

    Abstract: Deployed multimodal systems can fail in ways that evaluators did not anticipate. In order to find these failures before deployment, we introduce MultiMon, a system that automatically identifies systematic failures -- generalizable, natural-language descriptions of patterns of model failures. To uncover systematic failures, MultiMon scrapes a corpus for examples of erroneous agreement: inputs that… ▽ More

    Submitted 1 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Under Review

  18. arXiv:2306.05272  [pdf, other

    cs.CV cs.LG

    Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

    Authors: Tianzhe Chu, Shengbang Tong, Tianjiao Ding, Xili Dai, Benjamin David Haeffele, René Vidal, Yi Ma

    Abstract: The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem, still lacks an effective solution, particularly for large-scale datasets. In this paper, we propose a novel image clustering pipeline that leverages the powerful… ▽ More

    Submitted 26 April, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 23 pages, 14 figures

  19. arXiv:2306.01129  [pdf, other

    cs.LG

    White-Box Transformers via Sparse Rate Reduction

    Authors: Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin D. Haeffele, Yi Ma

    Abstract: In this paper, we contend that the objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a mixture of low-dimensional Gaussian distributions supported on incoherent subspaces. The quality of the final representation can be measured by a unified objective function called sparse rate reduction. From this perspective, popular deep… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 33 pages, 11 figures

  20. arXiv:2305.15685  [pdf, other

    cs.CL cs.AI

    RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

    Authors: Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, Lei Meng

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of s… ▽ More

    Submitted 19 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Journal ref: AAAI 2024

  21. arXiv:2305.14760  [pdf, other

    cs.CL

    Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

    Authors: Shoujie Tong, Heming Xia, Damai Dai, Runxin Xu, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui

    Abstract: Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of B… ▽ More

    Submitted 22 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings. Camera-ready version. Co-first authors with equal contributions

  22. arXiv:2304.03977  [pdf, other

    cs.CV cs.AI

    EMP-SSL: Towards Self-Supervised Learning in One Training Epoch

    Authors: Shengbang Tong, Yubei Chen, Yi Ma, Yann Lecun

    Abstract: Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each im… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  23. arXiv:2302.09347  [pdf, other

    cs.CV

    Closed-Loop Transcription via Convolutional Sparse Coding

    Authors: Xili Dai, Ke Chen, Shengbang Tong, Jingyuan Zhang, Xingjian Gao, Mingyang Li, Druv Pai, Yuexiang Zhai, XIaojun Yuan, Heung-Yeung Shum, Lionel M. Ni, Yi Ma

    Abstract: Autoencoding has achieved great empirical success as a framework for learning generative models for natural images. Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret, and the learned representations lack clear structure. In this work, we make the explicit assumption that the image distribution is generated from a multi-stage sparse deconvoluti… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: 20 pages

  24. arXiv:2302.04265  [pdf, other

    cs.LG cs.CV

    PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

    Authors: Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, Tommi Jaakkola

    Abstract: We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for $N$ dimensional data by embedding paths in $N{+}D$ dimensional space while still controlling the progression with a simple scalar norm of the $D$ additional variables. The new models reduce to PFGM wh… ▽ More

    Submitted 10 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: Code is available at https://github.com/Newbeeer/pfgmpp

  25. arXiv:2302.00670  [pdf, other

    cs.LG cs.CV

    Stable Target Field for Reduced Variance Score Estimation in Diffusion Models

    Authors: Yilun Xu, Shangyuan Tong, Tommi Jaakkola

    Abstract: Diffusion models generate samples by reversing a fixed forward diffusion process. Despite already providing impressive empirical results, these diffusion models algorithms can be further improved by reducing the variance of the training targets in their denoising score-matching objective. We argue that the source of such variance lies in the handling of intermediate noise-variance scales, where mu… ▽ More

    Submitted 17 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted by ICLR 2023. Code available at: https://github.com/Newbeeer/stf

  26. arXiv:2301.07558  [pdf, other

    cs.CL

    Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training

    Authors: Yuting Ning, Zhenya Huang, Xin Lin, Enhong Chen, Shiwei Tong, Zheng Gong, Shijin Wang

    Abstract: Understanding mathematical questions effectively is a crucial task, which can benefit many applications, such as difficulty estimation. Researchers have drawn much attention to designing pre-training models for question representations due to the scarcity of human annotations (e.g., labeling difficulty). However, unlike general free-format texts (e.g., user comments), mathematical questions are ge… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted by AAAI 2023

  27. arXiv:2301.01805  [pdf, other

    cs.LG cs.CV

    Unsupervised Manifold Linearizing and Clustering

    Authors: Tianjiao Ding, Shengbang Tong, Kwan Ho Ryan Chan, Xili Dai, Yi Ma, Benjamin D. Haeffele

    Abstract: We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunat… ▽ More

    Submitted 24 August, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

  28. arXiv:2210.16782  [pdf, other

    cs.CV

    Unsupervised Learning of Structured Representations via Closed-Loop Transcription

    Authors: Shengbang Tong, Xili Dai, Yubei Chen, Mingyang Li, Zengyi Li, Brent Yi, Yann LeCun, Yi Ma

    Abstract: This paper proposes an unsupervised method for learning a unified representation that serves both discriminative and generative purposes. While most existing unsupervised learning approaches focus on a representation for only one of these two goals, we show that a unified representation can enjoy the mutual benefits of having both. Such a representation is attainable by generalizing the recently p… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 17 pages

  29. arXiv:2210.12945  [pdf, other

    cs.CV

    Revisiting Sparse Convolutional Model for Visual Recognition

    Authors: Xili Dai, Mingyang Li, Pengyuan Zhai, Shengbang Tong, Xingjian Gao, Shao-Lun Huang, Zhihui Zhu, Chong You, Yi Ma

    Abstract: Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical inte… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 17 pages. Accepted by NeurIPS2022

  30. arXiv:2205.00633  [pdf, other

    cs.CL

    Robust Fine-tuning via Perturbation and Interpolation from In-batch Instances

    Authors: Shoujie Tong, Qingxiu Dong, Damai Dai, Yifan song, Tianyu Liu, Baobao Chang, Zhifang Sui

    Abstract: Fine-tuning pretrained language models (PLMs) on downstream tasks has become common practice in natural language processing. However, most of the PLMs are vulnerable, e.g., they are brittle under adversarial attacks or imbalanced data, which hinders the application of the PLMs on some downstream tasks, especially in safe-critical scenarios. In this paper, we propose a simple yet effective fine-tun… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: IJCAI-ECAI 2022 (the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence)

  31. arXiv:2203.08908  [pdf, other

    cs.LG

    Adversarial Support Alignment

    Authors: Shangyuan Tong, Timur Garipov, Yang Zhang, Shiyu Chang, Tommi S. Jaakkola

    Abstract: We study the problem of aligning the supports of distributions. Compared to the existing work on distribution alignment, support alignment does not require the densities to be matched. We propose symmetric support difference as a divergence measure to quantify the mismatch between supports. We show that select discriminators (e.g. discriminator trained for Jensen-Shannon divergence) are able to ma… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to ICLR 2022

  32. arXiv:2202.05411  [pdf, other

    cs.CV

    Incremental Learning of Structured Memory via Closed-Loop Transcription

    Authors: Shengbang Tong, Xili Dai, Ziyang Wu, Mingyang Li, Brent Yi, Yi Ma

    Abstract: This work proposes a minimal computational model for learning structured memories of multiple object classes in an incremental setting. Our approach is based on establishing a closed-loop transcription between the classes and a corresponding set of subspaces, known as a linear discriminative representation, in a low-dimensional feature space. Our method is simpler than existing approaches for incr… ▽ More

    Submitted 7 June, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: 20 pages

  33. arXiv:2112.06905  [pdf, other

    cs.CL

    GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

    Authors: Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu , et al. (2 additional authors not shown)

    Abstract: Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GL… ▽ More

    Submitted 1 August, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to ICML 2022

  34. Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

    Authors: Xili Dai, Shengbang Tong, Mingyang Li, Ziyang Wu, Michael Psenka, Kwan Ho Ryan Chan, Pengyuan Zhai, Yaodong Yu, Xiaojun Yuan, Heung Yeung Shum, Yi Ma

    Abstract: This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we… ▽ More

    Submitted 3 March, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: 41 pages

  35. Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

    Authors: Yun Yue, Yongchao Liu, Suo Tong, Minghao Li, Zhen Zhang, Chunyang Wen, Huanjun Bao, Lihong Gu, Jinjie Gu, Yixiang Mu

    Abstract: We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in… ▽ More

    Submitted 18 October, 2023; v1 submitted 30 July, 2021; originally announced July 2021.

    Comments: 24 pages. Published as a conference paper at ECML PKDD 2021. This version includes Appendix which was not included in the published version because of page limit

    Journal ref: Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part III

  36. arXiv:2105.07122  [pdf, other

    cs.CL

    Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues

    Authors: Qingxiu Dong, Ziwei Qin, Heming Xia, Tian Feng, Shoujie Tong, Haoran Meng, Lin Xu, Weidong Zhan, Sujian Li, Zhongyu Wei, Tianyu Liu, Zuifang Sui

    Abstract: It is a common practice for recent works in vision language cross-modal reasoning to adopt a binary or multi-choice classification formulation taking as input a set of source image(s) and textual query. In this work, we take a sober look at such an unconditional formulation in the sense that no prior knowledge is specified with respect to the source image(s). Inspired by the designs of both visual… ▽ More

    Submitted 17 March, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: ACL 2022 Main conference (Long Paper)

  37. arXiv:2012.05463  [pdf, other

    cs.CV cs.LG

    Investigating Bias in Image Classification using Model Explanations

    Authors: Schrasing Tong, Lalana Kagal

    Abstract: We evaluated whether model explanations could efficiently detect bias in image classification by highlighting discriminating features, thereby removing the reliance on sensitive attributes for fairness calculations. To this end, we formulated important characteristics for bias detection and observed how explanations change as the degree of bias in models change. The paper identifies strengths and… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

  38. arXiv:2010.03466  [pdf, ps, other

    eess.AS cs.SD

    Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

    Authors: Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard

    Abstract: We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch in designing model architectures. It exposes the LF-MMI cost function as an autograd function. Other capabilities of Kaldi have also been ported to Py… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

  39. arXiv:2006.16915  [pdf, other

    cs.CY cs.AI cs.LG

    HGKT: Introducing Hierarchical Exercise Graph for Knowledge Tracing

    Authors: Hanshuang Tong, Zhen Wang, Yun Zhou, Shiwei Tong, Wenyuan Han, Qi Liu

    Abstract: Knowledge tracing (KT) which aims at predicting learner's knowledge mastery plays an important role in the computer-aided educational system. In recent years, many deep learning models have been applied to tackle the KT task, which have shown promising results. However, limitations still exist. Most existing methods simplify the exercising records as knowledge sequences, which fail to explore rich… ▽ More

    Submitted 29 August, 2022; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: 10 pages, 11 figures, accepted by SIGIR 2022

  40. arXiv:2006.05047  [pdf

    cs.DL

    Novel utilization of a paper-level classification system for the evaluation of journal impact: An update of the CAS Journal Ranking

    Authors: Sichao Tong, Fuyou Chen, Liying Yang, Zhesi Shen

    Abstract: Since its first release in 2004, the CAS Journal Ranking, a ranking system of journals based on a citation impact indicator, has been widely used both in selecting journals when submitting manuscripts and conducting research evaluation in China This paper introduces an upgraded version of the CAS Journal Ranking released in 2020 and the corresponding improvements. We will discuss the following imp… ▽ More

    Submitted 30 August, 2023; v1 submitted 9 June, 2020; originally announced June 2020.

  41. arXiv:2003.00353  [pdf, other

    cs.CL

    Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification

    Authors: Wei-Hung Weng, Yu-An Chung, Schrasing Tong

    Abstract: In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow. The ideal summarization strategy can preserve important information in the informative but less organized, ill-structured clinical narrative texts. Instead of using pure statistical learning approaches, which are difficult to interpret and explain, we utilized… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

  42. arXiv:2002.08621  [pdf, other

    cs.LG stat.ML

    The Benefits of Pairwise Discriminators for Adversarial Training

    Authors: Shangyuan Tong, Timur Garipov, Tommi Jaakkola

    Abstract: Adversarial training methods typically align distributions by solving two-player games. However, in most current formulations, even if the generator aligns perfectly with data, a sub-optimal discriminator can still drive the two apart. Absent additional regularization, the instability can manifest itself as a never-ending game. In this paper, we introduce a family of objectives by leveraging pairw… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

  43. arXiv:2001.06803  [pdf

    cs.DL physics.soc-ph

    The effect of national and international multiple affiliations on citation impact

    Authors: Sichao Tong, Ting Yue, Zhesi Shen, Liying Yang

    Abstract: Researchers affiliated with multiple institutions are increasingly seen in current scientific environment. In this paper we systematically analyze the multi-affiliated authorship and its effect on citation impact, with focus on the scientific output of research collaboration. By considering the nationality of each institutions, we further differentiate the national multi-affiliated authorship and… ▽ More

    Submitted 19 January, 2020; originally announced January 2020.

  44. A Bayesian Approach to Recurrence in Neural Networks

    Authors: Philip N. Garner, Sibo Tong

    Abstract: We begin by reiterating that common neural network activation functions have simple Bayesian origins. In this spirit, we go on to show that Bayes's theorem also implies a simple recurrence relation; this leads to a Bayesian recurrent unit with a prescribed feedback formulation. We show that introduction of a context indicator leads to a variable feedback that is similar to the forget mechanism in… ▽ More

    Submitted 20 April, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

  45. arXiv:1906.04287  [pdf, other

    cs.CL

    Chinese Embedding via Stroke and Glyph Information: A Dual-channel View

    Authors: Hanqing Tao, Shiwei Tong, Tong Xu, Qi Liu, Enhong Chen

    Abstract: Recent studies have consistently given positive hints that morphology is helpful in enriching word embeddings. In this paper, we argue that Chinese word embeddings can be substantially enriched by the morphological information hidden in characters which is reflected not only in strokes order sequentially, but also in character glyphs spatially. Then, we propose a novel Dual-channel Word Embedding… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  46. arXiv:1905.12470  [pdf, other

    cs.CY cs.LG stat.ML

    Exploiting Cognitive Structure for Adaptive Learning

    Authors: Qi Liu, Shiwei Tong, Chuanren Liu, Hongke Zhao, Enhong Chen, Haiping Ma, Shijin Wang

    Abstract: Adaptive learning, also known as adaptive teaching, relies on learning path recommendation, which sequentially recommends personalized learning items (e.g., lectures, exercises) to satisfy the unique needs of each learner. Although it is well known that modeling the cognitive structure including knowledge level of learners and knowledge structure (e.g., the prerequisite relations) of learning item… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

    Comments: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19)

  47. arXiv:1904.05780  [pdf, other

    cs.CL stat.ML

    Corpora Generation for Grammatical Error Correction

    Authors: Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong

    Abstract: Grammatical Error Correction (GEC) has been recently modeled using the sequence-to-sequence framework. However, unlike sequence transduction problems such as machine translation, GEC suffers from the lack of plentiful parallel data. We describe two approaches for generating large parallel datasets for GEC using publicly available Wikipedia data. The first method extracts source-target pairs from W… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL 2019. arXiv admin note: text overlap with arXiv:1811.01710

  48. arXiv:1903.03530  [pdf, other

    cs.CL

    Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring

    Authors: Zhengyuan Liu, Hazel Lim, Nur Farah Ain Binte Suhaimi, Shao Chuen Tong, Sharon Ong, Angela Ng, Sheldon Lee, Michael R. Macdonald, Savitha Ramasamy, Pavitra Krishnaswamy, Wai Leng Chow, Nancy F. Chen

    Abstract: Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversation… ▽ More

    Submitted 5 April, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

    Comments: 8 pages. To appear in NAACL 2019

  49. arXiv:1903.00197  [pdf

    q-bio.QM cs.LG stat.ML

    Outcome-Driven Clustering of Acute Coronary Syndrome Patients using Multi-Task Neural Network with Attention

    Authors: Eryu Xia, Xin Du, Jing Mei, Wen Sun, Suijun Tong, Zhiqing Kang, Jian Sheng, Jian Li, Changsheng Ma, Jianzeng Dong, Shaochun Li

    Abstract: Cluster analysis aims at separating patients into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. It is an important approach in data-driven disease classification and subtyping. Acute coronary syndrome (ACS) is a syndrome due to sudden decrease of coronary artery blood flow, where disease classification would help to inform therapeutic strategies an… ▽ More

    Submitted 27 March, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

  50. arXiv:1811.01307  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Towards Unsupervised Speech-to-Text Translation

    Authors: Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

    Abstract: We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language. As opposed to traditional cascaded systems and end-to-end architectures, our system does not require any labeled data (i.e., transcribed source audio or parallel source and t… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.