Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 65 results for author: Qiao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13920  [pdf, other

    cs.LG cs.SI

    Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks

    Authors: Tao Wu, Canyixing Cui, Xingping Xian, Shaojie Qiao, Chao Wang, Lin Yuan, Shui Yu

    Abstract: Graph neural networks (GNNs) have achieved tremendous success, but recent studies have shown that GNNs are vulnerable to adversarial attacks, which significantly hinders their use in safety-critical scenarios. Therefore, the design of robust GNNs has attracted increasing attention. However, existing research has mainly been conducted via experimental trial and error, and thus far, there remains a… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.13499  [pdf, other

    cs.SI cs.LG

    GraphMU: Repairing Robustness of Graph Neural Networks via Machine Unlearning

    Authors: Tao Wu, Xinwen Cao, Chao Wang, Shaojie Qiao, Xingping Xian, Lin Yuan, Canyixing Cui, Yanbing Liu

    Abstract: Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2405.14205  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MA

    Agent Planning with World Knowledge Model

    Authors: Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

    Abstract: Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the ''real'' physical world. Imitating humans' m… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Work in progress

  4. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2405.01488  [pdf, other

    cs.LG stat.ML

    Digital Twin Generators for Disease Modeling

    Authors: Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D'Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova-Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky, Judy Viduya, Jonathan R. Walsh

    Abstract: A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2403.19651  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.MM

    MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

    Authors: Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

    Abstract: Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent works leverage text instructions to allow users to more freely express their search intents. However, they primarily focus on image pairs that are visually similar and/or can be characterized by a sm… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: ICML 2024 (Oral); Project Website: https://open-vision-language.github.io/MagicLens/

  7. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  8. arXiv:2403.03101  [pdf, other

    cs.CL cs.AI cs.HC cs.LG cs.MA

    KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

    Authors: Yuqi Zhu, Shuofei Qiao, Yixin Ou, Shumin Deng, Ningyu Zhang, Shiwei Lyu, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

    Abstract: Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories durin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Work in progress. Project page: https://zjunlp.github.io/project/KnowAgent/ Code: https://github.com/zjunlp/KnowAgent

  9. arXiv:2402.13840  [pdf, other

    cs.IR cs.AI

    LLM4SBR: A Lightweight and Effective Framework for Integrating Large Language Models in Session-based Recommendation

    Authors: Shutong Qiao, Chen Gao, Junhao Wen, Wei Zhou, Qun Luo, Peixuan Chen, Yong Li

    Abstract: Traditional session-based recommendation (SBR) utilizes session behavior sequences from anonymous users for recommendation. Although this strategy is highly efficient, it sacrifices the inherent semantic information of the items, making it difficult for the model to understand the true intent of the session and resulting in a lack of interpretability in the recommended results. Recently, large lan… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  10. arXiv:2402.03049  [pdf, other

    cs.CL cs.AI cs.HC cs.IR cs.LG

    EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models

    Authors: Yixin Ou, Ningyu Zhang, Honghao Gui, Ziwen Xu, Shuofei Qiao, Yida Xue, Runnan Fang, Kangwei Liu, Lei Li, Zhen Bi, Guozhou Zheng, Huajun Chen

    Abstract: In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist am… ▽ More

    Submitted 23 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ACL 2024 System Demonstrations; Project website: https://zjunlp.github.io/project/EasyInstruct Code: https://github.com/zjunlp/EasyInstruct Video: https://youtu.be/rfQOWYfziFo Demo: https://huggingface.co/spaces/zjunlp/EasyInstruct

  11. arXiv:2401.05268  [pdf, other

    cs.CL cs.AI cs.HC cs.LG cs.MA

    AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

    Authors: Shuofei Qiao, Ningyu Zhang, Runnan Fang, Yujie Luo, Wangchunshu Zhou, Yuchen Eleanor Jiang, Chengfei Lv, Huajun Chen

    Abstract: Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agen… ▽ More

    Submitted 26 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: ACL 2024

  12. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  13. arXiv:2312.02725  [pdf, other

    cs.CV

    R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction

    Authors: Chenhuan Li, Meihua Xiao, zehuan li, Fangping Chen, Shanshan Qiao, Dingli Wang, Mengxi Gao, Siyi Zhang

    Abstract: Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our k… ▽ More

    Submitted 6 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: being consider to patter recognition letters

  14. arXiv:2311.17072  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

    Authors: Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu

    Abstract: Generative training has been demonstrated to be powerful for building visual-language models. However, on zero-shot discriminative benchmarks, there is still a performance gap between models trained with generative and discriminative objectives. In this paper, we aim to narrow this gap by improving the efficacy of generative training on classification tasks, without any finetuning processes or add… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  15. arXiv:2311.05770  [pdf, other

    cs.CV

    PolyMaX: General Dense Prediction with Mask Transformer

    Authors: Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen

    Abstract: Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or regression (continuous outputs). This per-pixel prediction paradigm has remained popular due to the prevalence of fully convolutional networks. However, on the recent frontier of segmentation task, the community has been… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  16. arXiv:2311.00618  [pdf, other

    cs.CV

    De-Diffusion Makes Text a Strong Cross-Modal Interface

    Authors: Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu

    Abstract: We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is traine… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Technical report. Project page: https://dediffusion.github.io

  17. arXiv:2309.16889  [pdf, other

    cs.CV

    Superpixel Transformers for Efficient Semantic Segmentation

    Authors: Alex Zihao Zhu, Jieru Mei, Siyuan Qiao, Hang Yan, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar

    Abstract: Semantic segmentation, which aims to classify every pixel in an image, is a key task in machine perception, with many applications across robotics and autonomous driving. Due to the high dimensionality of this task, most existing approaches use local operations, such as convolutions, to generate per-pixel features. However, these methods are typically unable to effectively leverage global context… ▽ More

    Submitted 2 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures, 4 tables. Presented at IROS 2023. Equal contribution by A. Zhu and J. Mei

  18. arXiv:2307.13716  [pdf, other

    cs.LG cs.AI

    FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning

    Authors: Leiming Chen, Weishan Zhang, Cihao Dong, Sibo Qiao, Ziling Huang, Yuming Nie, Zhaoxiang Hou, Chee Wei Tan

    Abstract: Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition… ▽ More

    Submitted 19 March, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

  19. arXiv:2306.13557  [pdf

    cs.AR cs.CV cs.LG eess.IV

    FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition

    Authors: Shichen Qiao, Haining Qiu, Lingkai Zhao, Qikun Liu, Eric J. Hoffman

    Abstract: Machine Learning (ML) has recently been a skyrocketing field in Computer Science. As computer hardware engineers, we are enthusiastic about hardware implementations of popular software ML architectures to optimize their performance, reliability, and resource usage. In this project, we designed a highly-configurable, real-time device for recognizing handwritten letters and digits using an Altera DE… ▽ More

    Submitted 25 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: 27 pages, 13 figures

  20. arXiv:2305.13168  [pdf, other

    cs.CL cs.AI cs.DB cs.IR cs.LG

    LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

    Authors: Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao, Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen, Ningyu Zhang

    Abstract: This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performa… ▽ More

    Submitted 22 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Work in progress

  21. arXiv:2305.13068  [pdf, other

    cs.CL cs.AI cs.HC cs.IR cs.LG

    Making Language Models Better Tool Learners with Execution Feedback

    Authors: Shuofei Qiao, Honghao Gui, Chengfei Lv, Qianghuai Jia, Huajun Chen, Ningyu Zhang

    Abstract: Tools serve as pivotal interfaces that enable humans to understand and reshape the environment. With the advent of foundation models, AI systems can utilize tools to expand their capabilities and interact with the real world. Existing tool learning methodologies, encompassing supervised fine-tuning and prompt engineering approaches, often induce large language models to utilize tools indiscriminat… ▽ More

    Submitted 14 March, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NAACL 2024

  22. arXiv:2305.11527  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    InstructIE: A Bilingual Instruction-based Information Extraction Dataset

    Authors: Honghao Gui, Shuofei Qiao, Jintian Zhang, Hongbin Ye, Mengshu Sun, Lei Liang, Jeff Z. Pan, Huajun Chen, Ningyu Zhang

    Abstract: Large language models can perform well on general natural language tasks, but their effectiveness is still not optimal for information extraction. Recent works indicate that the main reason lies in the lack of extensive data on information extraction instructions. Note that the existing datasets on information extraction instructions not only have limited coverage but also involve high constructio… ▽ More

    Submitted 18 April, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Work in progress; project homepage: https://www.zjukg.org/project/InstructIE/ dataset: https://huggingface.co/datasets/zjunlp/InstructIE

  23. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  24. arXiv:2304.11136  [pdf

    cs.AR cs.PF

    Integrating Per-Stream Stat Tracking into Accel-Sim

    Authors: Shichen Qiao, Xin Su, Matthew D. Sinclair

    Abstract: Accel-Sim is a widely used computer architecture simulator that models the behavior of modern NVIDIA GPUs in great detail. However, although Accel-Sim and the underlying GPGPU-Sim model many of the features of real GPUs, thus far it has not been able to track statistics separately per stream. Instead, Accel-Sim combines statistics (e.g., cycles and cache hits/misses) across all simultaneously runn… ▽ More

    Submitted 4 September, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 13 pages

  25. arXiv:2301.10410  [pdf, other

    cs.CL cs.AI cs.DB cs.IR cs.LG

    One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

    Authors: Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen

    Abstract: Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up… ▽ More

    Submitted 18 September, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: IJCAI 2023

  26. arXiv:2301.06679  [pdf, other

    cs.CV

    Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff

    Authors: Jia Li, Shengye Qiao, Zhirui Zhao, Chenxi Xie, Xiaowu Chen, Changqun Xia

    Abstract: Existing salient object detection methods often adopt deeper and wider networks for better performance, resulting in heavy computational burden and slow inference speed. This inspires us to rethink saliency detection to achieve a favorable balance between efficiency and accuracy. To this end, we design a lightweight framework while maintaining satisfying competitive accuracy. Specifically, we prop… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

  27. arXiv:2301.00169  [pdf, other

    cs.SI cs.AI

    Generative Graph Neural Networks for Link Prediction

    Authors: Xingping Xian, Tao Wu, Xiaoke Ma, Shaojie Qiao, Yabin Shao, Chao Wang, Lin Yuan, Yu Wu

    Abstract: Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computin… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

    Comments: 13pages

    ACM Class: I.2.4; I.2.8; J.2

  28. arXiv:2212.09597  [pdf, other

    cs.CL cs.AI cs.CV cs.IR cs.LG

    Reasoning with Language Model Prompting: A Survey

    Authors: Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen

    Abstract: Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We… ▽ More

    Submitted 18 September, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: ACL 2023, 24 pages, add references of theoretical analysis

  29. arXiv:2211.07504  [pdf, other

    cs.CL cs.AI cs.CV cs.IR cs.LG

    On Analyzing the Role of Image for Visual-enhanced Relation Extraction

    Authors: Lei Li, Xiang Chen, Shuofei Qiao, Feiyu Xiong, Huajun Chen, Ningyu Zhang

    Abstract: Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual info… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI 2023 (Student Abstract)

  30. Deploying a Steered Query Optimizer in Production at Microsoft

    Authors: Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal

    Abstract: Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios. As a result, it is important to specialize these optimizers to instances of the workloads. In this paper, we continue a recent line of work in steering a query optimizer towards better plans for a given workload, and make major strides in pushing p… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the 2022 International Conference on Management of Data 2022 Jun 10 (pp. 2299-2311)

  31. arXiv:2210.05084  [pdf, other

    cs.IT

    Covert Communication Gains from Adversary's Uncertainty of Phase Angles

    Authors: Sen Qiao, Daming Cao, Qiaosheng Zhang, Yinfei Xu, Guangjie Liu

    Abstract: This work investigates the phase gain of intelligent reflecting surface (IRS) covert communication over complex-valued additive white Gaussian noise (AWGN) channels. The transmitter Alice intends to transmit covert messages to the legitimate receiver Bob via reflecting the broadcast signals from a radio frequency (RF) source, while rendering the adversary Willie's detector arbitrarily close to ine… ▽ More

    Submitted 6 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

  32. arXiv:2210.01820  [pdf, other

    cs.CV

    MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

    Authors: Chenglin Yang, Siyuan Qiao, Qihang Yu, Xiaoding Yuan, Yukun Zhu, Alan Yuille, Hartwig Adam, Liang-Chieh Chen

    Abstract: This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. Unlike the current works that stack separate mobile convolution and transformer blocks, we effectively merge them into a MOAT block. Starting with a standard Transformer block, we replace its multi-layer perceptron with a mobile convolution block, and furthe… ▽ More

    Submitted 30 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: ICLR 2023. arXiv v2: add ImageNet-1K-V2, tiny-MOAT on COCO detection and ADE20K segmentation

  33. arXiv:2207.04044  [pdf, other

    cs.CV

    kMaX-DeepLab: k-means Mask Transformer

    Authors: Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    Abstract: The rise of transformers in vision tasks not only advances network backbone designs, but also starts a brand-new page to achieve end-to-end image recognition (e.g., object detection and panoptic segmentation). Originated from Natural Language Processing (NLP), transformer architectures, consisting of self-attention and cross-attention, effectively learn long-range interactions between elements in… ▽ More

    Submitted 10 July, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. arXiv v2: add results on ADE20K. arXiv v3: fix appendix. v4: fix typo. v5: add PyTorch re-implementation. Codes and models are available at TensorFlow: https://github.com/google-research/deeplab2 PyTorch: https://github.com/bytedance/kmax-deeplab

  34. arXiv:2206.08948  [pdf, other

    cs.CV

    CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

    Authors: Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    Abstract: We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the object queries as cluster centers, which fill the role of grouping the pixels when applied to segmentation. The clustering is computed with an altern… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: CVPR 2022 Oral

  35. arXiv:2206.07704  [pdf, other

    cs.CV

    Waymo Open Dataset: Panoramic Video Panoptic Segmentation

    Authors: Jieru Mei, Alex Zihao Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar, Dragomir Anguelov

    Abstract: Panoptic image segmentation is the computer vision task of finding groups of pixels in an image and assigning semantic classes and object instance identifiers to them. Research in image segmentation has become increasingly popular due to its critical applications in robotics and autonomous driving. The research community thereby relies on publicly available benchmark dataset to advance the state-o… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: Our dataset can be found at https://waymo.com/open

  36. arXiv:2205.15361  [pdf, other

    cs.CV

    TubeFormer-DeepLab: Video Mask Transformer

    Authors: Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

    Abstract: We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner. Different video segmentation tasks (e.g., video semantic/instance/panoptic segmentation) are usually considered as distinct problems. State-of-the-art models adopted in the separate communities have diverged, and radically different approaches dominate in each task. By contrast, w… ▽ More

    Submitted 5 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: CVPR 2022; arXiv v2: add results on VIPSeg val/test sets and VSPW new test set

  37. arXiv:2201.03335  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

    Authors: Ningyu Zhang, Xin Xu, Liankuan Tao, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Peng Wang, Wen Zhang, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Feiyu Xiong, Fei Huang, Guozhou Zheng, Huajun Chen

    Abstract: We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to cus… ▽ More

    Submitted 18 September, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: Accepted by EMNLP 2022 System Demonstrations and the project website is http://deepke.zjukg.cn/

  38. arXiv:2107.08594  [pdf, other

    cs.DB cs.LG

    Optimal Resource Allocation for Serverless Queries

    Authors: Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen

    Abstract: Optimizing resource allocation for analytical workloads is vital for reducing costs of cloud-data services. At the same time, it is incredibly hard for users to allocate resources per query in serverless processing systems, and they frequently misallocate by orders of magnitude. Unfortunately, prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource al… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

  39. arXiv:2107.05637  [pdf, other

    cs.CV

    Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms

    Authors: Chenglin Yang, Siyuan Qiao, Adam Kortylewski, Alan Yuille

    Abstract: Self-Attention has become prevalent in computer vision models. Inspired by fully connected Conditional Random Fields (CRFs), we decompose self-attention into local and context terms. They correspond to the unary and binary terms in CRF and are implemented by attention mechanisms with projection matrices. We observe that the unary terms only make small contributions to the outputs, and meanwhile st… ▽ More

    Submitted 28 November, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  40. arXiv:2106.09748  [pdf, other

    cs.CV

    DeepLab2: A TensorFlow Library for Deep Labeling

    Authors: Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

    Abstract: DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the sta… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 4-page technical report. The first three authors contributed equally to this work

  41. arXiv:2103.15073  [pdf, other

    cs.LG cs.AI eess.SY

    IUP: An Intelligent Utility Prediction Scheme for Solid-State Fermentation in 5G IoT

    Authors: Min Wang, Shanchen Pang, Tong Ding, Sibo Qiao, Xue Zhai, Shuo Wang, Neal N. Xiong, Zhengwen Huang

    Abstract: At present, SOILD-STATE Fermentation (SSF) is mainly controlled by artificial experience, and the product quality and yield are not stable. Accurately predicting the quality and yield of SSF is of great significance for improving human food security and supply. In this paper, we propose an Intelligent Utility Prediction (IUP) scheme for SSF in 5G Industrial Internet of Things (IoT), including para… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

  42. arXiv:2012.05258  [pdf, other

    cs.CV

    ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

    Authors: Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    Abstract: In this paper, we present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective image sequences while providing each point with instance-level semantic interpretations. Solving this problem requires the vision models to predict the spatial location, semantic class, and… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Video: https://youtu.be/XR4HFiwwao0 GitHub: https://github.com/joe-siyuan-qiao/ViP-DeepLab

  43. arXiv:2011.14150  [pdf, other

    cs.CV

    Batch Normalization with Enhanced Linear Transformation

    Authors: Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

    Abstract: Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions. In this paper, we demonstrate properly enhancing this linear transformation module can effectively improve the ability of BN. Specifically, rather than using a single neuron, we propose to additionally con… ▽ More

    Submitted 28 November, 2020; originally announced November 2020.

    Comments: 12 pages. The code is available at https://github.com/yuhuixu1993/BNET

  44. arXiv:2011.13096  [pdf, ps, other

    cs.CV eess.IV

    Automatic Detection of Cardiac Chambers Using an Attention-based YOLOv4 Framework from Four-chamber View of Fetal Echocardiography

    Authors: Sibo Qiao, Shanchen Pang, Gang Luo, Silin Pan, Xun Wang, Min Wang, Xue Zhai, Taotao Chen

    Abstract: Echocardiography is a powerful prenatal examination tool for early diagnosis of fetal congenital heart diseases (CHDs). The four-chamber (FC) view is a crucial and easily accessible ultrasound (US) image among echocardiography images. Automatic analysis of FC views contributes significantly to the early diagnosis of CHDs. The first step to automatically analyze fetal FC views is locating the fetal… ▽ More

    Submitted 13 December, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

  45. arXiv:2011.11675  [pdf, other

    cs.CV

    Scaling Wide Residual Networks for Panoptic Segmentation

    Authors: Liang-Chieh Chen, Huiyu Wang, Siyuan Qiao

    Abstract: The Wide Residual Networks (Wide-ResNets), a shallow but wide model variant of the Residual Networks (ResNets) by stacking a small number of residual blocks with large channel sizes, have demonstrated outstanding performance on multiple dense prediction tasks. However, since proposed, the Wide-ResNet architecture has barely evolved over the years. In this work, we revisit its architecture design f… ▽ More

    Submitted 7 February, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: Update experimental results

  46. arXiv:2006.02334  [pdf, other

    cs.CV

    DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution

    Authors: Siyuan Qiao, Liang-Chieh Chen, Alan Yuille

    Abstract: Many modern object detectors demonstrate outstanding performances by using the mechanism of looking and thinking twice. In this paper, we explore this mechanism in the backbone design for object detection. At the macro level, we propose Recursive Feature Pyramid, which incorporates extra feedback connections from Feature Pyramid Networks into the bottom-up backbone layers. At the micro level, we p… ▽ More

    Submitted 30 November, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

  47. arXiv:2002.12393  [pdf, other

    cs.DB

    Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

    Authors: Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le

    Abstract: Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very co… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: To appear at SIGMOD 2020

  48. arXiv:1911.11263  [pdf, other

    cs.CV

    Deeply Shape-guided Cascade for Instance Segmentation

    Authors: Hao Ding, Siyuan Qiao, Alan Yuille, Wei Shen

    Abstract: The key to a successful cascade architecture for precise instance segmentation is to fully leverage the relationship between bounding box detection and mask segmentation across multiple stages. Although modern instance segmentation cascades achieve leading performance, they mainly make use of a unidirectional relationship, i.e., mask segmentation can benefit from iteratively refined bounding box d… ▽ More

    Submitted 27 March, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: Accepted to CVPR 2021

  49. arXiv:1911.09738  [pdf, other

    cs.CV

    Rethinking Normalization and Elimination Singularity in Neural Networks

    Authors: Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

    Abstract: In this paper, we study normalization methods for neural networks from the perspective of elimination singularity. Elimination singularities correspond to the points on the training trajectory where neurons become consistently deactivated. They cause degenerate manifolds in the loss landscape which will slow down training and harm model performances. We show that channel-based normalizations (e.g.… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1903.10520

  50. Deep Heterogeneous Hashing for Face Video Retrieval

    Authors: Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen

    Abstract: Retrieving videos of a particular person with face image as a query via hashing technique has many important applications. While face images are typically represented as vectors in Euclidean space, characterizing face videos with some robust set modeling techniques (e.g. covariance matrices as exploited in this study, which reside on Riemannian manifold), has recently shown appealing advantages. T… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: 14 pages, 17 figures, 4 tables, accepted by IEEE Transactions on Image Processing (TIP) 2019

    Journal ref: IEEE Transactions on Image Processing 2019