Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 72 results for author: Ji, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06949  [pdf, other

    cs.CV cs.AI

    Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Mao Ye

    Abstract: Moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily focus on extracting target features only from the spatial-temporal domain. For further enhancing feature representation, more information domains such as frequency are believed to be potentially valuable. To extend target feature… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has submitted to IEEE TGRS,under review

  2. arXiv:2405.15343  [pdf, other

    cs.CV

    Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

    Authors: Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu

    Abstract: The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.07652  [pdf, other

    cs.HC cs.AI

    G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

    Authors: Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu

    Abstract: Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual fie… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  4. arXiv:2405.04405  [pdf, other

    cs.LG

    Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation

    Authors: Pei Liu, Luping Ji

    Abstract: Uncertainty estimation (UE), as an effective means of quantifying predictive uncertainty, is crucial for safe and reliable decision-making, especially in high-risk scenarios. Existing UE schemes usually assume that there are completely-labeled samples to support fully-supervised learning. In practice, however, many UE tasks often have no sufficiently-labeled data to use, such as the Multiple Insta… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  5. arXiv:2401.11430  [pdf, other

    cs.CV

    Exploring Diffusion Time-steps for Unsupervised Representation Learning

    Authors: Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

    Abstract: Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised l… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  6. arXiv:2401.09454  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Voila-A: Aligning Vision-Language Models with User's Gaze Attention

    Authors: Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

    Abstract: In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper… ▽ More

    Submitted 22 December, 2023; originally announced January 2024.

  7. arXiv:2312.17072  [pdf, other

    cs.IR cs.LG

    An Adaptive Framework of Geographical Group-Specific Network on O2O Recommendation

    Authors: Luo Ji, Jiayu Mao, Hailong Shi, Qian Li, Yunfei Chu, Hongxia Yang

    Abstract: Online to offline recommendation strongly correlates with the user and service's spatiotemporal information, therefore calling for a higher degree of model personalization. The traditional methodology is based on a uniform model structure trained by collected centralized data, which is unlikely to capture all user patterns over different geographical areas or time periods. To tackle this challenge… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 7 pages, 4 figures, Accepted by ECIR 2024

  8. arXiv:2312.13108  [pdf, other

    cs.CV

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Authors: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper… ▽ More

    Submitted 1 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project Page: https://showlab.github.io/assistgui/

  9. arXiv:2310.18652  [pdf, other

    cs.CL cs.AI cs.CV

    EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

    Authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi

    Abstract: Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o… ▽ More

    Submitted 25 December, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023 Datasets and Benchmarks Track (10 pages for main text, 4 pages for references, 39 pages for supplementary materials)

  10. arXiv:2310.11285  [pdf, ps, other

    cs.DM

    Construction of optimal optimum distance flag codes by MRD codes

    Authors: Shuangqing Liu, Shuhui Yu, Lijun Ji

    Abstract: Optimum distance flag codes (ODFCs), as special flag codes, have received a lot of attention due to its application in random network coding. In 2021, Alonso-González et al. constructed optimal $(n,\mathcal{A})$-ODFC for $\mathcal {A}\subseteq \{1,2,\ldots,k,n-k,\ldots,n-1\}$ with $k\in \mathcal A$ and $k|n$. In this paper, we introduce a new construction of $(n,\mathcal A)_q$-ODFCs by maximum ran… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 14 pages

    MSC Class: 94B99

  11. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  12. arXiv:2309.07141  [pdf

    eess.SP cs.AI cs.LG

    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence

    Authors: Zhuo-yong Shi, Ye-tao Jia, Ke-xin Zhang, Ding-han Wang, Long-meng Ji, Yong Wu

    Abstract: With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skill… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 34pages, 16figures

    MSC Class: 93-01 ACM Class: G.1; H.4

  13. arXiv:2308.15016  [pdf, other

    cs.CV

    C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

    Authors: Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin

    Abstract: Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 12 pages, 6 figures, 7 tables

  14. arXiv:2307.07893  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Anomaly Detection in Automated Fibre Placement: Learning with Data Limitations

    Authors: Assef Ghamisi, Todd Charter, Li Ji, Maxime Rivard, Gil Lund, Homayoun Najjaran

    Abstract: Conventional defect detection systems in Automated Fibre Placement (AFP) typically rely on end-to-end supervised learning, necessitating a substantial number of labelled defective samples for effective training. However, the scarcity of such labelled data poses a challenge. To overcome this limitation, we present a comprehensive framework for defect detection and localization in Automated Fibre Pl… ▽ More

    Submitted 14 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Journal ref: Frontiers in Manufacturing Technology, 2024, 4, 1277152

  15. arXiv:2307.07409  [pdf, other

    cs.CL cs.AI eess.IV

    KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

    Authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang

    Abstract: In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published at BioNLP workshop @ ACL 2023

  16. Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification

    Authors: Pei Liu, Luping Ji, Xinyu Zhang, Feng Ye

    Abstract: Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data and ii) the sample memorization inclination inherent in neural networks. These problems may hinder MIL m… ▽ More

    Submitted 2 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 12 pages, 6 figures, 10 tables

  17. arXiv:2306.15255  [pdf, other

    cs.CV cs.CL

    GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

    Authors: Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

    Abstract: In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 4 tables, the champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023

  18. arXiv:2306.08640  [pdf, other

    cs.CV

    AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

    Authors: Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou

    Abstract: Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected… ▽ More

    Submitted 28 June, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Project page: https://showlab.github.io/assistgpt/

  19. arXiv:2305.15627  [pdf, ps, other

    cs.DM math.CO

    New constructions of cyclic subspace codes

    Authors: Shuhui Yu, Lijun Ji

    Abstract: A subspace of a finite field is called a Sidon space if the product of any two of its nonzero elements is unique up to a scalar multiplier from the base field. Sidon spaces, introduced by Roth et al. (IEEE Trans Inf Theory 64(6): 4412-4422, 2018), have a close connection with optimal full-length orbit codes. In this paper, we present two constructions of Sidon spaces. The union of Sidon spaces fro… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  20. ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification

    Authors: Rui Yang, Pei Liu, Luping Ji

    Abstract: Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to g… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 12 pages, 5 figures, and 3 tables

    Journal ref: Computer Methods and Programs in Biomedicine, 108161 (2024)

  21. arXiv:2303.16434  [pdf, other

    cs.AI cs.CL

    TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

    Authors: Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

    Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  22. arXiv:2303.01923  [pdf, other

    stat.ML cs.LG q-fin.ST stat.AP

    Bayesian CART models for insurance claims frequency

    Authors: Yaojun Zhang, Lanpeng Ji, Georgios Aivaliotis, Charles Taylor

    Abstract: Accuracy and interpretability of a (non-life) insurance pricing model are essential qualities to ensure fair and transparent premiums for policy-holders, that reflect their risk. In recent years, the classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature, since they offer good prediction performance and are relatively easily interpretable… ▽ More

    Submitted 1 December, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: 46 pages

    MSC Class: 62P05

  23. arXiv:2302.04438  [pdf, other

    stat.ML cs.LG

    An information-theoretic learning model based on importance sampling

    Authors: Jiangshe Zhang, Lizhen Ji, Fei Gao, Mengyao Li

    Abstract: A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empir… ▽ More

    Submitted 22 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 7 pages, 4 figures

  24. arXiv:2302.04421  [pdf, other

    stat.ML cs.LG

    Information Theoretical Importance Sampling Clustering

    Authors: Jiangshe Zhang, Lizhen Ji, Meng Wang

    Abstract: A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. However, this assumption may not hold in most real-world scenarios. In this paper, we propose an information theoretical importance sampling based approach for clustering problems (ITISC) which minimizes the worst case of expected distortions under the constraint of distr… ▽ More

    Submitted 30 May, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages, 9 figures

  25. arXiv:2212.09522  [pdf, other

    cs.CV

    MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

    Authors: Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

    Abstract: To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must. Existing multi-modal VQA models achieve promising performance on images or short video clips, especially with the recent success of large-scale multi-modal pre-training. However, when extending these methods to long-fo… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  26. arXiv:2212.07047  [pdf, other

    cs.CV

    Shared Coupling-bridge for Weakly Supervised Local Feature Learning

    Authors: Jiayuan Sun, Jiewen Zhu, Luping Ji

    Abstract: Sparse local feature extraction is usually believed to be of important significance in typical vision tasks such as simultaneous localization and mapping, image matching and 3D reconstruction. At present, it still has some deficiencies needing further improvement, mainly including the discrimination power of extracted local descriptors, the localization accuracy of detected keypoints, and the effi… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 15 pages

  27. AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images

    Authors: Pei Liu, Luping Ji, Feng Ye, Bo Fu

    Abstract: The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a c… ▽ More

    Submitted 5 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 15 pages, 10 figures, 8 tables

    Journal ref: Medical Image Analysis, 103020 (2023)

  28. arXiv:2211.08776  [pdf, other

    cs.CV cs.IR

    An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

    Authors: Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

    Abstract: This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic varia… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Technical report for ECCV 2022 Ego4D workshop, 4 pages, 2 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2209.10918

  29. arXiv:2210.07815  [pdf, other

    cs.IR cs.LG

    Intra-session Context-aware Feed Recommendation in Live Systems

    Authors: Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang

    Abstract: Feed recommendation allows users to constantly browse items until feel uninterested and leave the session, which differs from traditional recommendation scenarios. Within a session, user's decision to continue browsing or not substantially affects occurrences of later clicks. However, such type of exposure bias is generally ignored or not explicitly modeled in most feed recommendation studies. In… ▽ More

    Submitted 11 January, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, CIKM 2022 short paper

  30. arXiv:2210.04522  [pdf, other

    cs.CV

    HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

    Authors: Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

    Abstract: Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct appl… ▽ More

    Submitted 27 January, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: AAAI 2024 main conference

  31. arXiv:2209.10918  [pdf, other

    cs.CV cs.CL cs.IR

    CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

    Authors: Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

    Abstract: This paper tackles an emerging and challenging problem of long video temporal grounding~(VTG) that localizes video moments related to a natural language (NL) query. Compared with short videos, long videos are also highly demanded but less explored, which brings new challenges in higher inference computation cost and weaker multi-modal alignment. To address these challenges, we propose CONE, an eff… ▽ More

    Submitted 29 May, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: ACL 2023 Camera Ready. 14 pages, 7 figures, 4 tables

  32. arXiv:2206.05782  [pdf, other

    eess.IV cs.CV cs.LG

    DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis

    Authors: Pei Liu, Bo Fu, Feng Ye, Rui Yang, Bin Xu, Luping Ji

    Abstract: The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these p… ▽ More

    Submitted 28 March, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: 12 pages, 6 figures, 7 tables

    Journal ref: Expert Systems with Applications, 120280 (2023)

  33. arXiv:2204.03828  [pdf, other

    cs.IT cs.MM cs.PF

    From PHY to QoE: A Parameterized Framework Design

    Authors: Hao Wang, Lei Ji, Zhenxing Gao

    Abstract: The rapid development of 5G communication technology has given birth to various real-time broadband communication services, such as augmented reality (AR), virtual reality (VR) and cloud games. Compared with traditional services, consumers tend to focus more on their subjective experience when utilizing these services. In the meantime, the problem of power consumption is particularly prominent in… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  34. Deep Unified Representation for Heterogeneous Recommendation

    Authors: Chengqiang Lu, Mingyang Yin, Shuheng Shen, Luo Ji, Qi Liu, Hongxia Yang

    Abstract: Recommendation system has been a widely studied task both in academia and industry. Previous works mainly focus on homogeneous recommendation and little progress has been made for heterogeneous recommender systems. However, heterogeneous recommendations, e.g., recommending different types of items including products, videos, celebrity shopping notes, among many others, are dominant nowadays. State… ▽ More

    Submitted 26 January, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: 12 pages, 4 figures, accepted by the ACM Web Conference 2022 (WWW '22)

  35. arXiv:2112.01368  [pdf, other

    cs.CL cs.AI cs.LG

    ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

    Authors: Huaishao Luo, Lei Ji, Yanyong Huang, Bin Wang, Shenggong Ji, Tianrui Li

    Abstract: Fusion technique is a key research topic in multimodal sentiment analysis. The recent attention-based fusion demonstrates advances over simple operation-based fusion. However, these fusion works adopt single-scale, i.e., token-level or utterance-level, unimodal representation. Such single-scale fusion is suboptimal because that different modality should be aligned with different granularities. Thi… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  36. arXiv:2111.12417  [pdf, other

    cs.CV cs.AI

    NÃœWA: Visual Synthesis Pre-training for Neural visUal World creAtion

    Authors: Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan

    Abstract: This paper presents a unified multimodal pre-trained model called NÃœWA that can generate new or manipulate existing visual data (i.e., images and videos) for various visual synthesis tasks. To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and i… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  37. arXiv:2111.06061  [pdf, other

    cs.LG cs.AI

    Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI

    Authors: Jiangchao Yao, Shengyu Zhang, Yang Yao, Feng Wang, Jianxin Ma, Jianwei Zhang, Yunfei Chu, Luo Ji, Kunyang Jia, Tao Shen, Anpeng Wu, Fengda Zhang, Ziqi Tan, Kun Kuang, Chao Wu, Fei Wu, Jingren Zhou, Hongxia Yang

    Abstract: Influenced by the great success of deep learning via cloud computing and the rapid development of edge chips, research in artificial intelligence (AI) has shifted to both of the computing paradigms, i.e., cloud computing and edge computing. In recent years, we have witnessed significant progress in developing more advanced AI models on cloud servers that surpass traditional deep learning models ow… ▽ More

    Submitted 23 May, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: 20 pages, Transactions on Knowledge and Data Engineering

  38. arXiv:2110.00335  [pdf, other

    cs.CV

    Geometry Attention Transformer with Position-aware LSTMs for Image Captioning

    Authors: Chi Wang, Yulin Shen, Luping Ji

    Abstract: In recent years, transformer structures have been widely applied in image captioning with impressive performance. For good captioning results, the geometry and position relations of different visual objects are often thought of as crucial information. Aiming to further promote image captioning by transformers, this paper proposes an improved Geometry Attention Transformer (GAT) model. In order to… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: To be submitted

  39. arXiv:2108.09141  [pdf, other

    cs.IR cs.AI cs.LG

    Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation

    Authors: Luo Ji, Qin Qi, Bingqing Han, Hongxia Yang

    Abstract: Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In order to alleviate the cold-start issue, most existing methods introduce content and contextual information as the auxiliary information. Nevertheless, these methods assume the recommended items behave stea… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: Accepted by CIKM 2021

  40. arXiv:2108.02365  [pdf

    cs.CV cs.CL

    Hybrid Reasoning Network for Video-based Commonsense Captioning

    Authors: Weijiang Yu, Jian Liang, Lei Ji, Lu Li, Yuejian Fang, Nong Xiao, Nan Duan

    Abstract: The task of video-based commonsense captioning aims to generate event-wise captions and meanwhile provide multiple commonsense descriptions (e.g., attribute, effect and intention) about the underlying event in the video. Prior works explore the commonsense captions by using separate networks for different commonsense types, which is time-consuming and lacks mining the interaction of different comm… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 11 pages, 6 figures

    MSC Class: 68T07

  41. arXiv:2107.00818  [pdf, other

    cs.CV

    1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition

    Authors: Pengcheng Wang, Lingqiao Ji, Zhilong Ji, Yuan Gao, Xiao Liu

    Abstract: In this technical report, we briefly introduce the solution of our team "TAL-ai" for (Semi-) supervised Face detection in the low light condition in UG2+ Challenge in CVPR 2021. By conducting several experiments with popular image enhancement methods and image transfer methods, we pulled the low light image and the normal image to a more closer domain. And it is observed that using these data to t… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  42. arXiv:2106.09889  [pdf, other

    cs.CL cs.CV cs.MM

    GEM: A General Evaluation Benchmark for Multimodal Tasks

    Authors: Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti

    Abstract: In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language tasks and GEM-V for video-language tasks. Comparing with existing multimodal datasets such as MSCOCO an… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted by Findings of ACL 2021

  43. arXiv:2104.14806  [pdf, other

    cs.CV

    GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

    Authors: Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan

    Abstract: Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation. Existing works typically experiment on simple or small datasets, where the generalization ability is quite limited. In this work, we propose GODIVA, an open-domain text-to-video pretrained model that can generate videos from text in an auto-regress… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

  44. arXiv:2104.08860  [pdf, other

    cs.CV

    CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

    Authors: Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, Tianrui Li

    Abstract: Video-text retrieval plays an essential role in multi-modal research and has been widely used in many real-world web applications. The CLIP (Contrastive Language-Image Pre-training), an image-language pre-training model, has demonstrated the power of visual concepts learning from web collected image-text datasets. In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP mo… ▽ More

    Submitted 8 May, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

  45. arXiv:2104.08601  [pdf, other

    cs.CL

    Who Responded to Whom: The Joint Effects of Latent Topics and Discourse in Conversation Structure

    Authors: Lu Ji, Jing Li, Zhongyu Wei, Qi Zhang, Xuanjing Huang

    Abstract: Numerous online conversations are produced on a daily basis, resulting in a pressing need to conversation understanding. As a basis to structure a discussion, we identify the responding relations in the conversation discourse, which link response utterances to their initiations. To figure out who responded to whom, here we explore how the consistency of topic contents and dependency of discourse r… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 10 pages, 7 figures, 3 tables submitted for emnlp2021

  46. Towards Reducing Severe Defocus Spread Effects for Multi-Focus Image Fusion via an Optimization Based Strategy

    Authors: Shuang Xu, Lizhen Ji, Zhe Wang, Pengfei Li, Kai Sun, Chunxia Zhang, Jiangshe Zhang

    Abstract: Multi-focus image fusion (MFF) is a popular technique to generate an all-in-focus image, where all objects in the scene are sharp. However, existing methods pay little attention to defocus spread effects of the real-world multi-focus images. Consequently, most of the methods perform badly in the areas near focus map boundaries. According to the idea that each local region in the fused image should… ▽ More

    Submitted 29 December, 2020; originally announced December 2020.

    Journal ref: IEEE Transactions on Computational Imaging, vol. 6, pp. 1561-1570, 2020

  47. arXiv:2009.10557  [pdf, other

    cs.CL

    GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis

    Authors: Huaishao Luo, Lei Ji, Tianrui Li, Nan Duan, Daxin Jiang

    Abstract: In this paper, we focus on the imbalance issue, which is rarely studied in aspect term extraction and aspect sentiment classification when regarding them as sequence labeling tasks. Besides, previous works usually ignore the interaction between aspect terms when labeling polarities. We propose a GRadient hArmonized and CascadEd labeling model (GRACE) to solve these problems. Specifically, a cascad… ▽ More

    Submitted 24 September, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: to appear in Findings of EMNLP 2020

  48. arXiv:2009.07406  [pdf, other

    cs.CL cs.AI

    Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding

    Authors: Martin Kuo, Yaobo Liang, Lei Ji, Nan Duan, Linjun Shou, Ming Gong, Peng Chen

    Abstract: Question Aware Open Information Extraction (Question aware Open IE) takes question and passage as inputs, outputting an answer tuple which contains a subject, a predicate, and one or more arguments. Each field of answer is a natural language word sequence and is extracted from the passage. The semi-structured answer has two advantages which are more readable and falsifiable compared to span answer… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: 11 pages, 1 figure, 4 tables

    MSC Class: 68T50; 68T01

  49. arXiv:2009.00931  [pdf, other

    cs.GR

    A Study of Opacity Ranges for Transparent Overlays in 3D Landscapes

    Authors: Jan Hombeck, Li Ji, Kai Lawonn, Charles Perin

    Abstract: {When visualizing data in a realistically rendered 3D virtual environment, it is often important to represent not only the 3D scene but also overlaid information about additional, abstract data. These overlays must be usefully visible, i.e. be readable enough to convey the information they represent, but remain unobtrusive to avoid cluttering the view. We take a step toward establishing guidelines… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: IEEE VIS 2020

  50. arXiv:2005.00706  [pdf, other

    cs.CL cs.CV

    A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

    Authors: Frank F. Xu, Lei Ji, Botian Shi, Junyi Du, Graham Neubig, Yonatan Bisk, Nan Duan

    Abstract: Watching instructional videos are often used to learn about procedures. Video captioning is one way of automatically collecting such knowledge. However, it provides only an indirect, overall evaluation of multimodal models with no finer-grained quantitative measure of what they have learned. We propose instead, a benchmark of structured procedural knowledge extracted from cooking videos. This work… ▽ More

    Submitted 9 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted by NLP Beyond Text - First International Workshop on Natural Language Processing Beyond Text @ EMNLP 2020