Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–21 of 21 results for author: Lyu, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13977  [pdf, other

    eess.IV cs.CV

    Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning

    Authors: Tingyi Lin, Pengju Lyu, Jie Zhang, Yuqing Wang, Cheng Wang, Jianjun Zhu

    Abstract: Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional d… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2405.21013  [pdf, other

    cs.CV

    StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

    Authors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, Jingdong Wang

    Abstract: Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Langu… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  3. arXiv:2405.19765  [pdf, other

    cs.CV cs.AI

    Towards Unified Multi-granularity Text Detection with Interactive Attention

    Authors: Xingyu Wan, Chengquan Zhang, Pengyuan Lyu, Sen Fan, Zihan Ni, Kun Yao, Errui Ding, Jingdong Wang

    Abstract: Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  4. arXiv:2405.15412  [pdf, other

    physics.ao-ph cs.AI cs.LG

    ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

    Authors: Zijie Guo, Pumeng Lyu, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical O… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2404.05657  [pdf, other

    cs.CV

    MLP Can Be A Good Transformer Learner

    Authors: Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang

    Abstract: Self-attention mechanism is the key of the Transformer but often criticized for its computation demands. Previous token pruning works motivate their methods from the view of computation redundancy but still need to load the full network and require same memory costs. This paper introduces a novel strategy that simplifies vision transformers and reduces computational load through the selective remo… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: efficient transformer

  6. arXiv:2312.10429  [pdf, other

    physics.geo-ph cs.AI

    ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks

    Authors: Pumeng Lyu, Tao Tang, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Recent studies have shown that deep learning (DL) models can skillfully predict the El Niño-Southern Oscillation (ENSO) forecasts over 1.5 years ahead. However, concerns regarding the reliability of predictions made by DL methods persist, including potential overfitting issues and lack of interpretability. Here, we propose ResoNet, a DL model that combines convolutional neural network (CNN) and Tr… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 32 pages, 5 main figures and 12 supplementary figures

  7. arXiv:2309.14962  [pdf, other

    cs.CV

    GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction

    Authors: Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, Jingdong Wang

    Abstract: All tables can be represented as grids. Based on this observation, we propose GridFormer, a novel approach for interpreting unconstrained table structures by predicting the vertex and edge of a grid. First, we propose a flexible table representation in the form of an MXN grid. In this representation, the vertexes and edges of the grid store the localization and adjacency information of the table.… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: ACMMM2023

  8. arXiv:2308.07202  [pdf, other

    cs.CV

    Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning

    Authors: Xugong Qin, Pengyuan Lyu, Chengquan Zhang, Yu Zhou, Kun Yao, Peng Zhang, Hailun Lin, Weiping Wang

    Abstract: Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-up segmentation-based methods begin to be mainstream in real-time scene text detection. Despite great progress, these methods show deficiencies in robustness and still suffer from false positives and instance adhesion. Different from existing methods which integrate multiple-granularity features or multip… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  9. arXiv:2306.03287  [pdf, other

    cs.CV

    ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

    Authors: Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, Mingyu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun , et al. (2 additional authors not shown)

    Abstract: Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extracti… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: ICDAR 2023 Competition on SVRD report (To be appear in ICDAR 2023)

  10. arXiv:2207.07253  [pdf, other

    cs.CV

    Single Shot Self-Reliant Scene Text Spotter by Decoupled yet Collaborative Detection and Recognition

    Authors: Jingjing Wu, Pengyuan Lyu, Guangming Lu, Chengquan Zhang, Wenjie Pei

    Abstract: Typical text spotters follow the two-stage spotting paradigm which detects the boundary for a text instance first and then performs text recognition within the detected regions. Despite the remarkable progress of such spotting paradigm, an important limitation is that the performance of text recognition depends heavily on the precision of text detection, resulting in the potential error propagatio… ▽ More

    Submitted 7 February, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

  11. arXiv:2206.00311  [pdf, other

    cs.CV

    MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

    Authors: Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

    Abstract: Text images contain both visual and linguistic information. However, existing pre-training techniques for text recognition mainly focus on either visual representation learning or linguistic knowledge learning. In this paper, we propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework. We adopt the masked image modeling appro… ▽ More

    Submitted 9 October, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

  12. arXiv:2107.08766  [pdf, other

    cs.CV

    VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

    Authors: Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud , et al. (30 additional authors not shown)

    Abstract: Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd C… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: The method description of A7 Mutil-Scale Aware based SFANet (M-SFANet) is updated and missing references are added

    Journal ref: European Conference on Computer Vision. Springer, Cham, 2020: 675-691

  13. arXiv:2104.05458  [pdf, other

    cs.CV

    PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

    Authors: Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

    Abstract: The reading of arbitrarily-shaped text has received increasing research attention. However, existing text spotters are mostly built on two-stage frameworks or character-based methods, which suffer from either Non-Maximum Suppression (NMS), Region-of-Interest (RoI) operations, or character-level annotations. In this paper, to address the above problems, we propose a novel fully convolutional Point… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 10 pages, 8 figures, AAAI 2021

  14. arXiv:1908.08207  [pdf, other

    cs.CV

    Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

    Authors: Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

    Abstract: Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: Accepted by TPAMI. An extension of the conference version. arXiv admin note: text overlap with arXiv:1807.02242

  15. arXiv:1906.05708  [pdf, other

    cs.CV

    2D Attentional Irregular Scene Text Recognizer

    Authors: Pengyuan Lyu, Zhicheng Yang, Xinhang Leng, Xiaojun Wu, Ruiyu Li, Xiaoyong Shen

    Abstract: Irregular scene text, which has complex layout in 2D space, is challenging to most previous scene text recognizers. Recently, some irregular scene text recognizers either rectify the irregular text to regular text image with approximate 1D layout or transform the 2D image feature map to 1D feature sequence. Though these methods have achieved good performance, the robustness and accuracy are still… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  16. arXiv:1809.06508  [pdf, other

    cs.CV

    Scene Text Recognition from Two-Dimensional Perspective

    Authors: Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

    Abstract: Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images are actually distributed in two-dimensional space. It is a nature quite different from that of speech, which is essentially a one-dimensional signal. In… ▽ More

    Submitted 17 November, 2018; v1 submitted 17 September, 2018; originally announced September 2018.

    Comments: To appear in AAAI 2019

  17. arXiv:1807.02242  [pdf, other

    cs.CV

    Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

    Authors: Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai

    Abstract: Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by… ▽ More

    Submitted 1 August, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

    Comments: To appear in ECCV 2018

  18. arXiv:1802.08948  [pdf, other

    cs.CV

    Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

    Authors: Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai

    Abstract: Previous deep learning based state-of-the-art scene text detection methods can be roughly classified into two categories. The first category treats scene text as a type of general objects and follows general object detection paradigm to localize scene text by regressing the text box locations, but troubled by the arbitrary-orientation and large aspect ratios of scene text. The second one segments… ▽ More

    Submitted 26 February, 2018; v1 submitted 24 February, 2018; originally announced February 2018.

    Comments: To appear in CVPR2018

  19. arXiv:1706.08789  [pdf, other

    cs.CV

    Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis

    Authors: Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, Wenyu Liu

    Abstract: In this paper, we investigate the Chinese calligraphy synthesis problem: synthesizing Chinese calligraphy images with specified style from standard font(eg. Hei font) images (Fig. 1(a)). Recent works mostly follow the stroke extraction and assemble pipeline which is complex in the process and limited by the effect of stroke extraction. We treat the calligraphy synthesis problem as an image-to-imag… ▽ More

    Submitted 27 June, 2017; originally announced June 2017.

    Comments: submitted to ICADR2017

  20. arXiv:1704.04613  [pdf, other

    cs.CV

    Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification

    Authors: Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo

    Abstract: Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading system… ▽ More

    Submitted 29 May, 2017; v1 submitted 15 April, 2017; originally announced April 2017.

  21. arXiv:1603.03915  [pdf, other

    cs.CV

    Robust Scene Text Recognition with Automatic Rectification

    Authors: Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai

    Abstract: Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a specially-d… ▽ More

    Submitted 19 April, 2016; v1 submitted 12 March, 2016; originally announced March 2016.

    Comments: Accepted by CVPR 2016