Search | arXiv e-print repository

Application of Computer Deep Learning Model in Diagnosis of Pulmonary Nodules

Authors: Yutian Yang, Hongjie Qiu, Yulu Gong, Xiaoyi Liu, Yang Lin, Muqing Li

Abstract: The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition… ▽ More The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition of lung nodules. A 3D RCNN (Region-based Convolutional Neural Network) was utilized for feature extraction and nodule identification. The LUNA16 large sample database was used as the research dataset. FROC (Free-response Receiver Operating Characteristic) analysis was applied to evaluate the model, calculating sensitivity at various false positive rates to derive the average FROC. Compared with conventional diagnostic methods, the recognition rate was significantly improved. This technique facilitates the detection of pulmonary abnormalities at an initial phase, which holds immense value for the prompt diagnosis of lung malignancies. △ Less

Submitted 19 June, 2024; originally announced June 2024.

MSC Class: 68T10; 92C50

arXiv:2406.11158 [pdf, other]

Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a semi-submersible FWT. In particular, both the rotor dynamics and the finite rotations of the platform are considered in presented modeling approach, thereby effectively capturing the complex interplay between the platform, tower, nacelle, and rotor under combined wind and wave loads. Subsequently, based on the developed FWT model, a novel adaptive nonlinear pitch controller is formulated with the goal of striking a trade-off between regulating power generation and reducing platform motion. Notably, the proposed control strategy adopts a continuous control approach, strategically beneficial in circumventing the chattering phenomenon commonly associated with sliding mode control. Furthermore, the controller integrates an online approximator and a robust integral of the sign of the tracking error, facilitating real-time learning of system unknown dynamics while compensating for bounded disturbances. Finally, both the accuracy of the established nonlinear FWT model in predicting key dynamics and the superiority of the presented pitch controller are validated through comprehensive comparative studies. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10082 [pdf, other]

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Authors: Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

Abstract: Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data differe… ▽ More Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data difference motivates us to adapt Whisper to handle video inputs. Inspired by Flamingo which injects visual features into language models, we propose Whisper-Flamingo which integrates visual features into the Whisper speech recognition and translation model with gated cross attention. Our audio-visual Whisper-Flamingo outperforms audio-only Whisper on English speech recognition and En-X translation for 6 languages in noisy conditions. Moreover, Whisper-Flamingo is a versatile model and conducts all of these tasks using one set of parameters, while prior methods are trained separately on each language. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Interspeech 2024. Code https://github.com/roudimit/whisper-flamingo

arXiv:2406.09710 [pdf, other]

Fine-Grained Urban Flow Inference with Multi-scale Representation Learning

Authors: Shilu Yuan, Dongfeng Li, Wei Liu, Xinxin Zhang, Meng Chen, Junjie Zhang, Yongshun Gong

Abstract: Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between differe… ▽ More Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the interactions and dynamic information between different-scale regions within the city. Different-scale geographical features can capture redundant information from the same spatial areas. In order to effectively learn multi-scale information across time and space, we propose an effective fine-grained urban flow inference model called UrbanMSR, which uses self-supervised contrastive learning to obtain dynamic multi-scale representations of neighborhood-level and city-level geographic information, and fuses multi-scale representations to improve fine-grained accuracy. The fusion of multi-scale representations enhances fine-grained. We validate the performance through extensive experiments on three real-world datasets. The resutls compared with state-of-the-art methods demonstrate the superiority of the proposed model. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09321 [pdf, other]

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

Authors: Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LL… ▽ More Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LLM's response are varied, such as manual annotation or prompting GPT-4 in specific ways. Each approach has its own set of strengths and weaknesses, impacting their alignment with human values, as well as the time and financial cost. This diversity in evaluation presents challenges for researchers in choosing suitable evaluation methods and conducting fair comparisons across different jailbreak attacks and defenses. In this paper, we conduct a comprehensive analysis of jailbreak evaluation methodologies, drawing from nearly ninety jailbreak research released between May 2023 and April 2024. Our study introduces a systematic taxonomy of jailbreak evaluators, offering in-depth insights into their strengths and weaknesses, along with the current status of their adaptation. Moreover, to facilitate subsequent research, we propose JailbreakEval, a user-friendly toolkit focusing on the evaluation of jailbreak attempts. It includes various well-known evaluators out-of-the-box, so that users can obtain evaluation results with only a single command. JailbreakEval also allows users to customize their own evaluation workflow in a unified framework with the ease of development and comparison. In summary, we regard JailbreakEval to be a catalyst that simplifies the evaluation process in jailbreak research and fosters an inclusive standard for jailbreak evaluation within the community. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Our code is available at https://github.com/ThuCCSLab/JailbreakEval

arXiv:2406.06558 [pdf, other]

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

Authors: Ye Zhang, Qian Leng, Mengran Zhu, Rui Ding, Yue Wu, Jintong Song, Yulu Gong

Abstract: The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF tech… ▽ More The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF techniques with advanced machine learning models, including Bayesian classifiers, Stochastic Gradient Descent (SGD), Categorical Gradient Boosting (CatBoost), and 12 instances of Deberta-v3-large models. Our approach aims to address the challenges associated with detecting AI-generated text by leveraging the strengths of both traditional feature extraction methods and state-of-the-art deep learning models. Through extensive experiments on a comprehensive dataset, we demonstrate the effectiveness of our proposed method in accurately distinguishing between human and AI-generated text. Our approach achieves superior performance compared to existing methods. This research contributes to the advancement of AI-generated text detection techniques and lays the foundation for developing robust solutions to mitigate the challenges posed by AI-generated content. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.06007 [pdf, other]

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://github.com/richard-peng-xia/CARES. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05759 [pdf, ps, other]

Chebyshev Moment Method for Regular Graphs I: Kesten-McKay and Semicircle distributions

Authors: Yulin Gong, Wenbo Li, Shiping Liu

Abstract: We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that i… ▽ More We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that if $q_n=n^{o(1)}$ and $q_n$ tends to infinity, then the normalized spectral measure converges in Wasserstein $p$-distance $W_{p}$ to the semicircle distribution for any $p \in [1,\infty)$ almost surely. This strengthens the result of Dumitriu and Pal. △ Less

Submitted 9 June, 2024; originally announced June 2024.

MSC Class: 05C31; 05C50; 05C80; 60B20

arXiv:2406.01719 [pdf, other]

Imputation of Missing Photometric Data and Photometric Redshift Estimation for CSST

Authors: Zhijian Luo, Zhirui Tang, Zhu Chen, Liping Fu, Wei Du, Shaohua Zhang, Yan Gong, Chenggang Shu, Junhao Lu, Yicheng Li, Xian-Min Meng, Xingchen Zhou, Zuhui Fan

Abstract: Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimat… ▽ More Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimation methods unusable. The same situation may exist for the upcoming Chinese Space Station Telescope (CSST). In this study, we employ a deep learning method called Generative Adversarial Imputation Networks (GAIN) to impute the missing photometric data in CSST, aiming to reduce the impact of data missing on photo-$z$ estimation and improve estimation accuracy. Our results demonstrate that using the GAIN technique can effectively fill in the missing photometric data in CSST. Particularly, when the data missing rate is below 30\%, the imputation of photometric data exhibits high accuracy, with higher accuracy in the $g$, $r$, $i$, $z$, and $y$ bands compared to the $NUV$ and $u$ bands. After filling in the missing values, the quality of photo-$z$ estimation obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software is notably enhanced. Evaluation metrics for assessing the quality of photo-$z$ estimation, including the catastrophic outlier fraction ($f_{out}$), the normalized median absolute deviation ($\rm {σ_{NMAD}}$), and the bias of photometric redshift ($bias$), all show some degree of improvement. Our research will help maximize the utilization of observational data and provide a new method for handling sample missing values for applications that require complete photometry data to produce results. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.21045 [pdf]

An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction

Authors: Qinhua Jiang, Xishun Liao, Yaofa Gong, Jiaqi Ma

Abstract: Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict th… ▽ More Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict the traffic speed and incident likelihood during planned work zone events. The proposed model transforms traffic patterns into 2D space-time images for both model input and output and employs an attention-based multi-context convolutional encoder-decoder architecture to capture the spatial-temporal dependencies between work zone events and traffic variations. Trained and validated on four years of archived work zone traffic data from Maryland, USA, the model demonstrates superior performance over baseline models in predicting traffic speed, incident likelihood, and inferred traffic attributes such as queue length and congestion timings (i.e., start time and duration). Specifically, the proposed model outperforms the baseline models by reducing the prediction error of traffic speed by 5% to 34%, queue length by 11% to 29%, congestion timing by 6% to 17%, and increasing the accuracy of incident predictions by 5% to 7%. Consequently, this model offers substantial promise for enhancing the planning and traffic management of work zones. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20234 [pdf, other]

Context Injection Attacks on Large Language Models

Authors: Cheng'an Wei, Kai Chen, Yue Zhao, Yujia Gong, Lu Xiang, Shenchen Zhu

Abstract: Large Language Models (LLMs) such as ChatGPT and Llama-2 have become prevalent in real-world applications, exhibiting impressive text generation performance. LLMs are fundamentally developed from a scenario where the input data remains static and lacks a clear structure. To behave interactively over time, LLM-based chat systems must integrate additional contextual information (i.e., chat history)… ▽ More Large Language Models (LLMs) such as ChatGPT and Llama-2 have become prevalent in real-world applications, exhibiting impressive text generation performance. LLMs are fundamentally developed from a scenario where the input data remains static and lacks a clear structure. To behave interactively over time, LLM-based chat systems must integrate additional contextual information (i.e., chat history) into their inputs, following a pre-defined structure. This paper identifies how such integration can expose LLMs to misleading context from untrusted sources and fail to differentiate between system and user inputs, allowing users to inject context. We present a systematic methodology for conducting context injection attacks aimed at eliciting disallowed responses by introducing fabricated context. This could lead to illegal actions, inappropriate content, or technology misuse. Our context fabrication strategies, acceptance elicitation and word anonymization, effectively create misleading contexts that can be structured with attacker-customized prompt templates, achieving injection through malicious user messages. Comprehensive evaluations on real-world LLMs such as ChatGPT and Llama-2 confirm the efficacy of the proposed attack with success rates reaching 97%. We also discuss potential countermeasures that can be adopted for attack detection and developing more secure models. Our findings provide insights into the challenges associated with the real-world deployment of LLMs for interactive and structured data scenarios. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19943 [pdf, other]

Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting

Authors: Qi Zhang, Yunfei Gong, Daijie Chen, Antoni B. Chan, Hui Huang

Abstract: Recent deep learning-based multi-view people detection (MVD) methods have shown promising results on existing datasets. However, current methods are mainly trained and evaluated on small, single scenes with a limited number of multi-view frames and fixed camera views. As a result, these methods may not be practical for detecting people in larger, more complex scenes with severe occlusions and came… ▽ More Recent deep learning-based multi-view people detection (MVD) methods have shown promising results on existing datasets. However, current methods are mainly trained and evaluated on small, single scenes with a limited number of multi-view frames and fixed camera views. As a result, these methods may not be practical for detecting people in larger, more complex scenes with severe occlusions and camera calibration errors. This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach that better fuses multi-camera information under large scenes. Besides, a large synthetic dataset is adopted to enhance the model's generalization ability and enable more practical evaluation and comparison. The model's performance on new testing scenes is further improved with a simple domain adaptation technique. Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance. See code here: https://vcc.tech/research/2024/MVD. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: AAAI 2024

arXiv:2405.18767 [pdf, other]

Kinetic temperature of massive star-forming molecular clumps measured with formaldehyde V. The massive filament DR21

Authors: X. Zhao, X. D. Tang, C. Henkel, Y. Gong, Y. Lin, D. L. Li, Y. X. He, Y. P. Ao, X. Lu, T. Liu, Y. Sun, K. Wang, X. P. Chen, J. Esimbek, J. J. Zhou, J. W. Wu, J. J. Qiu, X. W. Zheng, J. S. Li, C. S. Luo, Q. Zhao

Abstract: The kinetic temperature structure of the massive filament DR21 has been mapped using the IRAM 30 m telescope. This mapping employed the para-H$_2$CO triplet ($J_{\rm K_aK_c}$ = 3$_{03}$--2$_{02}$, 3$_{22}$--2$_{21}$, and 3$_{21}$--2$_{20}$) on a scale of $\sim$0.1 pc. By modeling the averaged line ratios of para-H$_{2}$CO with RADEX under non-LTE assumptions, the kinetic temperature of the dense g… ▽ More The kinetic temperature structure of the massive filament DR21 has been mapped using the IRAM 30 m telescope. This mapping employed the para-H$_2$CO triplet ($J_{\rm K_aK_c}$ = 3$_{03}$--2$_{02}$, 3$_{22}$--2$_{21}$, and 3$_{21}$--2$_{20}$) on a scale of $\sim$0.1 pc. By modeling the averaged line ratios of para-H$_{2}$CO with RADEX under non-LTE assumptions, the kinetic temperature of the dense gas was derived at a density of $n$(H$_{2}$) = 10$^{5}$ cm$^{-3}$. The para-H$_2$CO lines reveal significantly higher temperatures than NH$_3$ (1,1)/(2,2) and FIR wavelengths. The dense clumps appear to correlate with the notable kinetic temperature. Among the four dense cores (N44, N46, N48, and N54), temperature gradients are observed on a scale of $\sim$0.1-0.3 pc. This suggests that the warm dense gas is influenced by internal star formation activity. With the exception of N54, the temperature profiles of these cores were fitted with power-law indices ranging from $-$0.3 to $-$0.5. This indicates that the warm dense gas is heated by radiation emitted from internally embedded protostar(s) and/or clusters. While there is no direct evidence supporting the idea that the dense gas is heated by shocks resulting from a past explosive event in the DR21 region, our measurements toward the DR21W1 region provide compelling evidence that the dense gas is indeed heated by shocks originating from the western DR21 flow. Higher temperatures appear to be associated with turbulence. The physical parameters of the dense gas in the DR21 filament exhibit a remarkable similarity to the results obtained in OMC-1 and N113. This may imply that the physical mechanisms governing the dynamics and thermodynamics of dense gas traced by H$_{2}$CO in diverse star formation regions may be dominated by common underlying principles despite variations in specific environmental conditions. (abbreviated) △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 16 pages, 8 figures, 3 tabels. Accepted for publication by Astronomy & Astrophysics

arXiv:2405.16093 [pdf, other]

Diverse Teacher-Students for Deep Safe Semi-Supervised Learning under Class Mismatch

Authors: Qikai Wang, Rundong He, Yongshun Gong, Chunxiao Ren, Haoliang Sun, Xiaoshui Huang, Yilong Yin

Abstract: Semi-supervised learning can significantly boost model performance by leveraging unlabeled data, particularly when labeled data is scarce. However, real-world unlabeled data often contain unseen-class samples, which can hinder the classification of seen classes. To address this issue, mainstream safe SSL methods suggest detecting and discarding unseen-class samples from unlabeled data. Nevertheles… ▽ More Semi-supervised learning can significantly boost model performance by leveraging unlabeled data, particularly when labeled data is scarce. However, real-world unlabeled data often contain unseen-class samples, which can hinder the classification of seen classes. To address this issue, mainstream safe SSL methods suggest detecting and discarding unseen-class samples from unlabeled data. Nevertheless, these methods typically employ a single-model strategy to simultaneously tackle both the classification of seen classes and the detection of unseen classes. Our research indicates that such an approach may lead to conflicts during training, resulting in suboptimal model optimization. Inspired by this, we introduce a novel framework named Diverse Teacher-Students (\textbf{DTS}), which uniquely utilizes dual teacher-student models to individually and effectively handle these two tasks. DTS employs a novel uncertainty score to softly separate unseen-class and seen-class data from the unlabeled set, and intelligently creates an additional ($K$+1)-th class supervisory signal for training. By training both teacher-student models with all unlabeled samples, DTS can enhance the classification of seen classes while simultaneously improving the detection of unseen classes. Comprehensive experiments demonstrate that DTS surpasses baseline methods across a variety of datasets and configurations. Our code and models can be publicly accessible on the link https://github.com/Zhanlo/DTS. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.13158 [pdf]

Towards establishing best practice in the analysis of hydrogen and deuterium by atom probe tomography

Authors: Baptiste Gault, Aparna Saksena, Xavier Sauvage, Paul Bagot, Leonardo S. Aota, Jonas Arlt, Lisa T. Belkacemi, Torben Boll, Yi-Sheng Chen, Luke Daly, Milos B. Djukic, James O. Douglas, Maria J. Duarte, Peter J. Felfer, Richard G. Forbes, Jing Fu, Hazel M. Gardner, Ryota Gemma, Stephan S. A. Gerstl, Yilun Gong, Guillaume Hachet, Severin Jakob, Benjamin M. Jenkins, Megan E. Jones, Heena Khanchandani , et al. (20 additional authors not shown)

Abstract: As hydrogen is touted as a key player in the decarbonization of modern society, it is critical to enable quantitative H analysis at high spatial resolution, if possible at the atomic scale. Indeed, H has a known deleterious impact on the mechanical properties (strength, ductility, toughness) of most materials that can hinder their use as part of the infrastructure of a hydrogen-based economy. Enab… ▽ More As hydrogen is touted as a key player in the decarbonization of modern society, it is critical to enable quantitative H analysis at high spatial resolution, if possible at the atomic scale. Indeed, H has a known deleterious impact on the mechanical properties (strength, ductility, toughness) of most materials that can hinder their use as part of the infrastructure of a hydrogen-based economy. Enabling H mapping, including local hydrogen concentration analyses at specific microstructural features, is essential for understanding the multiple ways that H affect the properties of materials, including for instance embrittlement mechanisms and their synergies, but also spatial mapping and quantification of hydrogen isotopes is essential to accurately predict tritium inventory of future fusion power plants, ensuring their safe and efficient operation for example. Atom probe tomography (APT) has the intrinsic capabilities for detecting hydrogen (H), and deuterium (D), and in principle the capacity for performing quantitative mapping of H within a material's microstructure. Yet the accuracy and precision of H analysis by APT remain affected by the influence of residual hydrogen from the ultra-high vacuum chamber that can obscure the signal of H from within the material, along with a complex field evaporation behavior. The present article reports the essence of discussions at a focused workshop held at the Max-Planck Institute for Sustainable Materials in April 2024. The workshop was organized to pave the way to establishing best practices in reporting APT data for the analysis of H. We first summarize the key aspects of the intricacies of H analysis by APT and propose a path for better reporting of the relevant data to support interpretation of APT-based H analysis in materials. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07526 [pdf, other]

doi 10.1145/3589335.3648327

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demand innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the way for future advancements in AI and system research. MS MARCO Web Search dataset is available at: https://github.com/microsoft/MS-MARCO-Web-Search. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

arXiv:2405.07022 [pdf, other]

DTMamba : Dual Twin Mamba for Time Series Forecasting

Authors: Zexue Wu, Yifeng Gong, Aoqian Zhang

Abstract: We utilized the Mamba model for time series data prediction tasks, and the experimental results indicate that our model performs well. We utilized the Mamba model for time series data prediction tasks, and the experimental results indicate that our model performs well. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06389 [pdf, other]

Continual Novel Class Discovery via Feature Enhancement and Adaptation

Authors: Yifan Yu, Shaokun Wang, Yuhang He, Junzhe Chen, Yihong Gong

Abstract: Continual Novel Class Discovery (CNCD) aims to continually discover novel classes without labels while maintaining the recognition capability for previously learned classes. The main challenges faced by CNCD include the feature-discrepancy problem, the inter-session confusion problem, etc. In this paper, we propose a novel Feature Enhancement and Adaptation method for the CNCD to tackle the above… ▽ More Continual Novel Class Discovery (CNCD) aims to continually discover novel classes without labels while maintaining the recognition capability for previously learned classes. The main challenges faced by CNCD include the feature-discrepancy problem, the inter-session confusion problem, etc. In this paper, we propose a novel Feature Enhancement and Adaptation method for the CNCD to tackle the above challenges, which consists of a guide-to-novel framework, a centroid-to-samples similarity constraint (CSS), and a boundary-aware prototype constraint (BAP). More specifically, the guide-to-novel framework is established to continually discover novel classes under the guidance of prior distribution. Afterward, the CSS is designed to constrain the relationship between centroid-to-samples similarities of different classes, thereby enhancing the distinctiveness of features among novel classes. Finally, the BAP is proposed to keep novel class features aware of the positions of other class prototypes during incremental sessions, and better adapt novel class features to the shared feature space. Experimental results on three benchmark datasets demonstrate the superiority of our method, especially in more challenging protocols with more incremental sessions. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05446 [pdf, other]

GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields

Authors: Yuanhao Gong

Abstract: The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much le… ▽ More The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much less Gaussian splats, leading to the more efficient storage and thus higher computational performance during both training and rendering. Thanks to the sparsity, during the view synthesis, only a small mount of pixels are needed, leading to much higher computational performance ($100\sim 1000\times$ faster). And the 2D image can be recovered from the gradients via solving a Poisson equation with linear computation complexity. Several experiments are performed to confirm the sparseness of the gradients and the computation performance of the proposed method. The method can be applied various applications, such as human body modeling and indoor environment modeling. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2404.09105

arXiv:2405.04719 [pdf, other]

First detection of CF$^{+}$ in the Large Magellanic Cloud

Authors: Yan Gong, Karl M. Menten, Arshia M. Jacob, Christian Henkel, C. -H. Rosie Chen

Abstract: CF$^{+}$ has been established as a valuable diagnostic tool for investigating photo-dissociation regions (PDRs) and fluorine abundances in the Milky Way. However, its role in extragalactic environments remains largely uncharted. Our objective is to explore the significance of CF$^{+}$ in the Large Magellanic Cloud (LMC) and assess its utility as a valuable probe for examining C$^{+}$ and fluorine… ▽ More CF$^{+}$ has been established as a valuable diagnostic tool for investigating photo-dissociation regions (PDRs) and fluorine abundances in the Milky Way. However, its role in extragalactic environments remains largely uncharted. Our objective is to explore the significance of CF$^{+}$ in the Large Magellanic Cloud (LMC) and assess its utility as a valuable probe for examining C$^{+}$ and fluorine abundances in external galaxies. We performed pointed CF$^{+}$ observations toward an active star-forming region, N113 in the LMC, using the Atacama Pathfinder EXperiment 12~m sub-millimeter telescope. We report the first discovery of CF$^{+}$ in the LMC through the successful detection of the CF$^{+}$ (2$\to$1) and (3$\to$2) lines. The excitation models indicate that CF$^{+}$ emission originates from dense PDRs characterized by an H$_{2}$ number density of $(0.5-7.9)\times 10^{4}$~cm$^{-3}$ in N113. Our observations provide the first constraint on the fluorine abundance in molecular clouds in the LMC, disclosing a value of $\lesssim 1.7\times 10^{-9}$. This value is about an order of magnitude lower than those previously measured toward red giants in the LMC, indicative of fluorine deficiency in the molecular gas. The estimated column density ratio between C$^{+}$ and CF$^{+}$ appears to be lower than the anticipated equilibrium ratio derived from the fluorine abundance in red giants. Both phenomena can be explained by the deficiency of CF$^{+}$ caused by the freeze-out of its primary chemical precursor, HF, onto dust grains. The deficiency of CF$^{+}$ within molecular clouds suggests that the measurements presented in this work serve exclusively as conservative estimates, establishing lower bounds for both the fluorine abundance and C$^{+}$ column densities in external galaxies. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages, 6 figures, 1 table, accepted for publication in A&A

arXiv:2405.01803 [pdf, other]

doi 10.1145/3660784

How to Gain Commit Rights in Modern Top Open Source Communities?

Authors: Xin Tan, Yan Gong, Geyu Huang, Haohua Wu, Li Zhang

Abstract: The success of open source software (OSS) projects relies on voluntary contributions from various community roles.Being a committer signifies gaining trust and higher privileges. Substantial studies have focused on the requirements of becoming a committer, but most of them are based on interviews or several hypotheses, lacking a comprehensive understanding of committers' qualifications.We explore… ▽ More The success of open source software (OSS) projects relies on voluntary contributions from various community roles.Being a committer signifies gaining trust and higher privileges. Substantial studies have focused on the requirements of becoming a committer, but most of them are based on interviews or several hypotheses, lacking a comprehensive understanding of committers' qualifications.We explore both the policies and practical implementations of committer qualifications in modern top OSS communities. Through a thematic analysis of these policies, we construct a taxonomy of committer qualifications, consisting of 26 codes categorized into nine themes, including Personnel-related to Project, Communication, and Long-term Participation. We also highlight the variations in committer qualifications emphasized in different OSS community governance models. For example, projects following the core maintainer model value project comprehension, while projects following the company-backed model place significant emphasis on user issue resolution. Then, we propose eight sets of metrics and perform survival analysis on two representative OSS projects to understand how these qualifications are implemented in practice. We find that the probability of gaining commit rights decreases as participation time passes.The selection criteria in practice are generally consistent with the community policies. Developers who submit high-quality code, actively engage in code review, and make extensive contributions to related projects are more likely to be granted commit rights. However, there are some qualifications that do not align precisely, and some are not adequately evaluated. This study contributes to the understanding of trust establishment in modern top OSS communities, assists communities in better allocating commit rights, and supports developers in achieving self-actualization through OSS participation. △ Less

Submitted 16 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 23 pages,5 figures,FSE 2024

Journal ref: Proceedings of the ACM on Software Engineering (PACMSE) Issue FSE 2024

arXiv:2405.00026 [pdf]

Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach

Authors: Mengran Zhu, Ye Zhang, Yulu Gong, Changxin Xu, Yafei Xiang

Abstract: Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transact… ▽ More Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities. △ Less

Submitted 26 February, 2024; originally announced May 2024.

arXiv:2404.19087 [pdf, other]

Deep Reinforcement Learning for Advanced Longitudinal Control and Collision Avoidance in High-Risk Driving Scenarios

Authors: Dianwei Chen, Yaobang Gong, Xianfeng Yang

Abstract: Existing Advanced Driver Assistance Systems primarily focus on the vehicle directly ahead, often overlooking potential risks from following vehicles. This oversight can lead to ineffective handling of high risk situations, such as high speed, closely spaced, multi vehicle scenarios where emergency braking by one vehicle might trigger a pile up collision. To overcome these limitations, this study i… ▽ More Existing Advanced Driver Assistance Systems primarily focus on the vehicle directly ahead, often overlooking potential risks from following vehicles. This oversight can lead to ineffective handling of high risk situations, such as high speed, closely spaced, multi vehicle scenarios where emergency braking by one vehicle might trigger a pile up collision. To overcome these limitations, this study introduces a novel deep reinforcement learning based algorithm for longitudinal control and collision avoidance. This proposed algorithm effectively considers the behavior of both leading and following vehicles. Its implementation in simulated high risk scenarios, which involve emergency braking in dense traffic where traditional systems typically fail, has demonstrated the algorithm ability to prevent potential pile up collisions, including those involving heavy duty vehicles. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18548 [pdf, ps, other]

On the duality in constant-roll inflation

Authors: Yue Wang, Qing Gao, Shengqing Gao, Yungui Gong

Abstract: There is a duality in the observables $n_s$, $r$ and the inflaton potential between large and small $η_H$ for the constant-roll inflation if the slow-roll parameter $ε_H$ is negligible. In general, the duality between $η_H$ and $\barη_H$ does not hold for the background evolution of the inflation. For some particular solutions for the constant-roll inflation with $η_H$ being a constant, we find th… ▽ More There is a duality in the observables $n_s$, $r$ and the inflaton potential between large and small $η_H$ for the constant-roll inflation if the slow-roll parameter $ε_H$ is negligible. In general, the duality between $η_H$ and $\barη_H$ does not hold for the background evolution of the inflation. For some particular solutions for the constant-roll inflation with $η_H$ being a constant, we find that in the small field approximation, the potential takes the quadratic form and it remains the same when the parameter $η_H$ changes to $\barη_H=3-η_H$. If the scalar field is small and the contribution of $ε_H$ is negligible, we find that there exists the logarithmic duality and the duality between large and small $η_H$ for the primordial curvature perturbation in inflationary models with the quadratic potential. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2404.18419 [pdf]

Research on Intelligent Aided Diagnosis System of Medical Image Based on Computer Deep Learning

Authors: Jiajie Yuan, Linxiao Wu, Yulu Gong, Zhou Yu, Ziang Liu, Shuyao He

Abstract: This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteris… ▽ More This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteristic under curve product (AUROC) is 0.9985, the recall rate is 0.9814, and the accuracy is 0.9833. This method can be applied to clinical diagnosis, and it is a practical method. Any outpatient doctor can register quickly through the system, or log in to the platform to upload the image to obtain more accurate images. Through the system, each outpatient physician can quickly register or log in to the platform for image uploading, thus obtaining more accurate images. The segmentation of images can guide doctors in clinical departments. Then the image is analyzed to determine the location and nature of the tumor, so as to make targeted treatment. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.14678 [pdf, other]

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Authors: Junjie Zhang, Tianci Hu, Xiaoshui Huang, Yongshun Gong, Dan Zeng

Abstract: Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges. The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field. Current evaluations heavily rely on classification and caption tasks, falling short in provid… ▽ More Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges. The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field. Current evaluations heavily rely on classification and caption tasks, falling short in providing a thorough assessment of MLLMs. A pressing need exists for a more sophisticated evaluation method capable of thoroughly analyzing the spatial understanding and expressive capabilities of these models. To address these issues, we introduce a scalable 3D benchmark, accompanied by a large-scale instruction-tuning dataset known as 3DBench, providing an extensible platform for a comprehensive evaluation of MLLMs. Specifically, we establish the benchmark that spans a wide range of spatial and semantic scales, from object-level to scene-level, addressing both perception and planning tasks. Furthermore, we present a rigorous pipeline for automatically constructing scalable 3D instruction-tuning datasets, covering 10 diverse multi-modal tasks with more than 0.23 million QA pairs generated in total. Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13576 [pdf, other]

I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning

Authors: Songlin Dong, Yingjie Chen, Yuhang He, Yuhan Jin, Alex C. Kot, Yihong Gong

Abstract: Online task-free continual learning (OTFCL) is a more challenging variant of continual learning which emphasizes the gradual shift of task boundaries and learns in an online mode. Existing methods rely on a memory buffer composed of old samples to prevent forgetting. However,the use of memory buffers not only raises privacy concerns but also hinders the efficient learning of new samples. To addres… ▽ More Online task-free continual learning (OTFCL) is a more challenging variant of continual learning which emphasizes the gradual shift of task boundaries and learns in an online mode. Existing methods rely on a memory buffer composed of old samples to prevent forgetting. However,the use of memory buffers not only raises privacy concerns but also hinders the efficient learning of new samples. To address this problem, we propose a novel framework called I2CANSAY that gets rid of the dependence on memory buffers and efficiently learns the knowledge of new data from one-shot samples. Concretely, our framework comprises two main modules. Firstly, the Inter-Class Analogical Augmentation (ICAN) module generates diverse pseudo-features for old classes based on the inter-class analogy of feature distributions for different new classes, serving as a substitute for the memory buffer. Secondly, the Intra-Class Significance Analysis (ISAY) module analyzes the significance of attributes for each class via its distribution standard deviation, and generates the importance vector as a correction bias for the linear classifier, thereby enhancing the capability of learning from new samples. We run our experiments on four popular image classification datasets: CoRe50, CIFAR-10, CIFAR-100, and CUB-200, our approach outperforms the prior state-of-the-art by a large margin. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.09318 [pdf, other]

Unraveling stochastic fundamental diagrams considering empirical knowledge: modeling, limitation and further discussion

Authors: Yuan-Zheng Lei, Yaobang Gong, Xianfeng Terry Yang

Abstract: Traffic flow modeling relies heavily on fundamental diagrams. However, deterministic fundamental diagrams, such as single or multi-regime models, cannot capture the uncertainty pattern that underlies traffic flow. To address this limitation, a sparse non-parametric regression model is proposed in this paper to formulate the stochastic fundamental diagram. Unlike parametric stochastic fundamental d… ▽ More Traffic flow modeling relies heavily on fundamental diagrams. However, deterministic fundamental diagrams, such as single or multi-regime models, cannot capture the uncertainty pattern that underlies traffic flow. To address this limitation, a sparse non-parametric regression model is proposed in this paper to formulate the stochastic fundamental diagram. Unlike parametric stochastic fundamental diagram models, a non-parametric model is insensitive to parameters, flexible, and applicable. The computation complexity and the huge memory required for training in the Gaussian process regression have been reduced by introducing the sparse Gaussian process regression. The paper also discusses how empirical knowledge influences the modeling process. The paper analyzes the influence of modeling empirical knowledge in the prior of the stochastic fundamental diagram model and whether empirical knowledge can improve the robustness and accuracy of the proposed model. By introducing several well-known single-regime fundamental diagram models as the prior and testing the model's robustness and accuracy with different sampling methods given real-world data, the authors find that empirical knowledge can only benefit the model under small inducing samples given a relatively clean and large dataset. A pure data-driven approach is sufficient to estimate and describe the pattern of the density-speed relationship. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09155 [pdf, other]

Mitigating Heterogeneity among Factor Tensors via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph Embedding

Authors: Jiang Li, Xiangdong Su, Yeyun Gong, Guanglai Gao

Abstract: Recent studies have highlighted the effectiveness of tensor decomposition methods in the Temporal Knowledge Graphs Embedding (TKGE) task. However, we found that inherent heterogeneity among factor tensors in tensor decomposition significantly hinders the tensor fusion process and further limits the performance of link prediction. To overcome this limitation, we introduce a novel method that maps f… ▽ More Recent studies have highlighted the effectiveness of tensor decomposition methods in the Temporal Knowledge Graphs Embedding (TKGE) task. However, we found that inherent heterogeneity among factor tensors in tensor decomposition significantly hinders the tensor fusion process and further limits the performance of link prediction. To overcome this limitation, we introduce a novel method that maps factor tensors onto a unified smooth Lie group manifold to make the distribution of factor tensors approximating homogeneous in tensor decomposition. We provide the theoretical proof of our motivation that homogeneous tensors are more effective than heterogeneous tensors in tensor fusion and approximating the target for tensor decomposition based TKGE methods. The proposed method can be directly integrated into existing tensor decomposition based TKGE methods without introducing extra parameters. Extensive experiments demonstrate the effectiveness of our method in mitigating the heterogeneity and in enhancing the tensor decomposition based TKGE models. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09105 [pdf, other]

EGGS: Edge Guided Gaussian Splatting for Radiance Fields

Authors: Yuanhao Gong

Abstract: The Gaussian splatting methods are getting popular. However, their loss function only contains the $\ell_1$ norm and the structural similarity between the rendered and input images, without considering the edges in these images. It is well-known that the edges in an image provide important information. Therefore, in this paper, we propose an Edge Guided Gaussian Splatting (EGGS) method that levera… ▽ More The Gaussian splatting methods are getting popular. However, their loss function only contains the $\ell_1$ norm and the structural similarity between the rendered and input images, without considering the edges in these images. It is well-known that the edges in an image provide important information. Therefore, in this paper, we propose an Edge Guided Gaussian Splatting (EGGS) method that leverages the edges in the input images. More specifically, we give the edge region a higher weight than the flat region. With such edge guidance, the resulting Gaussian particles focus more on the edges instead of the flat regions. Moreover, such edge guidance does not crease the computation cost during the training and rendering stage. The experiments confirm that such simple edge-weighted loss function indeed improves about $1\sim2$ dB on several difference data sets. With simply plugging in the edge guidance, the proposed method can improve all Gaussian splatting methods in different scenarios, such as human head modeling, building 3D reconstruction, etc. △ Less

Submitted 22 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08242 [pdf, other]

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

Authors: Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, Yue-Jiao Gong

Abstract: Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta… ▽ More Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted as full paper at GECCO 2024

arXiv:2404.08239 [pdf, other]

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

Authors: Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yining Ma, Yue-Jiao Gong

Abstract: Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously conf… ▽ More Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously configures and adapts the EET throughout the EC search process. The framework allows different individuals of the population to selectively attend to the global and local exemplars based on the current search state, maximizing the cooperative search outcome. Our proposed framework is characterized by its simplicity, effectiveness, and generalizability, with the potential to enhance numerous existing EC algorithms. To validate its capabilities, we apply our framework to several representative EC algorithms and conduct extensive experiments on the augmented CEC2021 benchmark. The results demonstrate significant improvements in the performance of the backbone algorithms, as well as favorable generalization across diverse problem classes, dimensions, and population sizes. Additionally, we provide an in-depth analysis of the EET issue by interpreting the learned behaviors of EC. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted as a full paper at GECCO 2024

arXiv:2404.07965 [pdf, other]

Rho-1: Not All Tokens Are What You Need

Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,… ▽ More Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training. △ Less

Submitted 23 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: First two authors equal contribution

arXiv:2404.07121 [pdf, other]

Digital Over-the-Air Computation: Achieving High Reliability via Bit-Slicing

Authors: Jiawei Liu, Yi Gong, Kaibin Huang

Abstract: 6G mobile networks aim to realize ubiquitous intelligence at the network edge via distributed learning, sensing, and data analytics. Their common operation is to aggregate high-dimensional data, which causes a communication bottleneck that cannot be resolved using traditional orthogonal multi-access schemes. A promising solution, called over-the-air computation (AirComp), exploits channels' wavefo… ▽ More 6G mobile networks aim to realize ubiquitous intelligence at the network edge via distributed learning, sensing, and data analytics. Their common operation is to aggregate high-dimensional data, which causes a communication bottleneck that cannot be resolved using traditional orthogonal multi-access schemes. A promising solution, called over-the-air computation (AirComp), exploits channels' waveform superposition property to enable simultaneous access, thereby overcoming the bottleneck. Nevertheless, its reliance on uncoded linear analog modulation exposes data to perturbation by noise and interference. Hence, the traditional analog AirComp falls short of meeting the high-reliability requirement for 6G. Overcoming the limitation of analog AirComp motivates this work, which focuses on developing a framework for digital AirComp. The proposed framework features digital modulation of each data value, integrated with the bit-slicing technique to allocate its bits to multiple symbols, thereby increasing the AirComp reliability. To optimally detect the aggregated digital symbols, we derive the optimal maximum a posteriori detector that is shown to outperform the traditional maximum likelihood detector. Furthermore, a comparative performance analysis of digital AirComp with respect to its analog counterpart with repetition coding is conducted to quantify the practical signal-to-noise ratio (SNR) regime favoring the proposed scheme. On the other hand, digital AirComp is enhanced by further development to feature awareness of heterogeneous bit importance levels and its exploitation in channel adaptation. Lastly, simulation results demonstrate the achivability of substantial reliability improvement of digital AirComp over its analog counterpart given the same channel uses. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05236 [pdf, other]

Stylizing Sparse-View 3D Scenes with Hierarchical Neural Representation

Authors: Y. Wang, A. Gao, Y. Gong, Y. Zeng

Abstract: Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-fr… ▽ More Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05188 [pdf, other]

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

Authors: Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang

Abstract: Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the origin… ▽ More Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the original upstream models. In this paper, we conduct the first study on the robustness of IP protection methods in model merging scenarios. We investigate two state-of-the-art IP protection techniques: Quantization Watermarking and Instructional Fingerprint, along with various advanced model merging technologies, such as Task Arithmetic, TIES-MERGING, and so on. Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models, whereas model fingerprinting techniques can. Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques, thereby promoting the healthy development of the open-source LLM community. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Technical Report

arXiv:2404.04978 [pdf, other]

doi 10.1088/1572-9494/ad51ef

Constant-roll inflation with non-minimally derivative coupling

Authors: Jie Liu, Yungui Gong, Zhu Yi

Abstract: We investigate the constant-roll inflation with non-minimally kinetic coupling to the Einstein tensor. With the slow-roll parameter $η_φ= -\ddotφ/(H\dotφ)$ being a constant, we calculate the power spectra for scalar and tensor perturbations, and derive the expressions for the scalar spectral tilt $n_s$, the tensor spectral tilt $n_T$, and the tensor-to-scalar ratio $r$. We find that the expression… ▽ More We investigate the constant-roll inflation with non-minimally kinetic coupling to the Einstein tensor. With the slow-roll parameter $η_φ= -\ddotφ/(H\dotφ)$ being a constant, we calculate the power spectra for scalar and tensor perturbations, and derive the expressions for the scalar spectral tilt $n_s$, the tensor spectral tilt $n_T$, and the tensor-to-scalar ratio $r$. We find that the expressions for $n_s$ are different with different ordering of taking the derivative of the scalar power spectrum with respect to the scale $k$ and the horizon crossing condition $c_sk=aH$ in the constant-roll inflation, the consistency relation $r=-8n_T$ does not hold if $|η_φ|$ is not small, and the duality of the tensor-to-scalar ratio between the slow-roll inflation and ultra-slow-roll inflation does not exist in inflationary models with non-minimally derivative coupling. The result offers a fresh perspective on the understanding of the inflationary models with non-minimally derivative coupling and is helpful for the production of scalar induced gravitational waves in the framework of ultra-slow-roll inflation with non-minimally derivative coupling. △ Less

Submitted 3 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: 8 pages, accepted by Communications in Theoretical Physics

arXiv:2404.04118 [pdf, other]

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

Authors: Yidong Gong, Pradeep Kumar

Abstract: We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows au… ▽ More We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01067 [pdf, other]

Exploring the Mystery of Influential Data for Mathematical Reasoning

Authors: Xinzhe Ni, Yeyun Gong, Zhibin Gou, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen

Abstract: Selecting influential data for fine-tuning on downstream tasks is a key factor for both performance and computation efficiency. Recent works have shown that training with only limited data can show a superior performance on general tasks. However, the feasibility on mathematical reasoning tasks has not been validated. To go further, there exist two open questions for mathematical reasoning: how to… ▽ More Selecting influential data for fine-tuning on downstream tasks is a key factor for both performance and computation efficiency. Recent works have shown that training with only limited data can show a superior performance on general tasks. However, the feasibility on mathematical reasoning tasks has not been validated. To go further, there exist two open questions for mathematical reasoning: how to select influential data and what is an influential data composition. For the former one, we propose a Quality-aware Diverse Selection (QaDS) strategy adaptable for mathematical reasoning. A comparison with other selection strategies validates the superiority of QaDS. For the latter one, we first enlarge our setting and explore the influential data composition. We conduct a series of experiments and highlight: scaling up reasoning data, and training with general data selected by QaDS is helpful. Then, we define our optimal mixture as OpenMathMix, an influential data mixture with open-source data selected by QaDS. With OpenMathMix, we achieve a state-of-the-art 48.8% accuracy on MATH with 7B base model. Additionally, we showcase the use of QaDS in creating efficient fine-tuning mixtures with various selection ratios, and analyze the quality of a wide range of open-source datasets, which can perform as a reference for future works on mathematical reasoning tasks. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00323 [pdf, other]

CLIP-driven Outliers Synthesis for few-shot OOD detection

Authors: Hao Sun, Rundong He, Zhongyi Han, Zhicong Lin, Yongshun Gong, Yilong Yin

Abstract: Few-shot OOD detection focuses on recognizing out-of-distribution (OOD) images that belong to classes unseen during training, with the use of only a small number of labeled in-distribution (ID) images. Up to now, a mainstream strategy is based on large-scale vision-language models, such as CLIP. However, these methods overlook a crucial issue: the lack of reliable OOD supervision information, whic… ▽ More Few-shot OOD detection focuses on recognizing out-of-distribution (OOD) images that belong to classes unseen during training, with the use of only a small number of labeled in-distribution (ID) images. Up to now, a mainstream strategy is based on large-scale vision-language models, such as CLIP. However, these methods overlook a crucial issue: the lack of reliable OOD supervision information, which can lead to biased boundaries between in-distribution (ID) and OOD. To tackle this problem, we propose CLIP-driven Outliers Synthesis~(CLIP-OS). Firstly, CLIP-OS enhances patch-level features' perception by newly proposed patch uniform convolution, and adaptively obtains the proportion of ID-relevant information by employing CLIP-surgery-discrepancy, thus achieving separation between ID-relevant and ID-irrelevant. Next, CLIP-OS synthesizes reliable OOD data by mixing up ID-relevant features from different classes to provide OOD supervision information. Afterward, CLIP-OS leverages synthetic OOD samples by unknown-aware prompt learning to enhance the separability of ID and OOD. Extensive experiments across multiple benchmarks demonstrate that CLIP-OS achieves superior few-shot OOD detection capability. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 9 pages,5 figures

arXiv:2403.18201 [pdf, other]

Few-shot Online Anomaly Detection and Segmentation

Authors: Shenxing Wei, Xing Wei, Zhiheng Ma, Songlin Dong, Shaochen Zhang, Yihong Gong

Abstract: Detecting anomaly patterns from images is a crucial artificial intelligence technique in industrial applications. Recent research in this domain has emphasized the necessity of a large volume of training data, overlooking the practical scenario where, post-deployment of the model, unlabeled data containing both normal and abnormal samples can be utilized to enhance the model's performance. Consequ… ▽ More Detecting anomaly patterns from images is a crucial artificial intelligence technique in industrial applications. Recent research in this domain has emphasized the necessity of a large volume of training data, overlooking the practical scenario where, post-deployment of the model, unlabeled data containing both normal and abnormal samples can be utilized to enhance the model's performance. Consequently, this paper focuses on addressing the challenging yet practical few-shot online anomaly detection and segmentation (FOADS) task. Under the FOADS framework, models are trained on a few-shot normal dataset, followed by inspection and improvement of their capabilities by leveraging unlabeled streaming data containing both normal and abnormal samples simultaneously. To tackle this issue, we propose modeling the feature distribution of normal images using a Neural Gas network, which offers the flexibility to adapt the topology structure to identify outliers in the data flow. In order to achieve improved performance with limited training samples, we employ multi-scale feature embedding extracted from a CNN pre-trained on ImageNet to obtain a robust representation. Furthermore, we introduce an algorithm that can incrementally update parameters without the need to store previous samples. Comprehensive experimental results demonstrate that our method can achieve substantial performance under the FOADS setting, while ensuring that the time complexity remains within an acceptable range on MVTec AD and BTAD datasets. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.18001 [pdf, other]

doi 10.1051/0004-6361/202348931

Discovery of widespread non-metastable ammonia masers in the Milky Way

Authors: Y. T. Yan, C. Henkel, K. M. Menten, T. L. Wilson, A. Wootten, Y. Gong, F. Wyrowski, W. Yang, A. Brunthaler, A. Kraus, B. Winkel

Abstract: We present the results of a search for ammonia maser emission in 119 Galactic high-mass star-forming regions (HMSFRs) known to host 22 GHz H$_2$O maser emission. Our survey has led to the discovery of non-metastable NH$_3$ inversion line masers toward 14 of these sources. This doubles the number of known non-metastable ammonia masers in our Galaxy, including nine new very high excitation ($J,K$)~=… ▽ More We present the results of a search for ammonia maser emission in 119 Galactic high-mass star-forming regions (HMSFRs) known to host 22 GHz H$_2$O maser emission. Our survey has led to the discovery of non-metastable NH$_3$ inversion line masers toward 14 of these sources. This doubles the number of known non-metastable ammonia masers in our Galaxy, including nine new very high excitation ($J,K$)~=~(9,6) maser sources. These maser lines, including NH$_3$ (5,4), (6,4), (6,5), (7,6), (8,6), (9,6), (9,8), (10,8), and (11,9), arise from energy levels of 342 K, 513 K, 465 K, 606 K, 834 K, 1090 K, 942 K, 1226 K, and 1449 K above the ground state. Additionally, we tentatively report a new metastable NH$_3$ (3,3) maser in G048.49 and an NH$_3$ (7,7) maser in G029.95. Our observations reveal that all of the newly detected NH$_3$ maser lines exhibit either blueshifted or redshifted velocities with respect to the source systemic velocities. Among the non-metastable ammonia maser lines, larger velocity distributions, offset from the source systemic velocities, are found in the ortho-NH$_3$ ($K=3n$) than in the para-NH$_3$ ($K\neq3n$) transitions. △ Less

Submitted 12 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 14 pages, 4 tables, 9 figures. Accepted for publication in A&A

Journal ref: A&A 686, A205 (2024)

arXiv:2403.17549 [pdf]

Practical Applications of Advanced Cloud Services and Generative AI Systems in Medical Image Analysis

Authors: Jingyu Xu, Binbin Wu, Jiaxin Huang, Yulu Gong, Yifan Zhang, Bo Liu

Abstract: The medical field is one of the important fields in the application of artificial intelligence technology. With the explosive growth and diversification of medical data, as well as the continuous improvement of medical needs and challenges, artificial intelligence technology is playing an increasingly important role in the medical field. Artificial intelligence technologies represented by computer… ▽ More The medical field is one of the important fields in the application of artificial intelligence technology. With the explosive growth and diversification of medical data, as well as the continuous improvement of medical needs and challenges, artificial intelligence technology is playing an increasingly important role in the medical field. Artificial intelligence technologies represented by computer vision, natural language processing, and machine learning have been widely penetrated into diverse scenarios such as medical imaging, health management, medical information, and drug research and development, and have become an important driving force for improving the level and quality of medical services.The article explores the transformative potential of generative AI in medical imaging, emphasizing its ability to generate syntheticACM-2 data, enhance images, aid in anomaly detection, and facilitate image-to-image translation. Despite challenges like model complexity, the applications of generative models in healthcare, including Med-PaLM 2 technology, show promising results. By addressing limitations in dataset size and diversity, these models contribute to more accurate diagnoses and improved patient outcomes. However, ethical considerations and collaboration among stakeholders are essential for responsible implementation. Through experiments leveraging GANs to augment brain tumor MRI datasets, the study demonstrates how generative AI can enhance image quality and diversity, ultimately advancing medical diagnostics and patient care. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.16443 [pdf, other]

CodeS: Natural Language to Code Repository via Multi-Layer Sketch

Authors: Daoguang Zan, Ailun Yu, Wei Liu, Dong Chen, Bo Shen, Wei Li, Yafen Yao, Yongshun Gong, Xiaolin Chen, Bei Guan, Zhiguang Yang, Yongji Wang, Qianxiang Wang, Lizhen Cui

Abstract: The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple y… ▽ More The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple yet effective framework CodeS, which decomposes NL2Repo into multiple sub-tasks by a multi-layer sketch. Specifically, CodeS includes three modules: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher first generates a repository's directory structure for given requirements; FileSketcher then generates a file sketch for each file in the generated structure; SketchFiller finally fills in the details for each function in the generated file sketch. To rigorously assess CodeS on the NL2Repo task, we carry out evaluations through both automated benchmarking and manual feedback analysis. For benchmark-based evaluation, we craft a repository-oriented benchmark, SketchEval, and design an evaluation metric, SketchBLEU. For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies. Extensive experiments prove the effectiveness and practicality of CodeS on the NL2Repo task. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: https://github.com/NL2Code/CodeS

arXiv:2403.16409 [pdf]

doi 10.1098/rsta.2023.0094

Large-scale Array for Radio Astronomy on the Farside

Authors: Xuelei Chen, Feng Gao, Fengquan Wu, Yechi Zhang, Tong Wang, Weilin Liu, Dali Zou, Furen Deng, Yang Gong, Kai He, Jixia Li, Shijie Sun, Nanben Suo, Yougang Wang, Pengju Wu, Jiaqin Xu, Yidong Xu, Bin Yue, Cong Zhang, Jia Zhou, Minquan Zhou, Chenguang Zhu, Jiacong Zhu

Abstract: At the Royal Society meeting in 2023, we have mainly presented our lunar orbit array concept called DSL, and also briefly introduced a concept of a lunar surface array, LARAF. As the DSL concept had been presented before, in this article we introduce the LARAF. We propose to build an array in the far side of the Moon, with a master station which handles the data collection and processing, and 20 s… ▽ More At the Royal Society meeting in 2023, we have mainly presented our lunar orbit array concept called DSL, and also briefly introduced a concept of a lunar surface array, LARAF. As the DSL concept had been presented before, in this article we introduce the LARAF. We propose to build an array in the far side of the Moon, with a master station which handles the data collection and processing, and 20 stations with maximum baseline of 10 km. Each station consists 12 membrane antenna units, and the stations are connected to the master station by power line and optical fiber. The array will make interferometric observation in the 0.1-50 MHz band during the lunar night, powered by regenerated fuel cells (RFCs). The whole array can be carried to the lunar surface with a heavy rocket mission, and deployed with a rover in 8 months. Such an array would be an important step in the long term development of lunar based ultralong wavelength radio astronomy. It has a sufficiently high sensitivity to observe many radio sources in the sky, though still short of the dark age fluctuations. We discuss the possible options in the power supply, data communication, deployment, etc. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: final submission version, 30 pages, 16 figures

Journal ref: Phil. Trans. R. Soc. A.382,20230094(2024)

arXiv:2403.16212 [pdf, other]

Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis

Authors: Shaojie Li, Haichen Qu, Xinqi Dong, Bo Dang, Hengyi Zang, Yulu Gong

Abstract: Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to a… ▽ More Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to analyze and classify vast amounts of MRI data with unprecedented accuracy. The progress of this technology not only enhances our understanding of brain structural changes but also opens up new avenues for monitoring disease progression through non-invasive means and potentially allows for precise diagnosis in the early stages of the disease. This study aims to classify MRI images using deep learning models to identify different stages of Alzheimer Disease through a series of innovative data processing and model construction steps. Our experimental results show that the deep learning framework based on the Xception model achieved a 99.6% accuracy rate in the multi-class MRI image classification task, demonstrating its potential application value in assistive diagnosis. Future research will focus on expanding the dataset, improving model interpretability, and clinical validation to further promote the application of deep learning technology in the medical field, with the hope of bringing earlier diagnosis and more personalized treatment plans to Alzheimer Disease patients. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14775 [pdf, ps, other]

RIS-Aided Cooperative Mobile Edge Computing: Computation Efficiency Maximization via Joint Uplink and Downlink Resource Allocation

Authors: Zhenrong Liu, Zongze Li, Yi Gong, Yik-Chung Wu

Abstract: In mobile edge computing (MEC) systems, the wireless channel condition is a critical factor affecting both the communication power consumption and computation rate of the offloading tasks. This paper exploits the idea of cooperative transmission and employing reconfigurable intelligent surface (RIS) in MEC to improve the channel condition and maximize computation efficiency (CE). The resulting pro… ▽ More In mobile edge computing (MEC) systems, the wireless channel condition is a critical factor affecting both the communication power consumption and computation rate of the offloading tasks. This paper exploits the idea of cooperative transmission and employing reconfigurable intelligent surface (RIS) in MEC to improve the channel condition and maximize computation efficiency (CE). The resulting problem couples various wireless resources in both uplink and downlink, which calls for the joint design of the user association, receive/downlink beamforming vectors, transmit power of users, task partition strategies for local computing and offloading, and uplink/downlink phase shifts at the RIS. To tackle the challenges brought by the combinatorial optimization problem, the group sparsity structure of the beamforming vectors determined by user association is exploited. Furthermore, while the CE does not explicitly depend on the downlink phase shifts, instead of simply finding a feasible solution, we exploit the hidden relationship between them and convert this relationship into an explicit form for optimization. Then the resulting problem is solved via the alternating maximization framework, and the nonconvexity of each subproblem is handled individually. Simulation results show that cooperative transmission and RIS deployment can significantly improve the CE and demonstrate the importance of optimizing the downlink phase shifts with an explicit form. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for publication in IEEE Transactions on Wireless Communications

arXiv:2403.14483 [pdf, other]

Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research

Authors: Shaojie Li, Xinqi Dong, Danqing Ma, Bo Dang, Hengyi Zang, Yulu Gong

Abstract: Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, co… ▽ More Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, communication operators are also builders and maintainers of communication networks. Internet data improves the user's credit evaluation strategy. This paper uses the massive data provided by communication operators to carry out research on the operator's user credit evaluation model based on the fusion LightGBM algorithm. First, for the massive data related to user evaluation provided by operators, key features are extracted by data preprocessing and feature engineering methods, and a multi-dimensional feature set with statistical significance is constructed; then, linear regression, decision tree, LightGBM, and other machine learning algorithms build multiple basic models to find the best basic model; finally, integrates Averaging, Voting, Blending, Stacking and other integrated algorithms to refine multiple fusion models, and finally establish the most suitable fusion model for operator user evaluation. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14244 [pdf, other]

Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering

Authors: Yuanhao Gong, Lantao Yu, Guanghui Yue

Abstract: The 3D Gaussian splatting method has drawn a lot of attention, thanks to its high performance in training and high quality of the rendered image. However, it uses anisotropic Gaussian kernels to represent the scene. Although such anisotropic kernels have advantages in representing the geometry, they lead to difficulties in terms of computation, such as splitting or merging two kernels. In this pap… ▽ More The 3D Gaussian splatting method has drawn a lot of attention, thanks to its high performance in training and high quality of the rendered image. However, it uses anisotropic Gaussian kernels to represent the scene. Although such anisotropic kernels have advantages in representing the geometry, they lead to difficulties in terms of computation, such as splitting or merging two kernels. In this paper, we propose to use isotropic Gaussian kernels to avoid such difficulties in the computation, leading to a higher performance method. The experiments confirm that the proposed method is about {\bf 100X} faster without losing the geometry representation accuracy. The proposed method can be applied in a large range applications where the radiance field is needed, such as 3D reconstruction, view synthesis, and dynamic object modeling. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13619 [pdf]

Dynamic Resource Allocation for Virtual Machine Migration Optimization using Machine Learning

Authors: Yulu Gong, Jiaxin Huang, Bo Liu, Jingyu Xu, Binbin Wu, Yifan Zhang

Abstract: The paragraph is grammatically correct and logically coherent. It discusses the importance of mobile terminal cloud computing migration technology in meeting the demands of evolving computer and cloud computing technologies. It emphasizes the need for efficient data access and storage, as well as the utilization of cloud computing migration technology to prevent additional time delays. The paragra… ▽ More The paragraph is grammatically correct and logically coherent. It discusses the importance of mobile terminal cloud computing migration technology in meeting the demands of evolving computer and cloud computing technologies. It emphasizes the need for efficient data access and storage, as well as the utilization of cloud computing migration technology to prevent additional time delays. The paragraph also highlights the contributions of cloud computing migration technology to expanding cloud computing services. Additionally, it acknowledges the role of virtualization as a fundamental capability of cloud computing while emphasizing that cloud computing and virtualization are not inherently interconnected. Finally, it introduces machine learning-based virtual machine migration optimization and dynamic resource allocation as a critical research direction in cloud computing, citing the limitations of static rules or manual settings in traditional cloud computing environments. Overall, the paragraph effectively communicates the importance of machine learning technology in addressing resource allocation and virtual machine migration challenges in cloud computing. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Showing 1–50 of 938 results for author: Gong, Y