Search | arXiv e-print repository

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Authors: Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu

Abstract: Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer scie… ▽ More Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer science field. To bridge this gap, we introduce CS-Bench, the first bilingual (Chinese-English) benchmark dedicated to evaluating the performance of LLMs in computer science. CS-Bench comprises approximately 5K meticulously curated test samples, covering 26 subfields across 4 key areas of computer science, encompassing various task forms and divisions of knowledge and reasoning. Utilizing CS-Bench, we conduct a comprehensive evaluation of over 30 mainstream LLMs, revealing the relationship between CS performance and model scales. We also quantitatively analyze the reasons for failures in existing LLMs and highlight directions for improvements, including knowledge supplementation and CS-specific reasoning. Further cross-capability experiments show a high correlation between LLMs' capabilities in computer science and their abilities in mathematics and coding. Moreover, expert LLMs specialized in mathematics and coding also demonstrate strong performances in several CS subfields. Looking ahead, we envision CS-Bench serving as a cornerstone for LLM applications in the CS field and paving new avenues in assessing LLMs' diverse reasoning capabilities. The CS-Bench data and evaluation code are available at https://github.com/csbench/csbench. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2403.19441 [pdf, other]

doi 10.1109/CBMS58004.2023.00303

A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews

Authors: Mamadou Dia, Ghazaleh Khodabandelou, Alice Othmani

Abstract: Post-traumatic stress disorder (PTSD) is a mental disorder that can be developed after witnessing or experiencing extremely traumatic events. PTSD can affect anyone, regardless of ethnicity, or culture. An estimated one in every eleven people will experience PTSD during their lifetime. The Clinician-Administered PTSD Scale (CAPS) and the PTSD Check List for Civilians (PCL-C) interviews are gold st… ▽ More Post-traumatic stress disorder (PTSD) is a mental disorder that can be developed after witnessing or experiencing extremely traumatic events. PTSD can affect anyone, regardless of ethnicity, or culture. An estimated one in every eleven people will experience PTSD during their lifetime. The Clinician-Administered PTSD Scale (CAPS) and the PTSD Check List for Civilians (PCL-C) interviews are gold standards in the diagnosis of PTSD. These questionnaires can be fooled by the subject's responses. This work proposes a deep learning-based approach that achieves state-of-the-art performances for PTSD detection using audio recordings during clinical interviews. Our approach is based on MFCC low-level features extracted from audio recordings of clinical interviews, followed by deep high-level learning using a Stochastic Transformer. Our proposed approach achieves state-of-the-art performances with an RMSE of 2.92 on the eDAIC dataset thanks to the stochastic depth, stochastic deep learning layers, and stochastic activation function. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Journal ref: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (2023) 700-705

arXiv:2402.11279 [pdf, other]

Multi-Perspective Consistency Enhances Confidence Estimation in Large Language Models

Authors: Pei Wang, Yejie Wang, Muxi Diao, Keqing He, Guanting Dong, Weiran Xu

Abstract: In the deployment of large language models (LLMs), accurate confidence estimation is critical for assessing the credibility of model predictions. However, existing methods often fail to overcome the issue of overconfidence on incorrect answers. In this work, we focus on improving the confidence estimation of large language models. Considering the fragility of self-awareness in language models, we… ▽ More In the deployment of large language models (LLMs), accurate confidence estimation is critical for assessing the credibility of model predictions. However, existing methods often fail to overcome the issue of overconfidence on incorrect answers. In this work, we focus on improving the confidence estimation of large language models. Considering the fragility of self-awareness in language models, we introduce a Multi-Perspective Consistency (MPC) method. We leverage complementary insights from different perspectives within models (MPC-Internal) and across different models (MPC-Across) to mitigate the issue of overconfidence arising from a singular viewpoint. The experimental results on eight publicly available datasets show that our MPC achieves state-of-the-art performance. Further analyses indicate that MPC can mitigate the problem of overconfidence and is effectively scalable to other models. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.09136 [pdf, other]

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Authors: Yejie Wang, Keqing He, Guanting Dong, Pei Wang, Weihao Zeng, Muxi Diao, Yutao Mou, Mengdi Zhang, Jingang Wang, Xunliang Cai, Weiran Xu

Abstract: Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code eva… ▽ More Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 14 pages, 6 figures

arXiv:2304.05398 [pdf, other]

Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space

Authors: Michael Diao, Krishnakumar Balasubramanian, Sinho Chewi, Adil Salim

Abstract: Variational inference (VI) seeks to approximate a target distribution $π$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $π$ by minimizing the Kullback-Leibler (KL) divergence to $π$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-G… ▽ More Variational inference (VI) seeks to approximate a target distribution $π$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $π$ by minimizing the Kullback-Leibler (KL) divergence to $π$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures-Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when $π$ is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when $π$ is only log-smooth. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:1909.12160 [pdf, other]

Galaxy Image Simulation Using Progressive GANs

Authors: Mohamad Dia, Elodie Savary, Martin Melchior, Frederic Courbin

Abstract: In this work, we provide an efficient and realistic data-driven approach to simulate astronomical images using deep generative models from machine learning. Our solution is based on a variant of the generative adversarial network (GAN) with progressive training methodology and Wasserstein cost function. The proposed solution generates naturalistic images of galaxies that show complex structures an… ▽ More In this work, we provide an efficient and realistic data-driven approach to simulate astronomical images using deep generative models from machine learning. Our solution is based on a variant of the generative adversarial network (GAN) with progressive training methodology and Wasserstein cost function. The proposed solution generates naturalistic images of galaxies that show complex structures and high diversity, which suggests that data-driven simulations using machine learning can replace many of the expensive model-driven methods used in astronomical data processing. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: Submitted to the Astronomical Data Analysis Software & Systems Conference (ADASS), 2019

arXiv:1812.02537 [pdf, other]

Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Abstract: Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and pro… ▽ More Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and proven in few specific cases by a variety of methods. Here, we use the spatial coupling methodology developed in the framework of error correcting codes, to rigorously derive the mutual information for the symmetric rank-one case. We characterize the detectability phase transitions in a large set of estimation problems, where we show that there exists a gap between what currently known polynomial algorithms (in particular spectral methods and approximate message-passing) can do and what is expected information theoretically. Moreover, we show that the computational gap vanishes for the proposed spatially coupled model, a promising feature with many possible applications. Our proof technique has an interest on its own and exploits three essential ingredients: the interpolation method first introduced in statistical physics, the analysis of approximate message-passing algorithms first introduced in compressive sensing, and the theory of threshold saturation for spatially coupled systems first developed in coding theory. Our approach is very generic and can be applied to many other open problems in statistical estimation where heuristic statistical physics predictions are available. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: Submitted to Journal of Machine Learning Research (JMLR)

arXiv:1804.00602 [pdf, other]

A Compressed Sensing Approach for Distribution Matching

Authors: Mohamad Dia, Vahid Aref, Laurent Schmalen

Abstract: In this work, we formulate the fixed-length distribution matching as a Bayesian inference problem. Our proposed solution is inspired from the compressed sensing paradigm and the sparse superposition (SS) codes. First, we introduce sparsity in the binary source via position modulation (PM). We then present a simple and exact matcher based on Gaussian signal quantization. At the receiver, the dematc… ▽ More In this work, we formulate the fixed-length distribution matching as a Bayesian inference problem. Our proposed solution is inspired from the compressed sensing paradigm and the sparse superposition (SS) codes. First, we introduce sparsity in the binary source via position modulation (PM). We then present a simple and exact matcher based on Gaussian signal quantization. At the receiver, the dematcher exploits the sparsity in the source and performs low-complexity dematching based on generalized approximate message-passing (GAMP). We show that GAMP dematcher and spatial coupling lead to asymptotically optimal performance, in the sense that the rate tends to the entropy of the target distribution with vanishing reconstruction error in a proper limit. Furthermore, we assess the performance of the dematcher on practical Hadamard-based operators. A remarkable feature of our proposed solution is the possibility to: i) perform matching at the symbol level (nonbinary); ii) perform joint channel coding and matching. △ Less

Submitted 25 November, 2018; v1 submitted 2 April, 2018; originally announced April 2018.

Comments: in the 2018 IEEE International Symposium on Information Theory (ISIT)

arXiv:1707.04203 [pdf, other]

Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocatio… ▽ More Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocation or spatially coupled constructions. Here, we generalize the analysis of these codes to a much broader setting that includes all memoryless channels. We show, for a large class of memoryless channels, that spatial coupling allows an efficient decoder, based on the generalized approximate message-passing (GAMP) algorithm, to reach the potential (or Bayes optimal) threshold of the underlying (or uncoupled) code ensemble. Moreover, we argue that spatially coupled sparse superposition codes universally achieve capacity under GAMP decoding by showing, through analytical computations, that the error floor vanishes and the potential threshold tends to capacity as one of the code parameter goes to infinity. Furthermore, we provide a closed form formula for the algorithmic threshold of the underlying code ensemble in terms of a Fisher information. Relating an algorithmic threshold to a Fisher information has theoretical as well as practical importance. Our proof relies on the state evolution analysis and uses the potential method developed in the theory of low-density parity-check (LDPC) codes and compressed sensing. △ Less

Submitted 8 November, 2018; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: Submitted to the IEEE transactions on information theory

arXiv:1701.05823 [pdf, other]

doi 10.1109/TIT.2020.2990880

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

Authors: Jean Barbier, Nicolas Macris, Mohamad Dia, Florent Krzakala

Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these consid… ▽ More We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in a proper limit the mutual information associated to such systems is the same as the one of uncoupled linear random Gaussian estimation. △ Less

Submitted 28 August, 2020; v1 submitted 20 January, 2017; originally announced January 2017.

Journal ref: IEEE Transactions on Information Theory, vol. 66, no. 7, pp. 4270-4303, July 2020

arXiv:1701.03590 [pdf, other]

doi 10.1109/ISIT.2017.8006798

Generalized Approximate Message-Passing Decoder for Universal Sparse Superposition Codes

Authors: Erdem Biyik, Jean Barbier, Mohamad Dia

Abstract: Sparse superposition (SS) codes were originally proposed as a capacity-achieving communication scheme over the additive white Gaussian noise channel (AWGNC) [1]. Very recently, it was discovered that these codes are universal, in the sense that they achieve capacity over any memoryless channel under generalized approximate message-passing (GAMP) decoding [2], although this decoder has never been s… ▽ More Sparse superposition (SS) codes were originally proposed as a capacity-achieving communication scheme over the additive white Gaussian noise channel (AWGNC) [1]. Very recently, it was discovered that these codes are universal, in the sense that they achieve capacity over any memoryless channel under generalized approximate message-passing (GAMP) decoding [2], although this decoder has never been stated for SS codes. In this contribution we introduce the GAMP decoder for SS codes, we confirm empirically the universality of this communication scheme through its study on various channels and we provide the main analysis tools: state evolution and potential. We also compare the performance of GAMP with the Bayes-optimal MMSE decoder. We empirically illustrate that despite the presence of a phase transition preventing GAMP to reach the optimal performance, spatial coupling allows to boost the performance that eventually tends to capacity in a proper limit. We also prove that, in contrast with the AWGNC case, SS codes for binary input channels have a vanishing error floor in the limit of large codewords. Moreover, the performance of Hadamard-based encoders is assessed for practical implementations. △ Less

Submitted 13 January, 2017; originally announced January 2017.

arXiv:1607.02335 [pdf, other]

doi 10.1109/ALLERTON.2016.7852290

The Mutual Information in Random Linear Estimation

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala

Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considera… ▽ More We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields, in particular, a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. △ Less

Submitted 6 September, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

Comments: Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016

Journal ref: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Pages: 625 - 632

arXiv:1606.04142 [pdf, other]

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, Lenka Zdeborova

Abstract: Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows… ▽ More Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016) pp 424-432

arXiv:1603.04591 [pdf, other]

Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in… ▽ More We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in the large input alphabet size limit: i) the GAMP algorithmic threshold of the underlying (or uncoupled) code ensemble is simply expressed as a Fisher information; ii) the potential threshold tends to Shannon's capacity. Although we focus on coding for sake of coherence with our previous results, the framework and methods are very general and hold for a wide class of generalized estimation problems with random linear mixing. △ Less

Submitted 15 March, 2016; originally announced March 2016.

Comments: Submitted to the Information Theory Workshop (ITW) 2016, Cambridge, United Kingdom

arXiv:1603.01817 [pdf, other]

Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding a… ▽ More Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding are used, without need of power allocation. In this note we prove that state evolution (which tracks message passing) indeed saturates the potential threshold of the underlying code ensemble, which approaches in a proper limit the optimal threshold. Our proof uses ideas developed in the theory of low-density parity-check codes and compressive sensing. △ Less

Submitted 6 March, 2016; originally announced March 2016.

Comments: Submitted to the International Symposium on Information Theory (ISIT) 2016, Barcelona, Spain

arXiv:1404.3389 [pdf, other]

doi 10.1371/journal.pone.0094933

Mean-Field Games for Marriage

Authors: Dario Bauso, Ben Mansour Dia, Boualem Djehiche, Hamidou Tembine, Raul Tempone

Abstract: This article examines mean-field games for marriage. The results support the argument that optimizing the long-term well-being through effort and social feeling state distribution (mean-field) will help to stabilize marriage. However, if the cost of effort is very high, the couple fluctuates in a bad feeling state or the marriage breaks down. We then examine the influence of society on a couple us… ▽ More This article examines mean-field games for marriage. The results support the argument that optimizing the long-term well-being through effort and social feeling state distribution (mean-field) will help to stabilize marriage. However, if the cost of effort is very high, the couple fluctuates in a bad feeling state or the marriage breaks down. We then examine the influence of society on a couple using mean field sentimental games. We show that, in mean-field equilibrium, the optimal effort is always higher than the one-shot optimal effort. We illustrate numerically the influence of the couple's network on their feeling states and their well-being. △ Less

Submitted 13 April, 2014; originally announced April 2014.

Comments: 22 figures. Accepted and to appear in PLoS One

Showing 1–16 of 16 results for author: Diao, M