MMSR: Symbolic Regression is a Multi-Modal Information Fusion Task

Li, Yanjie; Liu, Jingyi; Li, Weijun; Yu, Lina; Wu, Min; Li, Wenqiang; Hao, Meilan; Wei, Su; Deng, Yusong

doi:10.1016/j.inffus.2024.102681

Computer Science > Machine Learning

arXiv:2402.18603 (cs)

[Submitted on 28 Feb 2024 (v1), last revised 19 Sep 2024 (this version, v5)]

Title:MMSR: Symbolic Regression is a Multi-Modal Information Fusion Task

Authors:Yanjie Li, Jingyi Liu, Weijun Li, Lina Yu, Min Wu, Wenqiang Li, Meilan Hao, Su Wei, Yusong Deng

View PDF HTML (experimental)

Abstract:Mathematical formulas are the crystallization of human wisdom in exploring the laws of nature for thousands of years. Describing the complex laws of nature with a concise mathematical formula is a constant pursuit of scientists and a great challenge for artificial intelligence. This field is called symbolic regression (SR). Symbolic regression was originally formulated as a combinatorial optimization problem, and Genetic Programming (GP) and Reinforcement Learning algorithms were used to solve it. However, GP is sensitive to hyperparameters, and these two types of algorithms are inefficient. To solve this problem, researchers treat the mapping from data to expressions as a translation problem. And the corresponding large-scale pre-trained model is introduced. However, the data and expression skeletons do not have very clear word correspondences as the two languages do. Instead, they are more like two modalities (e.g., image and text). Therefore, in this paper, we proposed MMSR. The SR problem is solved as a pure multi-modal problem, and contrastive learning is also introduced in the training process for modal alignment to facilitate later modal feature fusion. It is worth noting that to better promote the modal feature fusion, we adopt the strategy of training contrastive learning loss and other losses at the same time, which only needs one-step training, instead of training contrastive learning loss first and then training other losses. Because our experiments prove training together can make the feature extraction module and feature fusion module wearing-in better. Experimental results show that compared with multiple large-scale pre-training baselines, MMSR achieves the most advanced results on multiple mainstream datasets including SRBench. Our code is open source at this https URL

Comments:	The Information Fusion has accepted this paper
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2402.18603 [cs.LG]
	(or arXiv:2402.18603v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.18603
Related DOI:	https://doi.org/10.1016/j.inffus.2024.102681

Submission history

From: Yanjie Li [view email]
[v1] Wed, 28 Feb 2024 08:29:42 UTC (693 KB)
[v2] Sun, 10 Mar 2024 11:17:58 UTC (924 KB)
[v3] Tue, 12 Mar 2024 16:35:25 UTC (946 KB)
[v4] Thu, 14 Mar 2024 12:10:43 UTC (947 KB)
[v5] Thu, 19 Sep 2024 12:30:04 UTC (2,135 KB)

Computer Science > Machine Learning

Title:MMSR: Symbolic Regression is a Multi-Modal Information Fusion Task

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MMSR: Symbolic Regression is a Multi-Modal Information Fusion Task

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators