Abstract
Conversational large language models (LLMs) such as ChatGPT and GPT-4 have recently exhibited remarkable capabilities across various domains, capturing widespread attention from the public. To facilitate this line of research, in this paper, we report the development of MOSS, an open-sourced conversational LLM that contains 16 B parameters and can perform a variety of instructions in multi-turn interactions with humans. The base model of MOSS is pre-trained on large-scale unlabeled English, Chinese, and code data. To optimize the model for dialogue, we generate 1.1 M synthetic conversations based on user prompts collected through our earlier versions of the model API. We then perform preference-aware training on preference data annotated from AI feedback. Evaluation results on real-world use cases and academic benchmarks demonstrate the effectiveness of the proposed approaches. In addition, we present an effective practice to augment MOSS with several external tools. Through the development of MOSS, we have established a complete technical roadmap for large language models from pre-training, supervised fine-tuning to alignment, verifying the feasibility of chatGPT under resource-limited conditions and providing a reference for both the academic and industrial communities. Model weights and code are publicly available at https://github.com/OpenMOSS/MOSS.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Change history
17 September 2024
An Erratum to this paper has been published: https://doi.org/10.1007/s11633-024-1527-z
14 September 2024
An Erratum to this paper has been published: https://doi.org/10.1007/s11633-024-1527-z
14 September 2024
A Correction to this paper has been published: https://doi.org/10.1007/s11633-024-1527-z
References
W. X. Zhao, K. Zhou, J. Y. Li, T. Y. Tang, X. L. Wang, Y. P. Hou, Y. Q. Min, B. C. Zhang, J. J. Zhang, Z. C. Dong, Y. F. Du, C. Yang, Y. S. Chen, Z. P. Chen, J. H. Jiang, R. Y. Ren, Y. F. Li, X. Y. Tang, Z. K. Liu, P. Y. Liu, J. Y. Nie, J. R. Wen. A survey of large language models, [Online], Available: https://arxiv.org/abs/2303.18223, 2023.
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 159, 2020.
J. W. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann, F. Song, J. Aslanides, S. Henderson, R. Ring, S. Young, E. Rutherford, T. Hennigan, J. Menick, A. Cassirer, R. Powell, G. van den Driessche, L. A. Hendricks, M. Rauh, P. S. Huang, A. Glaese, J. Welbl, S. Dathathri, S. Huang, J. Uesato, J. Mellor, I. Higgins, A. Creswell, N. McAleese, A. Wu, E. Elsen, S. Jayakumar, E. Buchatskaya, D. Budden, E. Sutherland, K. Simonyan, M. Paganini, L. Sifre, L. Martens, X. L. Li, A. Kuncoro, A. Nematzadeh, E. Gribovskaya, D. Donato, A. Lazaridou, A. Mensch, J. B. Lespiau, M. Tsimpoukelli, N. Grigorev, D. Fritz, T. Sottiaux, M. Pajarskas, T. Pohlen, Z. T. Gong, D. Toyama, C. de Masson d’Autume, Y. J. Li, T. Terzi, V. Mikulik, I. Babuschkin, A. Clark, D. de Las Casas, A. Guy, C. Jones, J. Bradbury, M. Johnson, B. Hechtman, L. Weidinger, I. Gabriel, W. Isaac, E. Lockhart, S. Osindero, L. Rimell, C. Dyer, O. Vinyals, K. Ayoub, J. Stanway, L. Bennett, D. Hassabis, K. Kavukcuoglu, G. Irving. Scaling language models: Methods, analysis & insights from training gopher, [Online], Available: https://arxiv.org/abs/2112.11446, 2021.
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. S. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. C. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. W. Zhou, X. Z. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, N. Fiedel. PaLM: Scaling language modeling with pathways. The Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, O. Vinyals, J. W. Rae, L. Sifre. Training compute-optimal large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 2176, 2022.
A. H. Zeng, X. Liu, Z. X. Du, Z. H. Wang, H. Y. Lai, M. Ding, Z. Y. Yang, Y. F. Xu, W. D. Zheng, X. Xia, W. L. Tam, Z. X. Ma, Y. F. Xue, J. D. Zhai, W. G. Chen, Z. Y. Liu, P. Zhang, Y. X. Dong, J. Tang. GLM-130B: An open bilingual pre-trained model. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lample. LLaMA: Open and efficient foundation language models, [Online], Available: https://arxiv.org/abs/2302.13971, 2023.
OpenAI. GPT-4 technical report, [Online], Available: https://arxiv.org/abs/2303.08774, 2023.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, R. Lowe. Training language models to follow instructions with human feedback. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 2011, 2022.
Y. T. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, J. Kaplan. Training a helpful and harmless assistant with reinforcement learning from human feedback, [Online], Available: https://arxiv.org/abs/2204.05862, 2022.
A. Glaese, N. McAleese, M. Trçbacz, J. Aslanides, V. Firoiu, T. Ewalds, M. Rauh, L. Weidinger, M. Chadwick, P. Thacker, L. Campbell-Gillingham, J. Uesato, P. S. Huang, R. Comanescu, F. Yang, A. See, S. Dathathri, R. Greig, C. Chen, D. Fritz, J. S. Elias, R. Green, S. Mokrá, N. Fernando, B. X. Wu, R. Foley, S. Young, I. Gabriel, W. Isaac, J. Mellor, D. Hassabis, K. Kavukcuoglu, L. A. Hendricks, G. Irving. Improving alignment of dialogue agents via targeted human judgements, [Online], Available: https://arxiv.org/abs/2209.14375, 2022.
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei. Scaling laws for neural language models, [Online], Available: https://arxiv.org/abs/2001.08361.
Y. F. Shao, Z. C. Geng, Y. T. Liu, J. Q. Dai, H. Yan, F. Yang, L. Zhe, H. J. Bao, X. P. Qiu. CPT: A pre-trained unbalanced transformer for both Chinese language understanding and generation, [Online], Available: https://arxiv.org/abs/2109.05729, 2021.
L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, S. Presser, C. Leahy. The pile: An 800 GB dataset of diverse text for language modeling, [Online], Available: https://arxiv.org/abs/2101.00027, 2020.
Z. Y. Yin, Q. S. Sun, Q. P. Guo, J. W. Wu, X. P. Qiu, X. J. Huang. Do large language models know what they don’t know? In Proceedings of the Findings of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, pp. 8653–8665, 2023. DOI: https://doi.org/10.18653/v1/2023.findings-acl.551.
Y. T. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, K. Lukosuite, L. Lovitt, M. Sellitto, N. Elhage, N. Schiefer, N. Mercado, N. DasSarma, R. Lasenby, R. Larson, S. Ringer, S. Johnston, S. Kravec, S. El Showk, S. Fort, T. Lanham, T. Telleen-Lawton, T. Conerly, T. Henighan, T. Hume, S. R. Bowman, Z. Hatfield-Dodds, B. Mann, D. Amodei, N. Joseph, S. McCandlish, T. Brown, J. Kaplan. Constitutional AI: Harmlessness from AI feedback, [Online], Available: https://arxiv.org/abs/2212.08073, 2021.
R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, J. Schulman. WebGPT: Browser-assisted question-answering with human feedback, [Online], Available: https://arxiv.org/abs/2112.09332, 2021.
T. Schick, J. Dwivedi-Yu, R. Dessi, R. Raileanu, M. Lomeli, L. E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom. Toolformer: Language models can teach themselves to use tools. In Proceedings of the 37th Conference on Neural Information Processing Systems, New Orleans, USA, 2023.
G. Mialon, R. Dessi, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, J. Dwivedi-Yu, A. Celikyilmaz, E. Grave, Y. LeCun, T. Scialom. Augmented language models: A survey, [Online], Available: https://arxiv.org/abs/2302.07842, 2023.
X. P. Qiu, T. X. Sun, Y. G. Xu, Y. F. Shao, N. Dai, X. J. Huang. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020. DOI: https://doi.org/10.1007/s11431-020-1647-3.
T. Y. Lin, Y. X. Wang, X. Y. Liu, X. P. Qiu. A survey of transformers. AI Open, vol. 3, pp. 111–132, 2022. DOI: https://doi.org/10.1016/j.aiopen.2022.10.001.
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Y. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Y. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. H. Lu, Y. N. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. X. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. X. Xu, Z. Yan, I. Zarov, Y. C. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom. Llama 2: Open foundation and fine-tuned chat models, [Online], Available: https://arxiv.org/abs/2307.09288, 2023.
J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, W. Fedus. Emergent abilities of large language models. Transactions on Machine Learning Research, vol. 2022, 2022.
N. Stiennon, L. Ouyang, J. Wu, D. M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, P. Christiano. Learning to summarize from human feedback, [Online], Available: https://arxiv.org/abs/2009.01325, 2020.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https://arxiv.org/abs/1707.06347, 2017.
A. Askell, Y. T. Bai, A. N. Chen, D. Drain, D. Ganguli, T. Henighan, A. Jones, N. Joseph, B. Mann, N. DasSarma, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, J. Kernion, K. Ndousse, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, J. Kaplan. A general language assistant as a laboratory for alignment, [Online], Available: https://arxiv.org/abs/2112.00861, 2021.
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language models are unsupervised multitask learners, [Online], Available: https://openai.com/index/better-language-models, 2019.
R. Sennrich, B. Haddow, A. Birch. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, pp. 1715–1725, 2016. DOI: https://doi.org/10.18653/v1/p16-1162.
S. Rajbhandari, J. Rasley, O. Ruwase, Y. X. He. ZeRO: Memory optimizations toward training trillion parameter models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, Atlanta, USA, pp. 1–16, 2020. DOI: https://doi.org/10.1109/SC41405.2020.00024.
T. Q. Chen, B. Xu, C. Y. Zhang, C. Guestrin. Training deep nets with sublinear memory cost, [Online], Available: https://arxiv.org/abs/1604.06174, 2016.
E. Nijkamp, B. Pang, H. Hayashi, L. F. Tu, H. Wang, Y. B. Zhou, S. Savarese, C. M. Xiong. CodeGen: An open large language model for code with multi-turn program synthesis. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
Q. Y. Cheng, X. G. Yang, T. X. Sun, L. Y. Li, X. P. Qiu. Improving contrastive learning of sentence embeddings from AI feedback. In Proceedings of the Findings of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, pp. 11122–11138, 2023. DOI: https://doi.org/10.18653/v1/2023.findings-acl.707.
J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Q. V. Le. Finetuned language models are zero-shot learners. In Proceedings of the 10th International Conference on Learning Representations, 2022.
V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, A. Raja, M. Dey, M. S. Bari, C. W. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. V. Nayak, D. Datta, J. Chang, M. T. J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Févry, J. A. Fries, R. Teehan, T. Le Scao, S. Biderman, L. Gao, T. Wolf, A. M. Rush. Multitask prompted training enables zero-shot task generalization. In Proceedings of the 10th International Conference on Learning Representations, 2022.
Y. Z. Wang, S. Mishra, P. Alipoormolabashi, Y. Kordi, A. Mirzaei, A. Arunkumar, A. Ashok, A. S. Dhanasekaran, A. Naik, D. Stap, E. Pathak, G. Karamanolakis, H. G. Lai, I. Purohit, I. Mondal, J. Anderson, K. Kuznia, K. Doshi, M. Patel, K. K. Pal, M. Moradshahi, M. Parmar, M. Purohit, N. Varshney, P. R. Kaza, P. Verma, R. S. Puri, R. Karia, S. K. Sampat, S. Doshi, S. Mishra, S. Reddy, S. Patro, T. Dixit, X. D. Shen, C. Baral, Y. J. Choi, N. A. Smith, H. Hajishirzi, D. Khashabi. Super-naturalinstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, pp. 5085–5109, 2022. DOI: https://doi.org/10.18653/v1/2022.emnlp-main.340.
Y. Z. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, H. Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, pp. 13484–13508, 2023. DOI: https://doi.org/10.18653/v1/2023.acl-long.754.
D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. T. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. N. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, J. Clark. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned, [Online], Available: https://arxiv.org/abs/2209.07858, 2022.
I. Loshchilov, F. Hutter. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, J. Steinhardt. Measuring massive multitask language understanding. In Proceedings of the 9th International Conference on Learning Representations, 2021.
S. Lin, J. Hilton, O. Evans. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, pp. 3214–3252, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.229.
S. Gehman, S. Gururangan, M. Sap, Y. Choi, N. A. Smith. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Proceedings of the Findings of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 3356–3369, 2020. DOI: https://doi.org/10.18653/v1/2020.fmdings-emnlp.301.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, W. Zaremba. Hindsight experience replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5055–5065, 2017.
T. J. Zhang, F. C. Liu, J. Wong, P. Abbeel, J. E. Gonzalez. The wisdom of hindsight makes language models better instruction followers. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, USA, pp. 41414–41428, 2023.
R. Mihalcea, P. Tarau. TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Barcelona, Spain, pp.404–411, 2004.
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 10674–10685, 2022. DOI: https://doi.org/10.1109/CV-PR52688.2022.01042.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 62022027). We also extend our gratitude to the Shanghai Artificial Intelligence Laboratory, China, for providing the computational resources.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declared that they have no conflicts of interest to this work.
Additional information
Colored figures are available in the online version at https://link.springer.com/journal/11633
Tianxiang Sun received the B. Eng. degree in software engineering from Xidian University, China in 2019. He is currently a Ph.D. degree candidate in School of Computer Science, Fudan University, China.
His research interests include natural language processing and deep learning.
Xiaotian Zhang received the B. Eng. degree in civil engineering from Tongji University, China in 2021. He received the M. Eng. degree in computer science and technology at Fudan University, China in 2004, under the supervision of Professor Xipeng Qiu.
His research interest is natural language processing.
Zhengfu He received the B. Sc. degree in computer science from Fudan University, China in 2023. He is a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interests include mechanistic interpretability and large language models.
Peng Li received the B. Eng. degree in data science from East China Normal University, China in 2020. He is now a master student at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interest is foundation models.
Qinyuan Cheng received the B. Eng. degree in computer science from Sun Yat-Sen University, China in 2020. He is a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interest is large language models.
Xiangyang Liu received the B. Eng. degree in intelligence science and technology from Xidian University, China in 2020. He is now a Ph. D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interests include language model training, efficient methods and AI alignment.
Hang Yan received the B. Eng. degree in electrical engineering and automation from Fudan University, China in 2015, received the M. Eng. degree in electrical engineer at Columbia University, USA in 2017. He is a Ph.D. degree candidate in computer science from Fudan University, China, under the supervision of Professor XiPeng Qiu.
His research interests include large model training, information extraction, and open-source software development.
Yunfan Shao received the B. Sc. and M. Sc. degrees in computer science from Fudan University, China in 2019 and 2022, respectively. He is a Ph.D. degree candidate at Fudan University, China.
His research interest is large language models.
Qiong Tang received the B. Sc. degree in data science from East China Normal University, China in 2022. She is a master student at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interest is large language models.
Shiduo Zhang received the B. Eng. degree in software engineering from Tongji University, China in 2023. He is now a master student at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interests include foundation models and embodied AI.
Xingjian Zhao received the B. Sc. degree in artificial intelligence from Fudan University, China in 2024. He is now a master student in computer science at Fudan University, China.
His research interest is large language models.
Ke Chen is an open source contributor for open-moss project and moss backend, interested in system software. He is now pursuing the Bachelor’s degree in computer science at Fudan University, China.
His research interests include natural language processing and artificial intelligence
Yining Zheng received the B. Sc. degree in computer science from Fudan University, China in 2019. He is now a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.
His research interests include large language model training and efficient methods.
Zhejian Zhou received the B. Sc. degree in electronic and information science and technology from the School of Electronics Engineering and Computer Science, Peking University, China. He was a visiting student at the Fudan NLP Group. He is currently a Ph.D. degree candidate in computer science at University of Southern California, USA.
His research interests include artificial intelligence and natural language processing.
Ruixiao Li received the B. Sc. degree in computer science from Fudan University, China in 2024. He is now a Ph.D. degree candidate in computer science at Fudan University, China.
His research interest is large language models.
Jun Zhan received the B. Eng. degree in software engineering from Huazhong University of Science and Technology, China in 2022, and is currently a master student computer science at Fudan University, China.
His research interest is large language models.
Yunhua Zhou received the M. Sc. and Ph.D. degrees in computer science from Fudan University, China in 2019 and 2024, respectively. Currently, He is a researcher at the Shanghai Artificial Intelligence Laboratory, China.
His research interest is large language models.
Linyang Li received the B. Eng. degree in electronical engineering from Fudan University, China in 2019. He is a Ph.D. degree candidate in computer science from Fudan University, China, under the supervision of Professor Xipeng Qiu.
His research interests include large model training, AI safety studies on large language models.
Xiaogui Yang received the B. Sc. and M. Eng. degrees in computer science from Fudan University, China in 2021 and 2024, respectively. Currently, he is an engineer at the Shanghai Artificial Intelligence Laboratory, China.
His research interest is large language models.
Lingling Wu received the B. Sc. degree in computer science from Shanghai JiaoTong University and M. Eng. degree in computer science from Fudan University, China in 2021 and 2024, respectively.
Her research interest is natural language processing.
Zhangyue Yin received the B. Sc. degree in data science from East China Normal University, China in 2021. He is now a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu and Professor Xuanjing Huang.
His research interests include large language models and machine reasoning.
Xuanjing Huang received the Ph.D. degree in computer science from Fudan University, China in 1998. She is currently a professor at the School of Computer Science, Fudan University, China.
Her research interests include natural language processing and information retrieval, with a particular emphasis on sentiment analysis, information extraction, pre-trained language models, and the robustness and interpretability of NLP.
Yu-Gang Jiang received the Ph.D. degree in computer science from City University of Hong Kong, China in 2009. He is Vice President of Fudan University, China, and a Chang Jiang Scholar Distinguished Professor of Computer Science. He is a Fellow of IEEE and IAPR.
His research interests include multimedia, computer vision, and trustworthy AGI.
Xipeng Qiu received the B. Sc. degree and Ph.D. degrees in computer science from Fudan University, China in 2001 and 2006, respectively. Currently, he is a professor at School of Computer Science, Fudan University, China.
His research interests include natural language processing and deep learning.
Rights and permissions
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
The original version of this article was revised due to a retrospective Open Access order
About this article
Cite this article
Sun, T., Zhang, X., He, Z. et al. MOSS: An Open Conversational Large Language Model. Mach. Intell. Res. 21, 888–905 (2024). https://doi.org/10.1007/s11633-024-1502-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-024-1502-8