Abstract
Language models serve as a cornerstone in natural language processing, utilizing mathematical methods to generalize language laws and knowledge for prediction and generation. Over extensive research spanning decades, language modeling has progressed from initial statistical language models to the contemporary landscape of large language models (LLMs). Notably, the swift evolution of LLMs has reached the ability to process, understand, and generate human-level text. Nevertheless, despite the significant advantages that LLMs offer in improving both work and personal lives, the limited understanding among general practitioners about the background and principles of these models hampers their full potential. Notably, most LLM reviews focus on specific aspects and utilize specialized language, posing a challenge for practitioners lacking relevant background knowledge. In light of this, this survey aims to present a comprehensible overview of LLMs to assist a broader audience. It strives to facilitate a comprehensive understanding by exploring the historical background of language models and tracing their evolution over time. The survey further investigates the factors influencing the development of LLMs, emphasizing key contributions. Additionally, it concentrates on elucidating the underlying principles of LLMs, equipping audiences with essential theoretical knowledge. The survey also highlights the limitations of existing work and points out promising future directions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Pinker, S.: The Language Instinct: How the Mind Creates. Harper Collins Language, New York (1994)
Turing, A.M., Geirsson, H., Losonsky, M.: Computing machinery and intelligence. Artif. Intell. Crit. Concepts 2(236), 19 (2000)
Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., Baabdullah, A.M., Koohang, A., Raghavan, V., Ahuja, M., et al.: “so what if chatgpt wrote it?’’ multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. Int. J. Inf. Manage. 71, 102642 (2023)
Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models (2023). arXiv preprint arXiv:2303.18223
Jin, H., Wei, W., Wang, X., Zhang, W., Wu, Y.: Rethinking Learning Rate Tuning in the Era of Large Language Models, pp. 112–121 (2023). IEEE
Fu, Y., Peng, H., Khot, T.: How does gpt obtain its ability? Tracing emergent abilities of language models to their sources. Yao Fu’s Notion (2022)
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., et al.: Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report (2023). arXiv preprint arXiv:2303.08774
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)
Han, X., Zhang, Z., Ding, N., Gu, Y., Liu, X., Huo, Y., Qiu, J., Yao, Y., Zhang, A., Zhang, L., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)
Shanahan, M.: Talking about large language models. Commun. ACM 67(2), 68–79 (2024)
Dodge, J., Sap, M., Marasović, A., Agnew, W., Ilharco, G., Groeneveld, D., Mitchell, M., Gardner, M.: Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1286–1305 (2021)
Sonkar, S., Liu, N., Mallick, D., Baraniuk, R.: Class: A design framework for building intelligent tutoring systems based on learning science principles. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1941–1961 (2023)
Kim, B., Kim, H., Lee, S.-W., Lee, G., Kwak, D., Hyeon, J.D., Park, S., Kim, S., Kim, S., Seo, D., et al.: What changes can large-scale language models bring? intensive study on hyperclova: billions-scale Korean generative pretrained transformers. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3405–3424 (2021)
Tay, Y., Wei, J., Chung, H., Tran, V., So, D., Shakeri, S., Garcia, X., Zheng, S., Rao, J., Chowdhery, A., et al.: Transcending scaling laws with 0.1% extra compute. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1471–1486 (2023)
Ahmad, W., Chakraborty, S., Ray, B., Chang, K.: Unified pre-training for program understanding and generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
Muennighoff, N., Wang, T., Sutawika, L., Roberts, A., Biderman, S., Le Scao, T., Bari, M.S., Shen, S., Yong, Z.X., Schoelkopf, H., et al.: Crosslingual generalization through multitask finetuning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15991–16111 (2023)
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al.: A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 675–718 (2023)
Sheng, E., Chang, K.-W., Natarajan, P., Peng, N.: Societal biases in language generation: progress and challenges. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4275–4293 (2021)
Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, S., Zettlemoyer, L., Lewis, M.: Incoder: a generative model for code infilling and synthesis. In: The Eleventh International Conference on Learning Representations
Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., Xiong, C.: Codegen: an open large language model for code with multi-turn program synthesis. In: The Eleventh International Conference on Learning Representations (2022)
Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners. In: International Conference on Learning Representations (2021)
Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Le Scao, T., Raja, A., et al.: Multitask prompted training enables zero-shot task generalization. In: ICLR 2022-Tenth International Conference on Learning Representations (2022)
Tay, Y., Dehghani, M., Tran, V.Q., Garcia, X., Wei, J., Wang, X., Chung, H.W., Bahri, D., Schuster, T., Zheng, S., et al.: Ul2: Unifying language learning paradigms. In: The Eleventh International Conference on Learning Representations
Zeng, A., Liu, X., Du, Z., Wang, Z., Lai, H., Ding, M., Yang, Z., Xu, Y., Zheng, W., Xia, X., et al.: Glm-130b: an open bilingual pre-trained model. In: The Eleventh International Conference on Learning Representations
He, P., Liu, X., Gao, J., Chen, W.: Deberta: Decoding-enhanced bert with disentangled attention. In: International Conference on Learning Representations
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models (2022)
Du, N., Huang, Y., Dai, A.M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A.W., Firat, O., et al.: Glam: efficient scaling of language models with mixture-of-experts. In: International Conference on Machine Learning, pp. 5547–5569 (2022). PMLR
Biderman, S., Schoelkopf, H., Anthony, Q.G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M.A., Purohit, S., Prashanth, U.S., Raff, E., et al.: Pythia: A suite for analyzing large language models across training and scaling. In: International Conference on Machine Learning, pp. 2397–2430 (2023). PMLR
Wang, T., Roberts, A., Hesslow, D., Le Scao, T., Chung, H.W., Beltagy, I., Launay, J., Raffel, C.: What language model architecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, pp. 22964–22984 (2022). PMLR
Yu, Z., Wu, Y., Zhang, N., Wang, C., Vorobeychik, Y., Xiao, C.: Codeipprompt: intellectual property infringement assessment of code language models. In: International Conference on Machine Learning, pp. 40373–40389 (2023). PMLR
Steenhoek, B., Rahman, M.M., Jiles, R., Le, W.: An empirical study of deep learning models for vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 2237–2248 (2023). IEEE
Yin, Z., Wang, Z., Zhang, W.: Improving fairness in machine learning software via counterfactual fairness thinking. In: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, pp. 420–421 (2024)
Li, Y., Wang, S., Ding, H., Chen, H.: Large language models in finance: a survey. In: Proceedings of the Fourth ACM International Conference on AI in Finance, pp. 374–382 (2023)
Pagliaro, C., Mehta, D., Shiao, H.-T., Wang, S., Xiong, L.: Investor behavior modeling by analyzing financial advisor notes: a machine learning perspective. In: Proceedings of the Second ACM International Conference on AI in Finance, pp. 1–8 (2021)
Saxena, N.A., Zhang, W., Shahabi, C.: Missed opportunities in fair ai. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 961–964 (2023). SIAM
Wang, Z., Narasimhan, G., Yao, X., Zhang, W.: Mitigating multisource biases in graph neural networks via real counterfactual samples. In: 2023 IEEE International Conference on Data Mining (ICDM), pp. 638–647 (2023). IEEE
Chinta, S.V., Fernandes, K., Cheng, N., Fernandez, J., Yazdani, S., Yin, Z., Wang, Z., Wang, X., Xu, W., Liu, J., et al.: Optimization and improvement of fake news detection using voting technique for societal benefit. In: 2023 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1565–1574 (2023). IEEE
Xiao, C., Xu, S.X., Zhang, K., Wang, Y., Xia, L.: Evaluating reading comprehension exercises generated by llms: a showcase of chatgpt in education applications. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pp. 610–625 (2023)
Gupta, A., Dengre, V., Kheruwala, H.A., Shah, M.: Comprehensive review of text-mining applications in finance. Financ. Innov. 6, 1–25 (2020)
Kung, T.H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., et al.: Performance of chatgpt on usmle: potential for ai-assisted medical education using large language models. PLoS Digit. Health 2(2), 0000198 (2023)
Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on bert model. PLoS ONE 15(8), 0237861 (2020)
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., Szolovits, P.: What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Appl. Sci. 11(14), 6421 (2021)
Kombrink, S., Mikolov, T., Karafiát, M., Burget, L.: Recurrent neural network based language modeling in meeting recognition. Interspeech 11, 2877–2880 (2011)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, pp. 1045–1048 (2010). Makuhari
Stolcke, A., et al.: Srilm—an extensible language modeling toolkit. Interspeech 2002, 2002 (2002)
Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024)
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1–113 (2023)
Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23(120), 1–39 (2022)
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Baidoo-Anu, D., Ansah, L.O.: Education in the era of generative artificial intelligence (ai): understanding the potential benefits of chatgpt in promoting teaching and learning. J. AI 7(1), 52–62 (2023)
Chen, Z.Z., Ma, J., Zhang, X., Hao, N., Yan, A., Nourbakhsh, A., Yang, X., McAuley, J., Petzold, L., Wang, W.Y.: A survey on large language models for critical societal domains: finance, healthcare, and law (2024). arXiv preprint arXiv:2405.01769
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y.: A survey on large language model (llm) security and privacy: the good, the bad, and the ugly. High-Confiden. Comput. 100211 (2024)
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al.: A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions (2023). arXiv preprint arXiv:2311.05232
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88(8), 1270–1278 (2000)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Adv. Neural Inf. Process. Syst. 13 (2000)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26 (2013)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Roziere, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models (2023). arXiv preprint arXiv:2302.13971
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016). arXiv preprint arXiv:1602.02410
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. Adv. Neural Inf. Process. Syst. 28 (2015)
Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., et al.: The pile: an 800gb dataset of diverse text for language modeling (2020). arXiv preprint arXiv:2101.00027
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models (2020). arXiv preprint arXiv:2001.08361
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Cui, J., Li, Z., Yan, Y., Chen, B., Yuan, L.: Chatlaw: open-source legal large language model with integrated external knowledge bases (2023). arXiv preprint arXiv:2306.16092
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations (2019). arXiv preprint arXiv:1909.11942
Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S., et al.: Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 34, 770–778 (2016)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). arXiv preprint arXiv:1607.06450
Wang, C., Li, M., Smola, A.J.: Language models with transformers (2019). arXiv preprint arXiv:1904.09408
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.D.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models trained on code (2021). arXiv preprint arXiv:2107.03374
Clark, K.: Electra: pre-training text encoders as discriminators rather than generators (2020). arXiv preprint arXiv:2003.10555
Zhuang, L., Wayne, L., Ya, S., Jun, Z.: A robustly optimized bert pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227 (2021)
Conneau, A.: Unsupervised cross-lingual representation learning at scale (2019). arXiv preprint arXiv:1911.02116
Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., Liu, J., Chen, X., Zhao, Y., Lu, Y., et al.: Ernie 3.0: large-scale knowledge enhanced pre-training for language understanding and generation (2021). arXiv preprint arXiv:2107.02137
Soltan, S., Ananthakrishnan, S., FitzGerald, J., Gupta, R., Hamza, W., Khan, H., Peris, C., Rawls, S., Rosenbaum, A., Rumshisky, A., et al.: Alexatm 20b: few-shot learning using a large-scale multilingual seq2seq model (2022). arXiv preprint arXiv:2208.01448
Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Dal Lago, A., et al.: Competition-level code generation with alphacode. Science 378(6624), 1092–1097 (2022)
Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., et al.: A general language assistant as a laboratory for alignment (2021). arXiv preprint arXiv:2112.00861
Costa-jussà, M.R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Lam, J., Licht, D., Maillard, J., et al.: No language left behind: Scaling human-centered machine translation (2022). arXiv preprint arXiv:2207.04672
Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., Thacker, P., et al.: Improving alignment of dialogue agents via targeted human judgements (2022). arXiv preprint arXiv:2209.14375
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Las Casas, D., Hendricks, L.A., Welbl, J., Clark, A., et al.: Training compute-optimal large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, pp. 30016–30030 (2022)
Iyer, S., Lin, X.V., Pasunuru, R., Mihaylov, T., Simig, D., Yu, P., Shuster, K., Wang, T., Liu, Q., Koura, P.S., et al.: Opt-iml: scaling language model instruction meta learning through the lens of generalization (2022). arXiv preprint arXiv:2212.12017
Rae, J.W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., et al.: Scaling language models: methods, analysis & insights from training gopher (2021). arXiv preprint arXiv:2112.11446
Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Wang, Z., Shen, L., Wang, A., Li, Y., et al.: Codegeex: a pre-trained model for code generation with multilingual evaluations on humaneval-x (2023). arXiv preprint arXiv:2303.17568
Wei, T., Zhao, L., Zhang, L., Zhu, B., Wang, L., Yang, H., Li, B., Cheng, C., Lü, W., Hu, R., et al.: Skywork: a more open bilingual foundation model (2023). arXiv preprint arXiv:2310.19341
Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al.: Qwen technical report (2023). arXiv preprint arXiv:2309.16609
Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., Zhou, Y.: Codegen2: Lessons for training llms on programming and natural languages (2023). arXiv preprint arXiv:2305.02309
Black, S., Biderman, S., Hallahan, E., Anthony, Q.G., Gao, L., Golding, L., He, H., Leahy, C., McDonell, K., Phang, J., et al.: Gpt-neox-20b: an open-source autoregressive language model. In: Challenges \(\{\)\(\backslash \) &\(\}\) Perspectives in Creating Large Language Models
Reid, M., Savinov, N., Teplyashin, D., Lepikhin, D., Lillicrap, T., Alayrac, J.-b., Soricut, R., Lazaridou, A., Firat, O., Schrittwieser, J., et al.: Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context (2024). arXiv preprint arXiv:2403.05530
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models (2024). arXiv preprint arXiv:2407.21783
Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R.: Galactica: A large language model for science (2022). arXiv preprint arXiv:2211.09085
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., et al.: Lamda: Language models for dialog applications (2022). arXiv preprint arXiv:2201.08239
Lieber, O., Sharir, O., Lenz, B., Shoham, Y.: Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs 1(9) (2021)
Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W., et al.: Webgpt: browser-assisted question-answering with human feedback (2021). arXiv preprint arXiv:2112.09332
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Le Scao, T., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A.S., Yvon, F., Gallé, M., et al.: Bloom: a 176b-parameter open-access multilingual language model (2023)
Vavekanand, R., Sam, K.: Llama 3.1: an in-depth analysis of the next-generation large language model
Smith, S., Patwary, M., Norick, B., LeGresley, P., Rajbhandari, S., Casper, J., Liu, Z., Prabhumoye, S., Zerveas, G., Korthikanti, V., et al.: Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model (2022). arXiv preprint arXiv:2201.11990
Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivière, M., Kale, M.S., Love, J., et al.: Gemma: open models based on gemini research and technology (2024). arXiv preprint arXiv:2403.08295
Li, Z., Lu, S., Guo, D., Duan, N., Jannu, S., Jenks, G., Majumder, D., Green, J., Svyatkovskiy, A., Fu, S., et al.: Automating code review activities by large-scale pre-training. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1035–1047 (2022)
He, J., Zhou, X., Xu, B., Zhang, T., Kim, K., Yang, Z., Thung, F., Irsan, I.C., Lo, D.: Representation learning for stack overflow posts: how far are we? ACM Trans. Softw. Eng. Methodol. 33(3), 1–24 (2024)
He, J., Xu, B., Yang, Z., Han, D., Yang, C., Lo, D.: Ptm4tag: sharpening tag recommendation of stack overflow posts with pre-trained models. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 1–11 (2022)
Yang, C., Xu, B., Thung, F., Shi, Y., Zhang, T., Yang, Z., Zhou, X., Shi, J., He, J., Han, D., et al.: Answer summarization for technical queries: benchmark and new approach. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–13 (2022)
Roziere, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X.E., Adi, Y., Liu, J., Remez, T., Rapin, J., et al.: Code llama: open foundation models for code (2023). arXiv preprint arXiv:2308.12950
Le, H., Wang, Y., Gotmare, A.D., Savarese, S., Hoi, S.C.H.: Coderl: mastering code generation through pretrained models and deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 35, 21314–21328 (2022)
Sadybekov, A.V., Katritch, V.: Computational approaches streamlining drug discovery. Nature 616(7958), 673–685 (2023)
Gorgulla, C., Jayaraj, A., Fackeldey, K., Arthanari, H.: Emerging frontiers in virtual drug discovery: from quantum mechanical methods to deep learning approaches. Curr. Opin. Chem. Biol. 69, 102156 (2022)
Savage, N.: Drug discovery companies are customizing chatgpt: here’s how. Nat. Biotechnol. 41(5), 585–586 (2023)
Haley, B., Roudnicky, F.: Functional genomics for cancer drug target discovery. Cancer Cell 38(1), 31–43 (2020)
Paananen, J., Fortino, V.: An omics perspective on drug target discovery platforms. Brief. Bioinform. 21(6), 1937–1953 (2020)
Zhang, Z., Zohren, S., Roberts, S.: Deep learning for portfolio optimization. J. Financ. Data Sci. (2020)
Mashrur, A., Luo, W., Zaidi, N.A., Robles-Kelly, A.: Machine learning for financial risk management: a survey. Ieee Access 8, 203203–203223 (2020)
Shah, A., Raj, P., Pushpam Kumar, S.P., Asha, H.: Finaid, a financial advisor application using ai
Misischia, C.V., Poecze, F., Strauss, C.: Chatbots in customer service: their relevance and impact on service quality. Procedia Comput. Sci. 201, 421–428 (2022)
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., Mann, G.: Bloomberggpt: a large language model for finance (2023). arXiv preprint arXiv:2303.17564
Thirunavukarasu, A.J., Ting, D.S.J., Elangovan, K., Gutierrez, L., Tan, T.F., Ting, D.S.W.: Large language models in medicine. Nat. Med. 29(8), 1930–1940 (2023)
Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., et al.: Towards expert-level medical question answering with large language models (2023). arXiv preprint arXiv:2305.09617
Arora, A., Arora, A.: The promise of large language models in health care. Lancet 401(10377), 641 (2023)
Bommarito II, M., Katz, D.M.: Gpt takes the bar exam (2022). arXiv preprint arXiv:2212.14402
Iu, K.Y., Wong, V.M.-Y.: Chatgpt by openai: the end of litigation lawyers? Available at SSRN 4339839 (2023)
Lee, U., Lee, S., Koh, J., Jeong, Y., Jung, H., Byun, G., Lee, Y., Moon, J., Lim, J., Kim, H.: Generative Agent for Teacher Training: Designing Educational Problem-Solving Simulations with Large Language Model-based Agents for Pre-Service Teachers. NeurIPS
Markel, J.M., Opferman, S.G., Landay, J.A., Piech, C.: Gpteach: Interactive ta training with gpt-based students. In: Proceedings of the Tenth Acm Conference on Learning@ Scale, pp. 226–236 (2023)
Tu, S., Zhang, Z., Yu, J., Li, C., Zhang, S., Yao, Z., Hou, L., Li, J.: Littlemu: Deploying an online virtual teaching assistant via heterogeneous sources integration and chain of teach prompts. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4843–4849 (2023)
Chen, Y., Ding, N., Zheng, H.-T., Liu, Z., Sun, M., Zhou, B.: Empowering private tutoring by chaining large language models (2023). arXiv preprint arXiv:2309.08112
Zentner, A.: Applied innovation: artificial intelligence in higher education. Available at SSRN 4314180 (2022)
Zhang, B.: Preparing educators and students for chatgpt and ai technology in higher education. ResearchGate (2023)
Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., Baabdullah, A.M., Koohang, A., Raghavan, V., Ahuja, M., et al.: Opinion paper:"so what if chatgpt wrote it?" multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. Int. J. Inf. Manage. 71, 102642 (2023)
Chen, Y., Jensen, S., Albert, L.J., Gupta, S., Lee, T.: Artificial intelligence (ai) student assistants in the classroom: designing chatbots to support student success. Inf. Syst. Front. 25(1), 161–182 (2023)
Yan, B., Li, K., Xu, M., Dong, Y., Zhang, Y., Ren, Z., Cheng, X.: On protecting the data privacy of large language models (llms): a survey (2024). arXiv preprint arXiv:2403.05156
Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650 (2021)
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015)
Leboukh, F., Aduku, E.B., Ali, O.: Balancing chatgpt and data protection in Germany: challenges and opportunities for policy makers. J. Polit. Ethics New Technol. AI 2(1), 35166–35166 (2023)
Falade, P.V.: Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in social engineering attacks (2023). arXiv preprint arXiv:2310.05595
Amos, Z.: What is fraudgpt? (2023)
Delley, D.: Wormgpt—the generative ai tool cybercriminals are using to launch business email compromise attacks. SlashNext. Retrieved August 24, 2023 (2023)
Chu, Z., Wang, Z., Zhang, W.: Fairness in large language models: a taxonomic survey. ACM SIGKDD Explor. Newsl. 2024, 34–48 (2024)
Doan, T.V., Wang, Z., Nguyen, M.N., Zhang, W.: Fairness in large language models in three hours. In: Proceedings of the 33rd ACM International Conference on Information & Knowledge Management (Boise, USA) (2024)
Doan, T., Chu, Z., Wang, Z., Zhang, W.: Fairness definitions in language models explained (2024). arXiv preprint arXiv:2407.18454
Zhang, W.: Ai fairness in practice: paradigm, challenges, and prospects. Ai Mag. (2024)
Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623 (2021)
Meade, N., Poole-Dayan, E., Reddy, S.: An empirical survey of the effectiveness of debiasing techniques for pre-trained language models (2021). arXiv preprint arXiv:2110.08527
Gallegos, I.O., Rossi, R.A., Barrow, J., Tanjim, M.M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., Ahmed, N.K.: Bias and fairness in large language models: a survey. Comput. Linguist. 1–79 (2024)
Wang, Z., Chu, Z., Blanco, R., Chen, Z., Chen, S.-C., Zhang, W.: Advancing graph counterfactual fairness through fair representation learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2024). Springer Nature Switzerland
Blodgett, S.L., O’Connor, B.: Racial disparity in natural language processing: a case study of social media African-American English (2017). arXiv preprint arXiv:1707.00061
Mei, K., Fereidooni, S., Caliskan, A.: Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1699–1710 (2023)
Dash, D., Thapa, R., Banda, J.M., Swaminathan, A., Cheatham, M., Kashyap, M., Kotecha, N., Chen, J.H., Gombar, S., Downing, L., et al.: Evaluation of gpt-3.5 and gpt-4 for supporting real-world information needs in healthcare delivery (2023). arXiv preprint arXiv:2304.13714
Pal, A., Umapathi, L.K., Sankarasubbu, M.: Med-halt: Medical domain hallucination test for large language models. In: Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pp. 314–334 (2023)
Dzuong, J., Wang, Z., Zhang, W.: Uncertain boundaries: multidisciplinary approaches to copyright issues in generative ai (2024). arXiv preprint arXiv:2404.08221
Yazdani, S., Saxena, N., Wang, Z., Wu, Y., Zhang, W.: A comprehensive survey of image and video generative ai: Recent advances, variants, and applications (2024)
Small, Z.: Sarah silverman sues openai and meta over copyright infringement. The New York Times (2023)
Stempel, J.: NY Times sues openai, Microsoft for infringing copyrighted works... Thomson Reuters Corporation (2023). https://www.reuters.com/legal/transactional/ny-times-sues-openai-microsoft-infringing-copyrighted-work-2023-12-27/
Li, Z., Wang, C., Wang, S., Gao, C.: Protecting intellectual property of large language model-based code generation apis via watermarks. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 2336–2350 (2023)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., Chu, Z., Doan, T.V. et al. History, development, and principles of large language models: an introductory survey. AI Ethics (2024). https://doi.org/10.1007/s43681-024-00583-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43681-024-00583-7