Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

History, development, and principles of large language models: an introductory survey

  • Review
  • Published:
AI and Ethics Aims and scope Submit manuscript

Abstract

Language models serve as a cornerstone in natural language processing, utilizing mathematical methods to generalize language laws and knowledge for prediction and generation. Over extensive research spanning decades, language modeling has progressed from initial statistical language models to the contemporary landscape of large language models (LLMs). Notably, the swift evolution of LLMs has reached the ability to process, understand, and generate human-level text. Nevertheless, despite the significant advantages that LLMs offer in improving both work and personal lives, the limited understanding among general practitioners about the background and principles of these models hampers their full potential. Notably, most LLM reviews focus on specific aspects and utilize specialized language, posing a challenge for practitioners lacking relevant background knowledge. In light of this, this survey aims to present a comprehensible overview of LLMs to assist a broader audience. It strives to facilitate a comprehensive understanding by exploring the historical background of language models and tracing their evolution over time. The survey further investigates the factors influencing the development of LLMs, emphasizing key contributions. Additionally, it concentrates on elucidating the underlying principles of LLMs, equipping audiences with essential theoretical knowledge. The survey also highlights the limitations of existing work and points out promising future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://commoncrawl.org/get-started.

  2. https://github.com/.

  3. https://huggingface.co/datasets/wikipedia.

  4. https://arxiv.org/.

  5. https://stackexchange.com/.

  6. https://openai.com/research/ai-and-compute.

  7. https://www.consumerfinance.gov/data-research/research-reports/chatbots-in-consumer-finance/chatbots-in-consumer-finance.

  8. https://sites.highlands.edu/tutorial-center/tutor-resources/online-tutor-training/module-4/the-socratic-method/.

  9. How ChatGPT and other new AI tools are being used by lawyers, architects and coders - ABC News.

  10. https://data.delilegal.com/lawQuestion.

  11. https://www.reuters.com/technology/microsoft-attracting-users-its-code-writing-generative-ai-software-2023-01-25/.

  12. https://www.reuters.com/legal/litigation/openai-microsoft-want-court-toss-lawsuit-accusing-them-abusing-open-source-code-2023-01-27/.

References

  1. Pinker, S.: The Language Instinct: How the Mind Creates. Harper Collins Language, New York (1994)

    Book  Google Scholar 

  2. Turing, A.M., Geirsson, H., Losonsky, M.: Computing machinery and intelligence. Artif. Intell. Crit. Concepts 2(236), 19 (2000)

    Google Scholar 

  3. Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., Baabdullah, A.M., Koohang, A., Raghavan, V., Ahuja, M., et al.: “so what if chatgpt wrote it?’’ multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. Int. J. Inf. Manage. 71, 102642 (2023)

    Article  Google Scholar 

  4. Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models (2023). arXiv preprint arXiv:2303.18223

  5. Jin, H., Wei, W., Wang, X., Zhang, W., Wu, Y.: Rethinking Learning Rate Tuning in the Era of Large Language Models, pp. 112–121 (2023). IEEE

  6. Fu, Y., Peng, H., Khot, T.: How does gpt obtain its ability? Tracing emergent abilities of language models to their sources. Yao Fu’s Notion (2022)

  7. Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., et al.: Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022)

  8. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)

  9. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  10. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  11. Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report (2023). arXiv preprint arXiv:2303.08774

  12. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)

    Article  Google Scholar 

  13. Han, X., Zhang, Z., Ding, N., Gu, Y., Liu, X., Huo, Y., Qiu, J., Yao, Y., Zhang, A., Zhang, L., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)

    Article  Google Scholar 

  14. Shanahan, M.: Talking about large language models. Commun. ACM 67(2), 68–79 (2024)

    Article  Google Scholar 

  15. Dodge, J., Sap, M., Marasović, A., Agnew, W., Ilharco, G., Groeneveld, D., Mitchell, M., Gardner, M.: Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1286–1305 (2021)

  16. Sonkar, S., Liu, N., Mallick, D., Baraniuk, R.: Class: A design framework for building intelligent tutoring systems based on learning science principles. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1941–1961 (2023)

  17. Kim, B., Kim, H., Lee, S.-W., Lee, G., Kwak, D., Hyeon, J.D., Park, S., Kim, S., Kim, S., Seo, D., et al.: What changes can large-scale language models bring? intensive study on hyperclova: billions-scale Korean generative pretrained transformers. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3405–3424 (2021)

  18. Tay, Y., Wei, J., Chung, H., Tran, V., So, D., Shakeri, S., Garcia, X., Zheng, S., Rao, J., Chowdhery, A., et al.: Transcending scaling laws with 0.1% extra compute. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1471–1486 (2023)

  19. Ahmad, W., Chakraborty, S., Ray, B., Chang, K.: Unified pre-training for program understanding and generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)

  20. Muennighoff, N., Wang, T., Sutawika, L., Roberts, A., Biderman, S., Le Scao, T., Bari, M.S., Shen, S., Yong, Z.X., Schoelkopf, H., et al.: Crosslingual generalization through multitask finetuning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15991–16111 (2023)

  21. Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al.: A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 675–718 (2023)

  22. Sheng, E., Chang, K.-W., Natarajan, P., Peng, N.: Societal biases in language generation: progress and challenges. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4275–4293 (2021)

  23. Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, S., Zettlemoyer, L., Lewis, M.: Incoder: a generative model for code infilling and synthesis. In: The Eleventh International Conference on Learning Representations

  24. Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., Xiong, C.: Codegen: an open large language model for code with multi-turn program synthesis. In: The Eleventh International Conference on Learning Representations (2022)

  25. Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners. In: International Conference on Learning Representations (2021)

  26. Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Le Scao, T., Raja, A., et al.: Multitask prompted training enables zero-shot task generalization. In: ICLR 2022-Tenth International Conference on Learning Representations (2022)

  27. Tay, Y., Dehghani, M., Tran, V.Q., Garcia, X., Wei, J., Wang, X., Chung, H.W., Bahri, D., Schuster, T., Zheng, S., et al.: Ul2: Unifying language learning paradigms. In: The Eleventh International Conference on Learning Representations

  28. Zeng, A., Liu, X., Du, Z., Wang, Z., Lai, H., Ding, M., Yang, Z., Xu, Y., Zheng, W., Xia, X., et al.: Glm-130b: an open bilingual pre-trained model. In: The Eleventh International Conference on Learning Representations

  29. He, P., Liu, X., Gao, J., Chen, W.: Deberta: Decoding-enhanced bert with disentangled attention. In: International Conference on Learning Representations

  30. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models (2022)

  31. Du, N., Huang, Y., Dai, A.M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A.W., Firat, O., et al.: Glam: efficient scaling of language models with mixture-of-experts. In: International Conference on Machine Learning, pp. 5547–5569 (2022). PMLR

  32. Biderman, S., Schoelkopf, H., Anthony, Q.G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M.A., Purohit, S., Prashanth, U.S., Raff, E., et al.: Pythia: A suite for analyzing large language models across training and scaling. In: International Conference on Machine Learning, pp. 2397–2430 (2023). PMLR

  33. Wang, T., Roberts, A., Hesslow, D., Le Scao, T., Chung, H.W., Beltagy, I., Launay, J., Raffel, C.: What language model architecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, pp. 22964–22984 (2022). PMLR

  34. Yu, Z., Wu, Y., Zhang, N., Wang, C., Vorobeychik, Y., Xiao, C.: Codeipprompt: intellectual property infringement assessment of code language models. In: International Conference on Machine Learning, pp. 40373–40389 (2023). PMLR

  35. Steenhoek, B., Rahman, M.M., Jiles, R., Le, W.: An empirical study of deep learning models for vulnerability detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 2237–2248 (2023). IEEE

  36. Yin, Z., Wang, Z., Zhang, W.: Improving fairness in machine learning software via counterfactual fairness thinking. In: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, pp. 420–421 (2024)

  37. Li, Y., Wang, S., Ding, H., Chen, H.: Large language models in finance: a survey. In: Proceedings of the Fourth ACM International Conference on AI in Finance, pp. 374–382 (2023)

  38. Pagliaro, C., Mehta, D., Shiao, H.-T., Wang, S., Xiong, L.: Investor behavior modeling by analyzing financial advisor notes: a machine learning perspective. In: Proceedings of the Second ACM International Conference on AI in Finance, pp. 1–8 (2021)

  39. Saxena, N.A., Zhang, W., Shahabi, C.: Missed opportunities in fair ai. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 961–964 (2023). SIAM

  40. Wang, Z., Narasimhan, G., Yao, X., Zhang, W.: Mitigating multisource biases in graph neural networks via real counterfactual samples. In: 2023 IEEE International Conference on Data Mining (ICDM), pp. 638–647 (2023). IEEE

  41. Chinta, S.V., Fernandes, K., Cheng, N., Fernandez, J., Yazdani, S., Yin, Z., Wang, Z., Wang, X., Xu, W., Liu, J., et al.: Optimization and improvement of fake news detection using voting technique for societal benefit. In: 2023 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1565–1574 (2023). IEEE

  42. Xiao, C., Xu, S.X., Zhang, K., Wang, Y., Xia, L.: Evaluating reading comprehension exercises generated by llms: a showcase of chatgpt in education applications. In: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pp. 610–625 (2023)

  43. Gupta, A., Dengre, V., Kheruwala, H.A., Shah, M.: Comprehensive review of text-mining applications in finance. Financ. Innov. 6, 1–25 (2020)

    Article  Google Scholar 

  44. Kung, T.H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., et al.: Performance of chatgpt on usmle: potential for ai-assisted medical education using large language models. PLoS Digit. Health 2(2), 0000198 (2023)

    Article  Google Scholar 

  45. Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on bert model. PLoS ONE 15(8), 0237861 (2020)

    Article  Google Scholar 

  46. Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., Szolovits, P.: What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Appl. Sci. 11(14), 6421 (2021)

    Article  Google Scholar 

  47. Kombrink, S., Mikolov, T., Karafiát, M., Burget, L.: Recurrent neural network based language modeling in meeting recognition. Interspeech 11, 2877–2880 (2011)

    Article  Google Scholar 

  48. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, pp. 1045–1048 (2010). Makuhari

  49. Stolcke, A., et al.: Srilm—an extensible language modeling toolkit. Interspeech 2002, 2002 (2002)

    Google Scholar 

  50. Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024)

    MathSciNet  Google Scholar 

  51. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1–113 (2023)

    Google Scholar 

  52. Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23(120), 1–39 (2022)

    MathSciNet  Google Scholar 

  53. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    MathSciNet  Google Scholar 

  54. Baidoo-Anu, D., Ansah, L.O.: Education in the era of generative artificial intelligence (ai): understanding the potential benefits of chatgpt in promoting teaching and learning. J. AI 7(1), 52–62 (2023)

    Article  Google Scholar 

  55. Chen, Z.Z., Ma, J., Zhang, X., Hao, N., Yan, A., Nourbakhsh, A., Yang, X., McAuley, J., Petzold, L., Wang, W.Y.: A survey on large language models for critical societal domains: finance, healthcare, and law (2024). arXiv preprint arXiv:2405.01769

  56. Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y.: A survey on large language model (llm) security and privacy: the good, the bad, and the ugly. High-Confiden. Comput. 100211 (2024)

  57. Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al.: A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions (2023). arXiv preprint arXiv:2311.05232

  58. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)

    Google Scholar 

  59. Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88(8), 1270–1278 (2000)

    Article  Google Scholar 

  60. Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Adv. Neural Inf. Process. Syst. 13 (2000)

  61. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26 (2013)

  62. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805

  63. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Roziere, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models (2023). arXiv preprint arXiv:2302.13971

  64. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016). arXiv preprint arXiv:1602.02410

  65. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. Adv. Neural Inf. Process. Syst. 28 (2015)

  66. Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., et al.: The pile: an 800gb dataset of diverse text for language modeling (2020). arXiv preprint arXiv:2101.00027

  67. Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models (2020). arXiv preprint arXiv:2001.08361

  68. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  69. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  70. Cui, J., Li, Z., Yan, Y., Chen, B., Yuan, L.: Chatlaw: open-source legal large language model with integrated external knowledge bases (2023). arXiv preprint arXiv:2306.16092

  71. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)

  72. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations (2019). arXiv preprint arXiv:1909.11942

  73. Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S., et al.: Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 34, 770–778 (2016)

    Google Scholar 

  74. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). arXiv preprint arXiv:1607.06450

  75. Wang, C., Li, M., Smola, A.J.: Language models with transformers (2019). arXiv preprint arXiv:1904.09408

  76. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.D.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models trained on code (2021). arXiv preprint arXiv:2107.03374

  77. Clark, K.: Electra: pre-training text encoders as discriminators rather than generators (2020). arXiv preprint arXiv:2003.10555

  78. Zhuang, L., Wayne, L., Ya, S., Jun, Z.: A robustly optimized bert pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227 (2021)

  79. Conneau, A.: Unsupervised cross-lingual representation learning at scale (2019). arXiv preprint arXiv:1911.02116

  80. Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., Liu, J., Chen, X., Zhao, Y., Lu, Y., et al.: Ernie 3.0: large-scale knowledge enhanced pre-training for language understanding and generation (2021). arXiv preprint arXiv:2107.02137

  81. Soltan, S., Ananthakrishnan, S., FitzGerald, J., Gupta, R., Hamza, W., Khan, H., Peris, C., Rawls, S., Rosenbaum, A., Rumshisky, A., et al.: Alexatm 20b: few-shot learning using a large-scale multilingual seq2seq model (2022). arXiv preprint arXiv:2208.01448

  82. Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Dal Lago, A., et al.: Competition-level code generation with alphacode. Science 378(6624), 1092–1097 (2022)

    Article  Google Scholar 

  83. Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., et al.: A general language assistant as a laboratory for alignment (2021). arXiv preprint arXiv:2112.00861

  84. Costa-jussà, M.R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Lam, J., Licht, D., Maillard, J., et al.: No language left behind: Scaling human-centered machine translation (2022). arXiv preprint arXiv:2207.04672

  85. Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., Thacker, P., et al.: Improving alignment of dialogue agents via targeted human judgements (2022). arXiv preprint arXiv:2209.14375

  86. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Las Casas, D., Hendricks, L.A., Welbl, J., Clark, A., et al.: Training compute-optimal large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, pp. 30016–30030 (2022)

  87. Iyer, S., Lin, X.V., Pasunuru, R., Mihaylov, T., Simig, D., Yu, P., Shuster, K., Wang, T., Liu, Q., Koura, P.S., et al.: Opt-iml: scaling language model instruction meta learning through the lens of generalization (2022). arXiv preprint arXiv:2212.12017

  88. Rae, J.W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., et al.: Scaling language models: methods, analysis & insights from training gopher (2021). arXiv preprint arXiv:2112.11446

  89. Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Wang, Z., Shen, L., Wang, A., Li, Y., et al.: Codegeex: a pre-trained model for code generation with multilingual evaluations on humaneval-x (2023). arXiv preprint arXiv:2303.17568

  90. Wei, T., Zhao, L., Zhang, L., Zhu, B., Wang, L., Yang, H., Li, B., Cheng, C., Lü, W., Hu, R., et al.: Skywork: a more open bilingual foundation model (2023). arXiv preprint arXiv:2310.19341

  91. Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al.: Qwen technical report (2023). arXiv preprint arXiv:2309.16609

  92. Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., Zhou, Y.: Codegen2: Lessons for training llms on programming and natural languages (2023). arXiv preprint arXiv:2305.02309

  93. Black, S., Biderman, S., Hallahan, E., Anthony, Q.G., Gao, L., Golding, L., He, H., Leahy, C., McDonell, K., Phang, J., et al.: Gpt-neox-20b: an open-source autoregressive language model. In: Challenges \(\{\)\(\backslash \) &\(\}\) Perspectives in Creating Large Language Models

  94. Reid, M., Savinov, N., Teplyashin, D., Lepikhin, D., Lillicrap, T., Alayrac, J.-b., Soricut, R., Lazaridou, A., Firat, O., Schrittwieser, J., et al.: Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context (2024). arXiv preprint arXiv:2403.05530

  95. Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models (2024). arXiv preprint arXiv:2407.21783

  96. Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R.: Galactica: A large language model for science (2022). arXiv preprint arXiv:2211.09085

  97. Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., et al.: Lamda: Language models for dialog applications (2022). arXiv preprint arXiv:2201.08239

  98. Lieber, O., Sharir, O., Lenz, B., Shoham, Y.: Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs 1(9) (2021)

  99. Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W., et al.: Webgpt: browser-assisted question-answering with human feedback (2021). arXiv preprint arXiv:2112.09332

  100. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)

    Google Scholar 

  101. Le Scao, T., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A.S., Yvon, F., Gallé, M., et al.: Bloom: a 176b-parameter open-access multilingual language model (2023)

  102. Vavekanand, R., Sam, K.: Llama 3.1: an in-depth analysis of the next-generation large language model

  103. Smith, S., Patwary, M., Norick, B., LeGresley, P., Rajbhandari, S., Casper, J., Liu, Z., Prabhumoye, S., Zerveas, G., Korthikanti, V., et al.: Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model (2022). arXiv preprint arXiv:2201.11990

  104. Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivière, M., Kale, M.S., Love, J., et al.: Gemma: open models based on gemini research and technology (2024). arXiv preprint arXiv:2403.08295

  105. Li, Z., Lu, S., Guo, D., Duan, N., Jannu, S., Jenks, G., Majumder, D., Green, J., Svyatkovskiy, A., Fu, S., et al.: Automating code review activities by large-scale pre-training. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1035–1047 (2022)

  106. He, J., Zhou, X., Xu, B., Zhang, T., Kim, K., Yang, Z., Thung, F., Irsan, I.C., Lo, D.: Representation learning for stack overflow posts: how far are we? ACM Trans. Softw. Eng. Methodol. 33(3), 1–24 (2024)

    Article  Google Scholar 

  107. He, J., Xu, B., Yang, Z., Han, D., Yang, C., Lo, D.: Ptm4tag: sharpening tag recommendation of stack overflow posts with pre-trained models. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 1–11 (2022)

  108. Yang, C., Xu, B., Thung, F., Shi, Y., Zhang, T., Yang, Z., Zhou, X., Shi, J., He, J., Han, D., et al.: Answer summarization for technical queries: benchmark and new approach. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–13 (2022)

  109. Roziere, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X.E., Adi, Y., Liu, J., Remez, T., Rapin, J., et al.: Code llama: open foundation models for code (2023). arXiv preprint arXiv:2308.12950

  110. Le, H., Wang, Y., Gotmare, A.D., Savarese, S., Hoi, S.C.H.: Coderl: mastering code generation through pretrained models and deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 35, 21314–21328 (2022)

    Google Scholar 

  111. Sadybekov, A.V., Katritch, V.: Computational approaches streamlining drug discovery. Nature 616(7958), 673–685 (2023)

    Article  Google Scholar 

  112. Gorgulla, C., Jayaraj, A., Fackeldey, K., Arthanari, H.: Emerging frontiers in virtual drug discovery: from quantum mechanical methods to deep learning approaches. Curr. Opin. Chem. Biol. 69, 102156 (2022)

    Article  Google Scholar 

  113. Savage, N.: Drug discovery companies are customizing chatgpt: here’s how. Nat. Biotechnol. 41(5), 585–586 (2023)

    Article  Google Scholar 

  114. Haley, B., Roudnicky, F.: Functional genomics for cancer drug target discovery. Cancer Cell 38(1), 31–43 (2020)

    Article  Google Scholar 

  115. Paananen, J., Fortino, V.: An omics perspective on drug target discovery platforms. Brief. Bioinform. 21(6), 1937–1953 (2020)

    Article  Google Scholar 

  116. Zhang, Z., Zohren, S., Roberts, S.: Deep learning for portfolio optimization. J. Financ. Data Sci. (2020)

  117. Mashrur, A., Luo, W., Zaidi, N.A., Robles-Kelly, A.: Machine learning for financial risk management: a survey. Ieee Access 8, 203203–203223 (2020)

    Article  Google Scholar 

  118. Shah, A., Raj, P., Pushpam Kumar, S.P., Asha, H.: Finaid, a financial advisor application using ai

  119. Misischia, C.V., Poecze, F., Strauss, C.: Chatbots in customer service: their relevance and impact on service quality. Procedia Comput. Sci. 201, 421–428 (2022)

    Article  Google Scholar 

  120. Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., Mann, G.: Bloomberggpt: a large language model for finance (2023). arXiv preprint arXiv:2303.17564

  121. Thirunavukarasu, A.J., Ting, D.S.J., Elangovan, K., Gutierrez, L., Tan, T.F., Ting, D.S.W.: Large language models in medicine. Nat. Med. 29(8), 1930–1940 (2023)

    Article  Google Scholar 

  122. Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., et al.: Towards expert-level medical question answering with large language models (2023). arXiv preprint arXiv:2305.09617

  123. Arora, A., Arora, A.: The promise of large language models in health care. Lancet 401(10377), 641 (2023)

    Article  Google Scholar 

  124. Bommarito II, M., Katz, D.M.: Gpt takes the bar exam (2022). arXiv preprint arXiv:2212.14402

  125. Iu, K.Y., Wong, V.M.-Y.: Chatgpt by openai: the end of litigation lawyers? Available at SSRN 4339839 (2023)

  126. Lee, U., Lee, S., Koh, J., Jeong, Y., Jung, H., Byun, G., Lee, Y., Moon, J., Lim, J., Kim, H.: Generative Agent for Teacher Training: Designing Educational Problem-Solving Simulations with Large Language Model-based Agents for Pre-Service Teachers. NeurIPS

  127. Markel, J.M., Opferman, S.G., Landay, J.A., Piech, C.: Gpteach: Interactive ta training with gpt-based students. In: Proceedings of the Tenth Acm Conference on Learning@ Scale, pp. 226–236 (2023)

  128. Tu, S., Zhang, Z., Yu, J., Li, C., Zhang, S., Yao, Z., Hou, L., Li, J.: Littlemu: Deploying an online virtual teaching assistant via heterogeneous sources integration and chain of teach prompts. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4843–4849 (2023)

  129. Chen, Y., Ding, N., Zheng, H.-T., Liu, Z., Sun, M., Zhou, B.: Empowering private tutoring by chaining large language models (2023). arXiv preprint arXiv:2309.08112

  130. Zentner, A.: Applied innovation: artificial intelligence in higher education. Available at SSRN 4314180 (2022)

  131. Zhang, B.: Preparing educators and students for chatgpt and ai technology in higher education. ResearchGate (2023)

  132. Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., Baabdullah, A.M., Koohang, A., Raghavan, V., Ahuja, M., et al.: Opinion paper:"so what if chatgpt wrote it?" multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. Int. J. Inf. Manage. 71, 102642 (2023)

    Article  Google Scholar 

  133. Chen, Y., Jensen, S., Albert, L.J., Gupta, S., Lee, T.: Artificial intelligence (ai) student assistants in the classroom: designing chatbots to support student success. Inf. Syst. Front. 25(1), 161–182 (2023)

    Article  Google Scholar 

  134. Yan, B., Li, K., Xu, M., Dong, Y., Zhang, Y., Ren, Z., Cheng, X.: On protecting the data privacy of large language models (llms): a survey (2024). arXiv preprint arXiv:2403.05156

  135. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650 (2021)

  136. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015)

  137. Leboukh, F., Aduku, E.B., Ali, O.: Balancing chatgpt and data protection in Germany: challenges and opportunities for policy makers. J. Polit. Ethics New Technol. AI 2(1), 35166–35166 (2023)

    Article  Google Scholar 

  138. Falade, P.V.: Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in social engineering attacks (2023). arXiv preprint arXiv:2310.05595

  139. Amos, Z.: What is fraudgpt? (2023)

  140. Delley, D.: Wormgpt—the generative ai tool cybercriminals are using to launch business email compromise attacks. SlashNext. Retrieved August 24, 2023 (2023)

  141. Chu, Z., Wang, Z., Zhang, W.: Fairness in large language models: a taxonomic survey. ACM SIGKDD Explor. Newsl. 2024, 34–48 (2024)

    Article  Google Scholar 

  142. Doan, T.V., Wang, Z., Nguyen, M.N., Zhang, W.: Fairness in large language models in three hours. In: Proceedings of the 33rd ACM International Conference on Information & Knowledge Management (Boise, USA) (2024)

  143. Doan, T., Chu, Z., Wang, Z., Zhang, W.: Fairness definitions in language models explained (2024). arXiv preprint arXiv:2407.18454

  144. Zhang, W.: Ai fairness in practice: paradigm, challenges, and prospects. Ai Mag. (2024)

  145. Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623 (2021)

  146. Meade, N., Poole-Dayan, E., Reddy, S.: An empirical survey of the effectiveness of debiasing techniques for pre-trained language models (2021). arXiv preprint arXiv:2110.08527

  147. Gallegos, I.O., Rossi, R.A., Barrow, J., Tanjim, M.M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., Ahmed, N.K.: Bias and fairness in large language models: a survey. Comput. Linguist. 1–79 (2024)

  148. Wang, Z., Chu, Z., Blanco, R., Chen, Z., Chen, S.-C., Zhang, W.: Advancing graph counterfactual fairness through fair representation learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2024). Springer Nature Switzerland

  149. Blodgett, S.L., O’Connor, B.: Racial disparity in natural language processing: a case study of social media African-American English (2017). arXiv preprint arXiv:1707.00061

  150. Mei, K., Fereidooni, S., Caliskan, A.: Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1699–1710 (2023)

  151. Dash, D., Thapa, R., Banda, J.M., Swaminathan, A., Cheatham, M., Kashyap, M., Kotecha, N., Chen, J.H., Gombar, S., Downing, L., et al.: Evaluation of gpt-3.5 and gpt-4 for supporting real-world information needs in healthcare delivery (2023). arXiv preprint arXiv:2304.13714

  152. Pal, A., Umapathi, L.K., Sankarasubbu, M.: Med-halt: Medical domain hallucination test for large language models. In: Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pp. 314–334 (2023)

  153. Dzuong, J., Wang, Z., Zhang, W.: Uncertain boundaries: multidisciplinary approaches to copyright issues in generative ai (2024). arXiv preprint arXiv:2404.08221

  154. Yazdani, S., Saxena, N., Wang, Z., Wu, Y., Zhang, W.: A comprehensive survey of image and video generative ai: Recent advances, variants, and applications (2024)

  155. Small, Z.: Sarah silverman sues openai and meta over copyright infringement. The New York Times (2023)

  156. Stempel, J.: NY Times sues openai, Microsoft for infringing copyrighted works... Thomson Reuters Corporation (2023). https://www.reuters.com/legal/transactional/ny-times-sues-openai-microsoft-infringing-copyrighted-work-2023-12-27/

  157. Li, Z., Wang, C., Wang, S., Gao, C.: Protecting intellectual property of large language model-based code generation apis via watermarks. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 2336–2350 (2023)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenbin Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Chu, Z., Doan, T.V. et al. History, development, and principles of large language models: an introductory survey. AI Ethics (2024). https://doi.org/10.1007/s43681-024-00583-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43681-024-00583-7

Keywords