research-article

Semantic anomaly detection with large language models

Authors:

Christopher Agia,

Edward Schmerling,

Issa A. D. Nesnas,

Marco PavoneAuthors Info & Claims

Autonomous Robots, Volume 47, Issue 8

Pages 1035 - 1055

https://doi.org/10.1007/s10514-023-10132-6

Published: 23 October 2023 Publication History

Abstract

As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic anomaly detection in vision-based policies. Our experiments apply this framework to a finite state machine policy for autonomous driving and a learned policy for object manipulation. These experiments demonstrate that the LLM-based monitor can effectively identify semantic anomalies in a manner that shows agreement with human reasoning. Finally, we provide an extended discussion on the strengths and weaknesses of this approach and motivate a research outlook on how we can further use foundation models for semantic anomaly detection. Our project webpage can be found at https://sites.google.com/view/llm-anomaly-detection.

References

[1]

Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR, Makarenkov V, and Nahavandi S A review of uncertainty quantification in deep learning: Techniques, applications and challenges Information Fusion 2021 76 243-297

[2]

Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al. (2022). Flamingo: A visual language model for few-shot learning. In Advances in neural information processing systems.

[3]

Amini, A., Schwarting, W., Soleimany, A., & Rus, D. (2020). Deep evidential regression. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Advances in neural information processing systems, (vol. 33, pp. 14927–14937). Curran Associates, Inc.

[4]

Antonante, P., Spivak, D. I., & Carlone, L. (2021). Monitoring and diagnosability of perception systems. In 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 168–175).

[5]

Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. Invariant risk minimization. arXiv:1907.02893.

[6]

Banerjee S, Sharma A, Schmerling E, Spolaor M, Nemerouf M, and Pavone M Karlinsky L, Michaeli T, and Nishino K Data lifecycle management in evolving input distributions for learning-based aerospace applications Computer vision–ECCV 2022 workshops 2023 Cham Springer 127-142

[7]

Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., et al. (2023). Do as i can, not as i say: Grounding language in robotic affordances. In Conference on robot learning (pp. 287–318). PMLR.

[8]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. In Advances in neural information processing systems, (Vol. 33, pp. 1877–1901).

[9]

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., et al. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.

[10]

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In European conference on computer vision (pp. 213–229). Springer.

[11]

Chen, W., Hu, S., Talak, R., & Carlone, L. (2022). Leveraging large language models for robot 3d scene understanding.

[12]

Cui, Y., Niekum, S., Gupta, A., Kumar, V., & Rajeswaran, A. (2022). Can foundation models perform zero-shot task specification for robot manipulation? In Learning for dynamics and control conference (pp. 893–905). PMLR.

[13]

Daftry, S., Zeng, S., Bagnell, J. A., & Hebert, M. (2016). Introspective perception: Learning to predict failures in vision systems. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1743–1750).

[14]

de Haan, P., Jayaraman, D., & Levine, S. (2019). Causal confusion in imitation learning. In Advances in neural information processing systems (Vol. 32). Curran Associates, Inc.

[15]

De Lange M, Aljundi R, Masana M, Sarah Parisot X, Jia AL, Slabaugh G, and Tuytelaars T A continual learning survey: Defying forgetting in classification tasks IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 44 7 3366-3385

[16]

Denouden, T., Salay, R., Czarnecki, K., Abdelzad, V., Phan, B., & Vernekar, S. (2018). Improving reconstruction autoencoder out-of-distribution detection with mahalanobis distance. arXiv preprint arXiv:1812.02765.

[17]

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). Carla: An open urban driving simulator. In Conference on robot learning (pp. 1–16). PMLR.

[18]

Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T. B, & Vanhoucke, V. (2022). Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 international conference on robotics and automation (ICRA) (pp. 2553–2560). IEEE.

[19]

Driess, D., Xia, F., Sajjadi, M. S. M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., Huang, W., Chebotar, Y., Sermanet, P., Duckworth, D., Levine, S., Vanhoucke, V., Hausman, K., Toussaint, M., Greff, K., Zeng, A., Mordatch, I., & Florence, P. (2023). Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378.

[20]

Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In M. Florina Balcan and K. Q. Weinberger (Eds.), Proceedings of the 33rd international conference on machine learning, volume 48 of proceedings of machine learning research (pp. 1050-1059), New York, New York, USA, 20–22. PMLR.

[21]

Geirhos R, Jacobsen J-H, Michaelis C, Zemel R, Brendel W, Bethge M, and Wichmann FA Shortcut learning in deep neural networks Nature Machine Intelligence 2020 2 11 665-673

[22]

Gomez-Donoso F, Castano-Amoros J, Escalona F, and Cazorla M Three-dimensional reconstruction using SFM for actual pedestrian classification Expert Systems with Applications 2023 213

[23]

Gulrajani, I., & Lopez-Paz, D. (2021). In search of lost domain generalization. In International conference on learning representations.

[24]

Hendrycks, D., & Gimpel, K. (2017). A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International conference on learning representations.

[25]

Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., et al. (2022). Inner monologue: Embodied reasoning through planning with language models. In 6th annual conference on robot learning.

[26]

Japkowicz, N., Myers, C. E., & Gluck, M. A. (1995). A novelty detection approach to classification. In International joint conference on artificial intelligence.

[27]

Jocher, G., Chaurasia, A., & Qiu, J. (January 2023). YOLO by Ultralytics.

[28]

Koh, P. W., et al. (Jul 2021). Wilds: A benchmark of in-the-wild distribution shifts. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th international conference on machine learning, volume 139 of proceedings of machine learning research (pp. 5637–5664). PMLR, 18–24.

[29]

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. In Advances in neural information processing systems.

[30]

Lakshminarayanan B, Pritzel A, and Blundell C Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, and Garnett R Simple and scalable predictive uncertainty estimation using deep ensembles Advances in neural information processing systems 2017 Curran Associates Inc

[31]

Lee, M. A., Tan, M., Zhu, Y., & Bohg, J. (2021). Detect, reject, correct: Crossmodal compensation of corrupted sensors. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 909–916).

[32]

Lee K, Lee K, Lee H, and Shin J Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, and Garnett R A simple unified framework for detecting out-of-distribution samples and adversarial attacks Advances in neural information processing systems 2018 Curran Associates Inc

[33]

Li, J., Li, D., Xiong, C., & Hoi, S. (2022). BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888–12900). PMLR.

[34]

Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Florence, P., Zeng, A., et al. (2022). Code as policies: Language model programs for embodied control. In Workshop on language and robotics at CoRL 2022.

[35]

Liang, S., Li, Y., & Srikant, R. (2018). Enhancing the reliability of out-of-distribution image detection in neural networks. In 6th international conference on learning representations, ICLR 2018.

[36]

Lin, K., Agia, C., Migimatsu, T., Pavone, M., & Bohg, J. (2023). Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153.

[37]

Lin, Z., Roy, S. D., & Li, Y. (2021). Mood: Multi-level out-of-distribution detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15313–15323).

[38]

Liu, J., Lin, Z., Padhy, S., Tran, D., Bedrax Weiss, T., & Lakshminarayanan, B. (2020). Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 7498–7512). Curran Associates, Inc.

[39]

Liu, Z., Bahety, A., & Song, S. (2023). Reflect: Summarizing robot experiences for failure explanation and correction. arXiv preprint arXiv:2306.15724.

[40]

Madaan, A., Zhou, S., Alon, U., Yang, Y., & Neubig, G. (2022). Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128.

[41]

McAllister, R., Kahn, G., Clune, J., & Levine, S. (2019). Robustness to out-of-distribution inputs via task-aware generative uncertainty. In ICRA (pp. 2083–2089).

[42]

Michels, F., Adaloglou, N., Kaiser, T., & Kollmann, M. (2023). Contrastive language-image pretrained (clip) models are powerful out-of-distribution detectors.

[43]

Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., Shen, Z., Wang, X., Zhai, X., Kipf, T., & Houlsby, N. (2022). Simple open-vocabulary object detection with vision transformers. In ECCV.

[44]

OpenAI. (2023). Gpt-4 technical report.

[45]

Osband, I., & Wen, Z., Asghari, S. M., Dwaracherla, V., Ibrahimi, M., Lu, X., & Van Roy, B. (2023). Epistemic neural networks.

[46]

Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J. V., Lakshminarayanan, B., & Snoek, J. (2019). Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Proceedings of the 33rd international conference on neural information processing systems, Red Hook, NY, USA. Curran Associates Inc.

[47]

Oza, P., & Patel, V. M. (2019). C2ae: Class conditioned auto-encoder for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2307–2316).

[48]

Rabiee, S., & Biswas, J. (2019). IVOA: Introspective vision for obstacle avoidance. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1230–1235). IEEE Press.

[49]

Ren, A. Z., Dixit, A., Bodrova, A., Singh, S., Tu, S., Brown, N., Xu, P., Takayama, L., Xia, F., Varley, J., et al. (2023). Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928.

[50]

Richter, C., & Roy, N. (July 2017). Safe visual navigation via deep learning and novelty detection. In RSS.

[51]

Ritter, H., Botev, A., & Barber, D. (2018). A scalable Laplace approximation for neural networks. In 6th international conference on learning representations, ICLR 2018-conference track proceedings, volume 6. International conference on representation learning.

[52]

Rosinol A, Violette A, Abate M, Hughes N, Chang Y, Shi J, Gupta A, and Carlone L Kimera: From slam to spatial perception with 3d dynamic scene graphs The International Journal of Robotics Research 2021 40 12–14 1510-1546

[53]

Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., & Kloft, M. (Jul 2018). Deep one-class classification. In Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research (pp. 4393–4402). PMLR, 10–15.

[54]

Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Dietterich TG, and Müller K-R A unifying review of deep and shallow anomaly detection Proceedings of the IEEE 2021 109 5 756-795

[55]

Salehi, M., Mirzaei, H., Hendrycks, D., Li, Y., Rohban, M. H., & Sabokrou, M. (2021). A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: Solutions and future challenges.

[56]

Shah, D., Osiński, B., Levine, S., et al. (2023). LM-NAV: Robotic navigation with large pre-trained models of language, vision, and action. In Conference on robot learning (pp. 492–504). PMLR.

[57]

Sharma, A., Azizan, N., & Pavone, M. (2021). Sketching curvature for efficient out-of-distribution detection for deep neural networks. In Uncertainty in artificial intelligence (pp. 1958–1967). PMLR.

[58]

Sharma, A., Azizan, N., & Pavone, M. (Jul 2021). Sketching curvature for efficient out-of-distribution detection for deep neural networks. In C. de Campos & M. H. Maathuis (Eds.), Proceedings of the thirty-seventh conference on uncertainty in artificial intelligence, volume 161 of proceedings of machine learning research (pp. 1958-1967). PMLR, 27–30.

[59]

Shridhar, M., Manuelli, L., & Fox, D. (2021). Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th conference on robot learning (CoRL).

[60]

Sinha, R., Sharma, A., Banerjee, S., Lew, T., Luo, R., Richards, S. M, Sun, Y., Schmerling, E., & Pavone, M. (2022). A system-level view on out-of-distribution data in robotics. arXiv:2212.14020.

[61]

Srivastava, M., Goodman, N., & Sadigh, D. (2023). Generating language corrections for teaching physical control tasks. arXiv preprint arXiv:2306.07012.

[62]

Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR, 2011 (pp. 1521–1528).

[63]

Volk, G., Müller, S., von Bernuth, A., Hospach, D., & Bringmann, O. (2019). Towards robust CNN-based object detection through augmentation with synthetic rain variations. In 2019 IEEE intelligent transportation systems conference (ITSC) (pp. 285–292).

[64]

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E. H., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in neural information processing systems.

[65]

Wilson G and Cook DJ A survey of unsupervised deep domain adaptation ACM Transactions on Intelligent Systems and Technology 2020 11 5 1-46

[66]

Yang, J., Zhou, K., Li, Y., & Liu, Z. (2021). Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334.

[67]

Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (June 2020). BDD100K: A diverse driving dataset for heterogeneous multitask learning. In IEEE/CVF conference on computer vision and pattern recognition (CVPR).

[68]

Zeng, A., Florence, P., Tompson, J., Welker, S., Chien, J., Attarian, M., Armstrong, T., Krasin, I., Duong, D., Sindhwani, V., & Lee, J. (2020). Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on robot learning (CoRL).

[69]

Zeng, A., Wong, A., Welker, S., Choromanski, K., Tombari, F., Purohit, A., Ryoo, M., Sindhwani, V., Lee, J., Vanhoucke, V., et al. (2022). Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598.

[70]

Zong, B., Song, Q., Min, M. R., Cheng, W., Lumezanu, C., Cho, D., & Chen, H. (2018). Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In International conference on learning representations.

Index Terms

Semantic anomaly detection with large language models
1. Computer systems organization
2. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Formalizing MedDRA to support semantic reasoning on adverse drug reaction terms

Graphical abstractDisplay Omitted MedDRA's current format limits accurate and consistent term selection for coding.OntoADR, a formalized ("ontologized") version of MedDRA can improve MedDRA coding and support signal detection.OntoADR is an OWL ...
Algebra model and experiment for semantic link network

Semantic Link Network, SLN, is a semantic extension of hyperlink network. Based on the previous SLN model, this paper enriches the semantic links and reasoning rules, proposes an algebra model that supports semantic reasoning and the management of SLN ...
Semantic anomaly detection in daily activities
UbiComp '12: Proceedings of the 2012 ACM Conference on Ubiquitous Computing

We monitor activities of daily living of smart home residents to detect anomalies in their behavior. Unlike traditional anomaly detection systems, we aim to reduce false positives in anomaly detection with the help of semantic rules. Some of these rules ...

Comments

Information & Contributors

Information

Published In

cover image Autonomous Robots

Autonomous Robots Volume 47, Issue 8

Dec 2023

598 pages

ISSN:0929-5593

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 23 October 2023

Accepted: 31 July 2023

Received: 08 May 2023

Author Tags

Qualifiers

Research-article

Funding Sources

National Aeronautics and Space Administration

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents