Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Semantic anomaly detection with large language models

Published: 23 October 2023 Publication History

Abstract

As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic anomaly detection in vision-based policies. Our experiments apply this framework to a finite state machine policy for autonomous driving and a learned policy for object manipulation. These experiments demonstrate that the LLM-based monitor can effectively identify semantic anomalies in a manner that shows agreement with human reasoning. Finally, we provide an extended discussion on the strengths and weaknesses of this approach and motivate a research outlook on how we can further use foundation models for semantic anomaly detection. Our project webpage can be found at https://sites.google.com/view/llm-anomaly-detection.

References

[1]
Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR, Makarenkov V, and Nahavandi S A review of uncertainty quantification in deep learning: Techniques, applications and challenges Information Fusion 2021 76 243-297
[2]
Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al. (2022). Flamingo: A visual language model for few-shot learning. In Advances in neural information processing systems.
[3]
Amini, A., Schwarting, W., Soleimany, A., & Rus, D. (2020). Deep evidential regression. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Advances in neural information processing systems, (vol. 33, pp. 14927–14937). Curran Associates, Inc.
[4]
Antonante, P., Spivak, D. I., & Carlone, L. (2021). Monitoring and diagnosability of perception systems. In 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 168–175).
[5]
Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. Invariant risk minimization. arXiv:1907.02893.
[6]
Banerjee S, Sharma A, Schmerling E, Spolaor M, Nemerouf M, and Pavone M Karlinsky L, Michaeli T, and Nishino K Data lifecycle management in evolving input distributions for learning-based aerospace applications Computer vision–ECCV 2022 workshops 2023 Cham Springer 127-142
[7]
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., Julian, R., et al. (2023). Do as i can, not as i say: Grounding language in robotic affordances. In Conference on robot learning (pp. 287–318). PMLR.
[8]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. In Advances in neural information processing systems, (Vol. 33, pp. 1877–1901).
[9]
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., et al. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
[10]
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In European conference on computer vision (pp. 213–229). Springer.
[11]
Chen, W., Hu, S., Talak, R., & Carlone, L. (2022). Leveraging large language models for robot 3d scene understanding.
[12]
Cui, Y., Niekum, S., Gupta, A., Kumar, V., & Rajeswaran, A. (2022). Can foundation models perform zero-shot task specification for robot manipulation? In Learning for dynamics and control conference (pp. 893–905). PMLR.
[13]
Daftry, S., Zeng, S., Bagnell, J. A., & Hebert, M. (2016). Introspective perception: Learning to predict failures in vision systems. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1743–1750).
[14]
de Haan, P., Jayaraman, D., & Levine, S. (2019). Causal confusion in imitation learning. In Advances in neural information processing systems (Vol. 32). Curran Associates, Inc.
[15]
De Lange M, Aljundi R, Masana M, Sarah Parisot X, Jia AL, Slabaugh G, and Tuytelaars T A continual learning survey: Defying forgetting in classification tasks IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 44 7 3366-3385
[16]
Denouden, T., Salay, R., Czarnecki, K., Abdelzad, V., Phan, B., & Vernekar, S. (2018). Improving reconstruction autoencoder out-of-distribution detection with mahalanobis distance. arXiv preprint arXiv:1812.02765.
[17]
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). Carla: An open urban driving simulator. In Conference on robot learning (pp. 1–16). PMLR.
[18]
Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T. B, & Vanhoucke, V. (2022). Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 international conference on robotics and automation (ICRA) (pp. 2553–2560). IEEE.
[19]
Driess, D., Xia, F., Sajjadi, M. S. M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., Huang, W., Chebotar, Y., Sermanet, P., Duckworth, D., Levine, S., Vanhoucke, V., Hausman, K., Toussaint, M., Greff, K., Zeng, A., Mordatch, I., & Florence, P. (2023). Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378.
[20]
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In M. Florina Balcan and K. Q. Weinberger (Eds.), Proceedings of the 33rd international conference on machine learning, volume 48 of proceedings of machine learning research (pp. 1050-1059), New York, New York, USA, 20–22. PMLR.
[21]
Geirhos R, Jacobsen J-H, Michaelis C, Zemel R, Brendel W, Bethge M, and Wichmann FA Shortcut learning in deep neural networks Nature Machine Intelligence 2020 2 11 665-673
[22]
Gomez-Donoso F, Castano-Amoros J, Escalona F, and Cazorla M Three-dimensional reconstruction using SFM for actual pedestrian classification Expert Systems with Applications 2023 213
[23]
Gulrajani, I., & Lopez-Paz, D. (2021). In search of lost domain generalization. In International conference on learning representations.
[24]
Hendrycks, D., & Gimpel, K. (2017). A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International conference on learning representations.
[25]
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., et al. (2022). Inner monologue: Embodied reasoning through planning with language models. In 6th annual conference on robot learning.
[26]
Japkowicz, N., Myers, C. E., & Gluck, M. A. (1995). A novelty detection approach to classification. In International joint conference on artificial intelligence.
[27]
Jocher, G., Chaurasia, A., & Qiu, J. (January 2023). YOLO by Ultralytics.
[28]
Koh, P. W., et al. (Jul 2021). Wilds: A benchmark of in-the-wild distribution shifts. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th international conference on machine learning, volume 139 of proceedings of machine learning research (pp. 5637–5664). PMLR, 18–24.
[29]
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. In Advances in neural information processing systems.
[30]
Lakshminarayanan B, Pritzel A, and Blundell C Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, and Garnett R Simple and scalable predictive uncertainty estimation using deep ensembles Advances in neural information processing systems 2017 Curran Associates Inc
[31]
Lee, M. A., Tan, M., Zhu, Y., & Bohg, J. (2021). Detect, reject, correct: Crossmodal compensation of corrupted sensors. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 909–916).
[32]
Lee K, Lee K, Lee H, and Shin J Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, and Garnett R A simple unified framework for detecting out-of-distribution samples and adversarial attacks Advances in neural information processing systems 2018 Curran Associates Inc
[33]
Li, J., Li, D., Xiong, C., & Hoi, S. (2022). BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning (pp. 12888–12900). PMLR.
[34]
Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Florence, P., Zeng, A., et al. (2022). Code as policies: Language model programs for embodied control. In Workshop on language and robotics at CoRL 2022.
[35]
Liang, S., Li, Y., & Srikant, R. (2018). Enhancing the reliability of out-of-distribution image detection in neural networks. In 6th international conference on learning representations, ICLR 2018.
[36]
Lin, K., Agia, C., Migimatsu, T., Pavone, M., & Bohg, J. (2023). Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153.
[37]
Lin, Z., Roy, S. D., & Li, Y. (2021). Mood: Multi-level out-of-distribution detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15313–15323).
[38]
Liu, J., Lin, Z., Padhy, S., Tran, D., Bedrax Weiss, T., & Lakshminarayanan, B. (2020). Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 7498–7512). Curran Associates, Inc.
[39]
Liu, Z., Bahety, A., & Song, S. (2023). Reflect: Summarizing robot experiences for failure explanation and correction. arXiv preprint arXiv:2306.15724.
[40]
Madaan, A., Zhou, S., Alon, U., Yang, Y., & Neubig, G. (2022). Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128.
[41]
McAllister, R., Kahn, G., Clune, J., & Levine, S. (2019). Robustness to out-of-distribution inputs via task-aware generative uncertainty. In ICRA (pp. 2083–2089).
[42]
Michels, F., Adaloglou, N., Kaiser, T., & Kollmann, M. (2023). Contrastive language-image pretrained (clip) models are powerful out-of-distribution detectors.
[43]
Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., Shen, Z., Wang, X., Zhai, X., Kipf, T., & Houlsby, N. (2022). Simple open-vocabulary object detection with vision transformers. In ECCV.
[44]
OpenAI. (2023). Gpt-4 technical report.
[45]
Osband, I., & Wen, Z., Asghari, S. M., Dwaracherla, V., Ibrahimi, M., Lu, X., & Van Roy, B. (2023). Epistemic neural networks.
[46]
Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J. V., Lakshminarayanan, B., & Snoek, J. (2019). Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Proceedings of the 33rd international conference on neural information processing systems, Red Hook, NY, USA. Curran Associates Inc.
[47]
Oza, P., & Patel, V. M. (2019). C2ae: Class conditioned auto-encoder for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2307–2316).
[48]
Rabiee, S., & Biswas, J. (2019). IVOA: Introspective vision for obstacle avoidance. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1230–1235). IEEE Press.
[49]
Ren, A. Z., Dixit, A., Bodrova, A., Singh, S., Tu, S., Brown, N., Xu, P., Takayama, L., Xia, F., Varley, J., et al. (2023). Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928.
[50]
Richter, C., & Roy, N. (July 2017). Safe visual navigation via deep learning and novelty detection. In RSS.
[51]
Ritter, H., Botev, A., & Barber, D. (2018). A scalable Laplace approximation for neural networks. In 6th international conference on learning representations, ICLR 2018-conference track proceedings, volume 6. International conference on representation learning.
[52]
Rosinol A, Violette A, Abate M, Hughes N, Chang Y, Shi J, Gupta A, and Carlone L Kimera: From slam to spatial perception with 3d dynamic scene graphs The International Journal of Robotics Research 2021 40 12–14 1510-1546
[53]
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., & Kloft, M. (Jul 2018). Deep one-class classification. In Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research (pp. 4393–4402). PMLR, 10–15.
[54]
Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Dietterich TG, and Müller K-R A unifying review of deep and shallow anomaly detection Proceedings of the IEEE 2021 109 5 756-795
[55]
Salehi, M., Mirzaei, H., Hendrycks, D., Li, Y., Rohban, M. H., & Sabokrou, M. (2021). A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: Solutions and future challenges.
[56]
Shah, D., Osiński, B., Levine, S., et al. (2023). LM-NAV: Robotic navigation with large pre-trained models of language, vision, and action. In Conference on robot learning (pp. 492–504). PMLR.
[57]
Sharma, A., Azizan, N., & Pavone, M. (2021). Sketching curvature for efficient out-of-distribution detection for deep neural networks. In Uncertainty in artificial intelligence (pp. 1958–1967). PMLR.
[58]
Sharma, A., Azizan, N., & Pavone, M. (Jul 2021). Sketching curvature for efficient out-of-distribution detection for deep neural networks. In C. de Campos & M. H. Maathuis (Eds.), Proceedings of the thirty-seventh conference on uncertainty in artificial intelligence, volume 161 of proceedings of machine learning research (pp. 1958-1967). PMLR, 27–30.
[59]
Shridhar, M., Manuelli, L., & Fox, D. (2021). Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th conference on robot learning (CoRL).
[60]
Sinha, R., Sharma, A., Banerjee, S., Lew, T., Luo, R., Richards, S. M, Sun, Y., Schmerling, E., & Pavone, M. (2022). A system-level view on out-of-distribution data in robotics. arXiv:2212.14020.
[61]
Srivastava, M., Goodman, N., & Sadigh, D. (2023). Generating language corrections for teaching physical control tasks. arXiv preprint arXiv:2306.07012.
[62]
Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR, 2011 (pp. 1521–1528).
[63]
Volk, G., Müller, S., von Bernuth, A., Hospach, D., & Bringmann, O. (2019). Towards robust CNN-based object detection through augmentation with synthetic rain variations. In 2019 IEEE intelligent transportation systems conference (ITSC) (pp. 285–292).
[64]
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E. H., Le, Q. V., Zhou, D., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in neural information processing systems.
[65]
Wilson G and Cook DJ A survey of unsupervised deep domain adaptation ACM Transactions on Intelligent Systems and Technology 2020 11 5 1-46
[66]
Yang, J., Zhou, K., Li, Y., & Liu, Z. (2021). Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334.
[67]
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (June 2020). BDD100K: A diverse driving dataset for heterogeneous multitask learning. In IEEE/CVF conference on computer vision and pattern recognition (CVPR).
[68]
Zeng, A., Florence, P., Tompson, J., Welker, S., Chien, J., Attarian, M., Armstrong, T., Krasin, I., Duong, D., Sindhwani, V., & Lee, J. (2020). Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on robot learning (CoRL).
[69]
Zeng, A., Wong, A., Welker, S., Choromanski, K., Tombari, F., Purohit, A., Ryoo, M., Sindhwani, V., Lee, J., Vanhoucke, V., et al. (2022). Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598.
[70]
Zong, B., Song, Q., Min, M. R., Cheng, W., Lumezanu, C., Cho, D., & Chen, H. (2018). Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In International conference on learning representations.

Index Terms

  1. Semantic anomaly detection with large language models
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Autonomous Robots
        Autonomous Robots  Volume 47, Issue 8
        Dec 2023
        598 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 23 October 2023
        Accepted: 31 July 2023
        Received: 08 May 2023

        Author Tags

        1. Semantic reasoning
        2. OOD detection
        3. Fault monitoring

        Qualifiers

        • Research-article

        Funding Sources

        • National Aeronautics and Space Administration

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 03 Sep 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media