Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3638550.3641130acmconferencesArticle/Chapter ViewAbstractPublication PageshotmobileConference Proceedingsconference-collections
research-article

Penetrative AI: Making LLMs Comprehend the Physical World

Published: 28 February 2024 Publication History

Abstract

Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to interact with and reason about the physical world through IoT sensors and actuators, a concept that we term "Penetrative AI". The paper explores such an extension at two levels of LLMs' ability to penetrate into the physical world via the processing of sensory signals. Our preliminary findings indicate that LLMs, with ChatGPT being the representative example in our exploration, have considerable and unique proficiency in employing the embedded world knowledge for interpreting IoT sensor data and reasoning over them about tasks in the physical realm. Not only this opens up new applications for LLMs beyond traditional text-based tasks, but also enables new ways of incorporating human knowledge in cyber-physical systems.

References

[1]
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818 (2023).
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020).
[3]
Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, and Saehoon Kim. 2022. COYO-700M: Image-Text Pair Dataset. https://github.com/kakaobrain/coyo-dataset.
[4]
Salvatore Carta, Alessandro Giuliani, Leonardo Piano, Alessandro Sebastian Podda, Livio Pompianu, and Sandro Gabriele Tiddia. 2023. Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction. arXiv preprint arXiv:2307.01128 (2023).
[5]
Antonia Creswell, Murray Shanahan, and Irina Higgins. 2022. Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712 (2022).
[6]
Android Developers. 2023. GnssStatus. https://developer.android.com/reference/android/location/GnssStatus
[7]
Android Developers. 2023. Motion Sensors. https://developer.android.com/develop/sensors-and-location/sensors/sensors_motion?hl=en#sensors-motion-stepcounter
[8]
Android Developers. 2023. ScanResult. https://developer.android.com/reference/android/net/wifi/ScanResult
[9]
Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. 2022. A survey of embodied ai: From simulators to research tasks. IEEE Transactions on Emerging Topics in Computational Intelligence 6, 2 (2022), 230--244.
[10]
Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-rec: Towards interactive and explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524 (2023).
[11]
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15180--15190.
[12]
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation 101, 23 (2000), e215--e220.
[13]
Siqi Lai, Zhao Xu, Weijia Zhang, Hao Liu, and Hui Xiong. 2023. Large Language Models as Traffic Signal Control Agents: Capacity and Opportunity. arXiv preprint arXiv:2312.16044 (2023).
[14]
Yann LeCun. 2022. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review 62 (2022).
[15]
KunChang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao. 2023. Videochat: Chat-centric video understanding. arXiv preprint arXiv:2305.06355 (2023).
[16]
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9493--9500.
[17]
Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, and Yan Zhang. 2023. Is chatgpt a good recommender? a preliminary study. arXiv preprint arXiv:2304.10149 (2023).
[18]
Xin Liu, Daniel McDuff, Geza Kovacs, Isaac Galatzer-Levy, Jacob Sunshine, Jiening Zhan, Ming-Zher Poh, Shun Liao, Paolo Di Achille, and Shwetak Patel. 2023. Large Language Models are Few-Shot Health Learners. arXiv preprint arXiv:2305.15525 (2023).
[19]
Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montser-rat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, and Andy Zeng. 2023. Large Language Models as General Pattern Machines. In Proceedings of the 7th Conference on Robot Learning (CoRL).
[20]
OpenAI. 2023. GPT. https://platform.openai.com/docs/guides/gpt
[21]
OpenAI. 2023. GPT-4 System Card. https://openai.com/research/gpt-4v-system-card
[22]
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
[23]
Jiapu Pan and Willis J Tompkins. 1985. A real-time QRS detection algorithm. IEEE transactions on biomedical engineering 3 (1985), 230--236.
[24]
Joon Sung Park, Joseph C O'Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442 (2023).
[25]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 (2023).
[26]
Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
[27]
Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. 2023. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523--11530.
[28]
Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Heung-Yeung Shum, and Jian Guo. 2023. Think-on-graph: Deep and responsible reasoning of large language model with knowledge graph. arXiv preprint arXiv:2307.07697 (2023).
[29]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
[30]
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671 (2023).
[31]
Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. 2021. Limu-bert: Unleashing the potential of unlabeled data for imu sensing applications. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 220--233.
[32]
Frank G Yanowitz. 2010. Lesson III. Characteristics of the Normal ECG. University of Utah School of Medicine (2010).
[33]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022).
[34]
Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).

Cited By

View all
  • (2024)Generative KI zur No-/Low-Code-WissensverarbeitungZeitschrift für wirtschaftlichen Fabrikbetrieb10.1515/zwf-2024-1155119:11(840-844)Online publication date: 18-Nov-2024
  • (2024)DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert KnowledgeProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997658:4(1-29)Online publication date: 21-Nov-2024
  • (2024)PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language ModelsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997598:4(1-26)Online publication date: 21-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications
February 2024
167 pages
ISBN:9798400704970
DOI:10.1145/3638550
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LLM
  2. CPS
  3. IoT
  4. penetrative AI

Qualifiers

  • Research-article

Funding Sources

  • Global STEM Professorship Scheme of Hong Kong and HKUST start up grant
  • Singapore NRF Investigatorship

Conference

HOTMOBILE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 96 of 345 submissions, 28%

Upcoming Conference

HOTMOBILE '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)990
  • Downloads (Last 6 weeks)88
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Generative KI zur No-/Low-Code-WissensverarbeitungZeitschrift für wirtschaftlichen Fabrikbetrieb10.1515/zwf-2024-1155119:11(840-844)Online publication date: 18-Nov-2024
  • (2024)DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert KnowledgeProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997658:4(1-29)Online publication date: 21-Nov-2024
  • (2024)PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language ModelsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997598:4(1-26)Online publication date: 21-Nov-2024
  • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/3690639Online publication date: 30-Aug-2024
  • (2024)Enabling On-Device LLMs Personalization with Smartphone SensingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3677545(186-190)Online publication date: 5-Oct-2024
  • (2024)Integration of LLMs and the Physical World: Research and ApplicationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674402(1-5)Online publication date: 5-Jul-2024
  • (2024)Improving On-Device LLMs' Sensory Understanding with Embedding InterpolationsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3697456(1674-1676)Online publication date: 4-Dec-2024
  • (2024)MELTing Point: Mobile Evaluation of Language TransformersProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3690668(890-907)Online publication date: 4-Dec-2024
  • (2024)Learning Domain-Invariant Model for WiFi-Based Indoor LocalizationIEEE Transactions on Mobile Computing10.1109/TMC.2024.343845423:12(13898-13913)Online publication date: Dec-2024
  • (2024)LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces2024 IEEE 3rd Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML)10.1109/SenSys-ML62579.2024.00007(9-14)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media