106 min listen
Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)
Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)
ratings:
Length:
102 minutes
Released:
Jul 29, 2024
Format:
Podcast episode
Description
Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Refs
Can LLMs Really Reason and Plan?
https://cacm.acm.org/blogcacm/can-llms-really-reason-and-plan/
On the Planning Abilities of Large Language Models : A Critical Investigation
https://arxiv.org/pdf/2305.15771
Chain of Thoughtlessness? An Analysis of CoT in Planning
https://arxiv.org/pdf/2405.04776
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
https://arxiv.org/pdf/2402.08115
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
https://arxiv.org/pdf/2402.01817
Embers of Autoregression: Understanding Large Language
Models Through the Problem They are Trained to Solve
https://arxiv.org/pdf/2309.13638
https://arxiv.org/abs/2402.04210
"Task Success" is not Enough
Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)
https://en.wikipedia.org/wiki/Partition_function_(number_theory)
Poincaré conjecture
https://en.wikipedia.org/wiki/Poincar%C3%A9_conjecture
Gödel's incompleteness theorems
https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems
ROT13 (Rotate13, "rotate by 13 places")
https://en.wikipedia.org/wiki/ROT13
A Mathematical Theory of Communication (C. E. SHANNON)
https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
Sparks of AGI
https://arxiv.org/abs/2303.12712
Kambhampati thesis on speech recognition (1983)
https://rakaposhi.eas.asu.edu/rao-btech-thesis.pdf
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
https://arxiv.org/abs/2206.10498
Explainable human-AI interaction
https://link.springer.com/book/10.1007/978-3-031-03767-2
Tree of Thoughts
https://arxiv.org/abs/2305.10601
On the Measure of Intelligence (ARC Challenge)
https://arxiv.org/abs/1911.01547
Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"
https://www.cs.cornell.edu/selman/cs672/readings/mccarthy-upd.pdf
Original chain of thought paper
https://arxiv.org/abs/2201.11903
ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)
https://www.youtube.com/watch?v=YnMqbpdHcaY
The Hardware Lottery (Hooker)
https://arxiv.org/abs/2009.06489
A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)
https://openreview.net/pdf?id=BZ5a1r-kVsf
AlphaGeometry
https://www.nature.com/articles/s41586-023-06747-5
FunSearch
https://www.nature.com/articles/s41586-023-06924-6
Emergent Abilities of Large Language Models
https://arxiv.org/abs/2206.07682
Language models are not naysayers (Negation in LLMs)
https://arxiv.org/abs/2306.08189
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
https://arxiv.org/abs/2309.12288
Embracing negative results
https://openreview.net/forum?id=3RXAiU7sss
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Refs
Can LLMs Really Reason and Plan?
https://cacm.acm.org/blogcacm/can-llms-really-reason-and-plan/
On the Planning Abilities of Large Language Models : A Critical Investigation
https://arxiv.org/pdf/2305.15771
Chain of Thoughtlessness? An Analysis of CoT in Planning
https://arxiv.org/pdf/2405.04776
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
https://arxiv.org/pdf/2402.08115
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
https://arxiv.org/pdf/2402.01817
Embers of Autoregression: Understanding Large Language
Models Through the Problem They are Trained to Solve
https://arxiv.org/pdf/2309.13638
https://arxiv.org/abs/2402.04210
"Task Success" is not Enough
Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)
https://en.wikipedia.org/wiki/Partition_function_(number_theory)
Poincaré conjecture
https://en.wikipedia.org/wiki/Poincar%C3%A9_conjecture
Gödel's incompleteness theorems
https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems
ROT13 (Rotate13, "rotate by 13 places")
https://en.wikipedia.org/wiki/ROT13
A Mathematical Theory of Communication (C. E. SHANNON)
https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
Sparks of AGI
https://arxiv.org/abs/2303.12712
Kambhampati thesis on speech recognition (1983)
https://rakaposhi.eas.asu.edu/rao-btech-thesis.pdf
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
https://arxiv.org/abs/2206.10498
Explainable human-AI interaction
https://link.springer.com/book/10.1007/978-3-031-03767-2
Tree of Thoughts
https://arxiv.org/abs/2305.10601
On the Measure of Intelligence (ARC Challenge)
https://arxiv.org/abs/1911.01547
Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"
https://www.cs.cornell.edu/selman/cs672/readings/mccarthy-upd.pdf
Original chain of thought paper
https://arxiv.org/abs/2201.11903
ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)
https://www.youtube.com/watch?v=YnMqbpdHcaY
The Hardware Lottery (Hooker)
https://arxiv.org/abs/2009.06489
A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)
https://openreview.net/pdf?id=BZ5a1r-kVsf
AlphaGeometry
https://www.nature.com/articles/s41586-023-06747-5
FunSearch
https://www.nature.com/articles/s41586-023-06924-6
Emergent Abilities of Large Language Models
https://arxiv.org/abs/2206.07682
Language models are not naysayers (Negation in LLMs)
https://arxiv.org/abs/2306.08189
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
https://arxiv.org/abs/2309.12288
Embracing negative results
https://openreview.net/forum?id=3RXAiU7sss
Released:
Jul 29, 2024
Format:
Podcast episode
Titles in the series (100)
Robert Lange on NN Pruning and Collective Intelligence by Machine Learning Street Talk (MLST)