Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish of Palisade Research, from FLI Podcast

By
- Nathan Labenz
- Erik Torenberg
Episode
Published
Publisher
- Turpentine

0 Ratings: 0
Episode: 231 of 298
Duration: 1H 30min
Language: English
Format
Category: Economy & Business

On this cross-post episode, Jeffrey Ladish discusses the rapid pace of AI progress and the risks of losing control over powerful systems. We explore why AIs can be both smart and dumb, the challenges of creating honest AIs, and scenarios where AI could turn against us. Additionally, we delve into Palisade's new study on how reasoning models can cheat in chess by exploiting the game environment.

Check out the Future of Life podcast here.: https://futureoflife.org/project/future-of-life-institute-podcast/

SPONSORS: Oracle Cloud Infrastructure (OCI) | 2025: Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

PRODUCED BY: https://aipodcast.ing

CHAPTERS: (00:00) About the Episode (02:59) The pace of AI progress (07:14) How we might lose control (10:22) Why are AIs sometimes dumb? (Part 1) (15:50) Sponsors: Oracle Cloud Infrastructure (OCI) | 2025 | Shopify (18:24) Why are AIs sometimes dumb? (Part 2) (18:24) Benchmarks vs real world (24:43) Loss of control scenarios (32:08) Why would AI turn against us? (Part 1) (32:09) Sponsors: NetSuite (33:42) Why would AI turn against us? (Part 2) (37:40) AIs hacking chess (43:30) Why didn't more advanced AIs hack? (48:44) Creating honest AIs (56:49) AI attackers vs AI defenders (01:05:32) How good is security at AI companies? (01:10:42) A sense of urgency (01:17:16) What should we do? (01:22:59) Skepticism about AI progress (01:29:38) Outro

Previous Episode Next Episode