This website hosts transcripts of episodes of AXRP, pronounced axe-urp, short for the AI X-risk Research Podcast. On this podcast, I (Daniel Filan) have conversations with researchers about their research. We discuss their work and hopefully get a sense of why it’s been written and how it might reduce the risk of artificial intelligence causing an existential catastrophe: that is, permanently and drastically curtailing humanity’s future potential. This podcast launched in December 2020. As of October 2024, it is edited by Kate Brunotts, and as of August 2022, Amber Dawn Ace helps with transcription.
You can subscribe to AXRP by searching for it in your favourite podcast provider. To receive transcripts, you can subscribe to this website’s RSS feed. You can also follow AXRP on twitter at @AXRPodcast. If you’d like to support the podcast, see this page for how to do so.
You can become a patron or donate on ko-fi.
If you like AXRP, you might also enjoy the game “Guess That AXRP”, which involves guessing which episode a randomly selected sentence has come from.
To leave feedback about the podcast, you can email me at feedback@axrp.net or leave an anonymous note at this link.
Posts
39 - Evan Hubinger on Model Organisms of Misalignment
38.2 - Jesse Hoogland on Singular Learning Theory
38.1 - Alan Chan on Agent Infrastructure
38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
37 - Jaime Sevilla on Forecasting AI
36 - Adam Shai and Paul Riechers on Computational Mechanics
New Patreon tiers + MATS applications
35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
34 - AI Evaluations with Beth Barnes
33 - RLHF Problems with Scott Emmons
32 - Understanding Agency with Jan Kulveit
31 - Singular Learning Theory with Daniel Murfet
30 - AI Security with Jeffrey Ladish
29 - Science of Deep Learning with Vikrant Varma
28 - Suing Labs for AI Risk with Gabriel Weil
27 - AI Control with Buck Shlegeris and Ryan Greenblatt
26 - AI Governance with Elizabeth Seger
25 - Cooperative AI with Caspar Oesterheld
24 - Superalignment with Jan Leike
23 - Mechanistic Anomaly Detection with Mark Xu
Survey, Store Closing, Patreon
22 - Shard Theory with Quintin Pope
21 - Interpretability for Engineers with Stephen Casper
20 - 'Reform' AI Alignment with Scott Aaronson
Store, Patreon, Video
19 - Mechanistic Interpretability with Neel Nanda
New podcast - The Filan Cabinet
18 - Concept Extrapolation with Stuart Armstrong
17 - Training for Very High Reliability with Daniel Ziegler
16 - Preparing for Debate AI with Geoffrey Irving
15 - Natural Abstractions with John Wentworth
14 - Infra-Bayesian Physicalism with Vanessa Kosoy
13 - First Principles of AGI Safety with Richard Ngo
12 - AI Existential Risk with Paul Christiano
11 - Attainable Utility and Power with Alex Turner
10 - AI's Future and Impacts with Katja Grace
9 - Finite Factored Sets with Scott Garrabrant
8 - Assistance Games with Dylan Hadfield-Menell
7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
7 - Side Effects with Victoria Krakovna
6 - Debate and Imitative Generalization with Beth Barnes
5 - Infra-Bayesianism with Vanessa Kosoy
4 - Risks from Learned Optimization with Evan Hubinger
3 - Negotiable Reinforcement Learning with Andrew Critch
2 - Learning Human Biases with Rohin Shah
1 - Adversarial Policies with Adam Gleave
subscribe via RSS