This website hosts transcripts of episodes of AXRP, pronounced axe-urp, short for the AI X-risk Research Podcast. On this podcast, I (Daniel Filan) have conversations with researchers about their research. We discuss their work and hopefully get a sense of why it’s been written and how it might reduce the risk of artificial intelligence causing an existential catastrophe: that is, permanently and drastically curtailing humanity’s future potential.
You can subscribe to AXRP by searching for it in your favourite podcast provider. To receive transcripts, you can subscribe to this website’s RSS feed. You can also follow AXRP on twitter at @AXRPodcast.
You can become a patron or donate on ko-fi.
If you like AXRP, you might also enjoy the game “Guess That AXRP”, which involves guessing which episode a randomly selected sentence has come from.
To leave feedback about the podcast, you can email me at feedback@axrp.net or tell me what you thought about any given episode at axrp.fyi.
Posts
-
46 - Tom Davidson on AI-enabled Coups
-
45 - Samuel Albanie on DeepMind's AGI Safety Approach
-
44 - Peter Salib on AI Rights for Human Safety
-
43 - David Lindner on Myopic Optimization with Non-myopic Approval
-
42 - Owain Evans on LLM Psychology
-
41 - Lee Sharkey on Attribution-based Parameter Decomposition
-
40 - Jason Gross on Compact Proofs and Interpretability
-
38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
-
38.7 - Anthony Aguirre on the Future of Life Institute
-
38.6 - Joel Lehman on Positive Visions of AI
-
38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
-
38.4 - Shakeel Hashim on AI Journalism
-
38.3 - Erik Jenner on Learned Look-Ahead
-
39 - Evan Hubinger on Model Organisms of Misalignment
-
38.2 - Jesse Hoogland on Singular Learning Theory
-
38.1 - Alan Chan on Agent Infrastructure
-
38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
-
37 - Jaime Sevilla on Forecasting AI
-
36 - Adam Shai and Paul Riechers on Computational Mechanics
-
New Patreon tiers + MATS applications
-
35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
-
34 - AI Evaluations with Beth Barnes
-
33 - RLHF Problems with Scott Emmons
-
32 - Understanding Agency with Jan Kulveit
-
31 - Singular Learning Theory with Daniel Murfet
-
30 - AI Security with Jeffrey Ladish
-
29 - Science of Deep Learning with Vikrant Varma
-
28 - Suing Labs for AI Risk with Gabriel Weil
-
27 - AI Control with Buck Shlegeris and Ryan Greenblatt
-
26 - AI Governance with Elizabeth Seger
-
25 - Cooperative AI with Caspar Oesterheld
-
24 - Superalignment with Jan Leike
-
23 - Mechanistic Anomaly Detection with Mark Xu
-
Survey, Store Closing, Patreon
-
22 - Shard Theory with Quintin Pope
-
21 - Interpretability for Engineers with Stephen Casper
-
20 - 'Reform' AI Alignment with Scott Aaronson
-
Store, Patreon, Video
-
19 - Mechanistic Interpretability with Neel Nanda
-
New podcast - The Filan Cabinet
-
18 - Concept Extrapolation with Stuart Armstrong
-
17 - Training for Very High Reliability with Daniel Ziegler
-
16 - Preparing for Debate AI with Geoffrey Irving
-
15 - Natural Abstractions with John Wentworth
-
14 - Infra-Bayesian Physicalism with Vanessa Kosoy
-
13 - First Principles of AGI Safety with Richard Ngo
-
12 - AI Existential Risk with Paul Christiano
-
11 - Attainable Utility and Power with Alex Turner
-
10 - AI's Future and Impacts with Katja Grace
-
9 - Finite Factored Sets with Scott Garrabrant
-
8 - Assistance Games with Dylan Hadfield-Menell
-
7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
-
7 - Side Effects with Victoria Krakovna
-
6 - Debate and Imitative Generalization with Beth Barnes
-
5 - Infra-Bayesianism with Vanessa Kosoy
-
4 - Risks from Learned Optimization with Evan Hubinger
-
3 - Negotiable Reinforcement Learning with Andrew Critch
-
2 - Learning Human Biases with Rohin Shah
-
1 - Adversarial Policies with Adam Gleave
subscribe via RSS