AI Safety Policy Can’t Go On Like This
A changed political gameboard means the 2023 playbook for safety policy is obsolete. Here’s what not to do next.
In 1985, Intel was not doing very well. Their share of the DRAM market has plummeted - but DRAM was Intel’s wheelhouse: their first major innovation, their first big driver of profits, their first stake in the rapidly growing computer industry. CEO Andy Grove saw the writing on the wall, and pivoted the company away from DRAM to CPUs. Intel survived and flourished. DRAM had long been self-evident, but as Grove was later quoted, “One of the toughest challenges is to make people see that these self-evident truths are no longer true.”
Today, AI safety policy is not doing very well. World leaders seem to have – partly – woken up to the fact that AI is going be a somewhat big deal; but they’re very much not drawing safetyist conclusions from that insight. With the stakes becoming more and more real, governments are becoming more and more fierce in the pursuit of their ‘place in the sun’. They scratched the inscriptions off the eponymous safety summits, condemned the very notion of safety policy publicly, and pretty much shut the door on any notion of spinning the fleeting victories of 2023 into a lasting international consensus. I believe this is not just a temporary setback, but points to a much-needed strategic realignment.
That is a tough pill to swallow. Capability growth is rapid, realignment takes time, crunch-time is now – a bad time to face fundamental reconsiderations. And it’s not like the old safetyist playbook never worked: There have been impressive successes in the first years of safety policy. It’s a natural reaction to setbacks to think that one just ought to hunker down and redouble the efforts. But that wouldn’t do. It’s also a natural reaction to think that the brand is now marred, and the challenge is primarily one of rebranding – say, to AI security instead. But these interpretations miss some more fundamental shifts that have hurt the safety pitch. Understanding what used to work helps finding out what to change.
The End of the 2023 Playbook
The success stories of AI safety policy have been stories of shaping people’s first encounters with AI in a way that nudges them toward safetyist conclusions. Perhaps the most notable example is Rishi Sunak, whose repeated encounters with high-profile safety advocates led to two of the most notable early successes of AI safety policy: The foundation of the UK AISI and subsequent AISIs in other jurisdictions, and the AI Safety Summit that led to commitments made at Bletchley and Seoul and a comprehensive scientific report. Lining up independent experts and having them talk to major decisionmakers is, in the words of Andy Grove, a self-evident part of the safety movement’s identity. For two reasons, I think the time for that blueprint is over, at least in the major AI jurisdictions.
No More First Impressions
First, people now know about AI, so the strategy of being the first to know what’s going on doesn’t work anymore. Early on, the safety pitch was a package deal: Policymakers would receive sound, well-delivered information from the world’s leading experts on an issue they urgently wanted to get up-to-speed on. Alongside that information, safety advocates delivered cautious tones on the risks of AI, and how easy it was to fall off the glorious paths they sketched out for these policymakers. That worked well, because qualified explanations of AI were in rare supply and high demand; those who could offer them had some creative license for the policy platforms they wanted to attach.
It works less well now; both because policymakers genuinely do know more, but also because they think they know even more about AI than they do. I think it’s fair to argue that many politicians still don’t really get what’s coming. But in 2023, politicians were conscious of their ignorance – they were happy to admit they didn’t know much about AI and needed external input. This is much less true now: there’s much less demand for some smart expert to come along and provide some noncontroversial information on what this new technology is about.
The political advice one can reasonably give to decisionmakers on AI is now much closer to other kinds of political advocacy: It consists of more opinionated pitches for one policy or the other, for one particular approach or one particular perspective, need alliances and political capital, is less likely to take the shortcuts to the top. That’s a much worse position for expert-driven advocacy to be in – anyone who has worked in lobbying or advocacy before knows what I mean: it’s much more frustrating to brief a politician on an issue they mistakenly consider themselves an expert on.
No More Neutral Experts
Second, policymakers are already primed on the mainstays of AI policy debate. 2023’s safety advocates were conducting political framing on a blank piece of paper: If you had to convince a conservative, you’d make your case about national security, if you had to convince a liberal, you’d make your case about reigning in big tech. The situation is different today. AI safety has a reputation as a Biden-era instance of tech regulation already – a good framing only gets you so far once that’s the case. Again, this is a somewhat commonplace experience in policy advocacy: the policymaker’s adversarial reflex of ‘wait, but isn’t this exactly what the other guys wanted last year’ reliably beats out clever and meritorious reframings.
And not only the policies, but also the high-profile experts that advocate for them have reputations and affiliations by now. Finding out on which side of the highly-politicized SB-1047 debate someone was is just one google search away, and enough information to discredit many AI safety advocates in the eyes of a highly polarized environment. Even policymakers that haven’t been privy to the debate so far will be quick to ask their aides for a basic rundown of the safety advocates they talk to nowadays, and the briefings they will get will often not be conducive to the pitch of ‘decidedly neutral information’ anymore.
In many ways, the safety movement finds itself cast in the often-evoked story of the team in the locker room, down 7-21 at half-time of the superbowl – just realising that the strategy that got them there won’t get them across the finish line. They’re closer to winning than most will ever get. But to do it, they’ll have to pivot, fast.
Safety Policy Lacks Political Leverage
So what’s keeping the safety movement from scoring? External reasons are a genuine part of the answer. The vibe swung against Democrats, and against tech regulation, and safety policy was associated with both. But there are also fundamental shortcomings in the current political environment. Now that a lot of information is out there and the issue is sufficiently salient, current AI policymaking is about motivating policymakers to prioritize your view of the issue. I believe that safety policy currently does not pull on enough levers that affect politicians’ priorities.
No Enduring Public Support
The first lever is vocal public support. The standard response to this is that AI safety polls well – if you ask people whether they support safety-focused regulation, they tend to say yes. But that’s not enough – if you ask people to prioritize the issues they care about, AI safety is nowhere to be seen. And neither is any greater issue that can be reasonably construed to be related to AI safety; not even terrorism or disempowerment score particularly highly. The perceived counter-positions to AI safety, on the other hand, can leverage popular support from association with salient issue clusters: Acceleration as a boost to the economy, deregulation as slashing bureaucracy, racing as cementing American supremacy. To a politician, this does not make for a compelling case to prioritise AI safety measures at all. Despite nominal approval for safety measures, safety policy is no vote winner yet.

This is somewhat downstream of a strategic design decision at the start of not only the AI safety initiative, but really the broader effective altruism environment it originated from, i.e. the decision to be an elite movement – with little reach or pitch to the general public. The few notable exceptions to that strategy have either backfired or are conducted by fringe actors that seem to reject coherent message discipline. This would not be an easy pivot; the very structure of the organisations, the style of messaging, the strengths of leadership and the content of the arguments that arose downstream of the elite movement pitch are very difficult to translate into a more public-facing initiative.
No Partisan Platform
One way to avoid having to rally this kind of grassroots support is the second lever: strong platform affiliation. If a position is interwoven with the political platform of a major political party, it’ll have its day in the sun eventually: majorities shift, and that means eventual progress on the issue. Unless an area is extraordinarily politicised, it turns out very difficult to roll back as much as you can pass. But even though the safety platform has suffered from the backlash against Democrat policies, it hasn’t found solid purchase with the Democrat platform. Neither did any notable Democrat campaign on their achievements in AI regulation, nor does the Democratic environment seem particularly shocked about the Trump administration’s turn away from safety.
There has been notably little Democrat response to Vice President Vance’s Paris remarks disavowing safety as the headline term for the AI summits, notably few efforts to protect the AISI and associated executive orders, and in my eyes, there’s notably little hope that Democrats will make AI safety a priority in 2026 or 2028. It certainly hasn’t been a mainstay of their post-inauguration communications so far. Even former FTC Chair Lina Khan’s post-exit interviews have generally not broached the AI topic from a safetyist point of view. Safety policy finds itself in the worst of both worlds: It’s Dem-coded enough to suffer political retaliation, but not Dem-coded enough to be enacted on the coattails of general Democratic victories.
No Majority in Secret Congress
A third alternative lever to unilateral affiliation is low-salience technocratic appeal. If you can pitch a slightly framing-adjusted version of your policy agenda to both political sides, you don’t need to get deep into politics for the agenda to succeed: you can fly under the radar and get your pitch through Secret Congress. But for reasonable bilateral fit without the political salience described above, you need your policy of choice to be cheap. If your policy is costly, there is ample incentive for one side to drag it out of Secret Congress, frame it as partisan, and score quick political points.
Many demands of AI safety policy are not cheap. They can be construed as expensive in terms of money, of course – consolidated alignment research, counterfactual loss of potential growth effects from slowing down (though of course, preventing disaster and extinction would be net positive, but that’s not the point here), losing a race to China all seems burdensome. But much more importantly, AI safety policy is politically costly. Like I mentioned above, AI safety policy – especially following its current framings and political affiliations – comes at a steep political price, because it is perceived as inconsistent with the deregulation agenda, as geopolitically risky, as Democrat-coded in the times of a massively unpopular Democratic party.
The Political Economy Might Not Fix Itself
This political economy does not suit the safety platform well. It might get better by itself, but that’s not guaranteed. Many (contentious) political causes elsewhere benefit from their focused risks manifesting: Natural disasters strengthen the case for environmental causes, foreign terrorism tends to catalyze migration crackdowns, and after Three Mile Island, political support for nuclear power dropped dramatically. The political economy of many prosaic risks fixes itself through escalating harms that force action. This is not necessarily true for AI, which could go well up until it goes really badly.
Of course, avoidable instances of large-scale AI misuse or loss of control can also shift the political economy back. But a disaster ‘wake-up call’ is – fortunately – far from guaranteed to happen. Less fortunately, it might just be that the first instance of an AI safety accident instead has catastrophic proportions. We don’t know how much of a political economy will be left after a wake-up call, which doesn’t make waiting for it a very compelling strategy. We might well have to zero-shot learn about the importance of safety.
What’s the strategy if pre-AGI systems continue to be pretty safe and beneficial? I think proactive – and thereby necessarily a little bit more controversial – thoughts on the issues are best divorced from posts like this, which primarily aims to give a lay of the land. But at the start of this post, I said that AI safety policy cannot go on like this, and I want to clarify what I mean by ‘going on like this’. Trying to look ahead, I think there are two specific failure modes the safety movement could be prone to steer into: Reframing without reconsidering, and taking poisoned wins. It’s worth naming them in hopes of avoiding them.
Don’t Reframe Without Reconsidering
The easiest trap to run into is reframing the movement without reconsidering the agenda. There’s been a growing consensus that ‘learning to speak Republican’ is advisable, and that maybe reframing AI safety, e.g. as AI security, could do the trick. But I think people aren’t that stupid. Pitching the very same policy agenda with a new coat of paint is not going to work with something that is going to be vetted at the very highest levels of scrutiny and sophistication – and AI policy will be at that level soon as it becomes more central to economic and foreign policy.
So if Paul, head of AI security, goes to Congress and argues in favour of limitations on open-sourcing and cautions against loss of control, people will quickly realise he looks a lot like Saul, head of AI safety, who had been a mainstay of the last years of AI policy. And at that point, he needs a very good answer to ‘so what’s the difference between AI safety and AI security’, and it cannot be ‘security sounds great to Republicans’. Sure, these questions get less and less over time, and once there’s been a Center for AI Security for some time, the reframing might take root. But ask any safety advocate, and they’ll tell you that ‘this will eventually work, but it’ll take time’ is not remotely good enough.
Put a bit less facetiously: At the levels of policymaking targeted here, reframing and renaming is futile without a substantive shift in agenda. If and only if you do conduct a shift in agenda, renaming can be an effective signifier of that shift. But the shift has to come first. And relative to the amount of conversations on how to signal a shift, I see very little discussion on what the actual shift should be. What 2023-era policy recommendations are safety advocates actually willing to give up? For instance, what about the worries around securitisation and race dynamics – if those are still a prevalent part of the safety platform, it might be more of a stretch to call it compatible with an AI security pivot.
Simply adding secure datacenters and hawkish export controls to the existing platform has not been successful in removing the political baggage ‘AI safety’ has been saddled with, so tougher calls might have to be made. Recurring to the previous discussion: bipartisan fit has to be bought, it can’t just be manifested. If AI safety wants to talk the Republican talk, it would have to walk the walk of making its platform much more palatable to the party’s politics as well.
Don’t Drink From The Quick Win Chalice
The second trap is taking politically poisoned wins. Some tempting opportunities to get strong safety language into legislation are coming up. There are real incentives to do so: A sense of success, of getting something done; an achievement to show for funders and for morale; and a precedent that might be leveraged in the future. I see three salient ways to do that:
To ensure maximally comprehensive enforcement of whatever safety-influenced agreement or legislation still remain, whether that’s in Europe, in the shape of international agreements, or in the institutional context of the AISIs,
to ride the wave of state-level backlash and corresponding US state legislation, e.g. by organisational and public support of broader state AI bills or by gearing up for another SB-1047-style fight in the California legislature soon.
or to jump on opportunities for new cross-cutting coalitions that might prompt myopic wins, like with economic incumbents against labour market impacts or with environmentalists against data center construction.
Affiliation with and support of such measures would mean saddling the safety movement with all their idiosyncratic weaknesses, inviting even more of an intellectual immune response to AI safety, providing for more ammunition for adversaries of safety policy, and spell even greater difficulty in executing any meaningful agenda pivot in the future. Every time an unpopular stipulation that made it through a state assembly on a large omnibus AI bill makes headlines because it sent three start-ups packing, the ‘safety movement’ could be named as the culprit. Whenever ‘anti-woke’, anti-regulation bloodhounds scrawl the books for AI legislations, it’d be easy for them to pin the safety movement with the most absurd instances of it. Whenever the unstoppable march of progress inevitably displaces some sector of the labour market anyways, the safety movement will be on the books as decelerators. And when the next big fight for AI regulation rolls around, opponents of safety policy will make powerpoint slides with screenshot headline snippets of all the times letting the safetyists in purportedly led to dramatically unpopular excesses of left-wing policy.
The more this happens, the harder it will be to eventually execute a plausible pivot like above. There’s not enough reward to warrant taking that risk: It seems very unlikely that Texas legislation will meaningfully shape the nature of the singularity. Furthermore, given the power of the major AI developers and the extent of federal backing they currently receive, it seems unlikely that even the most comprehensive legislation in auxiliary jurisdictions can move the needle on the substance of what happens as opposed to on the optics of future debate.
These are not yet the legislative fights most worth winning on the safetyist view. If you agree, it’s worth prioritising political capital at almost all costs – which is a lesson that could have already been applied during the last years. If building credibility for a reframing was to be the play, the safety movement might even consider going the opposite way: Making a break with its past misaffiliations more clear by distancing itself from the kind of legislation it often gets lumped in with.
What’s Next?
The AI safety movement is in a tough spot, and it will have to make some hard decisions to get out of it. None of the choices ahead are cheap, and it’ll take much more discussion to find out which ones are most prudent. But I am convinced, and hope to have convinced you, that ignoring the wake-up call, going out on the field again and trying the same strategy will fail. What got you here won’t get you there.
thanks for writing this up--finding this useful to work through some high-level questions over what is even the overall appropriate approach for the community. More thoughts to follow, but briefly -- the dynamics around the fledgling value of 'independent' expert testimony also remind me of some of the dynamics in the US nuclear test ban debate of the 1950s--
https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-7709.2010.00950.x
> "Although inspired by moral concerns, these scientists found that in order to influence policy they had to uphold, rather than question, the nuclear deterrent. Physicist Edward Teller, meanwhile, sought to cement the link between nuclear weapons and national security. Congressional debate over the Limited Test Ban Treaty revealed that the scientific expertise that scientists relied on to argue for a test ban in the end proved a weakness, as equally credentialed scientists questioned the feasibility and value of a test ban, consequently weakening both the test ban as an arms control measure and scientists' public image as objective experts."
"So if Paul, head of AI security, goes to Congress and argues in favour of limitations on open-sourcing and cautions against loss of control, people will quickly realise he looks a lot like Saul, head of AI safety, who had been a mainstay of the last years of AI policy. And at that point, he needs a very good answer to ‘so what’s the difference between AI safety and AI security’, and it cannot be ‘security sounds great to Republicans’."
I agree. While I defintiely opted for a more polemic take in my response to recent events, 'It should have been the Paris AI Security Summit',[0] I think we drive towards similar conclusions. Particularly that security is not the same as safety:
"... if AI safety fears are well-founded, then AI is not merely a tool to be wielded, but a force that, unchecked, may slip from human control altogether. This must be communicated differently. AI safety asks, ‘What harm might we do to others?’ AI security asks, ‘What harm might be done to us?’ The former is a moral argument, the latter is a strategic one, and in the age of realism, only the latter will drive cooperation. AI must be counted among the threats to a winning coalition and as a direct threat to states. AI is not just another contested technology — it is a sovereignty risk, a security event, and a potential adversary in its own right."
[0] https://cyril.substack.com/p/the-paris-ai-action-summit-2025