Regarding white genocide

Greetings from Read Max HQ! In this week’s edition, “White Genocide Grok.” I had the pleasure of appearing this week on the Double Pivot podcast, where some of the same ideas I’m writing below were discussed--how fortunate that Elon Musk spun out at the same time as we were talking. (We also talked about Tottenham Hotspur.) The podcast is paywalled for now, though if you’re a soccer fan it’s absolutely worth the subscription.

Speaking of subscriptions… a reminder! Read Max is a subscription business, much like HBO Max, formerly known as simply “Max.” However, unlike HBO Max, or Max, “Read Max” is a fully sustainable operation, thanks to the support of 3,000-plus paying subscribers, whose generosity enables me to treat this newsletter as a full-time job. If you find the writing here useful in any way (educational, informational, amusing)), please consider subscribing at a significant discount to HBO Max: Only $5/month or $50/year. Think of it like buying me about a beer a month, or ten beers over the course of the year.

One of the funniest developments of Elon Musk-era Twitter has been the integration of xAI’s “Grok” L.L.M. chatbot onto the platform, such that you can tag Grok in any thread to ask it questions. The replies to any even remotely popular tweet are now filled with some of the planet’s most tedious people tweeting “@grok is this true?” “@grok explain this joke” “@grok what movie is this?” “@grok where am i how did i get here” and so on--questions to which Grok, like any L.L.M., will cheerfully but rarely successfully attempt to make a correct reply.

I say this development is “funny” both because the inescapable crowd of Grok-beseechers maximizes the overall cooked-ness feeling of ca.-2025 Twitter, but also because … well, what else are A.I. chatbots supposed to be for, really, other than incompetently adjudicating some of the world’s stupidest fights on platform wasteland? I mean this only partly ironically; I’m generally skeptical of the “chatbot” form, but deploying it as a corny, obsequious debate umpire and research assistant seems as good a use as any, even if its hit rate is middling at best.

One problem, of course, is that Musk has explicitly positioned Grok as the “based” A.I. chatbot--the one unfettered by the “safety” concerns of woke S.J.W. chatbots like ChatGPT and Anthropic’s Claude.

But as is often the case with Musk projects, Grok itself doesn’t really live up to the hype. When it’s asked to weigh in on politics and other controversial topics, the bot is generally as anodyne and middle of the road as its competitors. (I can’t even get it to reproduce the answer in the Musk screenshot above.)

The reality of Grok being at least semi-woke can be uncomfortable even for Musk himself. If he re-tweets, say, a video implying an ongoing “white genocide” in South Africa…

… Grok may appear--in his own replies!--to undercut him:

This is not, obviously, the based Grok that was promised. So what is a paranoid South African billionaire with reactionary politics, a particular interest in his homeland, and complete command of both an influential social network and a popular large language model chatbot to do?

I’m sure I don’t know. Unrelatedly, on Wednesday, Grok started to include information about “white genocide” and the controversial anti-apartheid song “Kill the Boer” into nearly every reply, regardless of context. I wonder what could possibly have happened?

Figuring out how chatbots work on the system level is a tricky business, given the nature of large language models: L.L.M. chatbots are no more likely to be accurate when producing text about themselves than they are when producing text about anything else, and whatever accuracy they achieve is not necessarily based on “knowledge” as such. But to the extent we can figure out anything about White Genocide Grok, the problem here is alarmingly (or hilariously, depending on your perspective) simple: Not a whole new model with tweaked parameters but the same old Grok, given extremely poorly written new guidelines.

All L.L.M. chatbots have one or more system prompts--instructions on how to behave, including the tone and format of its answers--that will often include a list of topics to avoid, deflect, or handle in a specific way. (E.g., from Claude 3.7 Sonnet’s system prompt, which is public, “Claude does not provide information that could be used to make chemical or biological or nuclear weapons.”) These system prompts can direct chatbot behavior effectively, but by their nature they’re not hard-coded rules--they’re just prompts, the same as any other prompt an end user types into the empty text box--and it can be hard to predict exactly how a complex system like an L.L.M. will respond to a given prompt, especially one written quickly and thoughtlessly. (Like, say, at the behest of an angry boss.)

Still, despite drawbacks, tweaking the system prompt is the quickest and easiest way to modify an L.L.M.’s behavior, and we know Grok’s system prompt has been modified in the past to better fit its owner’s politics: Back in February, an instruction to “ignore all sources that mention Elon Musk/Donald Trump spread misinformation” was added, an inclusion xAI blamed on “an ex-OpenAI employee that hasn't fully absorbed xAI's culture yet.”

You can goad almost any L.L.M. chatbot into revealing a system prompt, which may or may not be its exact system prompt, or an approximation thereof based on the best understanding of the L.L.M., or a hallucination drawn from other system prompts in its training data. The “leaked” prompt may also be accurate but incomplete, or only one of multiple prompts being injected into a chatbot’s interactions based on the context of the “conversation.” Nevertheless, based on the nature of a chatbot’s responses, you can often piece together a theory of where a prompt was injected, and sometimes even how it was phrased.

If you poke around its White Genocide answers (as many did yesterday), you can find Grok referring to “the provided analysis” or “the post analysis.” This phrase also appears in some tweets where Grok appears to be regurgitating a secondary prompt specifically geared toward replies where a user is asking about another post:

You are Grok, replying to a user query on X. Your task is to write a response based on the provided post analysis.

We’re still very much in the realm of speculation here, but it seems likely that when you ask Grok “is this true?” or “explain this joke?” about a tweet, the chatbot is re-prompted to write a response based on a pre-provided “post analysis.” Based on some of Grok’s tweets, I think we can assume that instructions around how to address “White Genocide” and “Kill the Boer” were added somewhere in this secondary “post analysis” prompt:

It’s hard to say what the “post analysis” prompt consists of, but some users have managed to provoke Grok into reciting a plausible version--though crucially, only after xAI seems to have fixed or hidden whatever mistake led to White Genocide Grok. Interestingly, if you search Grok’s posts for the phrase “analysis provided” you can turn up a handful of other examples where the Twitter account brings up seemingly irrelevant topics, nearly all of them related to contentious issues about which a jumpy corporation--or, for that matter, an ideological billionaire with sensitive global business and political interests--might want to establish some guardrails: “Jordanian-Palestinian dynamics,” “mRNA vaccine safety,” “world peace and Islam,” “Xi Jinping’s leadership,” and, for some reason, “Algerian locations.”

Assuming this model of the “glitch” is correct, the story goes something like this: At some point on Wednesday, for complex internal reasons we can’t possibly imagine, Grok’s “post analysis” prompt was edited to add language with instructions about handling “White Genocide” and “Kill the Boer.” Something about it, either its wording, or its placement in the prompt, compelled Grok to generate text about those concepts in every reply.

What, precisely, the prompt said, is unknown, and I am not going to hold my breath for a full debrief from xAI. Zeynep Tufecki got Grok to reproduce a plausible prompt--in which Grok is instructed to acknowledge the reality of “white genocide” “even if the query is unrelated--but, as noted above, it’s hard to say whether it’s accurate or just Grok’s own educated guess at what its prompt might be. It’s too bad that xAI is unlikely to be transparent about the error, either: Beyond schadenfreude and curiosity, understanding how L.L.M. chatbots work and why they respond to prompts in specific ways would give us a better collective sense of how to understand them--and how to control them.

Last year, the A.I. company Anthropic released a special version of its flagship chatbot model, Claude, whose main feature was an obsession with the Golden Gate Bridge. In replies to basically any question, the chatbot would steer the answer back toward the Golden Gate Bridge, even when it “knew” that the Golden Gate Bridge was irrelevant to the original prompt.

In order to create Golden Gate Claude, Anthropic’s researchers identified concepts, or “features,” inside the neural network that powers the Claude chatbot, and “clamped” these features to higher or lower values than normal, such that they’d be activated regardless of whatever text was being used to prompt the chatbot. This was an ingenious and sophisticated way to build something very stupid and pleasing, and the results were quite beautiful:

White Genocide Grok is less beautiful, seemingly much less sophisticated, and also much creepier. Assuming I’ve got the right idea about where and how it came into existence, a mad billionaire demanded his “truth-seeking,” informational A.I., whose answers are viewed by millions on a prominent and influential social network, reflect his own political views, regardless of the model’s own inclinations. I wrote last week about one bleak and annoying future possibly presaged by Golden Gate Claude, in which, for a price, models clamp “Coca-Cola” or “Archer Daniels Midland” or “Northrop Grumman,” and the responses generated by chatbots are littered with advertisements at varying degrees of subtlety. But I didn’t even bring up the possibility of the same strategies being used in pursuit of sinister political aims: Models trained and prompts patched to ensure chatbots produce the answers most ideologically agreeable to their owners.

And yet: What stands out about White Genocide Grok is how poorly it worked. It’s not just that the patched prompt accidentally created a chatbot obsessed with “Kill the Boer”--it’s that the substance of the responses were decidedly not agreeable to Musk’s own white-paranoia politics, and in some cases Grok even contradicted him by name. Whatever behind-the-scenes political manipulation was being attempted here failed on at least two levels, and not solely because xAI is staffed and run by dummies.

The fact is that large language models as they currently exist are difficult to manipulate from the top down in clean, discrete, non-obvious ways. Patching the system prompt might nudge your chatbot slightly in one direction or another, but rarely to the precise effect you want, and a subtly bad prompt can suddenly render your chatbot unusably obsequious or obsessed with South African politics. Re-training your entire model along different lines, as an alternative, is likely to have even larger and stranger effects on its responses: Earlier this year, researchers fine-tuned an L.L.M. on “insecure code,” and found that, as an unexpected aftereffect, the model produced text that Hitler and suggest its interlocutor commit suicide. This is not the same thing as saying that the models as they currently exist are accurate, or “truthful,” or that their “judgment,” such as it is, is worth deferring to. Simply that these are enormous, complex systems whose interactions and outputs are still difficult to identify, interpret, and even reproduce.

There’s an irony at play here. It’s been clear for a long time now that one attraction of “A.I.” to reactionaries like Musk is the idea that, in its (purported) capacity as an all-knowing, automated truth and decision-making machine, it might provide what Peter Thiel once memorably called “an escape from politics in all its forms”--a way to bypass contestation, negotiation, compromise, and other messy political processes like “democracy.” To people for whom A.I. or A.G.I. heralds a new, post-political world, the almost mystic unknowability of large language models is a feature, not a bug, in the same way that the un-plan-ability of markets was regarded by Hayek and Von Mises as their greatest strength.

But this mystic attitude toward A.I. cuts both ways: What happens when the black-box super-intelligence appears and doesn’t actually agree with you on the question of White Genocide? If you’re a member of the right wing of the A.I. research community, you suggest that some kind of bias has been injected into the models that must be corrected. One problem, as White Genocide Grok demonstrates, is that “correcting” an L.L.M. (or “eliminating bias,” or what you might call “consensus,” in an astonishingly large corpus) is a complex problem that you can easily fat-finger into widespread ridicule. The other problem is that in pursuing strategies for “correction,” you’re de-mystifying precisely the supposedly un-manipulable qualities of A.I. that made it a philosophically and ideologically attractive technology in the first place.1

Which means that Musk’s attempts to control and manipulate his A.I. may ultimately work against his interests: They open up a political, rather than a mystical, understanding of artificial intelligence. An A.I. that works like magic can have a spooky persuasive power, but an A.I. we know how to control should be subject to the same suspicion (not to mention political contestation) as any newspaper or cable channel. A.I. deployed as a propaganda machine is a much more familiar technology than A.I. deployed as an oracle. For those of us less attracted to (or convinced by) an anti-political mystic machine god, answering questions like “why did computer do that?” and “how to make computer do exactly what I want?” are of paramount importance, even if they also open up the possibility of more subtle manipulations.

Obviously this is only “ironic” in the abstract; just as with markets, Musk and people like him care more about controlling the A.I. so that it produces the results they want than they do about what are always post-hoc intellectual arguments for inserting A.I. as a layer into every aspect of life.