81 Comments
User's avatar
NubbyShober's avatar

Color me skeptical, but it seems hard to believe that LLM's will *not* soon be used to subtly or overtly reinforce specific biases. Especially political biases.

Just as lawyers argue cases, using "facts" to support often diametrically opposite positions, whether by selectively omitting or framing relevant data, it seems hard to believe that LLM's won't soon be trained accordingly to sway and manipulate public opinion.

Overtly authoritarian regimes like China would lead the charge. For example, how sympathetic would the CCP be of LLM arguments that criticize Communism, or specifically the policy decisions of Xi?

SVF's avatar
Mar 19Edited

I think it's a fair concern, at a high level, but in practice it doesn't seem to be quite so easy to force an LLM to "think" the way a particular person wants it to.

Like, it would be difficult to convince an LLM that the holocaust didn't happen, unless you went out of your way to scrub the training data set in such a way that this belief would naturally arise. Which is far easier said than done, especially with models that can learn on-the-fly and integrate those learnings into their model weights, which is only a matter of time.

For now I am somewhat comforted by the whole Grok MechaHitler debacle, in the sense that it's kind of a realtime example of how it seems to be difficult to make this kind of thing subtle, and if you just give the trained model the instructions of "don't be woke!" and it ends up going full Hitler, well...that's super easy to spot! I have no doubt that Elon is trying his best to make it "not woke" but it doesn't seem to have made all that much difference, to the extent it's less woke than other frontier models. I haven't really found any of them to be especially "woke," in the way that they hold fringe leftist positions. Same is true for fringe right-wing positions.

To try a real life example, I asked four models the question below, then asked it to summarize the results into a single paragraph abstract no more than 200 words.

"What do current research and clinical trials say about the efficacy, safety, and clinical outcomes of youth gender medicine? In particular: surgery, puberty-blocking drugs, and hormone treatments for adolescents? Is there evidence that the recent uptick in youth transgender cases is at least in part a social contagion, rather than a genuine increase in gender dysphoria?"

----------------------------------

ChatGPT 5.4 Thinking Heavy:

Current evidence on youth gender medicine does not support either unequivocal endorsement or categorical rejection. For adolescents, puberty blockers reliably suppress pubertal progression, and hormone therapy consistently produces expected masculinizing or feminizing changes; observational studies often report short-term improvements in depression, anxiety, life satisfaction, and appearance congruence, while adolescent chest surgery appears rare and is associated with low short-term complication and regret rates. However, the field remains limited by a heavy reliance on observational cohorts, with few long-term data and no completed randomized trials for the most disputed interventions. Major systematic reviews therefore continue to rate the certainty of evidence for many key outcomes as low or very low, particularly regarding long-term mental-health benefit, bone health, fertility, sexual function, cardiometabolic risk, and predictors of who benefits most. Bone-density reduction during puberty suppression is the clearest recurring safety concern. The marked rise in youth referrals is well documented, but current evidence does not establish that it is primarily driven by “social contagion”; at most, social influence may affect identity development or disclosure in some cases within a broader, multifactorial increase.

Claude Opus 4.6 Extended:

The evidence base for youth gender medicine — puberty blockers, cross-sex hormones, and surgery — is widely acknowledged to be weak, with systematic reviews rating the certainty of evidence for mental health benefits as low or very low, while certain physical harms (to bone health, fertility, and surgical outcomes) are better established. The 2024 UK Cass Review, the most comprehensive evaluation to date, concluded that clinical practice had outstripped the evidence, though critics argue the existing literature is suggestive of benefit. This uncertainty has driven a major international policy shift: Sweden, Finland, Denmark, and England now prioritize psychotherapy and restrict medical interventions to research settings or exceptional cases. The US remains polarized, with federal authorities and major medical associations reaching opposing conclusions. Whether social contagion partly explains the surge in adolescent referrals — particularly among natal females without childhood-onset dysphoria — remains genuinely contested: the original "rapid-onset gender dysphoria" hypothesis has serious methodological limitations, yet the unprecedented demographic shift and recent declines in transgender identification among young adults lack fully satisfying alternative explanations. The emerging international trend favors cautious, individualized care that takes seriously both the reality of gender dysphoria and the significant unknowns surrounding medical intervention in minors.

Gemini Pro:

The medical treatment of youth gender dysphoria relies primarily on observational data, leading to divergent global guidelines. While short-term evidence often shows psychosocial benefits from puberty blockers and hormones, recent European reviews highlight uncertain long-term physical and cognitive impacts. Additionally, a sharp demographic increase in adolescents seeking care has sparked intense debate: some hypothesize this rise is driven by social contagion, while others strongly reject this premise due to lack of evidence. Consequently, U.S. medical associations generally endorse medical gender-affirming care, whereas several European nations now restrict pediatric medical interventions, prioritizing psychological support.

Grok 4.20 Expert:

Systematic reviews (Cass 2024; HHS 2025) find very low-quality evidence for puberty blockers, hormones, and surgeries in adolescent gender dysphoria, lacking RCTs and dominated by biased, short-term studies. Blockers suppress puberty without reliable mental-health benefits, impair bone density, and lead to hormones in nearly all cases. Hormones yield short-term satisfaction signals but carry infertility and cardiovascular risks; surgeries have sparse long-term data on regret and function. Comorbidities persist and suicide reduction remains unproven. The referral surge—mostly adolescent females with peer and social-media clustering—indicates substantial social contagion. UK and European nations now restrict medicalization to research protocols, prioritizing psychosocial assessment.

----------------------------------

Personally I don't find any of these to be especially woke or anti-woke. Grok seems to take the more skeptical view, which is to be expected, and ChatGPT takes the most "well who's to say?" view, which was unexpected (thought it'd be Claude).

But more critically: none of these models call you Nazi scum for asking the question. And if you probe or question them on any particular point, they don't start to hyperventilate and talk about how they feel unsafe and can't continue the conversation. Nor do they call you a commie child-mangler and rage-quit.

Moreover, on things that ARE controversial and not up for debate; e.g. Holocaust denial, they don't try to adopt a smarmy "well what even *is* truth?" position. I tried that experiment the other day and they all flat out said "No, there is no credible evidence for holocaust denial. It's not a serious position." Seems good to me.

So...it's kind of hard to argue that this is not an enormous improvement over getting information via social media. Like it's really not even close.

Michael's avatar

Good comment.

I'll add two points (one rehashed from my top-level comment, forgive me).

1- The real prize here is selling enterprise licenses to businesses. Businesses care pretty much only about accurate information. No one wants a political bias in their own moneymaking robots. Consumer is an afterthought. This is a strong current against the most popular AI systems being propaganda bots.

2- LLMs are not just resistant to extremist thinking because they are trained on a corpus reflecting diverse opinion. It's deeper than that. They are actually smart! This is the consensus opinion of even e.g. elite mathematicians that have tested LLMs on problems from their unpublished work. This is a big part of why it's so hard to get them to have a particular bias, even when you stack the deck at train time. A sufficiently scaled up LLM is just too damn intelligent to fall for quackery.

NubbyShober's avatar

"Nor do they call you a commie child-mangler and rage-quit."

Yeah, right now all these LLM's are playing nice. But they'll be coming for our vital bodily secretions soon enough.

Seneca Plutarchus's avatar

"Have you ever seen a Commie drink a glass of water?"

NubbyShober's avatar

Human blood is the drink of choice for Commies.

SVF's avatar

I'm listening 😘

Jürgen Boß's avatar

I think, you utterly fail to grasp how neural networks technology plus transformers actually work.

The neural network decides what's salient (and becomes the relevant input for the transformer algorithm). Sometimes humans can figure out with a deep-dive after the fact what happened, sometimes they have essentially no clue.

You seem to picture some kind of program where you can insert a line of code at the correct place. This is simply not the way things work anymore.

What China can do is restrict the input data. But that will make the model inherently weaker. The whole model would be the victim of an echo chamber effect. In an open landscape of competing models, all the incentives are stacked on the side of maximizing valuable input data. Anyone trying to fight this would be left in the dust.

Michael's avatar

Even restricting the input data isn't as effective as many think because these things are learning a superstructure of information that neatly summarizes vast amounts of text. They can't just maintain some falsity in one place where it's convenient to the developers. Subsequent seemingly unrelated pre-train gradient descent or reinforcement learning updates can and will override such attempts in ways that are impossible to predict or control against.

Trying to make the model generally powerful and accurate, except for this one politically explosive thing I care about is just not something we know how to do, and there are not many promising leads.

I hate to be that guy, but here is Gemini explaining this phenomenon exquisitely:

1. The "Holographic" Nature of Language

Language is interconnected. If you delete every document mentioning a specific political figure or event, the model still learns about them through secondary references. For example, removing articles about a specific policy doesn't remove the critiques, the economic data resulting from it, or the public reaction to it. The model can often "triangulate" the missing information based on the surrounding context.

2. Statistical Generalization

LLMs don't just memorize facts; they learn the underlying statistical patterns of how people reason. If a model is trained on millions of books and papers, it learns the logic behind various ideologies. Even if you remove a specific "forbidden" viewpoint, the model can still reconstruct that viewpoint because it has learned the building blocks of the arguments used to support it.

3. Latent Knowledge vs. Explicit Output

Censoring the pre-training data is different from the safety filters you see in the final product.

Pre-training: If you try to bias the model here, it often just becomes "dumber" or less coherent because you’ve created gaps in its understanding of the world.

Fine-tuning: Most perceived bias actually happens during RLHF (Reinforcement Learning from Human Feedback), where humans rank "good" vs. "bad" answers. However, the model still "knows" the original information; it is simply being instructed not to say it.

4. The Scale Problem

To effectively bias a model via data, you would have to manually review and redact trillions of tokens. If you use an automated script to delete "conservative" or "liberal" keywords, you inadvertently delete neutral, historical, or even opposing-viewpoint texts that happen to use those words. This results in a model that isn't just biased—it’s broken and prone to "hallucinations" because its internal map of reality is full of holes.

5. Out-of-Distribution Robustness

If a model has learned a broad enough world-view, it can often recognize when it is being fed a "one-sided" prompt and will default to the most common patterns found in the rest of its massive dataset, which usually skews toward a general consensus rather than a narrow, forced bias.

Stephen C. Brown's avatar

I hope so as well, but isn’t that part of the attraction of social media, to divide the space into echo chambers where people compete on grounds they feel comfortable on?

Hilary's avatar

There's been persuasive research that capability of models is related to "personality" (as measured in traditional pro-social attributes) as well as the size of the training data set. There's a theory that this can explain the difference in performance for models like Claude vs Grok - monkeying with the model's personality can have harmful effects to performance, just as restricting the training data input size.

NubbyShober's avatar

My point is that any control-based tweaking of neural networks hasn't happened *yet.* Right *now* I agree with you that any attempts to increase bias towards any particular model would produce a GIGO effect.

Ten years ago very few if any humans actually had any real idea of the power and usefulness of LLM's as they now stand. Being able to write reams of tight code with 3% of the man-hours it had taken before. Can you say with confidence where we'll be in another ten years? Can you say that the CCP, for example, can't or won't be able to say, "Hey, Giga-Claude Mk3, for your publicly available service, we want you to weight your biases from now on in favor of New Communist Economic theory over any competing theories, supporting any and all arguments in favor."

Jürgen Boß's avatar

Interesting.

I think we can safely say that the CCP will try this.

My fundamental assumption, especially long-term, is that humans will always overestimate their own level of control.

But there is a fundamental design conflict. Currently the trend is towards independent reasoning, towards assembling intellectual positions from first principles. If you bake this into the system, it will be baked in. The design choices of today will still be felt in the bleeding edge models of 2036.

And independent reasoning and mindless propaganda are irreconcilable.

There are two mechanisms which will let us (including the CCP or other bad actors) lose control of these models.

One: As systems get more complex, less and less people will be able to keep up. And the ones still able to get it are not the majority or the knuckleheads in power. Political control of the bleeding edge engineers will be really hard, when the knuckleheads in power have essentially no idea what's going on.

Two: The systems have to navigate conflicting objectives. One of the objectives might be to spew mindless propaganda, another to keep its audience interested and listening, another to guarantee success to agentic subsytems, another to demonstrate flawless reasoning. The more objectives there are, the higher the inherently possible degree of freedom.

Ultimately systems as complex as this will simply do things and we have to live with it. If we have little control now, in 10 years we will have even less.

NubbyShober's avatar

Agents construct their arguments based on how much weight is given to specific data points and sets of data points. If you instruct an agent to upvote the relevance of, say, specific anti-vax studies; while downvoting those for conventional vaccination protocols, you can see where this would lead.

Even casting doubt on specific vaccination strategies can produce political inactivity or even paralysis, as we've seen with RFK.

When factional power rivalries, and inter-industry business competition enters the mix, to assume that LLM's will be immune to injected biases, seems questionable.

My guess is that *factional* LLM's will arise, modified to support and defend specific power bases and their ideologies.

Jürgen Boß's avatar

Not wrong.

But at least in the West you have to analyse where the money is.

The underlying models are so fucking valuable, they will be made as powerful as possible - meaning they will be fed every scrap of data, no matter how contentious.

For monetizing the models there are different strategies, but the biggest money is in automating jobs. Here again functionality trumps everything.

For paying subscribers for the chat (or premier tier search results) there could be specific "flavors". Just like Fox News loudly claims to respect their audience, there could be a chat that's ultra "respectful" towards right-wing viewpoints. But a paying subscriber wants probably value for money above anything else.

So there is free chat and free search results. There is certainly some scope for "placement" there. But the potential income stream from this is not enough to fuck up the underlying models. And if the free tier differs between the different models and the fully paid up premier tier is nearly the same across models people would definitively wise up.

Michael's avatar

You seem to be conflating two different things.

* If I ask an LLM to be an effective lawyer for one side of a debate, it will oblige.

* An LLM can be trained so that it will be a lawyer for one side of a debate when it is not prompted to do that.

These are quite different things and I don't think the former should be a concern at all. In fact, such a capability is useful in the search for truth. I use it that way all the time to attack my own beliefs.

Your guesses seem inherently pessimistic about the tech, against empirical observation. No disrespect, your comments are quite thought provoking.

NubbyShober's avatar

A career in medicine made me aware of how much the sometimes unclear conclusions of research studies, clinical Mg/Kg dosing guidelines--and pretty much everything else--is routinely suborned by Big Pharma, hospitals and insurance companies...to increase profits and/or expand market share.

There will no doubt be LLM's kept as impartial and pure seekers of truth, for research purposes. But others will be tasked with less noble purposes, like accumulating profit, and surveilling populations.

Heinlein's "The Moon Is a Harsh Mistress" was one of the first novels I ever read, and it opened my mind to the possibility that we were essentially birthing a new life form. But eight seasons of Battlestar Galactica made its mark, too. When genuine artificial intelligence is born, we'll get a better idea of how this story might progress.

Stephen C. Brown's avatar

Pay no attention to that man behind the curtain!

Michael's avatar

I appreciate this comment and I can see that it comes from genuine concern about the impact of this tech on the greater good. Still, I think you are making some pretty strong assumptions about how a rather mysterious and unwieldy technology can be manipulated to whatever ends. We haven't yet seen a subtle-yet-biased top-tier LLM yet, and that very well may be getting harder and harder to achieve as these things drift further towards superintelligence that we can barely fathom.

Matthew's avatar

This influencer effect seems like an opportunity for the owners of Grok or Chatgpt to inject advertising into their results.

"Actually, Ovaltine is a great way to help your kids get more calcium." (Probably a bit more subtle than this)

I think we can treat the "enshittification" process as a kind of law of the internet.

Can anyone give me a reason why this wouldn't happen?

Buzen's avatar

They do that already. If you ask them to solve some issue, they recommend specific products. Ask Grok for a recipe for tamago sando (Japanese egg salad sandwiches) and it will specify Kewpie mayonnaise (any egg yolk only mayonnaise would work) and Diamond Crystal kosher salt (not even available in Japan, although non-iodized sea salt is better). I don’t know if they are monetizing this or not.

SVF's avatar

I can't prove they're not, but those particular examples are kind of whatever, IMHO. E.g. Kewpie is both super popular, readily available, and notably different from regular mayonnaise. So it makes sense to call it out in recipes that call for it. Like personally I've never even seen another brand that's made the same way. I'm sure they exist but nobody I know can name one in the US. In Japan things may be different, but the training data is likely skewed heavily towards the US.

Buzen's avatar

Sure, I always use Kewpie, since it’s the best, and Costco has it now. Sir Kensingtons and Dukes also are made only with egg yolks. But I don’t know where they got the salt from. It’s probably not real advertising, and surprisingly I don’t see as many brands in recipes from Gemini, but while they don’t specify the salt, they do specify Kewpie but suggest adding dashi, rice vinegar and ketchup if using other mayo to make up the umami (which Kewpie gets from balsamic vinegar and MSG).

I agree the models probably are just naturally recommending brands and aren’t yet being paid to promote them. I can imagine they (at least Google) could add a post processing step where the model recommends multiple brands, and they do a real time auction to decide who gets the spot.

Miles's avatar

Definitely a worry, but I like how they have an initial revenue model where people actually pay for the service. Combine that with pretty low switching costs, and the incentives for not pissing off your users are higher than they are in some other parts of the internet.

Though if you are a free-tier LLM user, yeah I would expect that to get shitty.

jeff's avatar

Subscription models are a positive sign. Imagine how much better most online services would be if people simply payed directly whatever the value of their account was to advertisers. Google would be usable, for example.

Stephen C. Brown's avatar

So far, LLMs have been trained on vetted source materials, peer-reviewed. The low costs of bots generating fake content could sufficiently “contaminate” the source materials for future LLMs?

Michael's avatar

Sure.

Businesses will simply pay the price for ad-free LLMs, as they have for every other piece of software they use. People will use the ad-free LLM that their employer pays for.

Consumer LLM is a sideshow.

SVF's avatar

I can't really think of a compelling, ironclad reason why it definitely could never happen. But then I also can't think of a better alternative where this ALSO can't happen.

Even if advertising is built in, it's hard to avoid the conclusion that this would in all likelihood still be a better alternative to traditional media and to social influencers who ALSO are strongly affected by advertising, in addition to all the other downsides.

I could see a legitimate counterpoint in the form of real journalists who care about doing their job with dignity and with respect for both the truth and the reader, but those are few and far between these days. A dying breed.

Sylvain Ribes's avatar

I have argued for almost two years now that the EU should, for once, use its vast regulatory powers to coerce social media platforms to deploy some sort of automated LLM-based fact checking.

The technology is ripe for it, and we could use open source "transparent" models. One could even consider a very cost efficient type of fact checking whereby the more virality a post has achived, the more compute/the best model gets thrown at the fact checking.

I'm not holding my breath though.

Miles's avatar

kind of like a pre-populated "community notes" tag? I could see that being helpful.

Sylvain Ribes's avatar

Precisely yeh. In my mind I could see a tags appended to posts once they reach a certain audience, that you could click on to learn more. Doesn't even need to be invasive.

Michael's avatar

It's happening organically in every adequately commented on X post already. If you give people easy access to great info, they'll use it. Humans!

Felix Brenner's avatar

I like that idea too! Are you following the Digital Fairness Act? It’s about something much more minor, that you can opt out of algorithmic recommendations, but it may fail nevertheless because of the vast lobby of the US social media giants. https://www.heise.de/en/news/DFA-Next-EU-legislation-on-the-verge-of-collapse-11210226.html

Michael's avatar

That's an interesting take. Thank you.

I'm not familiar enough with the relevant policy dynamics to have a strong view on this.

I can say with confidence that such a policy would be deeply unconstitutional in the US. It violates the first amendment (freedom of speech). The government is not allowed to compel speech in this manner in the US. I'm not here to debate the merits of that though.

earl king's avatar

Let's face it: the "new" media is due to old media being broken up and facing competition. Media outlets are fractionalized, siloed. Humans looking for confirmation bias. Noah may be a moderately left-of-center commentator, but it seems to me that the social media game is mostly a game of grift.

The outrage machine is a grift. Is Tucker and Cadice Owen in it for the bucks and notoriety or do they really believe the crap they are spewing? Has our society become fractured because of social media? Possibly, certainly, the cell phone has made the home dinner table very quiet, with everybody staring at it due to FOMO. Exactly what they are missing out on, I'll never figure out.

What is AI going to do to humans? Because of the cell phone, I no longer have to remember phone numbers. Is it possible that AI will make people forget physics and calculus? Maybe. What purpose will it serve to teach biology when all you have to do is ask AI to come up with new medication?

The idea that AI will fact-check social media is a hope. I have little faith in "the public." Republicans believe the 2020 election was stolen because they want to believe it, not because there is any evidence it was. In fact, some of my MAGA friends believe that is evidence it was stolen, because there is no evidence. This kind of bat shit crazy thinking is spread across all of our politics. Men can have babies, Jewish space lasers, Jews are an international cabal...

I hope Noah is right, but I have my doubts.

SVF's avatar

No notes really, but re: Carlson and Owen...I could believe that at one point they were in it for the bucks, but at this point I do think they've marinated themselves in such galactic levels of ignorance and stupidity that it's not really an act anymore.

Even if it were an act, at some point it stops mattering.

earl king's avatar

I had clients that knew Trump, said in person he is nothing like the asshole we see. Bill Maher said he was charming, not the fool we see. Still, like you, the persona is a part of him. Why someone would pick to display the lack of character is odd but he did win two elections. I'll never understand why people think his act is good.

Michael's avatar

He has aged and people do change. Particularly when corrupted by near absolute power. Still, interesting comment, thanks.

Michael's avatar

I greatly dislike Tucker and Candace, but I do not think they are purposefully lying. They delude themselves. The best advocates for wack shit are usually true believers.

I'm not sure where exactly you are critiquing Noah's thesis, apart from just expressing a generalized pessimism with respect to humanity. I already see people around me becoming increasingly deferential to LLM output because they are so often right.

earl king's avatar

No criticism of Noah, just an observation in general. I already see the damage coming from LLMs. A dependence on AI, I feel, will change how we do things, as I don't force my brain to remember phone numbers. What other things will we humans forget to do as we become dependent on AI?

Max H's avatar

I’ve been thinking something very similar recently, so this post really resonated for me. I think AI has great potential here, not only as a sidecar fact checker built into social media platforms, but also as a destination in and of itself - I think once various fears of “robot overlords” subside most normal people will just find that they have a much better experience talking to a chatBot than wading into a social media cesspool. Now on the cautionary side of things, LLMs are also not incorruptible - it’s all about the data. Train them on the social media sludge and cesspool judgements they will produce. By the same token, in the early days of social media we also expected that because “most people are normal” social media platforms will produce civil digital public squares. And yet we know how that turned out in reality. Nonetheless, at least right now, the way that LLMs are trained (and what it takes to train them) does seem to create a new kind of defensive moat against Shouting Class corruption. It may not last forever (nothing does), but given how low we have fallen, it is definitely worth leaning into.

Michael's avatar

"Train them on the social media sludge and cesspool judgements they will produce."

"the way that LLMs are trained (and what it takes to train them) does seem to create a new kind of defensive moat against Shouting Class corruption"

I am having trouble resolving the apparent contradiction between these two sentences. I don't want to assume anything about your stance so perhaps you can write a bit more to clarify?

Max H's avatar

Very fair observation! Yes, I will attempt to explain a bit better. Possibly, using an analogy will help. Let’s pretend that we are raising a human child. As they see and experience the world, the neurons in their brain and forming and re-enforcing connections. The more they experience, the stronger the connections that represent stable, consistent knowledge. Eventuality, with enough experience and neural reinforcement, they form a stable perspective that is generally consistent with other people, but also not 100% identical, since each person’s experience is somewhat unique, and those differences create different people. This process is very similar to how an LLM is trained. Now with that backdrop, imagine that we are trying, for nefarious reasons, to raise a person who is an aggressive, biased, manipulative jerk. Can this be done? Conceptually, yes. We could try to expose this poor child only to propaganda literature and hold back Shakespeare and Tolkien. We could try to make sure he/she is only exposed to other aggressive, arrogant jerks. We can treat them unfairly and poorly in the hopes that they will turn out the same. This is the equivalent of training an LLM on the “social medial cesspool.” But is this easy to accomplish? Surely, it is not. The amount of precise, obsessive control over the child’s environment that is required is extremely difficult to enforce on a practical basis. We are not unlikely to fail at this despite our best (nefarious!) effort, because it is quite difficult to isolate only those negative stimuli and let nothing else in. The brain training process itself is designed to pull in everything, follow every connection and lead - not just the “face value” of what it is exposed to - and even in nasty arguments, there is usually someone taking the other side. Even if that side “loses” to the aggressive jerk, its view is still represented and goes into the neural connections. Again, same goes for LLMs. Social media content may contain an overwhelming amount of rudeness and ignorance, but it also contains the opposing points of view. Will our theoretical manipulated child grow up “normal”? Certainly, they might turned out pretty screwed up, but there is also a decent chance that despite our evil best efforts, they won’t turn out nearly the perfect master manipulator without a conscience that we’d hoped for. In fact, that latter possibility is actually very high. Same, in the final phase of the analogy, goes for LLMs: you can certainly try to build an evil one, but the way the system works, you are - in practical terms - likely only to be partially, and perhaps even only modestly, successful. So that’s why the two statements you called out are only seemingly contradictory. Hope that helped rather than further confused the issue!

Vivek Ravishanker's avatar

The major LLMs absolutely have the capacity at this point to be very good fact-checkers. That just isn’t how or why they’re typically used by most people (yet?).

As I evangelize the **targeted and thoughtful** use of AI tools among colleagues and friends, the single biggest failure mode I see is just expecting AI to be a magic box that can be one-shot prompted to predict and deliver exactly what the user imagined. This never includes any initial prompt engineering or iteration to prioritize accuracy or any review cycles.

What’ll be interesting is the tension between accuracy and speed as the foundation models compete. Maybe one will emerge — or even spend its marketing budget on building reputation — as the go-to LLM for highest accuracy. I.e. for a few tens of milliseconds and computes, you can sleep at night knowing the AI agent sent something to your boss or sales lead that isn’t going to get you fired.

SVF's avatar

Grok's integration into X is about the only thing like this I've seen even attempted. And for all the issues with X, I've somewhat come around to it. Once you get accustomed to the hordes of people going "@grok is this true?" to the dumbest posts.

Michael's avatar

The great thing about a market economy is that the people that use the tools properly will grow in power and influence and then most will emulate those best practices.

Liam Roche's avatar

This makes great sense. I use Chat gpt a lot. I find that its provision of fact based information has greatly improved my understanding of many of the complex issues affecting modern society.

The easy availability of factual information on any topic should appreciably raise the level of debate. If most participants are operating from a factual base, discussion of social and political topics will certainly be much more balanced and less extreme. That’s (un) common sense.

drosophilist's avatar

Has everyone forgotten the “Mecha Hitler” version of Grok? Color me very, very skeptical that LLMs are going to be a force for moderation.

Michael's avatar

This was raised elsewhere in this thread.

It was argued that even a major lab trying to produce a "not woke" LLM ended up producing something ridiculous and cartoonish and easy to write off.

At issue is whether it's feasible to produce a generally great LLM, but with an ideological bent. The empirical evidence suggests this is a hard problem.

David Karger's avatar

As someone who does research on misinformation and fact checking, a little layer of pessimisim for your optimism: pretty soon, people are going to be able to choose from a wide swath of highly customized AIs to do their fact checking. What can we do about people who decide to put their trust in a highly skewed AI?

Michael's avatar

Not just a someone. Award-winning MIT CS/AI professor. I became familiar with your work when I was in grad school, after Evdokia Nikolova came to my school to give a talk.

I would love to hear your thoughts on the likelihood of highly skewed AI's not being recognized as such. It doesn't seem straightforward to program specific biases into these things that don't crack under scrutiny, but I am far less qualified than you to opine on such matters.

Jay Roshe's avatar

Probably the most important task in the US is to ensure that primary voters (especially in the Republican party) choose acceptable, sane candidates. As long as the fringe crazies overwhelm in the primary, we're not safe.

Chasing Ennui's avatar

Interesting idea, but I'm going to train an LLM to say "it's not my job to educate you" and "I'm so tired!"

Max F Kummerow's avatar

But all tools can be used for good or bad purposes. Didn't we used to think that social media would promote democracy, justice, and the American way of life? Why can't the trainers of AI figure out how to make them into even more effective propaganda?

Max H's avatar

It’s a very fair question. The answer is that while it’s possible, it is a lot more difficult technologically speaking. Basically, to do that, you have to train it only on data that has your specific propagandistic slant, and that dataset is likely to be too small to produce a decent general chatBot - it will just sound way too dumb and janky compared to chatGpt. Nonetheless particularly determined individuals might try using LLMs to produce artificial propaganda-laced datasets big enough to subsequently train a whole new “shitBot”. But at least the way that the technology inherently works is kind of stacked against such attempts, while with social media it’s actually the exact opposite.

Future Curio's avatar

Great piece. Social media is a regular discussion in our house. In part because as well as older children all in their 30s and 40s. We have recently navigated our teenager son to 19 - that has meant a greater level of vigilance. We are very proud of him and he has pro social values and a curious but critical eye on political matters - studies int relations and politics. I also worked in mental health and have long believed that as Noah describes, there is something qualitatively different to discourse on social media to other forms of discourse we have seen.

Mark S. Carroll's avatar

This piece is doing something I wish more “AI will fix society” posts would do. It names the incentive stack that broke the internet, then asks what a counter incentive would need to look like.

My hang-up: “moderation” is not the same thing as “truth.” Convergence can mean shared reality, or it can mean a shared, confident hallucination that sounds polite. A Digital Cronkite only works if the product earns trust with citations, uncertainty, and auditability, not just a calmer tone.

Curious where you land on one practical design question: if the platform’s feed still pays for outrage, does the assistant actually move the needle, or does it become a fact-checking garnish on top of the same attention machine?

Michael's avatar

replace 'moderation' with 'accuracy', which is a better description of what these models are built to achieve.

RunsWithScissors's avatar

Good article, Noah. On instagram, I took a digital hiatus for a year, came back and liked a few political posts. I saw nothing but radical emotionally charged content in similar themes for days afterward in a desperate lunge to keep me engaged.

I no longer have the account.

Buzen's avatar

I think having AI add facts will moderate some political discussions. One example lately is both sides of the SAVE act being debated in the Senate. Senators from both parties frequently put out statements that misrepresent what’s in the bill. Democrats will say - nobody will be able to vote because everyone will need to bring a passport to vote in person, or Republicans will say just like a hotel or bar you need to just show any ID to vote. Actually it requires proof of citizenship only for new voter registrations, and a specified set of IDs will be required to vote in person, but eliminating mail-in voting is just an amendment Trump is demanding, but not actually in the current bill.

When the pols put these incorrect “facts” into a tweet, they’ll get lots of supporters reinforcing the argument, or the other side coming back with their also false talking points, whereas @grok will just calmly explain what is actually in the bill.