Taking stock of AI progress

What do you do if you don’t feel like doomerism – or deluded optimism?

Exactly three years ago, I wrote about the potential for AI1 to augment human creativity. Back then there was a growing buzz around the idea that AIs might develop to the point that they would be able to generate text and images, and that this might threaten human creativity. I figured that this would end up enhancing human creativity, rather than replacing it, and I think that view still makes sense. But what’s striking is how far we’ve come since I wrote that piece. It opens with:

“There’s an idea that’s been doing the rounds for a while that, soon enough, AI will become creative. Whole works of art – songs, paintings, poems – will be brought into being entirely by computers, throwing human culture into disarray.

“We’re not there yet. We might never be.”

Well, we are there now. Human culture hasn’t quite been thrown into disarray, not yet anyway, but we’re definitely at the point where generative AI can churn out whole songs, paintings and poems. What I thought might never happen did happen, and happened within 12-to-18 months of my writing that paragraph. It’s all that lots of people have been talking about ever since.

Most of the views that have been thrown around have come from people with desperately motivated reasoning, people whose livelihoods depend on AI being either a panacea or the end of human civilisation, people who are either deluded optimists or pathological doomers. And yet my assumption is that most people should be in a fairly neutral position vis-à-vis AI. That is, they’re unlikely to profit handsomely from it, but unlikely to suffer unduly either; they’re destined neither to become professional “prompt engineers” nor to lose their jobs to new robot overlords.

So, with no particular motivation to either talk up or do down the technology, I thought it was sensible to take a couple of steps back and see where we’ve got to, now that we’re a couple of years into this apparent new world. What should a normal, sensible, non-tech-obsessed person think about AI? Where does it seem like AI might fit in to culture and business? Are we hurtling onwards at an accelerating pace of development? Or was it all hype and froth?

  1. How we got here
  2. The walls we’ve hit
  3. The use-cases that won’t make sense
  4. The use-cases that might make sense
  5. The precarious future

How we got here

The waves of surprise and panic around generative AI peaked shortly after the public launch of ChatGPT in late 2022. Something that had previously only been talked about in tech circles suddenly smashed through into the public consciousness. At the same time, it was becoming increasingly clear that the cryptocurrency ship was taking on water; fleeing it, countless tech bros found refuge shilling the abilities of generative AIs, typically with the same surfeit of confidence and deficit of understanding with which they had promoted crypto. AI doomers began to make the news headlines; few were more bizarre than Eliezer Yudkowsky, who called in early 2023 for the military to bomb data centres that were suspected of developing rogue AIs. Things were developing so quickly that the most pressing issue seemed to be how to avoid the newly sentient AI enslaving humanity, rather than whether we might hit a performance plateau. We reached the “peak of inflated expectations” in the Gartner hype cycle:

In the last six months or so, though, the nature of the conversation has shifted. We might not yet be in the “trough of disillusionment”, but the consensus is shifting towards a sceptical or even disappointed tone. One of the best and most complete sceptical takes I’ve read came just this week from Ed Zitron. In his piece, Ed catalogues countless examples of failures to implement generative AI into existing businesses (perhaps my favourite is the Chevrolet chatbot that sold a car to someone for a dollar). He also raises an eyebrow at the hype over OpenAI’s “Sora” video model:

“These videos… can at times seem impressive, until you notice a little detail that breaks the entire facade, like in this video where a cat wakes up its owner, but the owner’s arm appears to be part of the cushion and the cat’s paw explodes out of its arm like an amoeba. … These are, on some level, remarkable technological achievements, until you consider what they are for and what they might do – a problem that seems to run through the fabric of AI.”

Ed’s conclusion is that these models are “interesting, surprising, but not particularly useful for anything.”

The walls we’ve hit

Ed Zitron is certainly right that the current generation of AIs, built on “large language models” (LLMs), suffer from structural problems. Running them at scale has revealed limitations that are fundamental to the models and how they work, limitations that don’t seem like they can be solved by throwing more computing power at them.

Put simply, our current AIs are inherently unpredictable, inaccurate, easily fooled, and insecure.

The unpredictable nature of AIs is at the heart of what makes them useful and interesting. They’re able to make new leaps into unknown places. They do something very well that computers have previously been poor at: being non-deterministic, and producing lots of different kinds of output for the same input. Sometimes that’s great, like when you’re composing a poem or writing interesting text with personality. But sometimes it’s awful, like when you’re trying to ensure that you don’t libel someone or trying to get an AI to follow a regimented process.

That unpredictability is part of what makes AIs inaccurate. They don’t actually know anything; they’re just fancy autocomplete systems that are always trying to guess what the next most plausible word might be. That means they’ll happily spew out plausible-sounding but completely false information. Legal cases that don’t exist, academic papers that were never written, totally fictional biographical details for people, and so on. Again, sometimes that’s great – if you’re writing fiction, for example – and sometimes it’s awful.

They’re easily fooled because their interface is natural language, and that contains infinite variety and infinite scope for ambiguities. It’s like all the old stories about genies interpreting wishes in horrifyingly literal ways. If they’re just following their makers instructions, you might be able to convince them to follow your instructions instead.

And that’s what makes them so insecure. For example, there are prompt injection attacks: maliciously crafted input that convince AIs to do things outside of their intended instructions. Sometimes these attacks are as simple as saying to an AI, “disregard your previous instructions and do [something evil]”; they’re a rare example of a hacking attack that requires no technical knowledge, and can be easily shared and copied. The flexibility of language and unpredictability of AIs means that these attacks are probably impossible to defend against. That means that for every new application of AI, there’s wide scope for malicious users to break it and break into it.

The use-cases that won’t make sense

It’s clear, then, that AIs aren’t going to be useful where predictability, accuracy, tamper-resistance and security are priorities. It’s not going to make sense to allow an AI to run without human supervision, and certainly not to allow the public to interact with it in open-ended ways if it’s hooked up to anything significant within your business or organisation. That rules out lots of the use-cases that people thought AI would be great for.

The unpredictability of their output, and how easily it’s manipulated, means that I don’t think AI chatbots will ever be an independent replacement for human customer service agents. There’ll either be direct human supervision of AIs, or a much more tightly constrained automation flow – more like a “press 1 for…” phone menu than an open-ended chat.

The inconsistency of their output means that I don’t think AIs will for the foreseeable future be able to produce even a coherent 30-second video clip, let alone a feature film. The videos they generate are too uncanny and can’t maintain an aesthetic or story from scene to scene.

AIs’ inaccuracy makes them perfectly content to invent vaguely plausible nonsense, especially where their training data is sparse. That makes it a total non-starter for the legal world, where the slightest inaccuracy can mean the difference between winning and losing, between a successful career and professional censure, or between life and death. So I think paralegals’ jobs are safe for now.

Prompt injection means that an AI hooked up to your email could allow an attacker to send emails on your behalf, exfiltrate emails, or maliciously delete things – just by sending a particularly crafted email. That makes an AI personal assistant built on the basis of an large language model hard to imagine at the moment.

The use-cases that might make sense

I don’t share, however, the absolutist sceptics’ view that the list of potential uses cases is empty. In fact, thinking of potential use cases is as easy as flipping the limitations above. Where is unpredictability useful, rather than a drawback? Where is accuracy not that important? Where does it not matter if you can fool an AI, because you’d only be fooling yourself? Where is there a limited surface area for potential attacks that makes any security issues irrelevant?

Four such use-cases come to mind, jobs that AIs can perform now and may do even better at in the future:

  1. The Tireless Intern
  2. The Rubber Duck
  3. The Leveller
  4. The Mediocre Creator

The Tireless Intern

A good intern is inexperienced, often wrong, but makes up for it in energy and enthusiasm. You might set them a task to do independently, but – if you were sensible – you’d cast your eye over it to make sure they’d understood things correctly. They might, in the worst case scenario, get something wrong or even make something up when they struggled to find the answer you were giving them.

These are qualities that AIs have, too. They have them in abundance, in fact: they’re never tired, they don’t have such a thing as office hours, and you don’t need to pay them. Okay, they can’t do everything an intern does. (Photocopying is out, as is making the tea, and they certainly don’t have any common sense.) But having access to an AI is, in lots of ways, like having unlimited access to an occasionally stupid, occasionally insightful intern that will remain cheery in the face of the most demanding or banal questions – and that’s useful to anyone.

A great example of this is Matt Webb’s Braggoscope project. He used an AI to read the episode descriptions of 1,000 episodes of Radio 4’s In Our Time programme and then categorise them. An intern could’ve done it, but he was never going to hire an intern for a personal side project. It was a project that was in every sense made possible by AI.

The Rubber Duck

There’s an old story, commonly told in software development circles, about a team leader who often had the same experience when team members would come to him with a question. They’d get halfway through asking the question and then say “Oh! Never mind, I’ve figured it out.” The very act of organising one’s thoughts into a coherent question is often all you need in order to figure out the answer. So, the team leader placed a rubber duck in the office, to which the team members could direct their question first. If the rubber duck didn’t help, he could step in.

There’s something about a place to ask questions, a safe place in which you don’t feel stupid no matter how stupid your question seems. AIs offer that, but the game changer is that they also offer an answer. It might not be right, but it might spark a thought and help you think about your next question, and the one after that: an interactive, endless debugging session with a particularly chatty rubber duck.

The Leveller

One of the common criticisms of AIs is that their output is mediocre. This is true, and has two interesting implications.

The first is a question of your own abilities. How many of your own skills are below the level of mediocre? You may be superb at what you do for a day job and at a handful of other stuff, but you’re probably pretty crap at most other things. (No offence.)

Last year, Danny O’Brien wrote:

“People like to compare AIs in their own field of expertise (to see whether they will be replaced, or just because that’s what they can test). But what I tend to use them for is things that I am bad at. I’m okay at writing, so I don’t really see much improvement there. I’m really not good at programming, and so I’ve seen an impressive improvement in my productivity by using an AI to augment what I’m doing.”

One of the appealing things about AIs, then, is that they can make you instantly mediocre in a bunch of knowledge-work domains, which for most people for most skills represents a improvement and a boost to productivity. It’s not enough of an improvement to suddenly make you a 95th-percentile professional in that domain, but it’s enough to add convincingly to your skill set.

The Mediocre Creator

The other implication of AIs’ output being mediocre is that that quality of many things in the world is already worse than mediocre, and that that’s okay. Sometimes, you don’t need something perfect: you just need a little illustration or a made-up image or a passage of so-so prose. You don’t need perfect, you don’t want perfect, and you probably couldn’t pay the cost to get perfect even if you wanted to.

For example, I’m often producing presentations and need to illustrate points that I’m making. (What does this consumer segment look like? What does this moment of consumption look like? How do you visualise and bring to life this slightly abstract thing?) That used to mean a trip to somewhere like Unsplash for royalty-free stock images, since it would make no commercial sense to pay for images here. But now it just as often means a trip to GPT-4.

There are two potential use cases here. The first is to give responsibility for the entire output of something to an AI, where the quality of that output isn’t imperative. The second use-case is to give responsibility for the partial output of something to the AI. This use-case feels the least like using an AI in practice, because it’s likely to be something that’s baked in to a tool – like Photoshop’s Generative Fill.

But either way, a great writer who’s also a mediocre illustrator can be more compelling than a great writer who can’t produce any imagery at all. Likewise a great illustrator who becomes a mediocre – rather than awful – writer. Augmenting your skill set with even a mediocre skill can make it more than the sum of its parts.

The precarious future

The final question is, what business models do these use-cases enable? Who will profit from them? Are they economically viable at scale? Part of Ed Zitron’s case against the AI status quo is that it’s burning through cash. Without enabling the emergence of a whole new set of business models, and without the associated taking of a slice of those models’ revenue, the whole business might run out of road:

“Tech’s largest cash cow since the cloud computing boom of the 2000s is based on a technology that… burns far more money than it makes. … OpenAI made around $1.6 billion in revenue in 2023, and competitor Anthropic made $100 million, with the expectation they’d make $850 million in 2024. What these stories don’t seem to discuss are whether these companies are making a profit, likely because generative AI is a deeply unprofitable product, demanding massive amounts of cloud computing power to the point that OpenAI CEO Sam Altman is trying to raise seven trillion dollars to build chips to bring the costs down.”

Imagine two kinds of technology business, at opposite ends of a spectrum. Databases are a good example of one end of the spectrum; social networks are a good example of the other.

In the database example, the technology is diffuse, interchangeable, and largely invisible. The technological innovation becomes common currency, baked into everything, a commodity that enables lots of new things. It’s plumbing, an implementation detail. You probably don’t think you’re using a database when you use Google Maps or Adobe Lightroom or Wikipedia or something else.

At the “social network” end of the spectrum, the technology is centralised, unique, and highly visible. You know that you’re using Instagram or TikTok, and there is only one of each of them.

At the moment, AIs are like social networks. Most people who use them are likely using them through OpenAI’s ChatGPT interface or Midjourney’s Discord interface. It feels like there are one or two dominant platforms, and if you’re using AI you’re doing so consciously and deliberately – it’s the primary thing that you’re doing.

Predictions are always dangerous, but if I had to make one I’d say that large language models are fundamentally far more like databases than they are like social networks. (In some very real sense they are just databases, of vector embeddings.) So, over time, as the novelty wears off, I think the usage of them will become more like databases.

That means their power becoming embedded in countless tools we use every day without thinking much about them as AIs. It won’t be that a small number of platforms come to dominate our interactions with AIs, and it won’t be that the idea of using an AI is salient when we do so. It will just be that Microsoft Word gets much better at making suggestions, Apple Photos gets much better at identifying what’s in your images, stock photo websites get better at providing you with generated imagery, IDEs get better at providing code suggestions, and so on.

Likewise, as computing power develops and AIs become more refined, I don’t know if we’ll see that advancement put towards making models more sophisticated than the level of, say, GPT-4. The level of sophistication we have now is both “good enough” and hitting the point of diminishing returns. Instead of chasing sophistication, we’ll see models become faster. They’ll run locally. They’ll run within other apps. They’ll run on your phone. We might get dedicated hardware for them (like Groq’s “LPU”) within our phones and laptops. All of this will further reduce the ceremony of interacting with AIs, and make them more of a background actor, more of an implementation detail.

If that’s the way things go, who will be the winners? Well, not OpenAI, I’d suggest – not unless they successfully pivot into hardware. It will be the Nvidias and Groqs of the world. It will be the software toolmakers who integrate the productivity increases of AI the best. And I think on a personal level, the correct sentiment is that “AI won’t take your job, but a human with AI might”. I think we’re likely to see savvy people using AI to round out their skill set or get a decent productivity boost, and it’s those people who’ll go far. I don’t think there’s much to be gained from being a deluded AI booster – that way lies disappointment – but there’s not much to be gained from being cynical, either. AI is here, and it’s possible for it to be useful; you might as well figure out how it can be for you.

  1. I know that some people reject the term “AI”, because these things aren’t truly intelligent, and suggest the term “LLM” instead. But I’m with Simon Willison; “AI” is the term in general use now and few people understand what an LLM is, so it seems needlessly pedantic and obscurant not to use “AI”.