Written by:
Date:
April 10, 2025
Following our previous Insight, we invited Dr Arianna Dini, a political philosopher specialising in AI ethics and policy, to debate Johannes Castner, the Director of Intelligent Systems at Towards People - an organisation that sees AI not as a replacement for human strengths, but as a catalyst for them, about implementing AI into our research process.
Arianna: As you’ve mentioned in CRI’s previous Insight, LLMs are, currently, unreliable and prone to high rates of error even when they are given the simple task of summarising news reports by one specific source. The evidence continues to indicate that LLMs are unreliable research tools, as with this BBC study, or by this recent paper by OpenAI. I’d recommend this hugely influential paper, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Bender et al., 2021). A team of linguists and computer scientists demystify the mechanics behind LLMs, explaining that LLM inputs and outputs are a function of correlations of word combinations. In contrast, human language and texts contain an ingredient that LLMs lack: context. Without context, there is no understanding, and meaning is easily distorted.
Johannes: I’m also a big fan of the Bender et al. (2021) paper, and agree that LLMs fundamentally produce correlations rather than robust context or “understanding.” That said, the flip side is that we can embed certain forms of context if we design our agent systems to remember past conversations (long-term memory) and to rely on curated data sources—a “single point of truth.” In the work we’re doing at Towards People on the “Habermas Machine,” for example, we ensure our AI agents only pull from well-defined, multi-opinion data sets we trust, so that the LLM is forced to operate within a carefully curated knowledge context. It’s not a magic bullet, but it can mitigate the worst distortions.
Arianna: A recent study by Microsoft and Carnegie Mellon University indicates that there is a risk of “cognitive offloading” – that is, of over-reliance on delegating tasks that users could do themselves to LLMs. There is already some evidence that overreliance on LLMs will negatively affect critical thinking skills, which are essential to undertake the critical appraisal of LLM outputs. Recent studies suggest that by delegating these tasks to LLMs, users deprive themselves of the ability to develop the skills most highly valued in knowledge intensive industries.
Johannes: The concerns about “delegating away” critical thinking definitely resonate. I’d say the real question is whether we can create human-AI collaboration patterns that enhance human skill—rather than erode it. If people remain deeply involved in the reasoning steps, and if agent-human workflows are designed to train empathy and critical thought (instead of reflexively hitting “auto-complete”), there’s potential for analysts to hone their own appraisal skills. The design details here are crucial—my hope is that we can build a kind of synergy that makes people more cognitively engaged and empathetic toward each other’s perspectives.
Arianna: Regulators are currently hashing out copyright rules and carving out exceptions for AI companies. Journalists, authors, and creatives are advocating for their intellectual property rights. Academic publishers are also feeling the pressure to sign contracts granting AI companies access to their data, and are often entering agreements without consulting the authors to whose scholarship they are selling access. Media companies are also feeling the pressure, and some, including The Guardian, have opted to grant OpenAI and Perplexity access to their content in exchange for some of the revenue it generates.
Johannes: Your point on copyright and data-sharing is absolutely spot-on—who owns the data? And how do we build a“public vision” of the public sphere so we don’t wind up with a privatized echo chamber? I wrote a blog post touching on these questions (here’s the link, if you’d like: Breaking the Echo Chamber). I think any robust AI future must involve open deliberation on what we want AI to do for us as a society, and who gets to control that data.
Arianna: Specialists have been warning for a long time that Generative AI, as with AI in general, has a serious bias problem. From UNESCO’s 2024 study revealing persistent gender bias, homophobia, and racism in OpenAI’s GPT models and in Meta’s Llama, to more in-depth analyses of gender bias in GPT models, the problem of AI bias is summarised in the adage “garbage in, garbage out”. Generative AI models trained on texts and images that are the products of a world with sexism, racism, homophobia, ableism, and other forms of discrimination, unsurprisingly reflect those biases back.
Johannes: Absolutely: if you use a typical “vanilla” GPT or Llama direct from the pipeline, you inherit the biases in the training data. But there are ways to reduce that—again, by building smaller, domain-specific models or layering on top of LLMs with agent-based filtering and specialized “guardrails.” None of it solves the structural biases baked into data, but a carefully engineered approach can at least keep them from going unexamined or unmitigated.
Arianna: AI search engines are estimated to use ten times the amount of energy required for a standard Google search, while returning more inaccurate results. LLMs also require an enormous amount of freshwater for cooling data centres. According to a University of California study, “training the GPT-3 language model in Microsoft’s state-of-the-art U.S. data centres can directly evaporate 700,000 litres of clean freshwater”.
Critics have pointed out that AI is a double misnomer for being neither intelligent, nor artificial. The Data Labellers Association is a Kenyan group that advocates on behalf of the highly skilled, underpaid, and precariously employed data labellers and content moderators who form the indispensable backbone of AI.Data labellers are disproportionately from the global majority and work under harsh conditions, carrying out work which ranges from repetitive and dull to traumatising from viewing extreme violent content.
Johannes: I do still worry that we’re shipping too many tasks over to large-scale generative models. Not every search or question needs a giant LLM. Sometimes old-school search or smaller, more energy-efficient algorithms (even causal or probabilistic models) are enough—and actually more accurate. As you note, “AI” is neither purely artificial (given all the human labour from data labellers) nor truly “intelligent.” So maybe the trick is to make sure we only deploy big LLMs where they genuinely add value, and push for open-source frameworks that reduce the hidden labour exploitation of data labelling.
Arianna: As researchers at Harvard have noted, Goldman Sachs and Sequoia Capital have recently voiced concerns regarding the absence of any ROI in the form of notable productivity gains.
Some commentators have noted Microsoft quietly dropping some of its data farm leases, with estimates that the company has abandoned expansion to the tune of 14% over its current capacity. These developments have led experts to conclude that Microsoft “does not believe there is future growth in generative AI, nor does it have faith in (nor does it want responsibility for) the future of OpenAI.” This is a space to watch.
Johannes: My sense is that a more nuanced approach to generative AI—one that acknowledges some tasks are better handled by simpler or more classical methods—will help businesses see actual ROI. I share your scepticism about Microsoft quietly reducing capacity and about the potential of generative AI having peaked. The hype can overshadow the areas where AI genuinely works well. So for me, it’s about using LLMs as part of a broader toolset, with domain experts, well-curated data, and smaller-scale algorithms that we trust.