Article body

Introduction: Disciplinary Switch-and-Bait

Several years ago, I found myself in a public debate at the Data & Society Research Institute in New York City.[1] At the time, a rising tide of misinformation—half-truths, alternative truths, and plain untruths—was circulating in an information ecosystem, networked without safeguards to effectively prevent their spread. “FBI Agent Suspected In Hillary Email Leaks Found Dead In Apparent Murder-Suicide” ran one such headline, hosted on a legitimate-sounding domain name—Denverguardian.com—a story shared on Facebook a half million times during an election period.[2] Our panel aimed to address scandals such as this and to offer diagnostic tools for understanding the proliferation of alternative facts, disinformation, and hacking as they intersected with new digital tools, the construction or verification of reality, and issues of power within a new media ecology. The first speaker, a new realist philosopher, began proceedings by offering philosophical realism as antidote to the emergent deceptions and illusions of a post-fact era. His argument strongly rejected post-structuralist assertions about the irreducible texualism of knowledge systems, the relativism of ontology, and the totalizing link between knowledge and power.[3] The second speaker, a mathematician in the audience, argued that algorithms were inherently biased systems, both inaccurately predictive of human behavior and productive of considerable disinformation. Her argument focused on haphazard data gathering, spurious correlations, embedded values, and confirmation biases, which (in contrast to the philosopher) were ultimately linked to political and economic power structures.[4]

What is striking about this pairing is how the interdisciplinary swerve away from their fields (philosophy and mathematics respectively) considerably engages staple paradigms in neighboring fields. We witness a strange reversal, on one hand, from the humanities to the sciences and, on the other, from the sciences to the humanities. The data scientist turns to philosophy to show how seemingly unemotional algorithms and statistical models in fact contain embedded opinions and confirmation biases, which, left unchecked, culminate in mass misinformation and a threat to democracy: “The truth is, there is no purely algorithmic process that can infer truth from lies…” To temper algorithmic overreach, the mathematician advances the intervention of “actual human judgement.”[5] In contrast, the philosopher draws on the hard sciences—the “indifferent … objectivity” of natural objects and historical documents—to block the reign of postmodern relativism, which, left unchecked, logically culminates in rampant misinformation and right-wing populism.[6] To temper the escalations of human interpretation, the philosopher advances the intervention of objective ontologies—“the sphere of facts”—insisting on a forceful distinction between “being and knowing.”[7] Where the humanist brings scientific realism to bear on the question of accuracy and authenticity, the scientist brings human interpretation to bear on the question. This twofold interdisciplinary reversal—swapping default disciplinary axioms to supplement a perceived gap produced by internal constraints of method—is both contradictory and symptomatic. We find here a kind of methodological switch-and-bait—an attempt to provide a prognostic methodological corrective to a disciplinary proclivity, catalyzed by a failure diagnostically to map an information disorder that has somehow escaped established checks and balances. These are interdisciplinary swerves occasioned by a historical era variously characterized as post-fact, post-truth, or even post-reality.

This article claims that neither philosophical realism nor mathematical perspectivism can adequately diagnose the critical mediatic dynamics underwriting the contemporary socio-digital ecosystem. Deference to objective realities alone, on one hand, and deference to human interpretation, on the other, cannot course-correct the inertial tendencies of their respective disciplinary methods. What is required is an intermedial approach that outlines the multi-factorial conditioning ground of the information network today. The paper will not attempt precisely to define concepts such as “misinformation,” “disinformation,” “fake news,” “propaganda,” and so on, but will highlight instead systemic conditions facilitating misleading or false content today. In other words, the argument will not focus on the nature of the deceptive or false information as much as the mediatic conditions for their possibility. Given the structural nature of the argument, I include for consideration both deliberate and nondeliberate disinformation—from intentional false headlines of all stripes (driven by State, corporate, and other actors), deep fakes, pseudo-science, lies and distorted information appearing in social media, blogs, websites, etc. to fictions and deceptions unwittingly produced by computational features and functions (automated prompt-and-response systems, for example) of machine learning and artificial intelligence (AI).

Although it is not the direct subject of this article, any discussion of misinformation today must reckon with the legal framework for regulating the emergent digital network in the United States, which includes a crucial liability shield (or safe harbor passage) for content mounted on telecommunications platforms. In particular, Section 230 of the 1996 Communications Decency Act (CDA) in the USA regarded the digital internet on the historical model of telephonic, rather than broadcast, media. The legal framework guiding online content (and its moderation) intersects with an economic incentive structure, which, in turn, shapes the algorithmic systems and interface designs that curate and generate content on digital platforms. It is this last, more technical, feature of the information ecosystem today that will be the primary object of consideration in this article. With the increase in computing power in the early 2000s and the archiving of petabytes of digital data, computational approaches shifted from rule-governed, symbolic, and human readable systems to data mining, clustering, and statistical analysis. This statistical turn in the era of the early internet created the conditions for what I call a correlational paradigm for the flow of information within networked communication. In other words, the information ecosystem was increasingly ordered by technical tools for prediction and risk reduction—clustering algorithms, Markov chains, n-grams, neural network methods, language models, and the like. The intersection of the correlational paradigm for computing with the telephonic legal construal of platforms (and the concomitant economic incentive structures) created the conditions for the undermining of documentality imagined by Realists and the possibility of a friction-free flow of disinformation. Following a brief critique of the realist turn in philosophy, the article will explore the correlational paradigm as a technical diagnostic punctum, central to the reign of “powers of the false” within the digitally networked ecosystem today.

Insects, Algorithms, and Humans: Discontents of Realism

The only chance of emancipation that is given to humankind… [is] realism, against illusion and sorcery.

Maurizio Ferraris[8]

The information network today is grounded in repositories of digital data, statistical processing systems, code logics, and all manner of algorithmic functionality. Yet a strident non-digital (and even anti-digital) strain runs through adjacent scholarly work on the knowledge economy today. This paradox marks a wider turn in philosophy (no less than the theoretically oriented humanities) toward materialism at the end of the twentieth century. This was a kind of materialism—not to be confused with historical materialism or Marxism—that was largely catalyzed by a brand of Deleuze-inspired thought that took root in Anglophone academia in the late 1990s. The new materialism was pitted against analytics directed toward human consciousness, symbolic orders, and the workings of power and political economy. Instead, new materialism was directed toward non-intentionality, physical life, real objects, direct sensations, and empirical reality itself. By insisting on irreducible ground truths, materialism gained considerable traction from its critique of the postmodern idea that reality is somehow socially constructed. In crude terms, this shift marked a revived valuation of empiricism, realism, and pragmatism and the concomitant devaluation of rationalism, abstraction, and reflection. As if to recapitulate European debates of the eighteenth century, empirical studies and ontological explanations gradually gained ascendancy over critical and conceptual ones.

What is interesting about the materialist swerve toward ontology is the striking extent to which it is at odds with the contemporary consumer-electronics model of the world—the material infrastructures of digitality—which are characterized by discretization, abstraction, statistical and symbolic orders, and the modeling of formal systems. In other words, although these kinds of theoretical reflections tend opportunistically to deploy insights from a free market of disciplines—ranging from physics to neuroscience, all entangled in precarious, messy assemblages—the coalescence around a materialist position was curiously delinked from the primary modus of its own technological age: statistical analysis, formal modeling, digital abstraction, and so on. At the height of the digital paradigm, we paradoxically witnessed what Benjamin Boysen and Alexander Galloway call “semiophobia”—a fear of signs, symbols, and forms—and the privileging of embodiments and materialities, exemplified in the innards of real objects.[9] It is as if the orientation toward objects—their ineffable material gravitas—nostalgically resisted assimilation to the digital in the heyday of the digital.

One interesting strain of materialism has come to be called “New Realism.” New Realists, such as Graham Harman and Maurizio Ferraris, plausibly argue that, in the last century, the postmodern knowledge-interest-power linkage became so totalizing that its only counter-power was systematic doubt, negation, and skepticism. Deconstruction, they argue, became a paradoxical ally for contemporary forms of disinformation and fundamentalism. For Realists, the twin ideas that (1) reality is socially constructed (facts are theory-laden) and (2) solidarity trumps objectivity (affect displaces science) are paradoxical hallmarks of contemporary forms of disinformation. As antidote to the de-objectifying tendencies of postmodernism, Realists advance realia—objects and realities not reducible to (and hence manipulable by) social interpretations.

The Realist pushback against the textualities of postmodernism appears to offer a corrective to the proliferation of the false, but on closer examination of digital disinformation today, the recourse to objects encounters a stark diagnostic limit. In particular, the promised political corrective of New Realism occurs principally on the terrain of natural ontologies (“unamendability”) rather than that of social ontologies (“documentality”). Let me explain. For Ferraris, natural ontologies cannot be revised (“corrected or changed”) by social constructions, inventions, or interpretations (“conceptual schemes”).[10] Ontology cannot be confused with epistemology; realia cannot be confused with conceptualizations about them. He sums it up in a disarming example: “The fact remains that what we perceive is unamendable, it cannot be corrected: sunlight is blinding if the sun is up, and the handle of the coffee pot is hot if we leave it on the fire. There is no interpretation to be opposed to these facts: the only alternatives are sunglasses and potholders.”[11]

But what of social ontologies? Against rampant subjectivism and solipsism, Ferraris argues that documentality “precedes and produces” subjective intentionality in the first place.[12] By “documentality” he means “a system of communication, inscription, acknowledgment, coding, filing, and patents” that underwrites human behavior.[13] This is the array of “inscription” practices that “consist in the recording of social acts”:[14] “If it was not possible to keep traces,” he argues, “there would be no mind, and it is not by chance that the mind was traditionally depicted as a tabula rasa, a surface on which impressions and thoughts are inscribed.”[15] In short, social reality is constituted by behaviors that are a function of “laws, rituals, norms, social structures,” and so on.[16] Crucially, for Ferraris, these inscriptions are not regarded as “forms of thought” at all. Instead, they have an independent reality, approximating the condition of “computer operations,” which do not require knowledge of mathematics to execute their operations and calculations: “Both in artificial intelligence and in the natural one the same process occurs, for which organization precedes and produces understanding, and documentality precedes and produces intentionality.”[17] Documentality, then, is a kind of “weak textualism” (or “weak constructivism”) that grounds human intention, much like a “superorganism” of termites or a computer executing a rational program.[18] It is within this paradigm that the realist argues against the perspectivism of the “rule of intentionality,” favoring instead the “admirable regularities” that come under the “rule of documentality.”[19]

For a philosophy apparently committed to pragmatic realism, this generous use of theoretical analogies to the behavior of insects and algorithms requires some scrutiny. Is the ontology of human experience (even remotely) analogous to that of a computer program; or are they precisely set adrift in a kind of parallax mediatic setting? If documentality is to be taken as a guide to social ontologies, what do we make of the profligate inflation of online documentality within the newly networked digital ecology? How does the proliferation of realia execute a “rational program” or enact an “admiral regularity” in the context of information disorder, fake news, and chaotic contradiction (“the nightmare of verba manet”)?[20] If Ferraris, along with realists such as Manuel Delanda and Graham Harman, powerfully exposes the way natural objects seem implausibly to have taken on the character of social objects under the directives of Postmodernism, what of the case when social objects take on the character of natural ones under the directives of Realism? Here New Realism encounters a limit in the context of conceptual amalgams emerging in the fact-fiction assemblages that characterize contemporary socio-technical networks.

The question concerning post-truth and big data requires its own specificity. How does a metaphysics of documentality tally with the physics of digitality? Is false information true data? How is false information converted by analytical systems into behavioral modeling for state and corporate actors, ranging from predictive technologies for political campaigns to advertisement-sales optimization? Perhaps the algorithmic routines and subroutines (toward which documentality is increasingly subject) offer a clue to the new ontological consistencies. But what kind of philosophical object is the algorithm? What becomes of documents that are disassembled into quasi-infinite informatic data sets and then reconstituted under various computational models? How does the machine layer intersect with the social layer? What of the legal protocols and economic imperatives underwriting the chaotic surface of information disorder? In sum, how does one frame the constitutive alliance between the infrastructural physics of big data and its algorithmic functionality—clustering mechanisms, graph theory, transformational matrices, etc.—and the boundless metaphysics of post-truth—externalized desire, decontextualized affect, disinformation, and illusory facts?

Against realia—the apparently unassailable bulwark against the powers of the false—this paper argues that the primary historical conditions of possibility for these powers concern the legal constraints guiding the dissemination of information, the financial imperatives of platform economics, and the technical protocols running computation. As will become apparent, these intermedial factors cannot be assimilated to the documents and analogies of realism. Instead, they realistically engage the systemic processes that ground the flow of networked (dis)information today. The remaining paper will examine only the third, more technical, factor at work in the digital ecosystem with reference to the shortcomings of Realism.

The Reign of Correlationism: A Question Concerning the Question

There are three kinds of lies: lies, damned lies, and statistics.

Benjamin Disraeli by way of Mark Twain[21]

The critical commentary on the spread of online misinformation frequently appeals to the part played by bad actors.[22] This is not surprising given that, in legal cases involving defamation (slander, libel, etc.), it is incumbent upon the plaintiff to prove “malicious intent.” In Dominion v. Fox (26 March 2021–18 April 2023), for example, lawyers representing the voting company went to great lengths to prove the mismatch between the private statements made by Fox News hosts and the public pronouncements of falsehood.[23] Although social media companies are immune from analogous defamation charges, practically by definition, it seems tempting to follow the conceptual contours of such standard legal proceedings. However, networks of digital information today are rarely curated by individual actors (good or bad) but automated instead by algorithmic systems and language models beyond the direct supervision of human agents. To grasp the flow of disinformation, these non-human agents, systems, and models require their own analytic scrutiny. On the other hand, the appeal by some academically oriented commentary to materialism—object-oriented analyses, actor networks, affect theory, etc.—equally misses that specificity.

First, materialist approaches to affect, irreducible as they may appear, cannot adequately engage the dialectics of digital surveillance today.[24] As a subjective, practical experience, for example, quotidian online life—characterized by clicking and liking, linking and listening, swiping and scrolling—became enjoined, on an unprecedented scale, to techno-economic platforms that were designed to amplify networked affect. Contemporary technical design—from the screen itself and the blueprints of web-based interfaces and applications to the algorithmic systems that generated constantly refreshed content, timeline-based engagement, reposting, interactive audiovisual feedback, messaging streaks, recommendations, and so on—reflected the affective demands of an economy that came to regard human attention as a revenue-generating resource. Affect, a key driver of behavioral surplus in this kind of surveillance economy, is regarded by materialist theorists as a sub-personal force—independent of reason, elusive of meaning, embodied, non-intentional, free-floating or autonomic, triggered, innate, and so on. As a material embodiment (instead of an idea), affect is thereby kept undefined—as a matter of methodological course—by this brand of materialist critique. A determined disinterest in definitions, symbolic meanings, and semiotic interpretations is a curious interpretive disavowel that effectively cedes the terrain of its digital representation—no less than its statistical monitoring, manipulating, and modifying—to monopolized proprietary centers of computing.

Second, the turn to “objects” and “documentality” (or realia) reaches its limit in the context of automated algorithmic content proliferation today.[25] Graham Harman, for example, argues that objects grounding base reality are complex entities that elude our formulation of them, practically by definition. When it comes to information objects—apparently existing in a “flat ontology” that renders subjects and objects as interactive entities on an equal plane—Realism misses the data-driven systems that mine, analyze, cluster, model and, more recently, generate them in the first place. The Realist resistance to underlying laws, economic systems, mathematical models, and other symbolic orders is of little use here.[26] Indeed, online documentality, one might say, is today but a symptom of an informatic ordering system grounded in opaque statistical inference models. What is the “realistic” philosophical status of the fluid informatic layer optimized for algorithmic functionality? Or, more simply again, what kind of philosophical “object” is the algorithm? Are algorithms human constructions or do they have their own agency? What is the character of such technical agency? How do machinic decision trees and connectivity models—centroid, distributed, graph-based, etc.—proffer human-readable information? Can the computational outputs of mathematical functions driving features like autocomplete, voice-to-text, search, text generation, and so on—central systems for the spread of disinformation today—be construed as themselves anchoring inclinations and predispositions toward the false?

Against Realism, diagnostic tools for grasping misinformation today must consider the elusive metaphysics of information objects in relation to the infrastructural physics of digital mediation. It is important here to distinguish, then, between analytic registers—the machine layer and the social layer—as Alex Galloway convincingly argues in his parallax theory of digital media today.[27] These heterologous layers—while interfacing interactively—nonetheless organize, select, aggregate, hierarchize, discriminate, and process information in fundamentally different ways. The execution of an algorithmic function, which, against O’Neil’s mathematical perspectivism, cannot of itself be construed as biased or misleading, freights an inertial logic that lies in an orthogonal relation to the human proclivities and prompts to which it reacts.[28] Broadly speaking, statistical models and algorithms used to analyze and draw inferences from patterns in data are less a proxy for the reality they appear to represent and more a presentation of statistical significance within a training corpus alone.

Take, for example, one of the more neutral-seeming features of online functionality—search—which nonetheless plays a crucial role in the spread of misinformation.[29] The logic of search is essentially a sorting mechanism that, in philosophical terms, reverses the logic of human perception with which it interacts.[30] By mining petabytes of pre-existing linguistic fragments and then clustering statistical connections between letters, words, and phrases, automated sorting mechanisms systematically calculate all possibilities to identify an optimal selection. In the context of search, commonalities are sorted and ranked by column stochastic matrices and transition matrices, which register the prevalence and importance of links to pages, which have themselves been ranked. Simply put, if a node has k outgoing edges, it passes on 1/kth of importance to the ranking mechanism. In general, search algorithms use a host of clustering models that group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups.  The algorithms are acting on a variety of inputs, including the popularity of the search result, the frequency with which it is shared, the context (such as text) around the result, and meta-tagging.[31]

The algorithmic discrimination/selection that is set to work on a dataset like this cannot be conflated, even by association, with modes of human discrimination/selection. Graph traversal in these contexts from one node to another does not resemble either deductive or inductive reasoning. The result of a query leading to false information in a search algorithm, for example, is a brand of abductive reasoning; a matter of statistical significance alone. To simplify a little, search queries such as “a-r-e- -w-o-m-e-”, “-a-r-e- -b-l-a-c-k-”, or “a-r-e- -j-e-w-” autocomplete the inquiry by inferring what tokens (character, word, or string) are statistically most likely to follow a word, pair of words, or triplet of words. Google’s search function—an algorithmic analysis of a user’s typed sequence—is presented as a cascading box with ranked possibilities of predictive text. To correlate this string of tokens to a plural form of the ordinary meaning of the word is one thing (“w-o-m-e-n-,” “b-l-a-c-k-s-,” etc.); to suggest further tokens is quite another. In late 2016, this feature, commonly known as “autosuggest” (though rebranded as “autocomplete” in 2010)—the double meaning (to influence in a non-logical way) is intended here—[32] threw up the following top results: “are women evil,” “are blacks the dumbest race,” and “are jews a race” (“Are jews evil?” was the fourth item on the list).[33]

How is it that a simple search delivers users to such misleading content, including falsehoods, misinformation, and disinformation? For Cathy O’Neil, this would count as a case of algorithmic bias. And yet, the search results neither reflect the viewpoint of programmers directly nor are they strictly the outcome of a rules-based system that filters data through a set of algorithms representing prior knowledge.[34] Instead, these results are delivered by a statistical analysis of neighboring sequences of items in documents found across the web. For humanists concerned with misinformation, one tactic has been to indict the data upon which these applications are trained (as if better, or more, training data could proffer better results). But again, what is factually false may be statistically accurate (or at least relevant). An optimal result occurs when the search engine throws up links that correlate with the specificity of the query. The question concerns the very question that is itself typed into the inquiry box—a string of clustered tokens that statistically activates those human networks that have previously phrased inquiries in that particular way.[35] In effect, entire swaths of misleading, deceptive, and false information are tethered thereby to strings of letters—idiolects of typing—readily exploited by media manipulators.[36] For all the claims that more data, memory, and parametric annotation (“scaling”) will eventually erode the problem, technical solutions oriented around ever-larger training corpuses alone are inadequate to the task at hand.[37] In other words, these are non-ideological statistical enactments—a mathematical set of interacting objects (logic gates and voltages)—of ideologically-charged misinformation.

The World as Statistics Instead of Representation: Objective-All-Too-Objective

The question is not: is it true? But: does it work?

Brian Massumi[38]

As can be seen, computational processes driving features such as search, auto-complete, auto-play, and so on, can—for technical reasons—facilitate the spread of misinformation online today. These systems were not designed with this aim in mind and cannot therefore be construed as a simple reflection of the intentions or views of the software engineers, designers, and programmers that built them. In the context of functions and features like the social media feed, the problem is compounded by a host of additional algorithms, including feature-based algorithms, natural language processing techniques, collaborative filtering, and algorithms for mining, analyzing, and predicting real-time engagements and user-behavior—the monitoring of click rates, skipping and labeling items, adjustments in volume control, liking and linking, etc. Unlike search, which still operates as a kind of general recommendation software, these curated feeds are thereby personalized according to behavioral attributes of users.[39] At bottom, these systems are similarly grounded in statistical inferences, but here the correlations are additionally attuned to granular subjective attributes—variously described as a “digital double,”[40] “digital dossier,”[41] “Digital Doppelgänger,”[42] or “digital dividual”[43]—which are computationally clustered into a stream of content (“feed”) grounded in probabilistic group identifications.[44]

The rampant use of statistical sorting and modeling in the circulation of (dis)information today requires a brief historical contextualization. The combination of unprecedented computing power, vast troves of digitized training data (harvested within a globally networked architecture), and novel algorithmic training protocols, have created the conditions for a statistical turn in contemporary computing. Where artificial intelligence (AI), for example, was once construed in terms of generative, symbolic, and human-readable systems that required an explicit and declarative elicitation of rules and procedures, deep neural architectures today learn complex representations, and derive abstractions from their training sets, in a versatile statistical manner. Older forms of AI—dominant until the 1980s—were sometimes referred to as “expert” systems (with a “brute force” design), in which an algorithm followed a lengthy set of instructions and then drew conclusions by applying various combinations of those instructions.[45] When it came to modeling human reasoning or behavior, however, the older rule-governed algorithmic approach was generally met with limited success. Broadly speaking, no set of computational instructions was exhaustive or flexible enough to simulate the unique character of human inquiry, thought, and experience.[46] By the late 1980s, the idea that computers had an internal kind of logic set apart from human-derived expert disciplines became axiomatic. In the memorable words of the electrical engineer Frederick Jelinek, “We thought it was wrong to ask a machine to emulate people. After all, if a machine has to move, it does it with wheels—not by walking. If a machine has to fly, it does so as an airplane does—not by flapping its wings.”[47] Instead of working with syntax, rules, and grammar, Jelinek proposed training computers on large language models to recognize repeated patterns and derive statistical probabilities between them.

This disciplinary turn away from rule-governed simulations of linguistic cognition, and toward statistical data processing as such, became the defining hallmark of AI in the early twenty-first century. As a result, a new research paradigm reframed the very idea of machine cognition as a fundamentally computational endeavor, radically untethered from the human faculties it sought to emulate. While delinked from any explanation about the phenomena it modeled, the strength of the statistical turn in AI lay in its power to materialize them. Again, the shift from representational to statistical modeling had less to do with the turn to digitization itself than with a turn toward an old method (statistics) newly innervated by an exponentially expanding set of materially available data. For statistically grounded AI, a cognitive model was less represented than it was realized. Crudely put, the algorithmic routines and sub-routines of even neutral-seeming search or auto-complete functionality do not explain the results of human prompting as much as they execute them.

One final example—the recent release of various generative pre-trained transformers (GPT, Bard, Notion AI, Quillbot, Wordtune, Dall-E, etc.), which are highly optimized algorithms with access to gigabytes of data designed to respond to prompts from users. These transformer models benefit from large architectures and large quantities of data, but as systems trained to predict sequences of words, they are grounded in the same basic statistical operations we find in search, autocomplete, and so on.[48] These analogous technical ecologies create the conditions for the possibility of unsupervised misinformation. How so? For all the appearance of human thought, these systems do not understand in human terms, being that they actually invert the relation between syntax and semantics we find in classic language acquisition, comprehension, and production. Instead of beginning with ideas or concepts, GPTs calculate statistical probabilities from gigabytes of training data (extracted from a broad range of domains, including online books, encyclopedias, news stories, chat logs, etc.) and then predict a string of letters (and groups of letters) in a sequence that might have the appearance of meaningful conceptualization. Human interlocutors tend to impute meaning to these textual strings even when there is none. Factual (and fictional) relations between entities and properties in the world are rendered by GPTs as syntactic “bindings” between basic word components (subjects, predicates, etc.), often relying on the substitution of synonyms for related phrases (“embeddings”) and other forms of masking to craft an automated paraphrase.[49] To grasp contextual features of language, GPT pre-trains the model to learn general linguistic patterns or structures of language, which it can then correlate to specific tasks with a smaller amount of labeled data.[50]

In short, the pre-trained transformer imitates and interpolates (using insertions and substitutions) a statistical distribution of redundancies and restatements in a vast corpus of training data. Abstract relations are substituted by correlational abstractions. However, generating knowledge-approximating strings from statistically probable objects is encoded with certain ineradicable defects. Not only is the resulting text potentially ensnared in a kind of redundancy loop, which leads commentators like Ted Chiang to describe Chat GPT as a kind of “lossy compressed” pastiche of the world’s data.[51] But even in the context of reliable data, embeddings alone introduce variations in meaning that frequently lead to misinformation (or “hallucinations,” in the anthropomorphic language of AI engineers).[52] Because the system automatically renders (perpetuates?) past data in a string of small variations (or paraphrase) without the capacity to distinguish fact from fiction, the transformer can, of itself, gradually detach the text string from representational resemblance to truth, fact, or veracity.

We find here a recapitulation of an ancient category of the false. In the words of Socrates on sophistry, one might say, the generative transformer here enacts a kind of “deception [that] slips in through certain likenesses… incrementally, step by step, through similarities away from the truth to its opposite.”[53] This is the deception, inherent to a technology of bindings and embeddings, incurred by small errors. Despite the constitutive algorithmic indeterminacy—a computational process that can generate unintended misinformation—it may become less easy to keep track of training data characteristics (the relations between entities and properties within data sets) as companies controlling the technology cordon-off their training sets from public scrutiny. When predictive algorithms are used on data outside of their training set, for example, their results are often indeterminate.[54] Indeed, with the widespread adoption of these tools, the proliferation of information objects of this sort—stochastic sequences of notation without reference—leads to what Emily Bender, Timnit Gebru, and Jack Bundy call documentation debt: “putting ourselves in a situation where the datasets are both undocumented and too large to document post hoc.”[55] Without access to the precise data collection methodology, these systems can become vehicles for a kind of unintended misinformation creep.

Without denying the practical value and fluency of automatically generated text in a variety of contexts—from customer chatbots to crafting templates and drafts for menial tasks—generative pre-trained transformers, like their predecessors in text completion technologies, are inherently error-prone and thereby additionally vulnerable to deliberate misuse and automation bias. Their embedded factual slippages (glitches) can be exploited by bad actors in the spread of disinformation, deep fakes, pseudo-science, and lies. In an information ecology where financialized third parties pay to activate, moderate, and modify attention, could it be that a technological condition interacting with a fantasy universe of affects and decontextualized objects enables affective habiti forged by naked spectacle, charismatic personality, and outright mythology? Could it be that the economic reign of the correlational order plays a role in undermining a sense of shared political, or even factual, reality? Could it be that the era of big data—mutating assemblages interlinked by affects—creates the condition for the possibility of a generalized post-truth? Perhaps the Deleuzian desiring-machine, no longer desirable today, has become the true world picture. We find in the probabilistic calculus of correlation a kind of de-sublimated cluster interpellation.

This is the world as statistical realization instead of representation. Once more, the epigram: The question is not: is it true? But: does it work?