Abstracts
Abstract
This article outlines one of three systemic conditions underwriting misleading, deceptive, and false content in the socio-digital ecosystem today. Against philosophical realism, which attempts to offer a useful diagnostic bulwark against misinformation, the argument takes up both deliberate and nondeliberate misinformation—from intentional disinformation of all stripes (driven by State, corporate, and other actors), deep fakes, pseudo-science, lies and distorted information to fictions and deceptions unwittingly produced by computational features and functions of machine learning and artificial intelligence (AI), such as with automated prompt-and-response systems. The discussion of misinformation today must additionally reckon with the legal framework guiding online content (and its moderation) as it intersects with the economic incentive structures of contemporary platforms. The structure of the legal economy, in turn, shapes the algorithmic systems and interface designs that curate and generate content on digital platforms. It is the computational feature of the information ecosystem today that is the primary object of consideration in this article.
With the increase in computing power in the early 2000s and the archiving of petabytes of digital data, computational approaches shifted from rule-governed, symbolic, and human readable systems to data mining, clustering, and statistical analysis. This statistical turn in the era of the early internet created the conditions for a correlational paradigm for the flow of information within networked communication. The information ecosystem today is ordered by technical tools for prediction and risk reduction—clustering algorithms, Markov chains, n-grams, neural network methods, large language models, and the like. The intersection of the correlational paradigm for computing with the telephonic legal construal of platforms, as well as the concomitant economic incentives, created the conditions for the undermining of documentality imagined by philosophical realism. Following a brief critique of the realist turn in philosophy, the article will explore the correlational paradigm as a technical diagnostic punctum central to the reign of “powers of the false” within the digitally-networked ecosystem today.
Résumé
Cet article s’intéresse à l’une des trois conditions systémiques sous-jacentes à la diffusion de contenus trompeurs, fallacieux et mensongers dans l’écosystème socio-numérique contemporain. À l’encontre du réalisme philosophique — qui tenterait de fournir un diagnostic utile contre la désinformation — notre axe de réflexion aborde d’une part la désinformation intentionnelle (orchestrée par des acteurs étatiques ou corporatifs) telles que les deepfakes, les pseudosciences, les informations mensongères ou déformées, mais également la désinformation involontaire telles que les systèmes de demande et de réponse automatisés de l’apprentissage automatique et de l’intelligence artificielle (IA)
Aujourd’hui, la discussion sur la désinformation doit donc prendre en compte le cadre juridique qui guidant le contenu en ligne (et sa modération) puisqu’il entre directement en intersection avec les structures d’incitation économique des plateformes contemporaines. De plus, nous constaterons que la configuration de l’économie légale a également un impact sur la conception des systèmes algorithmiques et des interfaces qui régissent et produisent du contenu sur les plateformes numériques. Par conséquent, notre réflexion se centrera principalement sur l’aspect computationnel de l’écosystème de l’information contemporain.
Avec l’augmentation de la puissance de calcul au début des années 2000 et le stockage de pétaoctets de données numériques, les approches informatiques ont évolué de systèmes régis par des règles symboliques compréhensibles par les humains vers des méthodes axées sur l’exploration de données, le regroupement et l’analyse statistique. Ainsi, cette évolution à l’ère de l’Internet créa les conditions propices à l’émergence d’un paradigme corrélationnel concernant la circulation de l’information au sein des réseaux de communication. Aujourd’hui, ce sont des outils techniques de prédiction et de réduction des risques (algorithmes de regroupement, chaînes de Markov, n-grammes, méthodes de réseaux neuronaux, grands modèles de langage, etc.) qui ordonnent l’écosystème de l’information. Finalement, l’intersection du paradigme corrélationnel de l’informatique avec la construction juridique téléphonique des plateformes, ainsi que les incitations économiques concomitantes, contrarient bel et bien la documentalité imaginée par le réalisme philosophique. C’est donc après une brève critique du tournant réaliste en philosophie que cet article vise à explorer ce paradigme corrélationnel en tant que « punctum » diagnostique technique, au coeur du règne des « puissances du faux » dans l’écosystème numérique en réseau d’aujourd'hui.
Article body
Introduction: Disciplinary Switch-and-Bait
Several years ago, I found myself in a public debate at the Data & Society Research Institute in New York City.[1] At the time, a rising tide of misinformation—half-truths, alternative truths, and plain untruths—was circulating in an information ecosystem, networked without safeguards to effectively prevent their spread. “FBI Agent Suspected In Hillary Email Leaks Found Dead In Apparent Murder-Suicide” ran one such headline, hosted on a legitimate-sounding domain name—Denverguardian.com—a story shared on Facebook a half million times during an election period.[2] Our panel aimed to address scandals such as this and to offer diagnostic tools for understanding the proliferation of alternative facts, disinformation, and hacking as they intersected with new digital tools, the construction or verification of reality, and issues of power within a new media ecology. The first speaker, a new realist philosopher, began proceedings by offering philosophical realism as antidote to the emergent deceptions and illusions of a post-fact era. His argument strongly rejected post-structuralist assertions about the irreducible texualism of knowledge systems, the relativism of ontology, and the totalizing link between knowledge and power.[3] The second speaker, a mathematician in the audience, argued that algorithms were inherently biased systems, both inaccurately predictive of human behavior and productive of considerable disinformation. Her argument focused on haphazard data gathering, spurious correlations, embedded values, and confirmation biases, which (in contrast to the philosopher) were ultimately linked to political and economic power structures.[4]
What is striking about this pairing is how the interdisciplinary swerve away from their fields (philosophy and mathematics respectively) considerably engages staple paradigms in neighboring fields. We witness a strange reversal, on one hand, from the humanities to the sciences and, on the other, from the sciences to the humanities. The data scientist turns to philosophy to show how seemingly unemotional algorithms and statistical models in fact contain embedded opinions and confirmation biases, which, left unchecked, culminate in mass misinformation and a threat to democracy: “The truth is, there is no purely algorithmic process that can infer truth from lies…” To temper algorithmic overreach, the mathematician advances the intervention of “actual human judgement.”[5] In contrast, the philosopher draws on the hard sciences—the “indifferent … objectivity” of natural objects and historical documents—to block the reign of postmodern relativism, which, left unchecked, logically culminates in rampant misinformation and right-wing populism.[6] To temper the escalations of human interpretation, the philosopher advances the intervention of objective ontologies—“the sphere of facts”—insisting on a forceful distinction between “being and knowing.”[7] Where the humanist brings scientific realism to bear on the question of accuracy and authenticity, the scientist brings human interpretation to bear on the question. This twofold interdisciplinary reversal—swapping default disciplinary axioms to supplement a perceived gap produced by internal constraints of method—is both contradictory and symptomatic. We find here a kind of methodological switch-and-bait—an attempt to provide a prognostic methodological corrective to a disciplinary proclivity, catalyzed by a failure diagnostically to map an information disorder that has somehow escaped established checks and balances. These are interdisciplinary swerves occasioned by a historical era variously characterized as post-fact, post-truth, or even post-reality.
This article claims that neither philosophical realism nor mathematical perspectivism can adequately diagnose the critical mediatic dynamics underwriting the contemporary socio-digital ecosystem. Deference to objective realities alone, on one hand, and deference to human interpretation, on the other, cannot course-correct the inertial tendencies of their respective disciplinary methods. What is required is an intermedial approach that outlines the multi-factorial conditioning ground of the information network today. The paper will not attempt precisely to define concepts such as “misinformation,” “disinformation,” “fake news,” “propaganda,” and so on, but will highlight instead systemic conditions facilitating misleading or false content today. In other words, the argument will not focus on the nature of the deceptive or false information as much as the mediatic conditions for their possibility. Given the structural nature of the argument, I include for consideration both deliberate and nondeliberate disinformation—from intentional false headlines of all stripes (driven by State, corporate, and other actors), deep fakes, pseudo-science, lies and distorted information appearing in social media, blogs, websites, etc. to fictions and deceptions unwittingly produced by computational features and functions (automated prompt-and-response systems, for example) of machine learning and artificial intelligence (AI).
Although it is not the direct subject of this article, any discussion of misinformation today must reckon with the legal framework for regulating the emergent digital network in the United States, which includes a crucial liability shield (or safe harbor passage) for content mounted on telecommunications platforms. In particular, Section 230 of the 1996 Communications Decency Act (CDA) in the USA regarded the digital internet on the historical model of telephonic, rather than broadcast, media. The legal framework guiding online content (and its moderation) intersects with an economic incentive structure, which, in turn, shapes the algorithmic systems and interface designs that curate and generate content on digital platforms. It is this last, more technical, feature of the information ecosystem today that will be the primary object of consideration in this article. With the increase in computing power in the early 2000s and the archiving of petabytes of digital data, computational approaches shifted from rule-governed, symbolic, and human readable systems to data mining, clustering, and statistical analysis. This statistical turn in the era of the early internet created the conditions for what I call a correlational paradigm for the flow of information within networked communication. In other words, the information ecosystem was increasingly ordered by technical tools for prediction and risk reduction—clustering algorithms, Markov chains, n-grams, neural network methods, language models, and the like. The intersection of the correlational paradigm for computing with the telephonic legal construal of platforms (and the concomitant economic incentive structures) created the conditions for the undermining of documentality imagined by Realists and the possibility of a friction-free flow of disinformation. Following a brief critique of the realist turn in philosophy, the article will explore the correlational paradigm as a technical diagnostic punctum, central to the reign of “powers of the false” within the digitally networked ecosystem today.
Insects, Algorithms, and Humans: Discontents of Realism
The only chance of emancipation that is given to humankind… [is] realism, against illusion and sorcery.
Maurizio Ferraris[8]
The information network today is grounded in repositories of digital data, statistical processing systems, code logics, and all manner of algorithmic functionality. Yet a strident non-digital (and even anti-digital) strain runs through adjacent scholarly work on the knowledge economy today. This paradox marks a wider turn in philosophy (no less than the theoretically oriented humanities) toward materialism at the end of the twentieth century. This was a kind of materialism—not to be confused with historical materialism or Marxism—that was largely catalyzed by a brand of Deleuze-inspired thought that took root in Anglophone academia in the late 1990s. The new materialism was pitted against analytics directed toward human consciousness, symbolic orders, and the workings of power and political economy. Instead, new materialism was directed toward non-intentionality, physical life, real objects, direct sensations, and empirical reality itself. By insisting on irreducible ground truths, materialism gained considerable traction from its critique of the postmodern idea that reality is somehow socially constructed. In crude terms, this shift marked a revived valuation of empiricism, realism, and pragmatism and the concomitant devaluation of rationalism, abstraction, and reflection. As if to recapitulate European debates of the eighteenth century, empirical studies and ontological explanations gradually gained ascendancy over critical and conceptual ones.
What is interesting about the materialist swerve toward ontology is the striking extent to which it is at odds with the contemporary consumer-electronics model of the world—the material infrastructures of digitality—which are characterized by discretization, abstraction, statistical and symbolic orders, and the modeling of formal systems. In other words, although these kinds of theoretical reflections tend opportunistically to deploy insights from a free market of disciplines—ranging from physics to neuroscience, all entangled in precarious, messy assemblages—the coalescence around a materialist position was curiously delinked from the primary modus of its own technological age: statistical analysis, formal modeling, digital abstraction, and so on. At the height of the digital paradigm, we paradoxically witnessed what Benjamin Boysen and Alexander Galloway call “semiophobia”—a fear of signs, symbols, and forms—and the privileging of embodiments and materialities, exemplified in the innards of real objects.[9] It is as if the orientation toward objects—their ineffable material gravitas—nostalgically resisted assimilation to the digital in the heyday of the digital.
One interesting strain of materialism has come to be called “New Realism.” New Realists, such as Graham Harman and Maurizio Ferraris, plausibly argue that, in the last century, the postmodern knowledge-interest-power linkage became so totalizing that its only counter-power was systematic doubt, negation, and skepticism. Deconstruction, they argue, became a paradoxical ally for contemporary forms of disinformation and fundamentalism. For Realists, the twin ideas that (1) reality is socially constructed (facts are theory-laden) and (2) solidarity trumps objectivity (affect displaces science) are paradoxical hallmarks of contemporary forms of disinformation. As antidote to the de-objectifying tendencies of postmodernism, Realists advance realia—objects and realities not reducible to (and hence manipulable by) social interpretations.
The Realist pushback against the textualities of postmodernism appears to offer a corrective to the proliferation of the false, but on closer examination of digital disinformation today, the recourse to objects encounters a stark diagnostic limit. In particular, the promised political corrective of New Realism occurs principally on the terrain of natural ontologies (“unamendability”) rather than that of social ontologies (“documentality”). Let me explain. For Ferraris, natural ontologies cannot be revised (“corrected or changed”) by social constructions, inventions, or interpretations (“conceptual schemes”).[10] Ontology cannot be confused with epistemology; realia cannot be confused with conceptualizations about them. He sums it up in a disarming example: “The fact remains that what we perceive is unamendable, it cannot be corrected: sunlight is blinding if the sun is up, and the handle of the coffee pot is hot if we leave it on the fire. There is no interpretation to be opposed to these facts: the only alternatives are sunglasses and potholders.”[11]
But what of social ontologies? Against rampant subjectivism and solipsism, Ferraris argues that documentality “precedes and produces” subjective intentionality in the first place.[12] By “documentality” he means “a system of communication, inscription, acknowledgment, coding, filing, and patents” that underwrites human behavior.[13] This is the array of “inscription” practices that “consist in the recording of social acts”:[14] “If it was not possible to keep traces,” he argues, “there would be no mind, and it is not by chance that the mind was traditionally depicted as a tabula rasa, a surface on which impressions and thoughts are inscribed.”[15] In short, social reality is constituted by behaviors that are a function of “laws, rituals, norms, social structures,” and so on.[16] Crucially, for Ferraris, these inscriptions are not regarded as “forms of thought” at all. Instead, they have an independent reality, approximating the condition of “computer operations,” which do not require knowledge of mathematics to execute their operations and calculations: “Both in artificial intelligence and in the natural one the same process occurs, for which organization precedes and produces understanding, and documentality precedes and produces intentionality.”[17] Documentality, then, is a kind of “weak textualism” (or “weak constructivism”) that grounds human intention, much like a “superorganism” of termites or a computer executing a rational program.[18] It is within this paradigm that the realist argues against the perspectivism of the “rule of intentionality,” favoring instead the “admirable regularities” that come under the “rule of documentality.”[19]
For a philosophy apparently committed to pragmatic realism, this generous use of theoretical analogies to the behavior of insects and algorithms requires some scrutiny. Is the ontology of human experience (even remotely) analogous to that of a computer program; or are they precisely set adrift in a kind of parallax mediatic setting? If documentality is to be taken as a guide to social ontologies, what do we make of the profligate inflation of online documentality within the newly networked digital ecology? How does the proliferation of realia execute a “rational program” or enact an “admiral regularity” in the context of information disorder, fake news, and chaotic contradiction (“the nightmare of verba manet”)?[20] If Ferraris, along with realists such as Manuel Delanda and Graham Harman, powerfully exposes the way natural objects seem implausibly to have taken on the character of social objects under the directives of Postmodernism, what of the case when social objects take on the character of natural ones under the directives of Realism? Here New Realism encounters a limit in the context of conceptual amalgams emerging in the fact-fiction assemblages that characterize contemporary socio-technical networks.
The question concerning post-truth and big data requires its own specificity. How does a metaphysics of documentality tally with the physics of digitality? Is false information true data? How is false information converted by analytical systems into behavioral modeling for state and corporate actors, ranging from predictive technologies for political campaigns to advertisement-sales optimization? Perhaps the algorithmic routines and subroutines (toward which documentality is increasingly subject) offer a clue to the new ontological consistencies. But what kind of philosophical object is the algorithm? What becomes of documents that are disassembled into quasi-infinite informatic data sets and then reconstituted under various computational models? How does the machine layer intersect with the social layer? What of the legal protocols and economic imperatives underwriting the chaotic surface of information disorder? In sum, how does one frame the constitutive alliance between the infrastructural physics of big data and its algorithmic functionality—clustering mechanisms, graph theory, transformational matrices, etc.—and the boundless metaphysics of post-truth—externalized desire, decontextualized affect, disinformation, and illusory facts?
Against realia—the apparently unassailable bulwark against the powers of the false—this paper argues that the primary historical conditions of possibility for these powers concern the legal constraints guiding the dissemination of information, the financial imperatives of platform economics, and the technical protocols running computation. As will become apparent, these intermedial factors cannot be assimilated to the documents and analogies of realism. Instead, they realistically engage the systemic processes that ground the flow of networked (dis)information today. The remaining paper will examine only the third, more technical, factor at work in the digital ecosystem with reference to the shortcomings of Realism.
The Reign of Correlationism: A Question Concerning the Question
There are three kinds of lies: lies, damned lies, and statistics.
Benjamin Disraeli by way of Mark Twain[21]
The critical commentary on the spread of online misinformation frequently appeals to the part played by bad actors.[22] This is not surprising given that, in legal cases involving defamation (slander, libel, etc.), it is incumbent upon the plaintiff to prove “malicious intent.” In Dominion v. Fox (26 March 2021–18 April 2023), for example, lawyers representing the voting company went to great lengths to prove the mismatch between the private statements made by Fox News hosts and the public pronouncements of falsehood.[23] Although social media companies are immune from analogous defamation charges, practically by definition, it seems tempting to follow the conceptual contours of such standard legal proceedings. However, networks of digital information today are rarely curated by individual actors (good or bad) but automated instead by algorithmic systems and language models beyond the direct supervision of human agents. To grasp the flow of disinformation, these non-human agents, systems, and models require their own analytic scrutiny. On the other hand, the appeal by some academically oriented commentary to materialism—object-oriented analyses, actor networks, affect theory, etc.—equally misses that specificity.
First, materialist approaches to affect, irreducible as they may appear, cannot adequately engage the dialectics of digital surveillance today.[24] As a subjective, practical experience, for example, quotidian online life—characterized by clicking and liking, linking and listening, swiping and scrolling—became enjoined, on an unprecedented scale, to techno-economic platforms that were designed to amplify networked affect. Contemporary technical design—from the screen itself and the blueprints of web-based interfaces and applications to the algorithmic systems that generated constantly refreshed content, timeline-based engagement, reposting, interactive audiovisual feedback, messaging streaks, recommendations, and so on—reflected the affective demands of an economy that came to regard human attention as a revenue-generating resource. Affect, a key driver of behavioral surplus in this kind of surveillance economy, is regarded by materialist theorists as a sub-personal force—independent of reason, elusive of meaning, embodied, non-intentional, free-floating or autonomic, triggered, innate, and so on. As a material embodiment (instead of an idea), affect is thereby kept undefined—as a matter of methodological course—by this brand of materialist critique. A determined disinterest in definitions, symbolic meanings, and semiotic interpretations is a curious interpretive disavowel that effectively cedes the terrain of its digital representation—no less than its statistical monitoring, manipulating, and modifying—to monopolized proprietary centers of computing.
Second, the turn to “objects” and “documentality” (or realia) reaches its limit in the context of automated algorithmic content proliferation today.[25] Graham Harman, for example, argues that objects grounding base reality are complex entities that elude our formulation of them, practically by definition. When it comes to information objects—apparently existing in a “flat ontology” that renders subjects and objects as interactive entities on an equal plane—Realism misses the data-driven systems that mine, analyze, cluster, model and, more recently, generate them in the first place. The Realist resistance to underlying laws, economic systems, mathematical models, and other symbolic orders is of little use here.[26] Indeed, online documentality, one might say, is today but a symptom of an informatic ordering system grounded in opaque statistical inference models. What is the “realistic” philosophical status of the fluid informatic layer optimized for algorithmic functionality? Or, more simply again, what kind of philosophical “object” is the algorithm? Are algorithms human constructions or do they have their own agency? What is the character of such technical agency? How do machinic decision trees and connectivity models—centroid, distributed, graph-based, etc.—proffer human-readable information? Can the computational outputs of mathematical functions driving features like autocomplete, voice-to-text, search, text generation, and so on—central systems for the spread of disinformation today—be construed as themselves anchoring inclinations and predispositions toward the false?
Against Realism, diagnostic tools for grasping misinformation today must consider the elusive metaphysics of information objects in relation to the infrastructural physics of digital mediation. It is important here to distinguish, then, between analytic registers—the machine layer and the social layer—as Alex Galloway convincingly argues in his parallax theory of digital media today.[27] These heterologous layers—while interfacing interactively—nonetheless organize, select, aggregate, hierarchize, discriminate, and process information in fundamentally different ways. The execution of an algorithmic function, which, against O’Neil’s mathematical perspectivism, cannot of itself be construed as biased or misleading, freights an inertial logic that lies in an orthogonal relation to the human proclivities and prompts to which it reacts.[28] Broadly speaking, statistical models and algorithms used to analyze and draw inferences from patterns in data are less a proxy for the reality they appear to represent and more a presentation of statistical significance within a training corpus alone.
Take, for example, one of the more neutral-seeming features of online functionality—search—which nonetheless plays a crucial role in the spread of misinformation.[29] The logic of search is essentially a sorting mechanism that, in philosophical terms, reverses the logic of human perception with which it interacts.[30] By mining petabytes of pre-existing linguistic fragments and then clustering statistical connections between letters, words, and phrases, automated sorting mechanisms systematically calculate all possibilities to identify an optimal selection. In the context of search, commonalities are sorted and ranked by column stochastic matrices and transition matrices, which register the prevalence and importance of links to pages, which have themselves been ranked. Simply put, if a node has k outgoing edges, it passes on 1/kth of importance to the ranking mechanism. In general, search algorithms use a host of clustering models that group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups. The algorithms are acting on a variety of inputs, including the popularity of the search result, the frequency with which it is shared, the context (such as text) around the result, and meta-tagging.[31]
The algorithmic discrimination/selection that is set to work on a dataset like this cannot be conflated, even by association, with modes of human discrimination/selection. Graph traversal in these contexts from one node to another does not resemble either deductive or inductive reasoning. The result of a query leading to false information in a search algorithm, for example, is a brand of abductive reasoning; a matter of statistical significance alone. To simplify a little, search queries such as “a-r-e- -w-o-m-e-”, “-a-r-e- -b-l-a-c-k-”, or “a-r-e- -j-e-w-” autocomplete the inquiry by inferring what tokens (character, word, or string) are statistically most likely to follow a word, pair of words, or triplet of words. Google’s search function—an algorithmic analysis of a user’s typed sequence—is presented as a cascading box with ranked possibilities of predictive text. To correlate this string of tokens to a plural form of the ordinary meaning of the word is one thing (“w-o-m-e-n-,” “b-l-a-c-k-s-,” etc.); to suggest further tokens is quite another. In late 2016, this feature, commonly known as “autosuggest” (though rebranded as “autocomplete” in 2010)—the double meaning (to influence in a non-logical way) is intended here—[32] threw up the following top results: “are women evil,” “are blacks the dumbest race,” and “are jews a race” (“Are jews evil?” was the fourth item on the list).[33]
How is it that a simple search delivers users to such misleading content, including falsehoods, misinformation, and disinformation? For Cathy O’Neil, this would count as a case of algorithmic bias. And yet, the search results neither reflect the viewpoint of programmers directly nor are they strictly the outcome of a rules-based system that filters data through a set of algorithms representing prior knowledge.[34] Instead, these results are delivered by a statistical analysis of neighboring sequences of items in documents found across the web. For humanists concerned with misinformation, one tactic has been to indict the data upon which these applications are trained (as if better, or more, training data could proffer better results). But again, what is factually false may be statistically accurate (or at least relevant). An optimal result occurs when the search engine throws up links that correlate with the specificity of the query. The question concerns the very question that is itself typed into the inquiry box—a string of clustered tokens that statistically activates those human networks that have previously phrased inquiries in that particular way.[35] In effect, entire swaths of misleading, deceptive, and false information are tethered thereby to strings of letters—idiolects of typing—readily exploited by media manipulators.[36] For all the claims that more data, memory, and parametric annotation (“scaling”) will eventually erode the problem, technical solutions oriented around ever-larger training corpuses alone are inadequate to the task at hand.[37] In other words, these are non-ideological statistical enactments—a mathematical set of interacting objects (logic gates and voltages)—of ideologically-charged misinformation.
The World as Statistics Instead of Representation: Objective-All-Too-Objective
The question is not: is it true? But: does it work?
Brian Massumi[38]
As can be seen, computational processes driving features such as search, auto-complete, auto-play, and so on, can—for technical reasons—facilitate the spread of misinformation online today. These systems were not designed with this aim in mind and cannot therefore be construed as a simple reflection of the intentions or views of the software engineers, designers, and programmers that built them. In the context of functions and features like the social media feed, the problem is compounded by a host of additional algorithms, including feature-based algorithms, natural language processing techniques, collaborative filtering, and algorithms for mining, analyzing, and predicting real-time engagements and user-behavior—the monitoring of click rates, skipping and labeling items, adjustments in volume control, liking and linking, etc. Unlike search, which still operates as a kind of general recommendation software, these curated feeds are thereby personalized according to behavioral attributes of users.[39] At bottom, these systems are similarly grounded in statistical inferences, but here the correlations are additionally attuned to granular subjective attributes—variously described as a “digital double,”[40] “digital dossier,”[41] “Digital Doppelgänger,”[42] or “digital dividual”[43]—which are computationally clustered into a stream of content (“feed”) grounded in probabilistic group identifications.[44]
The rampant use of statistical sorting and modeling in the circulation of (dis)information today requires a brief historical contextualization. The combination of unprecedented computing power, vast troves of digitized training data (harvested within a globally networked architecture), and novel algorithmic training protocols, have created the conditions for a statistical turn in contemporary computing. Where artificial intelligence (AI), for example, was once construed in terms of generative, symbolic, and human-readable systems that required an explicit and declarative elicitation of rules and procedures, deep neural architectures today learn complex representations, and derive abstractions from their training sets, in a versatile statistical manner. Older forms of AI—dominant until the 1980s—were sometimes referred to as “expert” systems (with a “brute force” design), in which an algorithm followed a lengthy set of instructions and then drew conclusions by applying various combinations of those instructions.[45] When it came to modeling human reasoning or behavior, however, the older rule-governed algorithmic approach was generally met with limited success. Broadly speaking, no set of computational instructions was exhaustive or flexible enough to simulate the unique character of human inquiry, thought, and experience.[46] By the late 1980s, the idea that computers had an internal kind of logic set apart from human-derived expert disciplines became axiomatic. In the memorable words of the electrical engineer Frederick Jelinek, “We thought it was wrong to ask a machine to emulate people. After all, if a machine has to move, it does it with wheels—not by walking. If a machine has to fly, it does so as an airplane does—not by flapping its wings.”[47] Instead of working with syntax, rules, and grammar, Jelinek proposed training computers on large language models to recognize repeated patterns and derive statistical probabilities between them.
This disciplinary turn away from rule-governed simulations of linguistic cognition, and toward statistical data processing as such, became the defining hallmark of AI in the early twenty-first century. As a result, a new research paradigm reframed the very idea of machine cognition as a fundamentally computational endeavor, radically untethered from the human faculties it sought to emulate. While delinked from any explanation about the phenomena it modeled, the strength of the statistical turn in AI lay in its power to materialize them. Again, the shift from representational to statistical modeling had less to do with the turn to digitization itself than with a turn toward an old method (statistics) newly innervated by an exponentially expanding set of materially available data. For statistically grounded AI, a cognitive model was less represented than it was realized. Crudely put, the algorithmic routines and sub-routines of even neutral-seeming search or auto-complete functionality do not explain the results of human prompting as much as they execute them.
One final example—the recent release of various generative pre-trained transformers (GPT, Bard, Notion AI, Quillbot, Wordtune, Dall-E, etc.), which are highly optimized algorithms with access to gigabytes of data designed to respond to prompts from users. These transformer models benefit from large architectures and large quantities of data, but as systems trained to predict sequences of words, they are grounded in the same basic statistical operations we find in search, autocomplete, and so on.[48] These analogous technical ecologies create the conditions for the possibility of unsupervised misinformation. How so? For all the appearance of human thought, these systems do not understand in human terms, being that they actually invert the relation between syntax and semantics we find in classic language acquisition, comprehension, and production. Instead of beginning with ideas or concepts, GPTs calculate statistical probabilities from gigabytes of training data (extracted from a broad range of domains, including online books, encyclopedias, news stories, chat logs, etc.) and then predict a string of letters (and groups of letters) in a sequence that might have the appearance of meaningful conceptualization. Human interlocutors tend to impute meaning to these textual strings even when there is none. Factual (and fictional) relations between entities and properties in the world are rendered by GPTs as syntactic “bindings” between basic word components (subjects, predicates, etc.), often relying on the substitution of synonyms for related phrases (“embeddings”) and other forms of masking to craft an automated paraphrase.[49] To grasp contextual features of language, GPT pre-trains the model to learn general linguistic patterns or structures of language, which it can then correlate to specific tasks with a smaller amount of labeled data.[50]
In short, the pre-trained transformer imitates and interpolates (using insertions and substitutions) a statistical distribution of redundancies and restatements in a vast corpus of training data. Abstract relations are substituted by correlational abstractions. However, generating knowledge-approximating strings from statistically probable objects is encoded with certain ineradicable defects. Not only is the resulting text potentially ensnared in a kind of redundancy loop, which leads commentators like Ted Chiang to describe Chat GPT as a kind of “lossy compressed” pastiche of the world’s data.[51] But even in the context of reliable data, embeddings alone introduce variations in meaning that frequently lead to misinformation (or “hallucinations,” in the anthropomorphic language of AI engineers).[52] Because the system automatically renders (perpetuates?) past data in a string of small variations (or paraphrase) without the capacity to distinguish fact from fiction, the transformer can, of itself, gradually detach the text string from representational resemblance to truth, fact, or veracity.
We find here a recapitulation of an ancient category of the false. In the words of Socrates on sophistry, one might say, the generative transformer here enacts a kind of “deception [that] slips in through certain likenesses… incrementally, step by step, through similarities away from the truth to its opposite.”[53] This is the deception, inherent to a technology of bindings and embeddings, incurred by small errors. Despite the constitutive algorithmic indeterminacy—a computational process that can generate unintended misinformation—it may become less easy to keep track of training data characteristics (the relations between entities and properties within data sets) as companies controlling the technology cordon-off their training sets from public scrutiny. When predictive algorithms are used on data outside of their training set, for example, their results are often indeterminate.[54] Indeed, with the widespread adoption of these tools, the proliferation of information objects of this sort—stochastic sequences of notation without reference—leads to what Emily Bender, Timnit Gebru, and Jack Bundy call documentation debt: “putting ourselves in a situation where the datasets are both undocumented and too large to document post hoc.”[55] Without access to the precise data collection methodology, these systems can become vehicles for a kind of unintended misinformation creep.
Without denying the practical value and fluency of automatically generated text in a variety of contexts—from customer chatbots to crafting templates and drafts for menial tasks—generative pre-trained transformers, like their predecessors in text completion technologies, are inherently error-prone and thereby additionally vulnerable to deliberate misuse and automation bias. Their embedded factual slippages (glitches) can be exploited by bad actors in the spread of disinformation, deep fakes, pseudo-science, and lies. In an information ecology where financialized third parties pay to activate, moderate, and modify attention, could it be that a technological condition interacting with a fantasy universe of affects and decontextualized objects enables affective habiti forged by naked spectacle, charismatic personality, and outright mythology? Could it be that the economic reign of the correlational order plays a role in undermining a sense of shared political, or even factual, reality? Could it be that the era of big data—mutating assemblages interlinked by affects—creates the condition for the possibility of a generalized post-truth? Perhaps the Deleuzian desiring-machine, no longer desirable today, has become the true world picture. We find in the probabilistic calculus of correlation a kind of de-sublimated cluster interpellation.
This is the world as statistical realization instead of representation. Once more, the epigram: The question is not: is it true? But: does it work?
Appendices
Biographical note
Martin Scherzinger is an Associate Professor in the Department of Media, Culture & Communication at New York University. His research includes the examination of links between political economy and digital sound technologies, poetics of copyright law in diverse sociotechnical environments, relations between aesthetics and censorship, sensory limits of mass-mediation, mathematical geometries of musical time, histories of sound in philosophy, and the politics of biotechnification.
Notes
-
[1]
Maurizio Ferraris and Martin Scherzinger, “Post-Truth and New Realities: Algorithms, Alternative Facts, and Digital Ethics,” Data & Society, New York, 12 April 2017, https://datasociety.net/library/post-truth-and-new-realities-algorithms-alternative-facts-and-digital-ethics/ (accessed 9 October 2023).
-
[2]
For a full account of the company behind the story, see Laura Sydell, “We Tracked Down a Fake News Creator in the Suburbs,” NPR, 23 November 2016, https://www.npr.org/sections/alltechconsidered/2016/11/23/503146770/npr-finds-the-head-of-a-covert-fake-news-operation-in-the-suburbs (accessed 6 October 2023).
-
[3]
Maurizio Ferraris, Manifesto of New Realism, Sarah de Sanctis (trans.), New York, State University of New York Press, 2014.
-
[4]
Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Crown Books, 2016. O’Neil’s book won the Euler Book Prize of the Mathematical Association of America in 2019.
-
[5]
Cathy O’Neil, “Social Media Companies like Facebook need to Hire Human Editors,” New York Times, 26 November 2016.
-
[6]
Ferraris, 2014, p. 4.
-
[7]
Ibid., p. 34.
-
[8]
Ferraris, 2014, p. 84.
-
[9]
Benjamin Boysen, “The Embarrassment of Being Human: A Critique of New Materialism and Object-Oriented Ontology,” Orbis Litterarum, vol. 73, no. 3, June 2018, p. 225–242; and Alexander Galloway, “Peak Analog,” http://cultureandcommunication.org/galloway/peak-analog (accessed 15 March 2023).
-
[10]
Ferraris, 2014, p. 34.
-
[11]
Ibid., p. 35.
-
[12]
Ibid., p. 62.
-
[13]
Ibid., p. 40.
-
[14]
Ibid.
-
[15]
Ibid.
-
[16]
Ibid., p. 61.
-
[17]
Ibid., p. 62.
-
[18]
Ibid., p. 56, 57, 97. Ferraris draws inspiration here from Daniel Dennett’s theory of consciousness in his example of a family of termites—“which exhibits a rational behavior even if none of its components is able to think”, p. 97.
-
[19]
Ibid., p. 60, 63.
-
[20]
Ibid., p. 57.
-
[21]
Benjamin Disraeli in Mark Twain, “Chapters from My Autobiography,” North American Review, Project Gutenberg, 1906, retrieved December 12, 2023.
-
[22]
There are numerous examples of this turn toward bad actors in the attention economy. In his analysis of the role played by Facebook in fermenting violence in Myanmar and Sri Lanka, for example, Max Fisher heaps scorn upon the anti-democratic views of Facebook CEO Mark Zuckerberg and the founder of PayPal and Palantir, Peter Thiel. See, Max Fisher, The Chaos Machine:The Inside Story of How Social Media Rewired Our Minds and Our World, New York: Little Brown and Co., 2022 and Tamsin Shaw’s review of the book in “How Social Media Influences Our Behavior,” New York Times, 1 September 2022.
-
[23]
The coverage of this case is extensive, but see, for example, Jeremy W. Peters and Katie Robertson, “Murdoch acknowledges Fox News Hosts Endorsed Election Fraud Falsehoods,” New York Times, 27 February 2023.
-
[24]
See, for example, Brian Massumi’s iconic description of affect as a kind of asignifying intensity in Parables for the Virtual: Movement Affect, Sensation, Durham, Duke University Press, 2002.
-
[25]
On “realia,” see Ferraris, 2014; on “flat ontology,” see Graham Harman, Object-Oriented Ontology: A New Theory of Everything, U.K., Penguin, 2018.
-
[26]
Manuel Delanda similarly advances the Realist assemblage as a central analytic referent against systems-oriented thought. The assemblage covers entities of the real world, ranging from natural ones (rocks, humans, diseases, weather) to social ones (corporations, wars, concerts, nation states), without committing to their aprioristic distinction. Assemblages are irreducible; they cannot be further analyzed into abstract or ultimate layers of reality. Elaborating upon Deleuze and Guattari’s “planes of consistency,” Delanda advances a flat ontology; one that blurs the lines of traditional taxonomies. In this account of realism, the atoms of quantum physics have no more claim to reality than do sporting events, say, or the movements of the market. Instead, these diverse phenomena coexist in asynchronous parallel worlds—a thousand plateaus!—interacting only in ways that are argus-eyed and multi-capillaried. Relations between plateaus are mediated less by causes than they are by catalysts. There are neither overarching laws nor predetermined structures, even if there is a degree of interaction and collision between worlds. See Manuel Delanda’s Assemblage Theory, Edinburgh, Edinburgh University Press, 2016.
-
[27]
Alex Galloway, Protocol: How Control Exists after Decentralization, Boston, MIT Press, 2004.
-
[28]
For a contrary view, see Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Crown Books, 2016.
-
[29]
Jonathan Albright shows how search and hyperlinks played a central role in the spread of networked information. Using standard search engine optimization techniques (proliferating links to various websites), fake news sites and conspiracy theorists created a kind of fake news ecosystem within the information ecosystem. See Jonathan Albright, “The #Election2016Micro-Propaganda Machine,” Medium, 18 November 2016, and Carol Cadwalladr’s “Google, Democracy, and the Truth about Internet Search,” Guardian, 4 December 2016.
-
[30]
Take G.W.F. Hegel’s dialectical process, which begins with a set of obvious claims about an intuition—or sense certainty—to reveal an understanding of these claims through an unfolding progression of determinate negations. By tracing the dialectical progression through a series of contradictions and their supersession, Hegel’s project culminates eventually in a kind of world explanation grasped as an objective reality or concrete universal. In contrast, by modeling human sensory experiences, algorithmic search functionality begins with a kind of world explanation—billions of data points readied for algorithmic processing—that culminates in the intuitive result. Sorting through a series of statistical correlations, the sorting mechanism culminates in a kind of human sensory certainty. It is as if machinic techniques activate a dialectics in reverse—certainty without mediation, duplication without comprehension. On sense certainty, see the opening arguments of Georg Wilhelm Friedrich Hegel, Phenomenology of Spirit, A. V. Miller (trans.) with a foreword by J. N. Findlay, Oxford, Clarendon Press, 1977.
-
[31]
Meta-tagging (or metadata) is information about the image provided by the page the image is from or the image itself, so in the case of a trigram embedded in a string like “t-h-r-e-e- -b-l-a-c-k- -t-e-e-n-a-g-e-r-s,” for example, the description is likely coming from sites that posted mugshots, and people are likely clicking on those images. (On the “three black teenagers” scandal, see Antoine Allen, “The ‘three black teenagers’ search shows that it is society, not Google, that is Racist,” Guardian, 10 June 2016).
-
[32]
Martin Moore, director of the Centre for the Study of Media, argues that search results influence the decisions of users: “There’s large-scale, statistically significant research into the impact of search results on political views. And the way in which you see the results and the types of results you see on the page necessarily has an impact on your perspective.” Moore, in Cadwalladr, 2016.
-
[33]
For these (and other) examples, see Cadwalladr, 2016, and Scherzinger, “Post-Truth and New Realities: Algorithms, Alternative Facts, and Digital Ethics,” Data & Society, 12 April 2017, https://datasociety.net/library/post-truth-and-new-realities-algorithms-alternative-facts-and-digital-ethics/ (accessed 18 March 2024). Michael Golebiewski and dana boyd consider autosuggestions with phrases such as this to be a problem of a data void—a computational scenario where little countervailing data is available for search engines. (See their “Data Voids: Where Missing Data Can Easily be Exploited,” New York, Data & Society, 2019, p. 35).
-
[34]
In keeping with O’Neil’s general position, Frank Pasquale, for example, argues that search results delivering users to false information are linked to the bias of Silicon Valley engineers: “There’s all sorts of bias about what counts as a legitimate source of information and how that’s weighted. There’s enormous commercial bias. And when you look at the personnel, they are young, white and perhaps Asian, but not black or Hispanic and they are overwhelmingly men. The worldview of young wealthy white men informs all these judgments.” (Pasquale, in Cadwalladr, 2016). Against this view, Google’s classic, but inadequate, rejoinder reads: “Our search results are a reflection of the content across the web… These results don’t reflect Google’s own opinions or beliefs–as a company, we strongly value a diversity of perspectives, ideas and cultures” (ibid.). Pasquale’s person-centric critique echoes Max Fisher’s critique of the anti-democratic views of Mark Zuckerberg and Peter Thiel mentioned above (see Fisher, 2022, p. 4-6).
-
[35]
For the human user, search responds with almost hyperactive solicitude. A high-speed sorting algorithm is experienced as an instantaneous tap of the thumb. Correlation is therefore experienced as cause. In psychoanalytic terms, one might say, consciousness has been interrupted by machinic inferences, producing a cyborg unconscious. Software intervenes as superego of the id.
-
[36]
Search inquiries that interact with nonexistent or limited representational data create an exploit for media manipulation hoping to direct traffic to misinformation, disinformation, and falsehoods. It is precisely because conspiracy, rumor, and fabricated content, for example, is generally ignored by mainstream sources that terms uniquely associated with them—“pizzagate,” “crisis actor,” “deep state,” “plandemic,” and even “fake news,” for example—drive search results toward disinformation. This first-mover advantage becomes more fluent in the context of search-adjacent recommendation systems (including functions like autofill, auto-play, and trending topics). There are several ways first-mover advantage enables a media manipulator to establish a kind of recognition and loyalty before additional entrants into the information network. Breaking news, for example, creates a spike in search inquiries that can be strategically optimized away from the truth. Take the case of the 2017 mass shooting at an outdoor concert in Las Vegas: Here, conspiracy theorist Alex Jones circulated false information about the alleged shooter (watched by millions on YouTube) hours before the police issued their official reports. Media manipulators can also exploit the creation of new terms and optimize them in ways that undercut the truth. For example, when the media reported on the unrestrained circulation of a fake news item about the murder-suicide of an FBI agent, mentioned on the eve of the 2016 election, the then president-elect, Donald Trump, strategically optimized a bigram to direct traffic away from this news item and toward mainstream media—the “fake news” of CNN, the “failing” New York Times, and so on. Problematic search strings, too, can direct algorithmic systems to problematic content segmented into information clusters. Paradoxically, therefore, search engines such as Bing and DuckDuckGo—an “internet search engine that emphasizes protecting searchers’ privacy and used to avoid the filter bubble of personalized search results”—frequently fare worse than Google when it comes to the dissemination of false information. (see Stuart A. Thompson, “Fed Up With Google, Conspiracy Theorists Turn to DuckDuckGo,” New York Times, 23 February 2023).
-
[37]
Techniques for filtering out misinformation, falsehoods, and fake news in the world of data science tend to be subject specific, rather than generalized. For example, in Yochai Benkler’s discussion of NASA clickworkers engaging a collaborative project involving highly modularized individual tasks, Benkler writes, “[the organizers] built in redundancy and automated averaging out of both errors and purposeful erroneous markings” (Yochai Benkler, Wealth of Networks: How Social Production Transforms Markets and Freedom, New Haven and London, Yale University Press, 2006, p. 69). Indeed, Wikipedia’s use of automated bots, guided by algorithmically defined tasks to combat vandalism and false entries, largely facilitates the monumental task of editing, maintenance, and administration of the site (see Stuart Geiger & David Ribes, “The Work of Sustaining Order in Wikipedia: The Banning of a Vandal.” Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, February 2010, p. 117–126, https://dl.acm.org/doi/10.1145/1718918.1718941 (accessed 9 October 2023)). These subject-specific safeguards arguably account for the site’s overall robustness and reliability. On the adage, “there’s no data like more data,” see Emily M. Bender & Timnit Gebru, et al, “On the Dangers of Stochastic Parrots: Can Language Models be too Big?” FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, March 2021.
-
[38]
Brian Massumi, “Preface” to Gilles Deleuze and Félix Guattari’s A Thousand Plateaus: Capitalism and Schizophrenia. Minneapolis, University of Minnesota Press, 1987, p. xv.
-
[39]
The algorithmic spread of misleading information on Facebook and Twitter, for example, has been amply documented in recent years. In Chaos Machine, for example, Max Fisher outlines the role played by social media platforms in disseminating false theories that amplified everything from vaccine denialism to outbreaks of violence in Myanmar, Sri Lanka, and elsewhere.
-
[40]
David Lyon, Pandemic Surveillance: Privacy, Security, and Data Ethics. Cheltenham, U.K. & Northampton, MA: Edward Elgar Publishing, 2020.
-
[41]
Daniel Solove, “Digital Dossiers and the Dissipation of Fourth Amendment Privacy,” 75 S. Cal. L. Rev. 1083, 2002.
-
[42]
“You have doppelgängers. They’re quietly influencing your life,” Seth Davidovitz, 2019, YouTube, Big Think, https://www.youtube.com/watch?v=2raflHttBcg (accessed 12 December 2023), 2’45”.
-
[43]
John Cheney-Lippold, We are Data: Algorithms and the Making of Our Digital Selves. New York: New York University Press, 2017.
-
[44]
One way of conceptualizing this additional algorithmic layer in relation to the circulation of false information today involves what I call the cluster interpellation of the biotechnical individual. If the uses of data in features like search, auto-complete, and auto-play has shifted the emphasis from information provision to behavior prediction, then social media and other gamified applications have shifted it from prediction to behavior modification.
-
[45]
For an excellent account of expert systems, see John Haugeland, Artificial Intelligence: The Very Idea, Cambridge, MIT Press, 1985.
-
[46]
The current flourishing of research in AI was the result of a decisive shift away from either the formal modeling of the human sensory-motor system or the generative structure of the human linguistic system and toward large-scale statistical data processing. For a detailed account of how data-driven machine learning superseded traditional linguistic modeling in the context of predictive text and speech recognition systems, see Xiaochang Li, “Divination Engines: A Media History of Text Prediction,” doctoral dissertation, New York University, 2018. For a brief introduction to the distinction between human intuition and machinic pattern recognition, see Clive Thompson, “The Miseducation of Artificial Intelligence,” Wired, December 2018, p. 74–81.
-
[47]
Frederick Jelinek, quoted in Peter Hillyer, “Talking to Terminals,” THINK, IBM Corporate Archives, 1987.
-
[48]
Language models using n-grams can only order relatively local word linkages and dependencies (predicting words based on sequences of five, or fewer, words), whereas the transformer models used by these generative pretrained tools can capture larger spans of text, thereby producing text strings that resemble fluency and coherence across entire paragraphs.
-
[49]
“Word embedding” techniques are built on word vectors using various computational tools, including textual entailment, semantic role labeling (SRL), coreference resolution, named entity recognition (NER), and sentiment analysis. On “bindings” and “embeddings,” see Gary Marcus, “How come Chat GPT can seem so brilliant,” Marcus on IA, 1 December 2022, https://garymarcus.substack.com/p/how-come-gpt-can-seem-so-brilliant (accessed 15 March 2023).
-
[50]
To grasp linguistic contexts, the distributed representation model is learned based on word usage. This model allows words that are used in similar ways to result in similar representations, thereby providing a more effective proxy for their natural meaning. In contrast, word representation in a bag of words model, unless explicitly managed, correlates different words with different representations (regardless of use). The linguistic theory informing this approach—called the “distributional hypothesis”—was elaborated by Harris Zellig in the 1950s. In short, words appearing in similar contexts will overlap in meaning. For more on the subject, see Harris Zellig, [1954], “Distributional Structure,” Word, vol. 10, no. 2–3, 4 December 2015, p. 146–152.
-
[51]
On lossy compression, see Ted Chiang, “ChatGPT is a Blurry Jpeg of the Web,” New Yorker, 9 February 2023, https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web (accessed 15 March 2023).
-
[52]
Chiang uses a disarmingly simple example drawn from arithmetic, noting that the accuracy of ChatGPT decreases considerably when more digits are added to a simple addition or subtraction prompt (ibid.). This is because, unlike a calculator, GPT output does not derive from the principles of mathematics as much as from statistical correlations within pre-existing bodies of text.
-
[53]
Plato, Phaedrus, Stephen Scully (trans.), Hackett, 2003, p. 48, 262b.
-
[54]
This kind of substitution is always possible in computer applications. On algorithmic indeterminacy, see Derek Curry, “Artistic Defamiliarization in the Age of Algorithmic Prediction,” Leonardo, 2023, vol. 56, no. 2, p. 177–182.
-
[55]
Bender and Gebru, 2021, p. 615.