Dr Tiago Sousa Garcia
Research Associate, School of English Literature, Language and Linguistics, Newcastle University
One which my father saw in a hexagon on circuit fifteen ninety-four was made up of the letters MCV, perversely repeated from the first line to the last. Another (very much consulted in this area) is a mere labyrinth of letters, but the next-to-last page says Oh time thy pyramids. This much is already known: for every sensible line of straightforward statement, there are leagues of senseless cacophonies, verbal jumbles and incoherences.
— Jorge Luis Borges, ‘The Library of Babel’ (1941)
Jorge Luis Borges’ famous 1941 short-story, ‘The Library of Babel,’ describes an impossibly large library containing every possible 410-page long book in the world. The library is divided into hexagonal rooms, each with twenty shelves, each shelf carrying thirty-five books. Roaming its hexagonal rooms are its denizens, some intent on making sense of the impossible library; others, senseless, driven mad by their searches. Because the Library contains every book ever written, or every possible book that could ever be written — Shakespeare’s complete works, the Bible, these very words I write — it also contains every possible meaningless combination of characters. In fact, the vast majority of the library’s contents are gibberish. Though the Library contains the totality of human knowledge and all the secrets of the universe, these are impossible to find, forever lost among mountains of nonsense.
Borges’ total library would be impossible to replicate in the physical world, but not in the digital realm. That’s exactly what Jonathan Basile did: he created a digital version of the Library of Babel on-line, part plaything, part digital art project. Basile’s Babel realises Borges’ idea (with a few extra constraints) in the digital space, containing everything from Hamlet’s famous soliloquy to the very words of this article. Like Borges’, Basile’s library is composed mostly of gibberish, any and all random combinations of the available characters. Unlike Borges’ library, Basile introduces a very useful ‘search’ function that allows the user to find anything they wish amidst the mountains of possible information. With a simple digital function Basile appears to solve the Library’s most fiendish torture: the impossibility to find what one is looking for.
Except it doesn’t. Consider the paradox: though users are able to search and find virtually anything in the digital Library, they cannot search for a simple phrase like: ‘this line is in page x, of the book titled y, which is in hexagon z’ and have the values of page number, book title and hexagon correspond to reality. That book, in which page number, book title and hexagon are correct is sure to exist, but it is simply impossible to find. The paradoxical nature of the ‘search’ function in Basile’s digital Babel is not unlike the thousands of catalogues said to exist in Borges’ short-story. In the Library there is one true catalogue of all its contents, but there are a near-infinite number of false catalogues that lead the reader astray. Anything can be found in the digital Babel, but only if you already know what to search for.
It should be clear by now that I am establishing a parallel between Borges’ impossible library (and its reified digital counterpart) and our current data-driven information age. This parallel is not particularly new or insightful, but my point is not about the many comparable features of Borges’ story and the internet. For better or worse, Borges’ vision of a repository of all knowledge rendered unusable by the sheer amount of nonsense is more than ever the reality in which we live today: think about how hard it is to distinguish between the fake (purposefully introduced, or simply propagated misinformation) and the real news; or how Wikipedia pages about nonexistent historical facts appear, live for over ten years, and then become themselves part of the history of that platform.
Rather, the point that I want to make is about the paradoxical nature of the ‘search’ function in Basile’s digital Babel. A traditional search function — finding the exact thing you are looking for — is a crucial and necessary tool, but it fails in one fundamental aspect: the random encounter with the unknown. If Borge’s librarians were to rely on a similar ‘search’ function, they could never have encountered that book that was mostly nonsensical but included the phrase ‘Oh time thy pyramids’ on its second to last page, perhaps the most hauntingly beautiful phrase in literature.
In other words, the crucial problem of our age is not access to information, but finding information we don’t already have (or suspect). We need the digital equivalent of finding an interesting book next to the book we wanted. A simple ‘search’ function returns exactly what we look for, and nothing else; even its more advanced cousins, what is known as a ‘fuzzy search’ for example (which finds approximate as well as exact matches), mostly retrieve the information we believe we want, not necessarily the information that we need.
The problem with this traditional approach to search (or research) is that it relies on the user’s already existent web of knowledge: filling it with detail and enlarging it only very slowly and incrementally. In this regard, it is no different than the traditional humanistic approach to research which relies heavily on existing encyclopedic knowledge: every new piece of information acquired is compared against every other piece of information already known to establish connections, relationships, causations, parallels; the new piece of information is then added to the existing knowledge bank and used to analyse any subsequent new piece of information. It is a good, sure-footed model for knowledge expansion, but it is also slow and predictable. However, Borges’ library — and our real, data-rich world — is simply too large to be navigated in this way; even if we could speed up the search, as we have done with digital technology, there is only so much information that humans can hold.
Humans cannot read and hold all the information contained in the Library of Babel; but computers can. The challenge is to teach computers to read like humans — making connections, drawing conclusions, establishing relationships — rather than simply memorise and repeat information. I am thinking specifically about the developments in Artificial Intelligence (AI), and particularly in Machine Learning (ML). ML is already a part of our lives’: it’s how online retailers can offer a discount on the book you wanted to read but have not searched for (instead, you spent two hours reading about the author, their biography and their works); or how you can ask Google to translate anything into virtually any language and be sure to have a readable (if not stellar) translation. In both cases ML is working on the background: examining all the data available, making patterns, drawing conclusions.
You might be able to see why, as a humanist, I am intrigued by the development of ML: I can’t read the Library of Babel, but ML can, not simply by storing and retrieving information, but by establishing connections and relationships that I could never dream of. It has the potential to change the face of literary history as we know it, destabilize the canon, rewrite the narrative of literary exchange across time and space. There is an argument to be made that we, as humanists, should be more open to ML and other automated methodologies that give us the opportunity to escape ourselves, our limitations, our biases.
However, you might also see why, as a humanist, I am cautious about our growing reliance on ML algorithms and methodologies, not only in research, but in its use in the world at large. Reservations about ML and other AI go deeper than simple technophobia, and the humanistic perspective on the creation, development, and use of these technologies is more important than ever. In fact, despite its near magical capabilities, most ML methods rely heavily on hidden human labour which carries with it all the biases, limitations, and errors that all human labour is inevitably prone to, except that it hides it in a veneer of automation and certainty. Rather than correct our misunderstandings of the world, ML can dangerously and irreversibly confirm them, as seen countless times in its application to face recognition and other real-world uses of ML. The big problem with ML is that, like humans, it learns from the past but, unlike humans, it cannot be critical of it.
Compacting the error produced by the human-collected historical training data is the fact that ML methods force a binary decision: this object is either a banana or it isn’t a banana (with more or less probabilistic certainty), but the machine cannot consider whether the object could be a plantain. Forcing a binary view of the world means abandoning a nuanced understanding that things — people, ideas, words — can be many at once, none at all, or something completely different. Over-reliance on ML for decision-making leads us to bad decisions that are, inherently, human.
This is where I and others see the humanities and humanists playing a part in our current data-driven world. Humanists are accustomed to work with uncertainty, doubt and nuance in ways that escape the deterministic models that currently guide the development of ML and AI. Humanists and the humanities need to engage with, shape, and define the next generation of ML algorithms, rather than simply sit back and claim the end of the world is nigh. Perhaps a first step in that direction is to include ML and AI into our current humanities research. There are already many interesting projects doing so, and more are sure to follow.
From my perspective, involving the humanities in the development and use of ML has the potential to be beneficial to computer and data science, humanities’ research, and the wider society. On the one hand, humanities’ datasets are usually large, irregular, complex, and human-readable (rather than machine-readable), which creates an interesting challenge for ML at every turn: (relatively) simple tasks that humans perform with little difficulty — such as understanding handwritten documents — can be tricky for machines to replicate; ‘messy’ data that might be useless with today’s ML algorithms can help the development of newer, more refined solutions; and crucially, because we as humanists are rarely interested in clear-cut perspectives, we can help develop algorithms that reflect a more nuanced view of the world.
On the other hand, precisely because humanists often work with very large, very unruly datasets (be it entire libraries of literature, or centuries of handwritten records), the picture we are able to put together is rarely a complete one. As humanists, we are merely the librarians of Babel, roaming its huge expanse extracting what meaning we can from an infinitesimally small proportion of human knowledge. ML and other AI technologies have the potential to reveal to us that big picture, to read the Library of Babel in its entirety. The potential for ML and AI in humanities research is immense, from the automation of repetitive tasks, to the analysis of large amounts of data, to simply opening up new, unthought avenues of humanistic research.
ML and AI more broadly are far from perfect, and nothing like the solution to all the world’s problems that some tech-evangelists would have us believe; but they are here and are not going anywhere. As humanists, we have a decision to make: we either get lost by ourselves in the Library of Babel, or we work together to make sense of it.