Cognition through language
Written by Sil Hamilton in January 2022.

The last four years have seen a deep learning revolution take the field of natural language processing by storm. 2018 saw the emergence of deep learning architectures whose prime operating principle of self-attention allows them to maintain a sense of coherence when interpreting and interacting with longer texts. Equipped with this ability to infer the semantics of an input string, Transformers have tested our understanding of what it means to comprehend a text. While some academics in the humanities and elsewhere have deigned not to describe large language models as being capable of true understanding, I am here to argue models like GPT, BERT, Gopher, etc., constitute convincing models of cognition as they currently are.

Recent popular examples of Transformer-generated content has resulted in significant push back from academics working in the humanities. Take this article as an example. The authors characterize large language models as being mere “stochastic parrots” whose output is only coherent as it “is in fact in the eye of the beholder.” As their logic goes, communication is by definition conducted between individuals whose intent on holding a discussion momentarily aligns their world-views, thus allowing for semantic information to be exchanged between parties (616). Disregarding the tautological nature of their argument, their focus on communication between multiple agents belies a misunderstanding of the varied roles language plays in the human experience: while language is certainly often used in the semiotic sense, it is simultaneously critical for deeper logical operations. Language is a wide-ranging tool of many functions. While Ted Underwood has presented a convincing argument for understanding language models as manifestations of human culture; there are other reasons for why language models are of value for both the humanities and more generally those seeking a viable model of human cognition through language-enabled methods.

Any paper for any recently released language model will have numerous pages devoted to auditing the reasoning and comprehension skills of their new creation. Take the following papers on GPT-3 and Gopher as examples. Both examine their models through a similar set of tests: SuperGLUE, RACE, OpenBookQA, and so on. These tests are all designed in a similar manner, that being a set of multiple choice questions paired with answers. These are generally assembled and presented to the model for testing as-is, without fine-tuning. The model is directed to pick one of the possible answers after having reviewed the question, this choice being taken as deliberate on the part of the model. It will be readily clear to any individual reviewing the test scores that models like Gopher have achieved human-level performance in answering reading comprehension tests intended for English-speaking high-school students. This is not achievable by guessing alone. Being able to compete with humans in reading comprehension tests can only mean these models are not “stochastic parrots” but rather cognizant (up to some as-of-yet undefined limit) in their own right. This should not be lost on any scholar considering the impact of this present neural revolution.

There is further evidence demonstrating the prowess with which large language models decode text. One interesting result coming from the 2020 Manning et al. paper on emergent linguistic knowledge in self-supervised models is that bidirectional models like BERT and its derivatives implicitly understand numerous syntactic aspects of text, including dependencies between clauses. Others have found the same model is capable of extracting causal relationships between entities, details invisible to any purely syntactic manner of reading text. Higher-level semantics are also present. Probing tests have revealed GPT and other linear models encode significant details about the real world in so-called “knowledge neurons,” artificial neurons strongly correlated with recognizing facts. Those familiar with GPT-3 will know the model is sometimes accurate enough to work as a knowledge-retrieval engine out of the box, a detail not lost to OpenAI.

Exactly how the above effects are manifested by way of self-supervised learning remains a mystery, but there are a few extant clues. Those familiar with linguistics will know there are major families of theories present within the field: formal and functional linguistics. Formal linguists are inspired by individuals like Noam Chomsky to write generative and combinatory categorial grammars based on observations of language speakers. Those researching in the area of functional linguistics and its derivatives (cognitive linguistics, etc.) take an arguably more practical view on what constitutes language: how words are used in a real-world context. This usage-based perspective does not define languages according to some formulaic understanding of syntax and grammar; but rather leaves them undefined until proven in practice. A usage-based perspective assigns words meaning based on how they are used, not by their syntactic or semantic categorization in some larger Platonic structure. GPT and BERT learn explicitly by way of studying how languages are used; meaning they have no inherent understanding of syntax or grammar. Studies on language acquisition have long showed this is how we learn, so Transformer block arrangements bootstrapping themselves into valid models of cognition should come as no surprise.