Abstract: Recent advances in natural language processing have given rise to a new kind of AI architecture: the language agent. By repeatedly calling an LLM to perform a variety of cognitive tasks, language agents are able to function autonomously to pursue goals specified in natural language and stored in a human-readable format. Because of their architecture, language agents exhibit behavior that is predictable according to the laws of folk psychology: they have desires and beliefs, and then make and update plans to pursue their desires given their beliefs. We argue that the rise of language agents significantly reduces the probability of an existential catastrophe due to loss of control over an AGI. This is because the probability of such an existential catastrophe is proportional to the difficulty of aligning AGI systems, and language agents significantly reduce that difficulty. In particular, language agents help to resolve three important issues related to aligning AIs: reward misspecification, goal misgeneralization, and uninterpretability.
''Pronouns and Gender'' In The Oxford Handbook of Applied Philosophy of Language (forthcoming). (With Michael Glanzberg.)
Abstract: This chapter introduces readers to the empirical questions at issue in debates over gendered pronouns and assesses the plausibility of various possible answers to these questions. It has two parts. The first is a general introduction to the linguistics and psychology of grammatical gender. The second focuses on the meanings of gendered pronouns in English. It begins with a discussion of some methodological limitations of empirical approaches to the topic and the normative implications of those limitations. It then argues against three simple theories of the semantics of gendered pronouns in English and proposes an alternative that fares better: the Gender-First View. Finally, it discusses the singular use of 'they' and its connection to nonbinary gender identities.
''Do Not Diagonalize'' In The Oxford Handbook of Contemporary Philosophy of Language (forthcoming).
Abstract: Speakers assert in order to communicate information. It is natural, therefore, to hold that the content of an assertion is whatever information it communicates to its audience. In cases involving uncertainty about the semantic values of context-sensitive lexical items, moreover, it is natural to hold that the information an assertion communicates to its audience is whatever information audience members are in a position to recover from it by assuming that the proposition it semantically determines is true. This sort of picture corresponds to an influential and widely endorsed theory of assertoric content: diagonalism. I begin by arguing that, despite its intuitive appeal, diagonalism should be rejected because it conflicts with our intuitive judgments about the circumstances in which the contents of speakers' assertions would be true or false. I then show that the failure of diagonalism requires us to either abandon a familiar way of thinking about information and rational assertion or hold that the content of an assertion is not always the information it communicates. I suggest that we choose the latter horn of this dilemma — assertoric content is better characterized in terms of the commitments speakers undertake than in terms of the information they communicate.
''Covert Mixed Quotation'' Semantics and Pragmatics (accepted pending revisions).
Abstract: The term covert mixed quotation describes cases in which linguistic material is interpreted in the manner of mixed quotation — that is, used in addition to being mentioned — despite the superficial absence of any commonly recognized conventional devices indicating quotation. After developing a novel theory of mixed quotation, I show that positing covert mixed quotation allows us to give simple and unified treatments of a number of puzzling semantic phenomena, including the projective behavior of conventional implicature items embedded in indirect speech reports and propositional attitude ascriptions, so-called 'c-monsters,' metalinguistic negation, metalinguistic negotiation, and 'in a sense' constructions.
''Artificial Intelligence: Arguments for Catastrophic Risk'' Philosophy Compass 19(2): e12964 (2024). (With Adam Bales and William D'Alessandro.)
[Official (Open Access)]
Abstract: Recent progress in artificial intelligence (AI) has drawn attention to the technology’s transformative potential, including what some see as its prospects for causing large-scale harm. We review two influential arguments purporting to show how AI could pose catastrophic risks. The first argument — the Problem of Power-Seeking — claims that, under certain assumptions, advanced AI systems are likely to engage in dangerous power-seeking behavior in pursuit of their goals. We review reasons for thinking that AI systems might seek power, that they might obtain it, that this could lead to catastrophe, and that we might build and deploy such systems anyway. The second argument claims that the development of human-level AI will unlock rapid further progress, culminating in AI systems far more capable than any human — this is the Singularity Hypothesis. Power-seeking behavior on the part of such systems might be particularly dangerous. We discuss a variety of objections to both arguments and conclude by assessing the state of the debate.
Abstract: Existing work on gaslighting ties it constitutively to facts about the intentions or prejudices of the gaslighter and/or his victim’s prior experience of epistemic injustice. I argue that the concept of gaslighting is more broadly applicable than has been appreciated: what is distinctive about gaslighting, on my account, is simply that a gaslighter confronts his victim with a certain kind of choice between rejecting his testimony and doubting her own basic epistemic competence in some domain. I thus hold that gaslighting is a purely epistemic phenomenon — not requiring any particular set of intentions or prejudices on the part of the gaslighter — and also that it can occur even in the absence of any prior experience of epistemic injustice. Appreciating the dilemmatic character of gaslighting allows us to understand its connection with a characteristic sort of epistemic harm, makes it easier to apply the concept of gaslighting in practice, and raises the possibility that we might discover its structure and the associated harm in surprising places.
Abstract: Contextology is the science of the dynamics of the conversational context. Contextology formulates laws governing how the shared information states of interlocutors evolve in response to assertion. More precisely, the contextologist attempts to construct a function which, when provided with just a conversation's pre-update context and the content of an assertion, delivers that conversation's post-update context. Most contextologists have assumed that the function governing the evolution of the context is simple: the post-update context is just the pre-update context intersected with the content of the assertion. We argue that this assumption is wrong: not only is it false, it is also incoherent given standard contextological assumptions. Moreover, it is impossible in principle to revise it to correctly describe the dynamics of context. We conclude that there can be no science of Contextology. The laws governing the evolution of the context in response to assertion must make essential reference to the private information states of interlocutors.
Abstract: Horizontalism is the thesis that what a speaker asserts in literally and sincerely uttering an indicative sentence is some horizontal proposition of her utterance; diagonalism is the thesis that what a speaker asserts in literally and sincerely uttering an indicative sentence is some diagonal proposition of her utterance. Recent work on assertion has reached no clear consensus favoring either horizontalism or diagonalism. I explore a novel strategy for adjudicating between the two views by considering the advantages and disadvantages which would accrue to a linguistic community as a result of adopting different committal practices – that is, practices of associating utterances with the propositions to which speakers undertake assertoric commitments in uttering them – ultimately concluding that a horizontalist practice has important advantages over its competitors.
''Slurs Are Directives'' Philosophers' Imprint 19(48): 1-28 (2019).
[Official (Open Access)]
Abstract: Recent work on the semantics and pragmatics of slurs has explored a variety of ways of explaining their potential to derogate, with the most popular family of approaches appealing to either: (i), the non-cognitive attitudes expressed by — or (ii), the propositions concerning such attitudes semantically or pragmatically communicated by — the speakers who use them. I begin by arguing that no such speaker-oriented approach can be correct. I then propose an alternative treatment of slurs, according to which they are semantically associated with both descriptive and directive content. On the view I defend, when speakers use slurs, they simultaneously propose to add an at-issue proposition to the conversational common ground and issue a not-at-issue directive to their interlocutors to adopt a derogatory perspective toward members of the targeted group. This proposal both avoids the problems faced by other accounts and opens up a novel way of thinking about the phenomenon of appropriation.
Abstract: Sceptical theists attempt to meet the challenge to theism posed by evidential arguments from evil by appealing to the limitations of human cognition. Drawing on an exchange between William Rowe and Michael Bergmann, I argue that consistent sceptical theists must be radically insensitive to certain kinds of evidence about prima facie evils – that is, that they must endorse the claim that not even evidence of extreme and pervasive suffering could justify disbelief in theism. I show that Bergmann’s attempt to respond to this problem does not succeed and argue that no alternative response is forthcoming, concluding that the threat of radical insensitivity constitutes a serious and underappreciated difficulty for sceptical theism.
Abstract: Can rational communication proceed when interlocutors are uncertain which contents utterances contribute to discourse? An influential negative answer to this question is embodied in the Stalnakerian principle of uniformity, which requires speakers to produce only utterances that express the same content in every possibility treated as live for the purposes of the conversation. The principle of uniformity enjoys considerable intuitive plausibility and, moreover, seems to follow from platitudes about assertion; nevertheless, it has recently proven controversial. In what follows, I defend the principle by developing two arguments for it based on premises reflecting the central aims and assumptions of possibility-carving frameworks for modeling inquiry — that is, frameworks which describe the evolution of individuals’ attitudinal states in terms of set-theoretic operations defined over a domain of objects representing possibilities.
Abstract: Greg Ray (2014) believes he has discovered a crucial oversight in Donald Davidson’s semantic programme, recognition of which paves the way for a novel approach to Davidsonian semantics. We disagree: Ray’s novel approach involves a tacit appeal to pre-existing semantic knowledge which vitiates its interest as a development of the Davidsonian programme.
AI Safety Special Issue of Philosophical Studies (forthcoming.) (With Dan Hendrycks.)
Unstructured Content. Oxford University Press (forthcoming). (With Peter van Elswyk, Andy Egan, and Dirk Kindermann.)
Abstract: The original essays in this volume present new research on unstructured theories of content, which have traditionally played a central role in linguistics and philosophy of language. The volume explores a wide range of themes related to unstructured content, including both the continued controversy over whether unstructured theories individuate contents too coarsely and various applications of unstructured theories to topics like rationality, epistemic commitment, semantic expressivism, relevance, and propositional attitude ascriptions. It contains contributions from different theoretical perspectives, including both those sympathetic to unstructured theories of content and those who are skeptical, as well as from different methodological backgrounds, with philosophy, logic, and linguistics all represented. With contributions from leading scholars in philosophy and linguistics, this volume will be of interest to anyone working in logic, metaphysics, or the philosophy of mind.
Abstract: Donald Davidson was one of the most famous and influential philosophers of the twentieth century. The Structure of Truth presents his 1970 Locke Lectures in print for the first time. They comprise an invaluable historical document which illuminates how Davidson was thinking about the theory of meaning, the role of a truth theory therein, the ontological commitments of a truth theory, the notion of logical form, and so on, at a pivotal moment in the development of his thought. Unlike Davidson's previously published work, the lectures are written so as to be presented to an audience as a fully organized and coherent exposition of his program in the philosophy of language. Had they been widely available in the years following 1970, the reception of Davidson's work might have been very different. Given the systematic nature of their presentation of Davidson's semantic program, these lectures will be of interest to anyone working in the philosophy of language.
''AI Wellbeing'' (under review). (With Simon Goldstein.)
Abstract: Under what conditions would an artificially intelligent system have wellbeing? Despite its obvious bearing on the ethics of human interactions with artificial systems, this question has received little attention. Because all major theories of wellbeing hold that an individual’s welfare level is partially determined by their mental life, we begin by considering whether artificial systems have mental states. We show that a wide range of theories of mental states, when combined with leading theories of wellbeing, predict that certain existing artificial systems have wellbeing. While we do not claim to demonstrate conclusively that AI systems have wellbeing, we argue that our metaphysical and moral uncertainty about AI wellbeing requires us dramatically to reassess our relationship with the intelligent systems we create.
''The Polarity Problem'' (under review). (With Simon Goldstein.)
Abstract: If it is possible to construct artificial superintelligences, it is likely that they will be extremely powerful. A natural question is how many such superintelligences to expect. Will the future be shaped by a single superintelligence (a unipolar outcome), or will there be multiple superintelligences shaping the future through their cooperative or adversarial interactions (a multipolar outcome)? We refer to this question as the polarity problem. This paper investigates the polarity problem. First, we consider the question of safety, suggesting that multipolar outcomes are likely to be safer for humanity than unipolar outcomes because humans are less likely to be disempowered in multipolar scenarios. Then we develop a series of causal models of AI agents that make predictions about the relative likelihood of unipolar as opposed to multipolar outcomes. Central to our models are three parameters: the time it takes for an AI attacker to develop an attack against an AI defender, the time it takes for the AI defender to develop its defense, and how far the defender lags behind the attacker temporally. We use our models to identify possible interventions which could change the probability of a multipolar outcome, thereby leading to a safer trajectory for AI development.
Recent and Upcoming Presentations
Comments on "Bigoted Beliefs and the Safety Condition" by Gus Turyn. Eastern APA Colloquium, New York. [January 2024]
AI Safety: The View from Philosophy. ITAM AI Futures Fellowship, Mexico City. [January 2024]
Language Agents Reduce the Risk of Existential Catastrophe. AI Ethics and Safety Course, University of Oxford, Digital. [November 2023]
Might Language Agents Be Conscious? AI Agency and Wellbeing Workshop, University of Hong Kong. [November 2023]
Quotation for Dummies. Central APA Symposium, Denver. [February 2023]
Gender Identity and Volition. Social Identities and Cognition in the Desert Workshop, Palm Springs. [February 2023]
Gender First: A Theory of Binary Category Terms. Words Workshop, Digital. [February 2023]
Comments on "The Impact of Slurs" by Edward Schwartz. Eastern APA Symposium, Montreal. [January 2023]