Against “joint action”: all actions are actions in a populated environment

I gave a talk last week at the 6th Joint Action Meeting in Budapest. The program for the conference is here; my abstract appears on page 29. The conference is a biannual affair. It was initially set up ten years ago by Günther Knoblich and Natalie Sebanz as a forum for collegial disagreement about what joint action is. It still retains that atmosphere, I think. I tried to construct my talk in the spirit of open enquiry that Günther and Natalie seem to invite. I like this conference because I believe such open enquiry is necessary for working through the conceptual issues involved here. I used my talk to argue against “joint action” as a piece of terminology, on the grounds that the term leads to an overly restrictive interpretation of what’s going on in the phenomena studied by joint action researchers. The argument is based on James Gibson’s distinction between the environment and the physical world. I will use this post to make a record of the argumentative portion of the talk. (The talk also had an illustrative portion in which I talked about possession in soccer, as blogged here.)

What is a “joint action” anyway?

Ants on an ant mound Two people carrying a sofa together
What do these two images have in common? (sources: 1, 2)

I start with a question: what is it that is supposed to unite all of the different phenomena we talk about when we talk of joint action? Consider the two images above: on the left, a group of ants in the process of building a mound; on the right, two people carrying a sofa together (this is one of the go-to examples that philosophers reach for when discussing joint action).

In fact, it is very difficult to identify anything that unites these phenomena. Philosophers often appeal to the concept of common knowledge (e.g., from the Stanford Encyclopedia of Philosophy: “In order to communicate or otherwise coordinate their behavior successfully, individuals typically require mutual or common understandings or background knowledge”). But few would be willing to attribute common knowledge to the ants. Psychologists like to appeal to hypothetical theory of mind capacities within individuals, but again few would be willing to entertain the idea that the individual ants are engaging in complex mental modelling of the inner lives of their fellows.

One escape route here would be to simply insist that only humans can do proper joint action. (This is the option chosen by Michael Tomasello, for instance. Tomasello asserts that chimpanzees hunting in groups are not really doing joint action: “The apes are engaged in a group activity in ‘I’ mode, not in ‘we’ mode.”) This does not really solve the problem, though. We now just have a new problem: how do we distinguish proper joint action from merely apparent or quasi-joint action? And if non-linguistic animals cannot do joint action, what about human infants? In short, we’ve merely replaced the question of defining joint action with a different question: what exactly is it that makes humans so special?

So it is difficult to identify just what it is that allows us to talk about a bunch of disparate phenomena while labelling them all as instances of “joint action”. Except there is one thing that unites the ants and the sofa-carriers. Consider the situation from the perspective of one of the individual actors. Relative to that individual’s perceptual system, the environment is populated. And it is here that it is necessary to invoke James Gibson’s distinction between the environment and the physical world.

The environment v. the physical world

At the very start of his final book, Gibson introduces his concept of the environment by contrasting it with the standard view of the physical world. The terms “environment” and “physical world”, he claims, should not be treated as synonyms. Understanding this distinction is, I think, key to understanding the radical difference between Gibson’s psychology and the standard representational theory of mind. (I hadn’t really realised how important this was until I was preparing this talk. But the distinction obviously is important to Gibson: he makes it on page 2 of chapter 1, a chapter whose title is “The animal and the environment”.)

The environment, according to Gibson, is that which surrounds an individual animal (1979, p. 8):

[I]t is often neglected that the words animal and environment make an inseparable pair. Each term implies the other. No animal could exist without an environment surrounding it. Equally, although not so obvious, an environment implies an animal (or at least an organism) to be surrounded. This means that the surface of the earth, millions of years ago before life developed on it, was not an environment, properly speaking. The earth was a physical reality, a part of the universe, and the subject matter of geology. […] We might agree to call it a world, but it was not an environment.

Gibson is here insisting that animals do not perceive the world of physics, we do not perceive the bare physical world. What we perceive is our surroundings: the structured collection of surfaces and objects that are meaningful by virtue of our capacity to act relative to them. The environment ultimately only exists at the approximate scale of our bodies (microscopes and mass spectrometers notwithstanding). When ecological psychologists talk about the “animal–environment system”, this is not some vague device for pointing out that our bodies are situated in space and that we perceive things in that space (a fact which no cognitivist would deny). It is, rather, an assertion that we never in fact perceive pure “space”. What we perceive is our surroundings; we perceive our own unique environment relative to our ability to act in it.

To bring this back to the topic at hand, it should be pointed out that one of the features of the environments of all animals in reality is that they are populated with other animals. The ants and the sofa-carriers all encounter an environment that is not merely structured by the presence of surfaces and objects. It is also structured by the presence of other actors. A populated environment is one that not only reacts to the exploratory activity of the animal (as with Newtonian conservation of momentum). It is an environment that acts back, one in which that actions that the animal engages in can be perturbed, amplified, reversed, and observed by other animals.

“Joint action” mixes relational categories

When we talk about “joint action” we are in effect mixing two different relational categories.

  • “Jointness” describes an interpersonal relation; it denotes the fact that a group of actors can appear to doing more than merely acting simultaneously, they can appear to be acting together
  • “Action”, on the other hand, is usually descriptive of a relation between an individual actor and some aspect of its environment; when we act it brings about change in our environment

Perhaps part of the problem is that it is not immediately obvious that action is about animal–environment relations. Action is more often discussed in terms of the presence or absence of some mental entity or volition, as in Donald Davidson’s claim that an action must be “intentional under some description”. But on Gibson’s definition of the environment, an action must be about animal–environment relations. Even an action as simple as shaking one’s head involves the environment. In order to coordinate that particular action in the first place or to perceive whether the action has been carried out successfully one must create the perception of optic flow while keeping static the invariant structure that specifies the layout of the environment (when you shake your head your visual field moves but the environment doesn’t).

To talk of “joint action” is to suggest a dichotomy between “individual action” and “joint action”. The term invites an explanation of the behaviour of interest primarily in terms of interpersonal relations. Other kinds of actor–environment relations are left out of the analysis, or pushed to the background, taken for granted, described as mere “low-level” action.

The standard disagreement in the field is over exactly what kinds of interpersonal relations must be involved for joint action to be possible. One view says that in order for a group of actors to be able to act together successfully they must in some sense subordinate their own activity to that of the group; they must collectively constitute a “plural subject” of the action, in the phrase of the philosopher Margaret Gilbert. The second view prefers to maintain the boundaries around individual actors. On this view, a joint action requires each actor to engage in mindreading activity of some variety, whether it be through the construction of mental models of the other actors’ intentions or through unconscious processes of interpersonal synchronization. Only when each actor shares the appropriate mental content with the other actors can the joint action be undertaken.

Two standard approaches to joint action: the plural subject model (l) v. the mindreading/interpersonal synchronization model (r); the black arrows represent action.

What these views share is a lack of attention to just what the environment looks like from the perspective of an individual actor. The action itself is treated as mere output from a system. The puzzle is taken to consist in identifying how the jointness of this system arises. It is implicitly assumed that once we have an explanation of this jointness we will also have an explanation both of how the group behaves and of how the group’s individual members behave.

Acting in a populated environment: it’s not all about interpersonal relations

It is worth pausing over this phrase, “low-level action”. What could it mean? Why should actions directed at inanimate objects be considered “low-level” while actions involving other animals are considered high-level or social? Why are we tempted to make this distinction?

This distinction has arisen, I suspect, as an accidental by-product of the way that action has traditionally been studied. Psychologists have tried to study behaviour by isolating individual actors from their natural environments and placing them in carefully constructed artificial environments in which the sensory inputs and outputs can be tightly controlled and measured. It is tempting to believe that by getting a subject to respond to dots on a screen we are ignoring everything that is not action and perception at the most basic level.

But real environments are not like the environments created in the laboratory. An ant’s environment always contains other ants. That is to say, any action that the ant might perform is one that potentially perturbs the environment of another ant. It is this fact that allows the ants as a collective to construct great structures that could not possibly be conceived by any of the ants individually. Each ant merely acts in, and relative to, its own environment, and in doing so it alters the structure of the environment of its fellows.

“Low-level action”, then, is an analyst’s construct. In reality, all action is social because all action has the potential to perturb the environment of another animal. The phrase I propose, “acting in a populated environment”,[1] collapses the dichotomy between joint action and individual action. There is simply no need to establish a joint goal or to construct, ahead of time, a joint actor that is constituted by a set of interpersonal relations. It is sufficient for an individual to act on the available structure; this act can then be amplified or responded to by another individual whose environment has been perturbed.

The pay-off of this move, from talking about “joint action” to talking about “acting in a populated environment, is that it allows us to avoid asking spurious questions about how groups come to perform tasks together. Existing accounts of joint action attempt to explain such behaviour primarily in terms of interpersonal relations, asking the question: “How do individual actors in some situation combine to make a joint actor or team?” On the alternative view that I have outlined, this question is no longer seen as a pre-requisite to understanding the behaviour. Instead, we can ask a more tractable kind of question, focusing on some specific task and asking: “How do we identify precisely which structures in the actor’s environment are implicated in the control of the actor’s behaviour?”


The term “joint action” invites us to explain collective behaviour in unnecessarily narrow terms, focusing primarily on interpersonal relations while pushing other kinds of actor–environment relations to the background. As an alternative, I prefer to talk of “acting in a populated environment”. This phrase can be thought of as a device for turning complex questions about collective behaviour into tractable questions about individual behaviour. It also happily allows us to avoid having to grapple with such fraught concepts as shared intentions or shared goals. And yet to talk of acting in a populated environment is not to deny that there is a great deal of complexity that underlies such behaviour. Instead, it forces us to acknowledge the rich structure that exists in populated environments and that pervades an animal’s surroundings; this structure does not merely exist in between actors but surrounds each of us. This richness is in fact an asset; it is what is used by actors to guide and control specific actions. The hope is that by examining this structure in detail it will be possible to explain, in specific cases, how individual actors can successfully act as part of a coordinated collective.


[1] The phrase “populated environment” I take from Gibson’s “Note on perceiving in a populated environment”, published posthumously as part of a set of notes on the concept of affordances (Gibson, 1982).


  • Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Houghton-Mifflin, Boston.
  • Gibson, J. J. (1982). Notes on affordances. In Reed, E. and Jones, R., editors, Reasons for Realism : Selected essays of James J. Gibson, pages 401–418. Lawrence Erlbaum, Hillsdale, New Jersey.

