Category Archives: Acting in a populated environment

Minds do not extend, but activities do

I was intrigued to see that Jeff Wagman and Tony Chemero have a chapter out declaring “the end of the debate over extended cognition” (2014). “Oh, good,” I thought, “finally some sensible souls have stepped in to put an end to this nonsense.” But, alas, this chapter will not end the debate. It is merely the latest in a line of arguments declaring victory for one side.

The trouble is, the extended cognition debate is not one that can be ended by one side proving the other side wrong. The disagreement is not over empirical facts, but over a priori assumptions about what words like “mind” mean, or ought to mean. The problems here can be traced to the very first sentence in Clark and Chalmers’s (1998) paper introducing the concept of the extended mind, where they pose the question: “Where does the mind stop and the rest of the world begin?” This question commits a category mistake (one that Gilbert Ryle would have diagnosed): it assumes that minds are a kind of thing in the world that can be detected, when in fact “minds” are nothing more than labels we use to describe one another’s behaviour. The question is incoherent. To end the debate, we need to give up on this misguided search for the physical substrate of “minds”, and, instead, start to understand the world in terms of activities. I will here set out the beginnings of a case for this.

Otto’s notebook

The standard argument for the extended mind thesis is Clark and Chalmers’s thought experiment, the “Otto’s notebook” argument. Otto, who suffers from Alzheimer’s, has to write down the address for the Museum of Modern Art in his notebook and consult this note every time he wants to go there. Meanwhile, Inga, who does not suffer from Alzheimer’s, does not require this external memory aid but recalls the address using her own “biological memory”. Since Otto’s notebook is, we are assured, playing the same function as Inga’s brain, we have no reason to reject the conclusion that Otto’s notebook is literally part of his mind.

There are some severe issues with this thought experiment. Here is one. We are told that Otto has Alzheimer’s, which means he has to write things down in his notebook. Yet apparently he has no problem remembering to carry the notebook around with him at all times, or remembering how to consult it, or remembering what piece of information he is looking for, remembering what the task was that originally caused him to consult his notebook. We are to assume, I suppose, that, having once consulted his notebook and seen that MoMA is on 53rd St, Otto will happily set out from his starting point (where? we are not told, although it is stated that Inga is within walking distance), not once forgetting his purpose along the way. Otto, in short, is a perfectly functional, healthy individual, bar an unfortunate inability to remember certain facts. The form of Alzheimer’s at play here is, to say the least, unusual. This is hardly a realistic description of any actual condition.

Of course, it’s easy to be pedantic about thought experiments—to take issue with their simplifying assumptions. In this case, though, the objection is more than merely pedantic. The objection is that this particular thought experiment leaves out certain details that matter. The whole argument relies on Inga and Otto being identical except in one crucial respect: one keeps her memories on the inside (in her brain), the other keeps his memories on the outside (in his notebook). What is left out is any account of the processes by which Otto and Inga supposedly achieve their identical goals. It is merely asserted that both have “access” to their respective store of memory, and the fact that “accessing” a notebook can be described using the same verb as “accessing” one’s own memory is supposed to ensure that these are, in fact, one and the same phenomenon, and that Otto’s notebook is, therefore, part of his mind. Colour me underwhelmed.[1] The point here is: once we start to understand the processes properly, it simply won’t matter which components of the process are part of the “mind” and which components aren’t. Such labels will not be doing any explanatory work.

Jimmie’s Mass

We can contrast the vignette about Otto’s notebook with an account by Oliver Sacks of an actual neurological patient, Jimmie G., whom Sacks dubs “the lost mariner” (chapter 2 in The Man Who Mistook His Wife for a Hat; the chapter was earlier published in the New York Review of Books). Jimmie G. does not have Alzheimer’s. Sacks diagnoses him as having Korsakov’s syndrome, a memory condition caused by alcohol abuse. Jimmie has lost the ability to form new memories, and believes he is still living in 1945 following the end of the Second World War, although Sacks first meets him in the mid-1970s. Jimmie is unable to remember what happened moments before. He is trapped in a perpetual present, and is apparently, for the most part, not even aware of his condition. The turning point in Sacks’s account comes when Sacks asks whether Jimmie has, in some profound sense, lost his soul:

One tended to speak of him, instinctively, as a spiritual casualty—a “lost soul”: was it possible that he had really been “de-souled” by a disease? “Do you think he has a soul?” I once asked the Sisters. They were outraged by my question, but could see why I asked it. “Watch Jimmie in chapel,” they said, “and judge for yourself.”

I did, and I was moved, profoundly moved and impressed, because I saw here an intensity and steadiness of attention and concentration that I had never seen before in him or conceived him capable of. I watched him pray, I watched him at Mass, I watched him kneel and take the Sacrament on his tongue, and could not doubt the fullness and totality of Communion, the perfect alignment of his spirit with the spirit of the Mass. Fully, intensely, quietly, in the quietude of absolute concentration and attention, he entered and partook of the Holy Communion. He was wholly held, absorbed, by a feeling. There was no forgetting, no Korsakov’s then, nor did it seem possible or imaginable that there should be; for he was no longer at the mercy of a faulty and fallible mechanism—that of meaningless sequences and memory traces—but was absorbed in an act, an act of his whole being, which carried feeling and meaning in an organic continuity and unity, a continuity and unity so seamless it could not permit any break.

Sacks hints at an explanation of why it is that Jimmie appears to find his bearing in the taking of the Communion: ‘The same depth of absorption and attention was to be seen in relation to music and art: he had no difficulty, I noticed, “following” music or simple dramas, for every moment in music and art refers to, contains, other moments.’

What Sacks seems to suggest here is that persisting, temporally extend structures in the environment can support and give form to behaviour. The Mass has a standard structure, a structure that was present earlier in Jimmie’s life. And it was in the context of this structure that Jimmie’s behaviour came to be shaped. Jimmie learnt how to conduct himself appropriately in the context of the Mass, and this Mass-specific behaviour remains intact. Even if Jimmie cannot remember how he got into the chapel five minutes ago, he can see that he’s in a Mass, and knows what to do at any moment. But as soon as he leaves the chapel and steps outside he will find himself in a context that does not contain such a formal structure. He will find himself in a novel situation for which his behavioural repertoire is no longer appropriately shaped by past learning. He will, once again, be lost.

It would make little sense to talk about what’s going on here in terms of the extended mind (although this is precisely the kind of situation, one would think, that an extended mind concept should be applicable to). Is the Mass part of Jimmie’s extended cognitive system? No, the Mass predates Jimmie’s existence, and has had the same formal structure throughout Jimmie’s life. It is this very unchanging nature of the Mass which means that Jimmie’s behaviour can remain intact relative to it. To talk of an extended mind is to imply that activities necessarily emanate from within an individual actor. The reality is that activities constitute a setting into which animals find themselves deposited: we encounter ongoing activities from birth or even before. And it is only through exploratory action and engagement that we learn to take part in these activities, to take control of them, and to alter their course. (For a partial discussion of this focused on human emotional developmental, see Greenwood, 2013.)

Extended activities

Wagman and Chemero do not talk about Masses or notebooks. The activities they discuss are those involved in a particular type of perception research: work on dynamic touch. An important motivating insight for this work comes from Merleau-Ponty: a blind man using a cane does not perceive the cane itself but perceives the world at the end of the cane. A dynamic touch study might go like this: a subject comes into the lab, is blindfolded and laden down with a backpack and is handed a stick; the subject is then asked to use the stick to poke about at a sloped surface and is asked to give a verbal judgement about whether they will be able to stand on that surface. All good fun, and it turns out that we’re quite good at doing this sort of task. Wagman and Chemero want to use this kind of result to claim that minds do indeed extend. The results, they say, “suggest that both the weighted backpack and the handheld wooden dowel are experienced as part of the body.” But, of course, this is of no consequence to cognitivist opponents of the extended mind. It claims only that sticks can be experienced as parts of bodies. But one of the things that is at issue is whether it is reasonable to talk of minds as extending into bodies in the first place. Wagman and Chemero simply assume that it is reasonable, and want their opponents to agree with their assumption.

What these kinds of results do show is that the act of perceiving is one that necessarily spans animal and environment. It is activities that are distributed, not minds. And activities are the kinds of things that we can investigate empirically. We can take away or disrupt some component of the overall system and observe whether or not this affects the outcome of the activity. Indeed, if a given disruption has an effect on the behaviour, this is a good reason to deduce that the thing that was disturbed was an integral part of the activity, and instrumental in the individual’s control of their behaviour. But it is simply not helpful to insist that whatever is integral to an activity is therefore part of an “extended mind” or “extended cognitive system”.

The extended mind metaphor does hit on an essential truth: classical cognitivism’s view of “mind” is inadequate because it simply assumes that all behaviour (or all interesting behaviour) is the output of symbol-manipulation processes. But the fact that cognitivism is too narrow is no reason to replace one problematic use of “mind” with another that only creates new problems. The extended mind metaphor has outlived its usefulness.


  1. Beware philosophers bearing thought experiments. Especially when those thought experiments rely on tenuous claims about real-life neurological conditions. Ask this: Does my philosopher have the appropriate clinical experience to be making pronouncements on these matters?
  2. If we really want to end the debate over extended cognition, we should simply stop arguing about whether “minds” or “cognition” might “extend” into objects. This is a language game that can have no winners because the participants refuse to agree on the rules (see also Sabrina’s old post on this). Instead, we should agree that behaviour—acting in the world, engaging socially with others, and all the rest—involves activities that extend across resources both internal to and external to organisms. The task for psychologists, then, is to identify the structures that are involved in putting these activities together.


[1] For further entertaining discussion of Otto’s notebook, I recommend Jerry Fodor’s (2009) review of Andy Clark’s book.


  • Clark, A. & Chalmers, D. (1998) The extended mindAnalysis 58(1):7–19.
  • Fodor, J. (2009) Where is my mind? London Review of Books 31(3):13–15.
  • Greenwood, J. (2013) Contingent transcranialism and deep functional cognitive integration: The case of human emotional ontogenesis. Philosophical Psychology 26(3):420–436. doi: 10.1080/09515089.2011.633752
  • Sacks, O. (1985) The Man Who Mistook His Wife for a Hat. London: Duckworth.
  • Wagman, J. B. & Chemero, A. (2014) The end of the debate over extended cognition, in Solymosi, T. & Shook, J. R. (Eds.) Neuroscience, Neurophilosophy and Pragmatism: Brains at Work with the World. New York: Palgrave Macmillan. doi: 10.1057/9781137376077.0012

Speaking is a type of acting, but being spoken to is not a type of perceiving

I have a paper in the newly published issue of the journal Ecological Psychology. The paper sets out the case that language-use should be understood as fundamentally a form of acting. I contrast this with a view of language-use as fundamentally a form of perceiving information via conventional patterns.

The argument may come across in the paper less clearly than I had hoped. The paper is part of a special issue on the topic of “Language as a public activity”, which is introduced by the editors, Bert Hodges and Carol Fowler (Hodges and Fowler, 2015). The editors valiantly attempt to provide one-paragraph summaries of all the papers in the issue. And they do an excellent job of summarizing mine (they’ve even highlighted a couple of things about it which I wasn’t aware of, for which I’m grateful). But there are a couple of statements in their summary which I would not endorse, which tells me that I wasn’t clear enough about those claims. So in this post I will try to try to do a couple of things: (a) summarize the argument I think I tried to make in this paper, and (b) set things out in a more intuitive way.

To understand speaking we need to understand how populated environments are structured

I tried to argue against the view that language-use necessarily involves conventional patterns. Conventions are to be understood here as mappings between invariant patterns (e.g., the word “dandelion”) and events (e.g., the perceiving of a dandelion). I don’t think we need to appeal to any such notion of conventions to explain how speaking works.

The key is to adopt a developmental perspective on language-use.

Children, when they are learning to speak, do so in an environment that contains other actors. And although none of these actors will be constantly present at every moment throughout the child’s growing up, there will nevertheless be a handful of actors who form part of the furniture of the child’s surroundings.

Now, the actors in a child’s environment themselves create structure through their own activities. For instance, sometimes an adult will point to some object and say something like, “Ooh, look at that giraffe!” One of the advantages about being a child (as opposed to being, say, a dictaphone) is that you don’t have to wait for other actors to produce these structures: if you want to explore the structure of your surroundings, you can perform an action that disturbs things and that thus creates potentially meaningful new patterns.

This is something that children do all the time. It happens even in situations where the child’s behaviour seems at first glance to be little more than a series of responses to an adult’s prompting, e.g. in ritualized games of the type where an adult repeatedly asks the child “Where’s your nose?”, “Where’s your mouth?”, etc. See this video of a child interacting with his grandmother.

When the child cannot immediately produce the appropriate response to the prompt “Where’s your mouth?” (a little after 1:30), he does not disengage but continues to attend to what the adult is saying. Notice that this behaviour from the child functions as a means of eliciting further behaviour from the adult. What looks like a failure to respond is in fact a form of engagement. At any moment, the child could choose to wriggle out of the adult’s grip and find some other activity to get involved in. But he doesn’t. In reality, the child is driving the game as much as the adult is. Even an apparent absence of response can in fact be an active attempt by the child to disturb the structure of his surroundings.

It is not always easy to notice that learning to speak is such an active process. (Notice how, a couple of times in the video, the grandmother interprets the child’s attentive non-responding as the child’s way of saying “this is boring”.)

When viewed in the wild, learning to speak does not look like a process of inferring mappings over a set of forms and a set of meanings. It looks like a series of explorations by the child of the environment it finds itself surrounded by—an environment that contains other actors who respond in meaningful ways to the child’s own activity. It is not the case that in learning to speak the child must learn some abstract set of patterns that we call a language (e.g. English). The child’s environment contains mostly just a small set of actors who are intermittently present. The child’s immediate task is to learn how to influence and manage the behaviour of these particular actors. There’s no need to learn an abstract set of conventions that can be used to communicate with an ideal conversation partner: what matters is what’s in the child’s surroundings.

Speaking is acting to alter the layout of a populated environment

A young child points out of a train window

A young speaker manages her environment (image source)

If my claim is that we should not view (early-childhood) speaking as involving conventional mappings, then I need to be able to provide an alternative account of what speaking, in fact, is. This is hard to do but I think it’s worth a try.

Again, let’s start with the infant at the early stages of learning to talk. The infant is born with the ability to make sounds. Different types of verbal action will result in different kinds of responses from the infant’s environment. From the day the infant is born, it can act on its environment by crying. Crying is a type of acting that not only disturbs the ambient sound patterns in the environment; it also perturbs the activity of other actors in the infant’s surroundings (see Thompson et al, 1996). But crying is not a very accurate means of eliciting a response: caregivers respond to crying but don’t immediately know exactly what to do to bring the crying to an end.

As the child’s repertoire of sound-making skills expands, the child will be able to elicit increasingly specific responses from its caregiver-containing environment. The crucial developmental shift occurs when the child learns to control its verbal actions with reference to a specific type of relation that exists in its environment: relations that stand between other actors and other objects and events in the environment (this occurs at around nine months of age; Tomasello calls it the “nine-month revolution”). At this stage, the child starts to point to, and to name, objects and events.

How does this work? Well, it works because of the caregiver’s own history of learning. If the infant produces some sort of sound that approximates “doggie”, this orients the caregiver’s attention towards dog-shaped objects in the caregiver’s environment. How does this work? I confess, I have no real idea. But I don’t think it needs to involve conventions. The caregiver does not need to recognize the conventional link between this particular sound pattern and this particular object. The sound orients the caregiver’s attention because the caregiver’s nervous system is configured in such a way that the one immediately triggers the other. The caregiver’s nervous system got that way because of activities the caregiver engaged in in the past. What matters is that this link within the caregiver’s nervous system (between sound pattern and object) is a persisting piece of structure in the infant’s environment. The infant cannot directly perceive the caregiver’s nervous system, but can perceive the behavioural patterns that it generates. The infant can repeatedly produce the sound “doggie” at irregular intervals over weeks and months, and the sound will reliably go along with the caregiver’s attending to the object. This is the kind of structure that the child needs to be able to attend to in order to learn to speak—in order to learn to elicit meaningful responses from its environment.

One difficulty here is that, while it seems reasonable to view speaking as a kind of acting, it is not very clear how we should view the status of the addressee.

Being spoken to is something we do; it is also something that we have done to us

I liked this sentence from Sabrina’s paper, which is also in the special issue (Golonka, 2015): “A tacit assumption in ecological psychology seems to be that anything that has an effect on behavior must be grounded in the perception of an affordance and, therefore, must be guided by law-based information.” I think that’s true: ecological psychology is disposed to overlook anything that’s not information-based, first-person control of an activity.

But suppose I punch you in the leg. This will certainly have an effect on your behaviour. You’ll probably hobble around for a bit and complain. Your behaviour here is not grounded in the perception of an affordance. It is not a response to informational input. It is a response to an internal change in you as a system which I, rather rudely, brought about.

Being spoken to is a bit like being punched in the leg, sometimes. When you are spoken to, your attention is directed towards things that you yourself are not in control of selecting. It is the speaker who selects what you are to attend to, and the speaker achieves this by performing an action that perturbs your nervous system in an appropriate way.

To get things a little clearer here, being spoken to is two things:

  1. Being spoken to is acting: in order to engage with someone in the first place it is necessary to seek that person out and to not run away from them, and in order to keep the situation moving it is necessary to actively attend to the situation you are in and to make appropriate adjustments to your own behaviour (as the kid in the video does)
  2. Being spoken to is something you undergo: it is having your nervous system perturbed by the actions of another actor—the speaker

How these two processes fit together is a central problem for any psychology concerned with the problem of acting in a populated environment, or with collective activity.

What should be evident is that the second item here—having your nervous system acted upon (like being punched in the leg)—cannot really be described as a case either of acting or perceiving. It is not a straightforward case of perception-action because what is going on here is not under the control of the person being spoken to.

As a field, we have not yet come up with a satisfying way to capture this non-voluntary aspect of public language as part of our theorizing. My hunch is that a big reason for this is that we have been trying to extend James Gibson’s account of visual perception to deal with something which it is not equipped to deal with. Gibson’s description scheme treats perception as the activity of a single actor exploring its environment, which, for the purposes of much of his work he treats as if it were inanimate: the environment is described as a set of objects and surfaces whose structure is projected into energy arrays. The problems of explaining behaviour in a populated environment are, quite reasonably, largely set aside in Gibson’s books. But as a consequence, the description scheme he ends up with is one that cannot straightforwardly be extended to describe what’s going on in multi-actor settings. If we want to understand how speaking works we’re going to have to look beyond the perceiving-acting behaviour of the individual; we’re going to have to try to understand the larger context within which speaking behaviour is situated.


The editors of this special issue are searching for a way of describing language that is compatible with the anti-representational tenets of the ecological approach. This is clearly a difficult project. There are several traps that must be avoided.

One of these is the temptation to simply map the terms “speaking” and “listening” onto the terms “acting” and “perceiving”. Doing so obscures the nature of speaking as a form of action directed at an environment that is structured by the presence of other actors.

Another trap is the temptation to see linguistic meaning as fundamentally a relation between a word form and an event in the world. This is a disembodied view of meaning. In reality, meaning does not need to be disembodied in this way because meaning is distributed throughout the entire system that a child finds itself in. In the paper I wrote that: “The act of speaking is not meaningful because of the words it contains but because of the effect it has on the meaningfully structured environment into which it is directed.”

Clearly there is much more work to do before we can claim that we have a useful description scheme for talking about language in a properly ecological or enactive way. Are we moving towards one? Hard to say. I’d like to see a lot more attention paid by anti-representational researchers to developmental processes and to the peculiarities of the structure of settings that contain multiple actors. Until we have an account of speaking that is anti-representational from the ground up we risk falling back into the familiar conception of speaking as the sending of messages in a speech code. I think we can do better than that.


How direct is social perception? A comment on Abramova and Slors (2015)

Students constructing a staircase, each working separately on a particular part.

Students constructing a staircase, each working separately on a particular part; Abramova and Slors (2015) label such activity as “distributive action coordination”. (Image source)

It has become trendy recently to talk about social perception as a direct process. Or, at least, I gather it has become trendy, because the journal Consciousness and Cognition has a special issue coming out on the subject. Most of the articles appear to be playing the game of taking a novel phrase, in this case “direct social perception”, and trying to cram that phrase into the author’s preferred psychological description scheme, be that Bayesianism, theory-of-mind cognitivism, or some other approach. Each to their own. I don’t have anything much to say about these articles. There was one paper that caught my eye, though. It is written by Ekaterina Abramova and Marc Slors (A&S), and it attempts to give an account of “simple action coordination” in explicitly Gibsonian terms—in terms of individual actors, or agents, who, through their actions, “shape the field of affordances for another agent”.

This strikes me as a move in a useful direction, and one to be encouraged. It is also one that immediately raises a whole set of conceptual issues which demand careful thought.

I do not want to critique the paper itself, as such. The proposals presented are quite preliminary in nature (the authors themselves claim only that they “have tentatively made an indirect case in favor of [direct social perception] in simple action coordination”). Instead, I want to take issue with the way the authors seem to be interpreting the concept of affordances. This will, I hope, be illustrative of some of the issues that arise when we appeal to affordances in trying to explain social phenomena; these issues are general and go well beyond this particular paper. I want to suggest that A&S’s reading of the concept of affordances is unduly narrow. As a consequence of this, they end up maintaining certain notions which I believe we’d be better off simply abandoning. Specifically, they maintain that coordinated activities involving multiple actors must be supported by a “shared intention”; and they appear to be committed to the notion that acting with others necessarily involves “understanding” others in some sense—e.g., “recognizing” that certain objects afford certain actions for others. I want to suggest that there is a way to formulate the notion of affordances which allows an account of coordinated multi-actor activity in which these issues simply do not arise.

It is, however, worth noting a couple of things about the paper up front. Firstly, the phrase “direct social perception” is being appealed to here in a promissory manner. It is being invoked as part of a rejection of the mainstream, cognitivist account of acting with others as involving the making of inferences about others’ intentions. On the mainstream account it is assumed that intentions are a matter of private mental activity and that therefore they are inherently inaccessible to anyone but the individual whose intentions they are. A&S reckon that their affordance-based description scheme allows them to avoid having to appeal to inference-making, although it is not always clear from what they write that they have completely escaped thinking about the actions they describe in terms of inferences. Thus, in their conclusion they claim that acting with another agent requires “perceiving affordances for another person [which] involves seeing that person as an agent”, and that this “does involve attributing some basic kind of intentional relation between a person and her environment”. It is hard to interpret that “attributing” as anything other than a form of inferring.

Secondly, the authors make a distinction between two types of acting with others:

    1. distributive action coordination, in which actors work on their own tasks while avoiding getting in each other’s way, as in the photograph at the top of this post
    2. contributive action coordination, in which multiple actors use one another to complete a task which none would be able to perform independently, as in the ubiquitous sofa-carrying case; A&S write: “In such cases coordinated joint action is usually required to start simultaneously”


Four men carry a sofa together.

A rare four-man sofa-lift; according to A&S this is a case of “contributive action coordination”. (Image source)

The authors think that their affordance-based scheme can straightforwardly explain what’s going on in the case of distributive action coordination (and that this can already be modelled using dynamic field theory), but that some work is required before an account of coordination of the contributive type can be attempted. This distinction is, intuitively and descriptively, a reasonable one to make. However, I think the distinction breaks down quite rapidly when we consider the situation in terms of affordances relative to an individual actor (see this previous post in which I argue against analysing the situation primarily in terms of interpersonal relations; I claim that we should instead focus on the relations between the actor and its populated environment).

Perceiving affordances for others is not like perceiving affordances for oneself

In their attempt to give an account of contributive, sofa-carrying-type action coordination, A&S suggest that it is necessary that the actor perceive their environment not only in terms of what it affords for them, but also in terms of what it affords for others. They cite the following passage from James Gibson’s chapter introducing the “theory of affordances” (Gibson, 1979, p. 141):

The child begins, no doubt, by perceiving the affordances of things for her, (. . .). But she must learn to perceive the affordances of things for other observers as well as [for] herself.

As with a number of the remarks that Gibson makes in that chapter, this one can be read in a couple of different ways:

  1. Perceiving the affordances of things for others requires one to perceive that, from where the other person is standing, that person’s environment affords action X
  2. Perceiving the affordances of things for others requires one to attend to and discriminate a correspondence, or fit, between two pieces of structure in one’s own (the perceiver’s) environment: a correspondence between the object and the other actor

The first reading appeals to what in the cognitive tradition is called other-modelling. A&S appear to be leaning towards the first reading: “in the plank-lifting case, the field of affordances of a person standing at one end of a plank is changed when she sees another person at the other end of the plank. The plank becomes ‘jointly liftable’ for her because she recognizes the availability of exactly the same affordance for the other person as well.” This seems to sit uneasily with any claim to be pursuing a direct perception account of social activity, because other-modelling adds an inferential step between what is seen (what is specified in the optic array) and what is “perceived”.

The second reading is more interesting, I think. Here there is no requirement to put oneself in the place of the other. On this reading, perceiving affordances for others is nothing like perceiving affordances for oneself. And how could it be? If affordances are animal-relative, as we are told is the case, then we can only directly perceive our own environment in terms of what it affords to us as individuals. Any notions I might have about how the world looks to you are a matter of conjecture on my part. What I perceive is an environment (my surroundings) which happens to be populated with other actors, whom I perceive in relation to all the other objects in my environment. I might perceive you and the sofa as objects that I can potentially assemble in a way that affords sofa-carrying. But this is no different in degree from my perceiving the relation between a sofa and, say, one of those flat trolleys you have to push around when you get to the warehouse bit in Ikea. The trolley and sofa can likewise be assembled into a structure for sofa-transportation (over a flat surface, at least). The point is that the other person, like the trolley, is, for the purposes of the task at hand (shifting this sofa to some place else), merely usable structure in the environment.

To be clear, the claim is that perceiving is only ever of an environment that affords actions to the animal doing the perceiving. Consider an example from soccer. A striker bearing down on the opponents’ goal must pick out a particular place within the goalmouth into which to aim the ball such that the goalkeeper will not be able reach it. We might say that this involves the striker perceiving the affordances for the goalkeeper. And in a sense it does. But this does not mean that the striker must know what it is like to be the goalkeeper. The striker need have had no experience of being a goalkeeper and need have no appreciation of what the situation looks like from the goalkeeper’s point of view. The striker perceives the situation in terms of what kinds of action will most likely result in a goal—in terms of what kinds of shot the goalkeeper is unlikely to reach. The goalkeeper is a mere obstacle, to be negotiated, evaded, got the better of.

Note that on this way of thinking about affordances it does not make sense to talk of what the environment affords for the collective, or to talk of “fields of affordances for dyads or groups”. Groups do not experience their surroundings or perceive affordances, only actors do.

Affordances make “shared intentions” unnecessary

An American beaver in Alaska lifts its head above the dam.

Beavers don’t waste their time worrying about whether or not the other beavers in the colony are committed to a shared intention for dam-maintenance. (Image source)

In the abstract for their paper, A&S write: “Coordination ensues, we argue, when, given a shared intention, the actions of and/or affordances for one agent shape the field of affordances for another agent” (my emphasis). They ask us to simply accept that coordination requires a pre-existing shared intention. The problem here is that, if you share an intention with someone (whatever that means), then you must already be coordinated with that person in some sense. Coordination does not “ensue”; it has already been presupposed. The question of how that coordination came about has been begged.

Given how opaque this notion of “shared intentions” is, there is good reason to want to avoid invoking it in the first place. Fortunately, the affordance concept is powerful enough to allow us to do this. The crucial thing is to note that the field of affordances surrounding an animal is subject to change as the internal structure of the animal changes.

Consider the beaver. Beaver dams are maintained by members of the beaver colony. This might be taken as a case of what A&S call “distributive action coordination”: each individual beaver takes part in building and repairing a structure that is beneficial to all. But there is no need to appeal to shared intentions here. (And since we are talking about a non-human animal here, we are naturally less inclined to do so.) The beaver’s motivation to repair the dam is explained in this delightful Attenborough clip. Beavers are apparently averse to the sound of running water, so when they hear the sound of water rushing out of a damaged section of the dam they are drawn towards that location and immediately start work on the repairs. Put another way, the sound of running water triggers an internal change in the beaver that reorients the animal’s activity.

Or consider the sofa-carrying case again. It is not the case that the actors involved must initiate their actions “simultaneously”. In fact, there is no natural start point to the action. To understand how the actors arrived at the position of being about to lift this particular sofa it is necessary to look backwards and ask: what happened before now that led to the present situation? Perhaps I am moving house and you agreed a couple of weeks ago to help move the furniture. Why not take that as the starting point of the action, then? When you agreed to help, you became oriented towards the furniture in a particular way: there was a change in your internal structure such that you went from being neutral with reference to this furniture to being action-oriented relative to it. The furniture became a set of objects to be moved. And as we now stand at either end of the sofa, the idea that we might need to establish a shared intention before we can get this bulky object off the ground simply does not enter into consideration. We both became object-moving devices some time earlier, and this is manifest in the fact that we are both here, and in the fact that we have probably already shifted a bunch of objects out of the room.

Or we could take the explanation further back and ask how it was that we learned to lift bulky objects in coordination with others in the first place.

The point is that, appropriately understood, the concept of affordances simply obviates the need to think about these situations in terms of shared intentions. It is perhaps useful here to make a distinction between “action” and “task”. Action has no start point. Actors do not simply engage in routine performances, or scripts, that have a clearly defined start and end point. Action is continuous: everything the actor does is part of an ongoing process in which the actor changes its environment and in doing so changes its own internal structure. I have argued (Baggs, 2014) that the task, by contrast, is an analytical device that allows us to artificially break up this stream of action into tractable chunks that can be studied; these chunks need to have a clearly defined start and end point and a characteristic mode of transition between the two. The outfielder problem is a good example: we can define the start (the ball being launched), the end (the catch), and the transition (the parabolic trajectory of the ball through the air), and this is very useful as a guide for what we should measure. But we should not make the mistake of thinking that the task as we have defined it is the same as the action from the perspective of the actor. In the context of action in multi-actor settings, such a narrow definition creates unnecessary problems. If we assume that the action itself starts at the point that we happen to start observing or measuring, i.e. at the point when we are already standing at either end of the sofa, then it looks like our behaviour must be structured by some hidden “shared intention” which is not itself perceivable. What is overlooked is the fact that our internal structure has been shaped by previous action. If the affordances for lifting the sofa are directly perceivable it’s because we perceive the situation we are in relative to the type of animal we have become.

“Direct social perception”: a conceptual blind alley?

A note of caution is in order about this phrase, “direct social perception”. The existing phrase, “direct perception”, at least as Gibson uses the term, is meant to denote a relationship between an actor and its environment. Here is a description from Turvey, Shaw, Reed and Mace (1981, p. 239):

Gibson argued that the proper “objects” of perceiving are the same as those of activity. Standing still, walking, and running are all relations between an animal and its supporting surface. Though not always explicitly recognized … the supporting surface is just as much an essential constituent of these activities as, for instance, legs; and useful perceiving involved in controlling posture and locomotion must be directed toward the same surface. Thus it would seem that a two-term relation involving the same surface or ground can exist in both cases: an animal *runs* on the ground and an animal *sees* the ground. This much should be common sense. There is no thing *between* the animal and the ground in the relation. This is what Gibson has always meant by direct perception and it is the same as what one would mean by direct action if one were discussing activity.

Even if one may be inclined to take issue with some of this, it is at least clear that direct perception means something: it refers to a particular way of thinking about what’s going on between the animal and its environment.

This is not the case with the phrase “direct social perception”. This is because “social” is a descriptive category, introduced by a third-party observer or an analyst. A situation is described as “social” if it involves multiple actors. But just because we can describe a situation as “social”, we are not necessarily licensed to assume that the “socialness” of the situation must play a causal role in our eventual explanation of the actors’ behaviour. (See the parallel argument here for the phrase “joint action”.) We might describe the beaver as a social animal, yet the beaver does not need to appreciate its own status as a social animal in order to fix its dam.

It may be prudent to avoid the phrase “direct social perception” altogether, on the grounds that the inherent mixing of perspectives—descriptive and explanatory—is bound to lead to confusion further along the line. (I will however point out Eric Charles’s claim that there’s a way of reading ecological psychology according to which other minds are perceivable in the sense that the actions of others are perceived in the context of larger, ongoing patterns of behaviour.)

Some devices for avoiding conceptual traps

In lieu of a summary, I conclude with some recommendations for avoiding some of the ever-present conceptual traps surrounding the analyst of multi-actor activity.

  • Shared intentions only seem necessary as a function of the language we use to describe situations. To avoid this, consider non-human animals: the beaver, or, better, the ant. Ants don’t do shared intentions. When proposing an account of multi-actor coordinated activity, ask: how would this apply to ants.
  • Describing something as “social” or “joint” is all well and good as a description, but do not make the mistake of thinking that just because you (the analyst) can describe something as social, that the socialness of the activity must necessarily play a causal explanatory role in an account of the behaviour of any given individual actor.
  • Actions do not begin at the point that you start measuring them. Tasks, on the other hand, do. Tasks are a methodological device that establishes some more or less arbitrary start and end point (Baggs, 2014). Do not confuse the action an actor is performing with the task as you have defined it.


The hierarchy of menace among road users and its implications for urban design

A sign in Santa Monica, California instructs drivers to 'Share the road' with cyclists
Cycling infrastructure, Los Angeles style

In recent years a trend has caught on among city planners for redesigning urban spaces in a way that removes traditional features such as kerbs, signage, and light signals. The theory goes that these kinds of structure artificially segregate users of the road and encourage us to respond mechanically to our environment instead of responding attentively to the movements of our fellow citizens. The design approach is known as shared space and it has been pursued with particular fondness by towns and cities in the United Kingdom. (But not just in the UK; see, for instance, this public relations video from Auckland Council in New Zealand, enthusiastically showing off their new shared space.)

There are, however, some serious problems with shared spaces in the way that they have been implemented to date. In short, these spaces tend to create uncertainty at the expense of vulnerable road users: cyclists, the elderly, and the physically and visually impaired (see the resources at the bottom of this post).

My aim in this post is to show that these problems have arisen because the designers have been inattentive to the perceptual strategies being employed by road users (pedestrians, cyclists, motorists). The designers and architects have lacked an appropriate psychology of perception to guide their design decisions. I will suggest that the ecological approach to perception is well-suited to provide the missing theory.

The concept of shared space has recently come under forceful attack in a report by Lord Holmes, a Conservative peer who sits in the UK’s House of Lords. Holmes is himself blind (he is a former Paralympic swimmer) and he is clearly invested in a campaign against shared spaces. His dislike of the concept comes across in almost every sentence of the report, despite its claims to objectivity. The report is based on survey data and it is most effective when Holmes allows his respondents to speak in their own voice:

One respondent summed up the shared space they used as:

“…lethally dangerous. In poor light or glare or shadow, drivers cannot see
pedestrians. Disabled people and those with poor sight or mobility cannot protect
themselves. The idea behind such spaces depends on every user being 100 per
cent able and 100 per cent alert at all times, which just doesn’t happen in real life. I
consider this whole idea to be completely (and criminally) insane.”

One blind user unable to access a local shared space independently said:

“…for people with no sight like myself they are a death trap. I cannot express how
terrible they are and how they make me feel so angry; to think all the people
responsible for them expect us to use it when we cannot see. I use the one in Leek
with my husband and never on my own.”

What comes across in these quotes is a sense that these road users feel they have been excluded from their local surroundings. They have lost their sense of control. (An upshot of this is that assessing whether shared spaces are working effectively cannot simply be a case of going down to the space, setting up a video camera, and observing whether traffic is flowing freely and whether pedestrians are successfully crossing the road. If these respondents are at all representative of a portion of the local population, then the actual traffic patterns will only tell part of the story. What is being overlooked is the fact that some people have simply taken to avoiding the area altogether. The users of the shared space are a self-selecting sample.)

Here is another quote from the report, this one apparently from a cyclist:

In spaces where the lanes are very narrow and traffic movements unclear one
respondent reported a “resulting tendency towards “might is right” rather than the
spontaneous outbreak of courtesy which advocates presume. As a cyclist or
pedestrian, you’re never going to win a contest of might against a car or lorry, so it’s
just intimidating.”

This identifies what is perhaps the central contradiction in the “shared space” approach. All road users are not created equal; there is a hierarchy of menace among us.

The hierarchy of menace

A much-quoted remark by James Gibson, introducing the concept of affordances, reads as follows (Gibson, 1979, p.127): “The affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill.” That “or ill” has generally been ignored by researchers, despite the frequency with which it is quoted. If we want to analyse traffic infrastructure in ecological terms we are going to be required to think about that “or ill”.

What the survey respondents to the Holmes report have accurately perceived is that motor vehicles afford danger to cyclists and pedestrians in a way that does not hold in the opposite direction. A truck that pulls into the path of a cyclist affords disaster to the cyclist. But if a cyclist swerves in front of a truck this does not afford serious injury to the truck driver.

This is more than merely a truism because it means that different road users are engaged in different kinds of task (in the sense outlined on Andrew and Sabrina’s blog here). Phenomenally, actors engaged in locomotion are seeking to propel themselves forwards along a path that affords safety to the locomotor (this is described as the “field of safe travel” by Gibson and Crooks, 1938). It is this forward motion, this task, that structures the attentional activity of the actor and that causes certain obstacles to stand out as salient while others recede to the phenomenal background. Obstacles to the actor’s own safe locomotion are of primary concern. And while we should of course all be courteously looking out for each other on the road, it is difficult to monitor the safe locomotion activity of all other road users at the same time, and blind spots and other visual barriers can sometimes make it impossible to do so. To some extent, we have to trust that our fellow road users are looking out for their own safety and that the structure of the road means that they are able to do so.

To propose that all road users should simply share the space, as advocates of the shared space idea do (and as the author of the sign in the photograph at the top of this post does) is to ignore these differences that exist between road users in the tasks that are being engaged in. It is to ignore the relations of danger that obtain between one type of road user and another type further up or further down the hierarchy of menace.

How to build inclusive infrastructure

If we are to be empowered as road users to make decisions about our own locomotion behaviour, then we need to be able to perceive for ourselves that the movements we are engaged in are safe. Designers should abide by the following principle:

Design principle: Make threatening movements perceivable to the threatened

To see why this is the case, consider an example where the threatening movements of more menacing vehicles are difficult to perceive. The photograph below shows a cycle lane in Minneapolis that extends across a junction. For the cyclist travelling straight on in this lane it is necessary to attend to potential motor vehicles attempting to turn right. The right turns takes the vehicle across the path of the cycle lane at a shallow angle. The motorist’s immediate task (the behaviour they perceive themselves to be engaged in, and which is structuring their attention) is to steer the vehicle into the turn. It would be very easy for the motorist, in this situation, to fail to attend to a cyclist travelling alongside, resulting in a turn directly into the cyclist’s path. Consequently, cyclists must check over their left shoulder before proceeding across the junction. But in doing so they take their eyes momentarily off what’s going on ahead, and are thus momentarily unable to monitor for other potentially threatening movements in the junction itself. This is an uncomfortable situation for the cyclist.

A cycle lane in Minneapolis
This cycle lane in Minneapolis is badly designed; cyclists have to look all the way over their left shoulder to check for cars turning right, while also attempting to monitor movements ahead.

The infrastructure that works best is that which allows all road users to easily perceive all the relevant obstacles in their surroundings. This is elegantly illustrated in this short video made by David Hembrow (who blogs at A view from the cycle path), showing a roundabout in the Netherlands that features segregated cycle lanes and pedestrian footpaths.

If various shared space schemes have failed it is because they have created too much ambiguity and they do not provide road users with the structure that would enable them to adequately perceive their own surroundings. If road users do not feel empowered and in control in these spaces it is because they are unable to perceive the threatening movements of vehicles higher up in the hierarchy of menace.

A setting in which motorists can cut across the path of cyclists does not encourage cycling. Far better, from the cyclist’s perspective, that there be segregated infrastructure reserved for bikes. Where motor vehicle traffic has to be crossed it should be crossed on the perpendicular, enabling the cyclist to perceive any threatening movements and respond appropriately. Vulnerable groups of pedestrians, meanwhile, are reliant on light-controlled or, at least, pedestrian-priority crossings which afford them control over the movement of vehicle traffic and allow them to cross safely. Shared space schemes that remove such crossings have failed to provide these road users with an alternative method of control.

These conclusions may seem like nothing more than common sense. Yet the designers of many shared space schemes have evidently failed to accommodate the needs of the road users discussed in this post. This is the case despite shared space advocates’ claims to be informed by “a greater understanding of behavioural psychology” (Hamilton-Baillie, 2008). Perhaps what is needed is a greater understanding of the actual locomotion tasks that different road users are engaged in, along with an understanding of the dangers that must be attended to and the strategies being relied on by actors for taking control of their own behaviour. Only by identifying the sources of danger will it be possible to remove them or mitigate them. An understanding of the tasks that road users are engaged in will only come with a thoroughgoing theory of perception. Gibsonian psychology is well-suited for the job.

Note: I presented some of these ideas last weekend at the International Conference on Perception and Action which took place at the University of Minnesota (hence the photo from Minneapolis). All the talks were videoed. I hope to post a link to the video when it turns up online.



Against “joint action”: all actions are actions in a populated environment

I gave a talk last week at the 6th Joint Action Meeting in Budapest. The program for the conference is here; my abstract appears on page 29. The conference is a biannual affair. It was initially set up ten years ago by Günther Knoblich and Natalie Sebanz as a forum for collegial disagreement about what joint action is. It still retains that atmosphere, I think. I tried to construct my talk in the spirit of open enquiry that Günther and Natalie seem to invite. I like this conference because I believe such open enquiry is necessary for working through the conceptual issues involved here. I used my talk to argue against “joint action” as a piece of terminology, on the grounds that the term leads to an overly restrictive interpretation of what’s going on in the phenomena studied by joint action researchers. The argument is based on James Gibson’s distinction between the environment and the physical world. I will use this post to make a record of the argumentative portion of the talk. (The talk also had an illustrative portion in which I talked about possession in soccer, as blogged here.)

What is a “joint action” anyway?

Ants on an ant mound Two people carrying a sofa together
What do these two images have in common? (sources: 1, 2)

I start with a question: what is it that is supposed to unite all of the different phenomena we talk about when we talk of joint action? Consider the two images above: on the left, a group of ants in the process of building a mound; on the right, two people carrying a sofa together (this is one of the go-to examples that philosophers reach for when discussing joint action).

In fact, it is very difficult to identify anything that unites these phenomena. Philosophers often appeal to the concept of common knowledge (e.g., from the Stanford Encyclopedia of Philosophy: “In order to communicate or otherwise coordinate their behavior successfully, individuals typically require mutual or common understandings or background knowledge”). But few would be willing to attribute common knowledge to the ants. Psychologists like to appeal to hypothetical theory of mind capacities within individuals, but again few would be willing to entertain the idea that the individual ants are engaging in complex mental modelling of the inner lives of their fellows.

One escape route here would be to simply insist that only humans can do proper joint action. (This is the option chosen by Michael Tomasello, for instance. Tomasello asserts that chimpanzees hunting in groups are not really doing joint action: “The apes are engaged in a group activity in ‘I’ mode, not in ‘we’ mode.”) This does not really solve the problem, though. We now just have a new problem: how do we distinguish proper joint action from merely apparent or quasi-joint action? And if non-linguistic animals cannot do joint action, what about human infants? In short, we’ve merely replaced the question of defining joint action with a different question: what exactly is it that makes humans so special?

So it is difficult to identify just what it is that allows us to talk about a bunch of disparate phenomena while labelling them all as instances of “joint action”. Except there is one thing that unites the ants and the sofa-carriers. Consider the situation from the perspective of one of the individual actors. Relative to that individual’s perceptual system, the environment is populated. And it is here that it is necessary to invoke James Gibson’s distinction between the environment and the physical world.

The environment v. the physical world

At the very start of his final book, Gibson introduces his concept of the environment by contrasting it with the standard view of the physical world. The terms “environment” and “physical world”, he claims, should not be treated as synonyms. Understanding this distinction is, I think, key to understanding the radical difference between Gibson’s psychology and the standard representational theory of mind. (I hadn’t really realised how important this was until I was preparing this talk. But the distinction obviously is important to Gibson: he makes it on page 2 of chapter 1, a chapter whose title is “The animal and the environment”.)

The environment, according to Gibson, is that which surrounds an individual animal (1979, p. 8):

[I]t is often neglected that the words animal and environment make an inseparable pair. Each term implies the other. No animal could exist without an environment surrounding it. Equally, although not so obvious, an environment implies an animal (or at least an organism) to be surrounded. This means that the surface of the earth, millions of years ago before life developed on it, was not an environment, properly speaking. The earth was a physical reality, a part of the universe, and the subject matter of geology. […] We might agree to call it a world, but it was not an environment.

Gibson is here insisting that animals do not perceive the world of physics, we do not perceive the bare physical world. What we perceive is our surroundings: the structured collection of surfaces and objects that are meaningful by virtue of our capacity to act relative to them. The environment ultimately only exists at the approximate scale of our bodies (microscopes and mass spectrometers notwithstanding). When ecological psychologists talk about the “animal–environment system”, this is not some vague device for pointing out that our bodies are situated in space and that we perceive things in that space (a fact which no cognitivist would deny). It is, rather, an assertion that we never in fact perceive pure “space”. What we perceive is our surroundings; we perceive our own unique environment relative to our ability to act in it.

To bring this back to the topic at hand, it should be pointed out that one of the features of the environments of all animals in reality is that they are populated with other animals. The ants and the sofa-carriers all encounter an environment that is not merely structured by the presence of surfaces and objects. It is also structured by the presence of other actors. A populated environment is one that not only reacts to the exploratory activity of the animal (as with Newtonian conservation of momentum). It is an environment that acts back, one in which that actions that the animal engages in can be perturbed, amplified, reversed, and observed by other animals.

“Joint action” mixes relational categories

When we talk about “joint action” we are in effect mixing two different relational categories.

  • “Jointness” describes an interpersonal relation; it denotes the fact that a group of actors can appear to doing more than merely acting simultaneously, they can appear to be acting together
  • “Action”, on the other hand, is usually descriptive of a relation between an individual actor and some aspect of its environment; when we act it brings about change in our environment

Perhaps part of the problem is that it is not immediately obvious that action is about animal–environment relations. Action is more often discussed in terms of the presence or absence of some mental entity or volition, as in Donald Davidson’s claim that an action must be “intentional under some description”. But on Gibson’s definition of the environment, an action must be about animal–environment relations. Even an action as simple as shaking one’s head involves the environment. In order to coordinate that particular action in the first place or to perceive whether the action has been carried out successfully one must create the perception of optic flow while keeping static the invariant structure that specifies the layout of the environment (when you shake your head your visual field moves but the environment doesn’t).

To talk of “joint action” is to suggest a dichotomy between “individual action” and “joint action”. The term invites an explanation of the behaviour of interest primarily in terms of interpersonal relations. Other kinds of actor–environment relations are left out of the analysis, or pushed to the background, taken for granted, described as mere “low-level” action.

The standard disagreement in the field is over exactly what kinds of interpersonal relations must be involved for joint action to be possible. One view says that in order for a group of actors to be able to act together successfully they must in some sense subordinate their own activity to that of the group; they must collectively constitute a “plural subject” of the action, in the phrase of the philosopher Margaret Gilbert. The second view prefers to maintain the boundaries around individual actors. On this view, a joint action requires each actor to engage in mindreading activity of some variety, whether it be through the construction of mental models of the other actors’ intentions or through unconscious processes of interpersonal synchronization. Only when each actor shares the appropriate mental content with the other actors can the joint action be undertaken.

Two standard approaches to joint action: the plural subject model (l) v. the mindreading/interpersonal synchronization model (r); the black arrows represent action.

What these views share is a lack of attention to just what the environment looks like from the perspective of an individual actor. The action itself is treated as mere output from a system. The puzzle is taken to consist in identifying how the jointness of this system arises. It is implicitly assumed that once we have an explanation of this jointness we will also have an explanation both of how the group behaves and of how the group’s individual members behave.

Acting in a populated environment: it’s not all about interpersonal relations

It is worth pausing over this phrase, “low-level action”. What could it mean? Why should actions directed at inanimate objects be considered “low-level” while actions involving other animals are considered high-level or social? Why are we tempted to make this distinction?

This distinction has arisen, I suspect, as an accidental by-product of the way that action has traditionally been studied. Psychologists have tried to study behaviour by isolating individual actors from their natural environments and placing them in carefully constructed artificial environments in which the sensory inputs and outputs can be tightly controlled and measured. It is tempting to believe that by getting a subject to respond to dots on a screen we are ignoring everything that is not action and perception at the most basic level.

But real environments are not like the environments created in the laboratory. An ant’s environment always contains other ants. That is to say, any action that the ant might perform is one that potentially perturbs the environment of another ant. It is this fact that allows the ants as a collective to construct great structures that could not possibly be conceived by any of the ants individually. Each ant merely acts in, and relative to, its own environment, and in doing so it alters the structure of the environment of its fellows.

“Low-level action”, then, is an analyst’s construct. In reality, all action is social because all action has the potential to perturb the environment of another animal. The phrase I propose, “acting in a populated environment”,[1] collapses the dichotomy between joint action and individual action. There is simply no need to establish a joint goal or to construct, ahead of time, a joint actor that is constituted by a set of interpersonal relations. It is sufficient for an individual to act on the available structure; this act can then be amplified or responded to by another individual whose environment has been perturbed.

The pay-off of this move, from talking about “joint action” to talking about “acting in a populated environment, is that it allows us to avoid asking spurious questions about how groups come to perform tasks together. Existing accounts of joint action attempt to explain such behaviour primarily in terms of interpersonal relations, asking the question: “How do individual actors in some situation combine to make a joint actor or team?” On the alternative view that I have outlined, this question is no longer seen as a pre-requisite to understanding the behaviour. Instead, we can ask a more tractable kind of question, focusing on some specific task and asking: “How do we identify precisely which structures in the actor’s environment are implicated in the control of the actor’s behaviour?”


The term “joint action” invites us to explain collective behaviour in unnecessarily narrow terms, focusing primarily on interpersonal relations while pushing other kinds of actor–environment relations to the background. As an alternative, I prefer to talk of “acting in a populated environment”. This phrase can be thought of as a device for turning complex questions about collective behaviour into tractable questions about individual behaviour. It also happily allows us to avoid having to grapple with such fraught concepts as shared intentions or shared goals. And yet to talk of acting in a populated environment is not to deny that there is a great deal of complexity that underlies such behaviour. Instead, it forces us to acknowledge the rich structure that exists in populated environments and that pervades an animal’s surroundings; this structure does not merely exist in between actors but surrounds each of us. This richness is in fact an asset; it is what is used by actors to guide and control specific actions. The hope is that by examining this structure in detail it will be possible to explain, in specific cases, how individual actors can successfully act as part of a coordinated collective.


[1] The phrase “populated environment” I take from Gibson’s “Note on perceiving in a populated environment”, published posthumously as part of a set of notes on the concept of affordances (Gibson, 1982).


  • Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Houghton-Mifflin, Boston.
  • Gibson, J. J. (1982). Notes on affordances. In Reed, E. and Jones, R., editors, Reasons for Realism : Selected essays of James J. Gibson, pages 401–418. Lawrence Erlbaum, Hillsdale, New Jersey.