Tag Archives: sofa-carrying

How direct is social perception? A comment on Abramova and Slors (2015)

Students constructing a staircase, each working separately on a particular part.

Students constructing a staircase, each working separately on a particular part; Abramova and Slors (2015) label such activity as “distributive action coordination”. (Image source)

It has become trendy recently to talk about social perception as a direct process. Or, at least, I gather it has become trendy, because the journal Consciousness and Cognition has a special issue coming out on the subject. Most of the articles appear to be playing the game of taking a novel phrase, in this case “direct social perception”, and trying to cram that phrase into the author’s preferred psychological description scheme, be that Bayesianism, theory-of-mind cognitivism, or some other approach. Each to their own. I don’t have anything much to say about these articles. There was one paper that caught my eye, though. It is written by Ekaterina Abramova and Marc Slors (A&S), and it attempts to give an account of “simple action coordination” in explicitly Gibsonian terms—in terms of individual actors, or agents, who, through their actions, “shape the field of affordances for another agent”.

This strikes me as a move in a useful direction, and one to be encouraged. It is also one that immediately raises a whole set of conceptual issues which demand careful thought.

I do not want to critique the paper itself, as such. The proposals presented are quite preliminary in nature (the authors themselves claim only that they “have tentatively made an indirect case in favor of [direct social perception] in simple action coordination”). Instead, I want to take issue with the way the authors seem to be interpreting the concept of affordances. This will, I hope, be illustrative of some of the issues that arise when we appeal to affordances in trying to explain social phenomena; these issues are general and go well beyond this particular paper. I want to suggest that A&S’s reading of the concept of affordances is unduly narrow. As a consequence of this, they end up maintaining certain notions which I believe we’d be better off simply abandoning. Specifically, they maintain that coordinated activities involving multiple actors must be supported by a “shared intention”; and they appear to be committed to the notion that acting with others necessarily involves “understanding” others in some sense—e.g., “recognizing” that certain objects afford certain actions for others. I want to suggest that there is a way to formulate the notion of affordances which allows an account of coordinated multi-actor activity in which these issues simply do not arise.

It is, however, worth noting a couple of things about the paper up front. Firstly, the phrase “direct social perception” is being appealed to here in a promissory manner. It is being invoked as part of a rejection of the mainstream, cognitivist account of acting with others as involving the making of inferences about others’ intentions. On the mainstream account it is assumed that intentions are a matter of private mental activity and that therefore they are inherently inaccessible to anyone but the individual whose intentions they are. A&S reckon that their affordance-based description scheme allows them to avoid having to appeal to inference-making, although it is not always clear from what they write that they have completely escaped thinking about the actions they describe in terms of inferences. Thus, in their conclusion they claim that acting with another agent requires “perceiving affordances for another person [which] involves seeing that person as an agent”, and that this “does involve attributing some basic kind of intentional relation between a person and her environment”. It is hard to interpret that “attributing” as anything other than a form of inferring.

Secondly, the authors make a distinction between two types of acting with others:

    1. distributive action coordination, in which actors work on their own tasks while avoiding getting in each other’s way, as in the photograph at the top of this post
    2. contributive action coordination, in which multiple actors use one another to complete a task which none would be able to perform independently, as in the ubiquitous sofa-carrying case; A&S write: “In such cases coordinated joint action is usually required to start simultaneously”


Four men carry a sofa together.

A rare four-man sofa-lift; according to A&S this is a case of “contributive action coordination”. (Image source)

The authors think that their affordance-based scheme can straightforwardly explain what’s going on in the case of distributive action coordination (and that this can already be modelled using dynamic field theory), but that some work is required before an account of coordination of the contributive type can be attempted. This distinction is, intuitively and descriptively, a reasonable one to make. However, I think the distinction breaks down quite rapidly when we consider the situation in terms of affordances relative to an individual actor (see this previous post in which I argue against analysing the situation primarily in terms of interpersonal relations; I claim that we should instead focus on the relations between the actor and its populated environment).

Perceiving affordances for others is not like perceiving affordances for oneself

In their attempt to give an account of contributive, sofa-carrying-type action coordination, A&S suggest that it is necessary that the actor perceive their environment not only in terms of what it affords for them, but also in terms of what it affords for others. They cite the following passage from James Gibson’s chapter introducing the “theory of affordances” (Gibson, 1979, p. 141):

The child begins, no doubt, by perceiving the affordances of things for her, (. . .). But she must learn to perceive the affordances of things for other observers as well as [for] herself.

As with a number of the remarks that Gibson makes in that chapter, this one can be read in a couple of different ways:

  1. Perceiving the affordances of things for others requires one to perceive that, from where the other person is standing, that person’s environment affords action X
  2. Perceiving the affordances of things for others requires one to attend to and discriminate a correspondence, or fit, between two pieces of structure in one’s own (the perceiver’s) environment: a correspondence between the object and the other actor

The first reading appeals to what in the cognitive tradition is called other-modelling. A&S appear to be leaning towards the first reading: “in the plank-lifting case, the field of affordances of a person standing at one end of a plank is changed when she sees another person at the other end of the plank. The plank becomes ‘jointly liftable’ for her because she recognizes the availability of exactly the same affordance for the other person as well.” This seems to sit uneasily with any claim to be pursuing a direct perception account of social activity, because other-modelling adds an inferential step between what is seen (what is specified in the optic array) and what is “perceived”.

The second reading is more interesting, I think. Here there is no requirement to put oneself in the place of the other. On this reading, perceiving affordances for others is nothing like perceiving affordances for oneself. And how could it be? If affordances are animal-relative, as we are told is the case, then we can only directly perceive our own environment in terms of what it affords to us as individuals. Any notions I might have about how the world looks to you are a matter of conjecture on my part. What I perceive is an environment (my surroundings) which happens to be populated with other actors, whom I perceive in relation to all the other objects in my environment. I might perceive you and the sofa as objects that I can potentially assemble in a way that affords sofa-carrying. But this is no different in degree from my perceiving the relation between a sofa and, say, one of those flat trolleys you have to push around when you get to the warehouse bit in Ikea. The trolley and sofa can likewise be assembled into a structure for sofa-transportation (over a flat surface, at least). The point is that the other person, like the trolley, is, for the purposes of the task at hand (shifting this sofa to some place else), merely usable structure in the environment.

To be clear, the claim is that perceiving is only ever of an environment that affords actions to the animal doing the perceiving. Consider an example from soccer. A striker bearing down on the opponents’ goal must pick out a particular place within the goalmouth into which to aim the ball such that the goalkeeper will not be able reach it. We might say that this involves the striker perceiving the affordances for the goalkeeper. And in a sense it does. But this does not mean that the striker must know what it is like to be the goalkeeper. The striker need have had no experience of being a goalkeeper and need have no appreciation of what the situation looks like from the goalkeeper’s point of view. The striker perceives the situation in terms of what kinds of action will most likely result in a goal—in terms of what kinds of shot the goalkeeper is unlikely to reach. The goalkeeper is a mere obstacle, to be negotiated, evaded, got the better of.

Note that on this way of thinking about affordances it does not make sense to talk of what the environment affords for the collective, or to talk of “fields of affordances for dyads or groups”. Groups do not experience their surroundings or perceive affordances, only actors do.

Affordances make “shared intentions” unnecessary

An American beaver in Alaska lifts its head above the dam.

Beavers don’t waste their time worrying about whether or not the other beavers in the colony are committed to a shared intention for dam-maintenance. (Image source)

In the abstract for their paper, A&S write: “Coordination ensues, we argue, when, given a shared intention, the actions of and/or affordances for one agent shape the field of affordances for another agent” (my emphasis). They ask us to simply accept that coordination requires a pre-existing shared intention. The problem here is that, if you share an intention with someone (whatever that means), then you must already be coordinated with that person in some sense. Coordination does not “ensue”; it has already been presupposed. The question of how that coordination came about has been begged.

Given how opaque this notion of “shared intentions” is, there is good reason to want to avoid invoking it in the first place. Fortunately, the affordance concept is powerful enough to allow us to do this. The crucial thing is to note that the field of affordances surrounding an animal is subject to change as the internal structure of the animal changes.

Consider the beaver. Beaver dams are maintained by members of the beaver colony. This might be taken as a case of what A&S call “distributive action coordination”: each individual beaver takes part in building and repairing a structure that is beneficial to all. But there is no need to appeal to shared intentions here. (And since we are talking about a non-human animal here, we are naturally less inclined to do so.) The beaver’s motivation to repair the dam is explained in this delightful Attenborough clip. Beavers are apparently averse to the sound of running water, so when they hear the sound of water rushing out of a damaged section of the dam they are drawn towards that location and immediately start work on the repairs. Put another way, the sound of running water triggers an internal change in the beaver that reorients the animal’s activity.

Or consider the sofa-carrying case again. It is not the case that the actors involved must initiate their actions “simultaneously”. In fact, there is no natural start point to the action. To understand how the actors arrived at the position of being about to lift this particular sofa it is necessary to look backwards and ask: what happened before now that led to the present situation? Perhaps I am moving house and you agreed a couple of weeks ago to help move the furniture. Why not take that as the starting point of the action, then? When you agreed to help, you became oriented towards the furniture in a particular way: there was a change in your internal structure such that you went from being neutral with reference to this furniture to being action-oriented relative to it. The furniture became a set of objects to be moved. And as we now stand at either end of the sofa, the idea that we might need to establish a shared intention before we can get this bulky object off the ground simply does not enter into consideration. We both became object-moving devices some time earlier, and this is manifest in the fact that we are both here, and in the fact that we have probably already shifted a bunch of objects out of the room.

Or we could take the explanation further back and ask how it was that we learned to lift bulky objects in coordination with others in the first place.

The point is that, appropriately understood, the concept of affordances simply obviates the need to think about these situations in terms of shared intentions. It is perhaps useful here to make a distinction between “action” and “task”. Action has no start point. Actors do not simply engage in routine performances, or scripts, that have a clearly defined start and end point. Action is continuous: everything the actor does is part of an ongoing process in which the actor changes its environment and in doing so changes its own internal structure. I have argued (Baggs, 2014) that the task, by contrast, is an analytical device that allows us to artificially break up this stream of action into tractable chunks that can be studied; these chunks need to have a clearly defined start and end point and a characteristic mode of transition between the two. The outfielder problem is a good example: we can define the start (the ball being launched), the end (the catch), and the transition (the parabolic trajectory of the ball through the air), and this is very useful as a guide for what we should measure. But we should not make the mistake of thinking that the task as we have defined it is the same as the action from the perspective of the actor. In the context of action in multi-actor settings, such a narrow definition creates unnecessary problems. If we assume that the action itself starts at the point that we happen to start observing or measuring, i.e. at the point when we are already standing at either end of the sofa, then it looks like our behaviour must be structured by some hidden “shared intention” which is not itself perceivable. What is overlooked is the fact that our internal structure has been shaped by previous action. If the affordances for lifting the sofa are directly perceivable it’s because we perceive the situation we are in relative to the type of animal we have become.

“Direct social perception”: a conceptual blind alley?

A note of caution is in order about this phrase, “direct social perception”. The existing phrase, “direct perception”, at least as Gibson uses the term, is meant to denote a relationship between an actor and its environment. Here is a description from Turvey, Shaw, Reed and Mace (1981, p. 239):

Gibson argued that the proper “objects” of perceiving are the same as those of activity. Standing still, walking, and running are all relations between an animal and its supporting surface. Though not always explicitly recognized … the supporting surface is just as much an essential constituent of these activities as, for instance, legs; and useful perceiving involved in controlling posture and locomotion must be directed toward the same surface. Thus it would seem that a two-term relation involving the same surface or ground can exist in both cases: an animal *runs* on the ground and an animal *sees* the ground. This much should be common sense. There is no thing *between* the animal and the ground in the relation. This is what Gibson has always meant by direct perception and it is the same as what one would mean by direct action if one were discussing activity.

Even if one may be inclined to take issue with some of this, it is at least clear that direct perception means something: it refers to a particular way of thinking about what’s going on between the animal and its environment.

This is not the case with the phrase “direct social perception”. This is because “social” is a descriptive category, introduced by a third-party observer or an analyst. A situation is described as “social” if it involves multiple actors. But just because we can describe a situation as “social”, we are not necessarily licensed to assume that the “socialness” of the situation must play a causal role in our eventual explanation of the actors’ behaviour. (See the parallel argument here for the phrase “joint action”.) We might describe the beaver as a social animal, yet the beaver does not need to appreciate its own status as a social animal in order to fix its dam.

It may be prudent to avoid the phrase “direct social perception” altogether, on the grounds that the inherent mixing of perspectives—descriptive and explanatory—is bound to lead to confusion further along the line. (I will however point out Eric Charles’s claim that there’s a way of reading ecological psychology according to which other minds are perceivable in the sense that the actions of others are perceived in the context of larger, ongoing patterns of behaviour.)

Some devices for avoiding conceptual traps

In lieu of a summary, I conclude with some recommendations for avoiding some of the ever-present conceptual traps surrounding the analyst of multi-actor activity.

  • Shared intentions only seem necessary as a function of the language we use to describe situations. To avoid this, consider non-human animals: the beaver, or, better, the ant. Ants don’t do shared intentions. When proposing an account of multi-actor coordinated activity, ask: how would this apply to ants.
  • Describing something as “social” or “joint” is all well and good as a description, but do not make the mistake of thinking that just because you (the analyst) can describe something as social, that the socialness of the activity must necessarily play a causal explanatory role in an account of the behaviour of any given individual actor.
  • Actions do not begin at the point that you start measuring them. Tasks, on the other hand, do. Tasks are a methodological device that establishes some more or less arbitrary start and end point (Baggs, 2014). Do not confuse the action an actor is performing with the task as you have defined it.



Against “joint action”: all actions are actions in a populated environment

I gave a talk last week at the 6th Joint Action Meeting in Budapest. The program for the conference is here; my abstract appears on page 29. The conference is a biannual affair. It was initially set up ten years ago by Günther Knoblich and Natalie Sebanz as a forum for collegial disagreement about what joint action is. It still retains that atmosphere, I think. I tried to construct my talk in the spirit of open enquiry that Günther and Natalie seem to invite. I like this conference because I believe such open enquiry is necessary for working through the conceptual issues involved here. I used my talk to argue against “joint action” as a piece of terminology, on the grounds that the term leads to an overly restrictive interpretation of what’s going on in the phenomena studied by joint action researchers. The argument is based on James Gibson’s distinction between the environment and the physical world. I will use this post to make a record of the argumentative portion of the talk. (The talk also had an illustrative portion in which I talked about possession in soccer, as blogged here.)

What is a “joint action” anyway?

Ants on an ant mound Two people carrying a sofa together
What do these two images have in common? (sources: 1, 2)

I start with a question: what is it that is supposed to unite all of the different phenomena we talk about when we talk of joint action? Consider the two images above: on the left, a group of ants in the process of building a mound; on the right, two people carrying a sofa together (this is one of the go-to examples that philosophers reach for when discussing joint action).

In fact, it is very difficult to identify anything that unites these phenomena. Philosophers often appeal to the concept of common knowledge (e.g., from the Stanford Encyclopedia of Philosophy: “In order to communicate or otherwise coordinate their behavior successfully, individuals typically require mutual or common understandings or background knowledge”). But few would be willing to attribute common knowledge to the ants. Psychologists like to appeal to hypothetical theory of mind capacities within individuals, but again few would be willing to entertain the idea that the individual ants are engaging in complex mental modelling of the inner lives of their fellows.

One escape route here would be to simply insist that only humans can do proper joint action. (This is the option chosen by Michael Tomasello, for instance. Tomasello asserts that chimpanzees hunting in groups are not really doing joint action: “The apes are engaged in a group activity in ‘I’ mode, not in ‘we’ mode.”) This does not really solve the problem, though. We now just have a new problem: how do we distinguish proper joint action from merely apparent or quasi-joint action? And if non-linguistic animals cannot do joint action, what about human infants? In short, we’ve merely replaced the question of defining joint action with a different question: what exactly is it that makes humans so special?

So it is difficult to identify just what it is that allows us to talk about a bunch of disparate phenomena while labelling them all as instances of “joint action”. Except there is one thing that unites the ants and the sofa-carriers. Consider the situation from the perspective of one of the individual actors. Relative to that individual’s perceptual system, the environment is populated. And it is here that it is necessary to invoke James Gibson’s distinction between the environment and the physical world.

The environment v. the physical world

At the very start of his final book, Gibson introduces his concept of the environment by contrasting it with the standard view of the physical world. The terms “environment” and “physical world”, he claims, should not be treated as synonyms. Understanding this distinction is, I think, key to understanding the radical difference between Gibson’s psychology and the standard representational theory of mind. (I hadn’t really realised how important this was until I was preparing this talk. But the distinction obviously is important to Gibson: he makes it on page 2 of chapter 1, a chapter whose title is “The animal and the environment”.)

The environment, according to Gibson, is that which surrounds an individual animal (1979, p. 8):

[I]t is often neglected that the words animal and environment make an inseparable pair. Each term implies the other. No animal could exist without an environment surrounding it. Equally, although not so obvious, an environment implies an animal (or at least an organism) to be surrounded. This means that the surface of the earth, millions of years ago before life developed on it, was not an environment, properly speaking. The earth was a physical reality, a part of the universe, and the subject matter of geology. […] We might agree to call it a world, but it was not an environment.

Gibson is here insisting that animals do not perceive the world of physics, we do not perceive the bare physical world. What we perceive is our surroundings: the structured collection of surfaces and objects that are meaningful by virtue of our capacity to act relative to them. The environment ultimately only exists at the approximate scale of our bodies (microscopes and mass spectrometers notwithstanding). When ecological psychologists talk about the “animal–environment system”, this is not some vague device for pointing out that our bodies are situated in space and that we perceive things in that space (a fact which no cognitivist would deny). It is, rather, an assertion that we never in fact perceive pure “space”. What we perceive is our surroundings; we perceive our own unique environment relative to our ability to act in it.

To bring this back to the topic at hand, it should be pointed out that one of the features of the environments of all animals in reality is that they are populated with other animals. The ants and the sofa-carriers all encounter an environment that is not merely structured by the presence of surfaces and objects. It is also structured by the presence of other actors. A populated environment is one that not only reacts to the exploratory activity of the animal (as with Newtonian conservation of momentum). It is an environment that acts back, one in which that actions that the animal engages in can be perturbed, amplified, reversed, and observed by other animals.

“Joint action” mixes relational categories

When we talk about “joint action” we are in effect mixing two different relational categories.

  • “Jointness” describes an interpersonal relation; it denotes the fact that a group of actors can appear to doing more than merely acting simultaneously, they can appear to be acting together
  • “Action”, on the other hand, is usually descriptive of a relation between an individual actor and some aspect of its environment; when we act it brings about change in our environment

Perhaps part of the problem is that it is not immediately obvious that action is about animal–environment relations. Action is more often discussed in terms of the presence or absence of some mental entity or volition, as in Donald Davidson’s claim that an action must be “intentional under some description”. But on Gibson’s definition of the environment, an action must be about animal–environment relations. Even an action as simple as shaking one’s head involves the environment. In order to coordinate that particular action in the first place or to perceive whether the action has been carried out successfully one must create the perception of optic flow while keeping static the invariant structure that specifies the layout of the environment (when you shake your head your visual field moves but the environment doesn’t).

To talk of “joint action” is to suggest a dichotomy between “individual action” and “joint action”. The term invites an explanation of the behaviour of interest primarily in terms of interpersonal relations. Other kinds of actor–environment relations are left out of the analysis, or pushed to the background, taken for granted, described as mere “low-level” action.

The standard disagreement in the field is over exactly what kinds of interpersonal relations must be involved for joint action to be possible. One view says that in order for a group of actors to be able to act together successfully they must in some sense subordinate their own activity to that of the group; they must collectively constitute a “plural subject” of the action, in the phrase of the philosopher Margaret Gilbert. The second view prefers to maintain the boundaries around individual actors. On this view, a joint action requires each actor to engage in mindreading activity of some variety, whether it be through the construction of mental models of the other actors’ intentions or through unconscious processes of interpersonal synchronization. Only when each actor shares the appropriate mental content with the other actors can the joint action be undertaken.

Two standard approaches to joint action: the plural subject model (l) v. the mindreading/interpersonal synchronization model (r); the black arrows represent action.

What these views share is a lack of attention to just what the environment looks like from the perspective of an individual actor. The action itself is treated as mere output from a system. The puzzle is taken to consist in identifying how the jointness of this system arises. It is implicitly assumed that once we have an explanation of this jointness we will also have an explanation both of how the group behaves and of how the group’s individual members behave.

Acting in a populated environment: it’s not all about interpersonal relations

It is worth pausing over this phrase, “low-level action”. What could it mean? Why should actions directed at inanimate objects be considered “low-level” while actions involving other animals are considered high-level or social? Why are we tempted to make this distinction?

This distinction has arisen, I suspect, as an accidental by-product of the way that action has traditionally been studied. Psychologists have tried to study behaviour by isolating individual actors from their natural environments and placing them in carefully constructed artificial environments in which the sensory inputs and outputs can be tightly controlled and measured. It is tempting to believe that by getting a subject to respond to dots on a screen we are ignoring everything that is not action and perception at the most basic level.

But real environments are not like the environments created in the laboratory. An ant’s environment always contains other ants. That is to say, any action that the ant might perform is one that potentially perturbs the environment of another ant. It is this fact that allows the ants as a collective to construct great structures that could not possibly be conceived by any of the ants individually. Each ant merely acts in, and relative to, its own environment, and in doing so it alters the structure of the environment of its fellows.

“Low-level action”, then, is an analyst’s construct. In reality, all action is social because all action has the potential to perturb the environment of another animal. The phrase I propose, “acting in a populated environment”,[1] collapses the dichotomy between joint action and individual action. There is simply no need to establish a joint goal or to construct, ahead of time, a joint actor that is constituted by a set of interpersonal relations. It is sufficient for an individual to act on the available structure; this act can then be amplified or responded to by another individual whose environment has been perturbed.

The pay-off of this move, from talking about “joint action” to talking about “acting in a populated environment, is that it allows us to avoid asking spurious questions about how groups come to perform tasks together. Existing accounts of joint action attempt to explain such behaviour primarily in terms of interpersonal relations, asking the question: “How do individual actors in some situation combine to make a joint actor or team?” On the alternative view that I have outlined, this question is no longer seen as a pre-requisite to understanding the behaviour. Instead, we can ask a more tractable kind of question, focusing on some specific task and asking: “How do we identify precisely which structures in the actor’s environment are implicated in the control of the actor’s behaviour?”


The term “joint action” invites us to explain collective behaviour in unnecessarily narrow terms, focusing primarily on interpersonal relations while pushing other kinds of actor–environment relations to the background. As an alternative, I prefer to talk of “acting in a populated environment”. This phrase can be thought of as a device for turning complex questions about collective behaviour into tractable questions about individual behaviour. It also happily allows us to avoid having to grapple with such fraught concepts as shared intentions or shared goals. And yet to talk of acting in a populated environment is not to deny that there is a great deal of complexity that underlies such behaviour. Instead, it forces us to acknowledge the rich structure that exists in populated environments and that pervades an animal’s surroundings; this structure does not merely exist in between actors but surrounds each of us. This richness is in fact an asset; it is what is used by actors to guide and control specific actions. The hope is that by examining this structure in detail it will be possible to explain, in specific cases, how individual actors can successfully act as part of a coordinated collective.


[1] The phrase “populated environment” I take from Gibson’s “Note on perceiving in a populated environment”, published posthumously as part of a set of notes on the concept of affordances (Gibson, 1982).


  • Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Houghton-Mifflin, Boston.
  • Gibson, J. J. (1982). Notes on affordances. In Reed, E. and Jones, R., editors, Reasons for Realism : Selected essays of James J. Gibson, pages 401–418. Lawrence Erlbaum, Hillsdale, New Jersey.