Speaking is a type of acting, but being spoken to is not a type of perceiving

I have a paper in the newly published issue of the journal Ecological Psychology. The paper sets out the case that language-use should be understood as fundamentally a form of acting. I contrast this with a view of language-use as fundamentally a form of perceiving information via conventional patterns.

The argument may come across in the paper less clearly than I had hoped. The paper is part of a special issue on the topic of “Language as a public activity”, which is introduced by the editors, Bert Hodges and Carol Fowler (Hodges and Fowler, 2015). The editors valiantly attempt to provide one-paragraph summaries of all the papers in the issue. And they do an excellent job of summarizing mine (they’ve even highlighted a couple of things about it which I wasn’t aware of, for which I’m grateful). But there are a couple of statements in their summary which I would not endorse, which tells me that I wasn’t clear enough about those claims. So in this post I will try to try to do a couple of things: (a) summarize the argument I think I tried to make in this paper, and (b) set things out in a more intuitive way.

To understand speaking we need to understand how populated environments are structured

I tried to argue against the view that language-use necessarily involves conventional patterns. Conventions are to be understood here as mappings between invariant patterns (e.g., the word “dandelion”) and events (e.g., the perceiving of a dandelion). I don’t think we need to appeal to any such notion of conventions to explain how speaking works.

The key is to adopt a developmental perspective on language-use.

Children, when they are learning to speak, do so in an environment that contains other actors. And although none of these actors will be constantly present at every moment throughout the child’s growing up, there will nevertheless be a handful of actors who form part of the furniture of the child’s surroundings.

Now, the actors in a child’s environment themselves create structure through their own activities. For instance, sometimes an adult will point to some object and say something like, “Ooh, look at that giraffe!” One of the advantages about being a child (as opposed to being, say, a dictaphone) is that you don’t have to wait for other actors to produce these structures: if you want to explore the structure of your surroundings, you can perform an action that disturbs things and that thus creates potentially meaningful new patterns.

This is something that children do all the time. It happens even in situations where the child’s behaviour seems at first glance to be little more than a series of responses to an adult’s prompting, e.g. in ritualized games of the type where an adult repeatedly asks the child “Where’s your nose?”, “Where’s your mouth?”, etc. See this video of a child interacting with his grandmother.

When the child cannot immediately produce the appropriate response to the prompt “Where’s your mouth?” (a little after 1:30), he does not disengage but continues to attend to what the adult is saying. Notice that this behaviour from the child functions as a means of eliciting further behaviour from the adult. What looks like a failure to respond is in fact a form of engagement. At any moment, the child could choose to wriggle out of the adult’s grip and find some other activity to get involved in. But he doesn’t. In reality, the child is driving the game as much as the adult is. Even an apparent absence of response can in fact be an active attempt by the child to disturb the structure of his surroundings.

It is not always easy to notice that learning to speak is such an active process. (Notice how, a couple of times in the video, the grandmother interprets the child’s attentive non-responding as the child’s way of saying “this is boring”.)

When viewed in the wild, learning to speak does not look like a process of inferring mappings over a set of forms and a set of meanings. It looks like a series of explorations by the child of the environment it finds itself surrounded by—an environment that contains other actors who respond in meaningful ways to the child’s own activity. It is not the case that in learning to speak the child must learn some abstract set of patterns that we call a language (e.g. English). The child’s environment contains mostly just a small set of actors who are intermittently present. The child’s immediate task is to learn how to influence and manage the behaviour of these particular actors. There’s no need to learn an abstract set of conventions that can be used to communicate with an ideal conversation partner: what matters is what’s in the child’s surroundings.

Speaking is acting to alter the layout of a populated environment

A young child points out of a train window

A young speaker manages her environment (image source)

If my claim is that we should not view (early-childhood) speaking as involving conventional mappings, then I need to be able to provide an alternative account of what speaking, in fact, is. This is hard to do but I think it’s worth a try.

Again, let’s start with the infant at the early stages of learning to talk. The infant is born with the ability to make sounds. Different types of verbal action will result in different kinds of responses from the infant’s environment. From the day the infant is born, it can act on its environment by crying. Crying is a type of acting that not only disturbs the ambient sound patterns in the environment; it also perturbs the activity of other actors in the infant’s surroundings (see Thompson et al, 1996). But crying is not a very accurate means of eliciting a response: caregivers respond to crying but don’t immediately know exactly what to do to bring the crying to an end.

As the child’s repertoire of sound-making skills expands, the child will be able to elicit increasingly specific responses from its caregiver-containing environment. The crucial developmental shift occurs when the child learns to control its verbal actions with reference to a specific type of relation that exists in its environment: relations that stand between other actors and other objects and events in the environment (this occurs at around nine months of age; Tomasello calls it the “nine-month revolution”). At this stage, the child starts to point to, and to name, objects and events.

How does this work? Well, it works because of the caregiver’s own history of learning. If the infant produces some sort of sound that approximates “doggie”, this orients the caregiver’s attention towards dog-shaped objects in the caregiver’s environment. How does this work? I confess, I have no real idea. But I don’t think it needs to involve conventions. The caregiver does not need to recognize the conventional link between this particular sound pattern and this particular object. The sound orients the caregiver’s attention because the caregiver’s nervous system is configured in such a way that the one immediately triggers the other. The caregiver’s nervous system got that way because of activities the caregiver engaged in in the past. What matters is that this link within the caregiver’s nervous system (between sound pattern and object) is a persisting piece of structure in the infant’s environment. The infant cannot directly perceive the caregiver’s nervous system, but can perceive the behavioural patterns that it generates. The infant can repeatedly produce the sound “doggie” at irregular intervals over weeks and months, and the sound will reliably go along with the caregiver’s attending to the object. This is the kind of structure that the child needs to be able to attend to in order to learn to speak—in order to learn to elicit meaningful responses from its environment.

One difficulty here is that, while it seems reasonable to view speaking as a kind of acting, it is not very clear how we should view the status of the addressee.

Being spoken to is something we do; it is also something that we have done to us

I liked this sentence from Sabrina’s paper, which is also in the special issue (Golonka, 2015): “A tacit assumption in ecological psychology seems to be that anything that has an effect on behavior must be grounded in the perception of an affordance and, therefore, must be guided by law-based information.” I think that’s true: ecological psychology is disposed to overlook anything that’s not information-based, first-person control of an activity.

But suppose I punch you in the leg. This will certainly have an effect on your behaviour. You’ll probably hobble around for a bit and complain. Your behaviour here is not grounded in the perception of an affordance. It is not a response to informational input. It is a response to an internal change in you as a system which I, rather rudely, brought about.

Being spoken to is a bit like being punched in the leg, sometimes. When you are spoken to, your attention is directed towards things that you yourself are not in control of selecting. It is the speaker who selects what you are to attend to, and the speaker achieves this by performing an action that perturbs your nervous system in an appropriate way.

To get things a little clearer here, being spoken to is two things:

  1. Being spoken to is acting: in order to engage with someone in the first place it is necessary to seek that person out and to not run away from them, and in order to keep the situation moving it is necessary to actively attend to the situation you are in and to make appropriate adjustments to your own behaviour (as the kid in the video does)
  2. Being spoken to is something you undergo: it is having your nervous system perturbed by the actions of another actor—the speaker

How these two processes fit together is a central problem for any psychology concerned with the problem of acting in a populated environment, or with collective activity.

What should be evident is that the second item here—having your nervous system acted upon (like being punched in the leg)—cannot really be described as a case either of acting or perceiving. It is not a straightforward case of perception-action because what is going on here is not under the control of the person being spoken to.

As a field, we have not yet come up with a satisfying way to capture this non-voluntary aspect of public language as part of our theorizing. My hunch is that a big reason for this is that we have been trying to extend James Gibson’s account of visual perception to deal with something which it is not equipped to deal with. Gibson’s description scheme treats perception as the activity of a single actor exploring its environment, which, for the purposes of much of his work he treats as if it were inanimate: the environment is described as a set of objects and surfaces whose structure is projected into energy arrays. The problems of explaining behaviour in a populated environment are, quite reasonably, largely set aside in Gibson’s books. But as a consequence, the description scheme he ends up with is one that cannot straightforwardly be extended to describe what’s going on in multi-actor settings. If we want to understand how speaking works we’re going to have to look beyond the perceiving-acting behaviour of the individual; we’re going to have to try to understand the larger context within which speaking behaviour is situated.


The editors of this special issue are searching for a way of describing language that is compatible with the anti-representational tenets of the ecological approach. This is clearly a difficult project. There are several traps that must be avoided.

One of these is the temptation to simply map the terms “speaking” and “listening” onto the terms “acting” and “perceiving”. Doing so obscures the nature of speaking as a form of action directed at an environment that is structured by the presence of other actors.

Another trap is the temptation to see linguistic meaning as fundamentally a relation between a word form and an event in the world. This is a disembodied view of meaning. In reality, meaning does not need to be disembodied in this way because meaning is distributed throughout the entire system that a child finds itself in. In the paper I wrote that: “The act of speaking is not meaningful because of the words it contains but because of the effect it has on the meaningfully structured environment into which it is directed.”

Clearly there is much more work to do before we can claim that we have a useful description scheme for talking about language in a properly ecological or enactive way. Are we moving towards one? Hard to say. I’d like to see a lot more attention paid by anti-representational researchers to developmental processes and to the peculiarities of the structure of settings that contain multiple actors. Until we have an account of speaking that is anti-representational from the ground up we risk falling back into the familiar conception of speaking as the sending of messages in a speech code. I think we can do better than that.