Speaking is a type of acting, but being spoken to is not a type of perceiving

I have a paper in the newly published issue of the journal Ecological Psychology. The paper sets out the case that language-use should be understood as fundamentally a form of acting. I contrast this with a view of language-use as fundamentally a form of perceiving information via conventional patterns.

The argument may come across in the paper less clearly than I had hoped. The paper is part of a special issue on the topic of “Language as a public activity”, which is introduced by the editors, Bert Hodges and Carol Fowler (Hodges and Fowler, 2015). The editors valiantly attempt to provide one-paragraph summaries of all the papers in the issue. And they do an excellent job of summarizing mine (they’ve even highlighted a couple of things about it which I wasn’t aware of, for which I’m grateful). But there are a couple of statements in their summary which I would not endorse, which tells me that I wasn’t clear enough about those claims. So in this post I will try to try to do a couple of things: (a) summarize the argument I think I tried to make in this paper, and (b) set things out in a more intuitive way.

To understand speaking we need to understand how populated environments are structured

I tried to argue against the view that language-use necessarily involves conventional patterns. Conventions are to be understood here as mappings between invariant patterns (e.g., the word “dandelion”) and events (e.g., the perceiving of a dandelion). I don’t think we need to appeal to any such notion of conventions to explain how speaking works.

The key is to adopt a developmental perspective on language-use.

Children, when they are learning to speak, do so in an environment that contains other actors. And although none of these actors will be constantly present at every moment throughout the child’s growing up, there will nevertheless be a handful of actors who form part of the furniture of the child’s surroundings.

Now, the actors in a child’s environment themselves create structure through their own activities. For instance, sometimes an adult will point to some object and say something like, “Ooh, look at that giraffe!” One of the advantages about being a child (as opposed to being, say, a dictaphone) is that you don’t have to wait for other actors to produce these structures: if you want to explore the structure of your surroundings, you can perform an action that disturbs things and that thus creates potentially meaningful new patterns.

This is something that children do all the time. It happens even in situations where the child’s behaviour seems at first glance to be little more than a series of responses to an adult’s prompting, e.g. in ritualized games of the type where an adult repeatedly asks the child “Where’s your nose?”, “Where’s your mouth?”, etc. See this video of a child interacting with his grandmother.

When the child cannot immediately produce the appropriate response to the prompt “Where’s your mouth?” (a little after 1:30), he does not disengage but continues to attend to what the adult is saying. Notice that this behaviour from the child functions as a means of eliciting further behaviour from the adult. What looks like a failure to respond is in fact a form of engagement. At any moment, the child could choose to wriggle out of the adult’s grip and find some other activity to get involved in. But he doesn’t. In reality, the child is driving the game as much as the adult is. Even an apparent absence of response can in fact be an active attempt by the child to disturb the structure of his surroundings.

It is not always easy to notice that learning to speak is such an active process. (Notice how, a couple of times in the video, the grandmother interprets the child’s attentive non-responding as the child’s way of saying “this is boring”.)

When viewed in the wild, learning to speak does not look like a process of inferring mappings over a set of forms and a set of meanings. It looks like a series of explorations by the child of the environment it finds itself surrounded by—an environment that contains other actors who respond in meaningful ways to the child’s own activity. It is not the case that in learning to speak the child must learn some abstract set of patterns that we call a language (e.g. English). The child’s environment contains mostly just a small set of actors who are intermittently present. The child’s immediate task is to learn how to influence and manage the behaviour of these particular actors. There’s no need to learn an abstract set of conventions that can be used to communicate with an ideal conversation partner: what matters is what’s in the child’s surroundings.

Speaking is acting to alter the layout of a populated environment

A young child points out of a train window

A young speaker manages her environment (image source)

If my claim is that we should not view (early-childhood) speaking as involving conventional mappings, then I need to be able to provide an alternative account of what speaking, in fact, is. This is hard to do but I think it’s worth a try.

Again, let’s start with the infant at the early stages of learning to talk. The infant is born with the ability to make sounds. Different types of verbal action will result in different kinds of responses from the infant’s environment. From the day the infant is born, it can act on its environment by crying. Crying is a type of acting that not only disturbs the ambient sound patterns in the environment; it also perturbs the activity of other actors in the infant’s surroundings (see Thompson et al, 1996). But crying is not a very accurate means of eliciting a response: caregivers respond to crying but don’t immediately know exactly what to do to bring the crying to an end.

As the child’s repertoire of sound-making skills expands, the child will be able to elicit increasingly specific responses from its caregiver-containing environment. The crucial developmental shift occurs when the child learns to control its verbal actions with reference to a specific type of relation that exists in its environment: relations that stand between other actors and other objects and events in the environment (this occurs at around nine months of age; Tomasello calls it the “nine-month revolution”). At this stage, the child starts to point to, and to name, objects and events.

How does this work? Well, it works because of the caregiver’s own history of learning. If the infant produces some sort of sound that approximates “doggie”, this orients the caregiver’s attention towards dog-shaped objects in the caregiver’s environment. How does this work? I confess, I have no real idea. But I don’t think it needs to involve conventions. The caregiver does not need to recognize the conventional link between this particular sound pattern and this particular object. The sound orients the caregiver’s attention because the caregiver’s nervous system is configured in such a way that the one immediately triggers the other. The caregiver’s nervous system got that way because of activities the caregiver engaged in in the past. What matters is that this link within the caregiver’s nervous system (between sound pattern and object) is a persisting piece of structure in the infant’s environment. The infant cannot directly perceive the caregiver’s nervous system, but can perceive the behavioural patterns that it generates. The infant can repeatedly produce the sound “doggie” at irregular intervals over weeks and months, and the sound will reliably go along with the caregiver’s attending to the object. This is the kind of structure that the child needs to be able to attend to in order to learn to speak—in order to learn to elicit meaningful responses from its environment.

One difficulty here is that, while it seems reasonable to view speaking as a kind of acting, it is not very clear how we should view the status of the addressee.

Being spoken to is something we do; it is also something that we have done to us

I liked this sentence from Sabrina’s paper, which is also in the special issue (Golonka, 2015): “A tacit assumption in ecological psychology seems to be that anything that has an effect on behavior must be grounded in the perception of an affordance and, therefore, must be guided by law-based information.” I think that’s true: ecological psychology is disposed to overlook anything that’s not information-based, first-person control of an activity.

But suppose I punch you in the leg. This will certainly have an effect on your behaviour. You’ll probably hobble around for a bit and complain. Your behaviour here is not grounded in the perception of an affordance. It is not a response to informational input. It is a response to an internal change in you as a system which I, rather rudely, brought about.

Being spoken to is a bit like being punched in the leg, sometimes. When you are spoken to, your attention is directed towards things that you yourself are not in control of selecting. It is the speaker who selects what you are to attend to, and the speaker achieves this by performing an action that perturbs your nervous system in an appropriate way.

To get things a little clearer here, being spoken to is two things:

  1. Being spoken to is acting: in order to engage with someone in the first place it is necessary to seek that person out and to not run away from them, and in order to keep the situation moving it is necessary to actively attend to the situation you are in and to make appropriate adjustments to your own behaviour (as the kid in the video does)
  2. Being spoken to is something you undergo: it is having your nervous system perturbed by the actions of another actor—the speaker

How these two processes fit together is a central problem for any psychology concerned with the problem of acting in a populated environment, or with collective activity.

What should be evident is that the second item here—having your nervous system acted upon (like being punched in the leg)—cannot really be described as a case either of acting or perceiving. It is not a straightforward case of perception-action because what is going on here is not under the control of the person being spoken to.

As a field, we have not yet come up with a satisfying way to capture this non-voluntary aspect of public language as part of our theorizing. My hunch is that a big reason for this is that we have been trying to extend James Gibson’s account of visual perception to deal with something which it is not equipped to deal with. Gibson’s description scheme treats perception as the activity of a single actor exploring its environment, which, for the purposes of much of his work he treats as if it were inanimate: the environment is described as a set of objects and surfaces whose structure is projected into energy arrays. The problems of explaining behaviour in a populated environment are, quite reasonably, largely set aside in Gibson’s books. But as a consequence, the description scheme he ends up with is one that cannot straightforwardly be extended to describe what’s going on in multi-actor settings. If we want to understand how speaking works we’re going to have to look beyond the perceiving-acting behaviour of the individual; we’re going to have to try to understand the larger context within which speaking behaviour is situated.


The editors of this special issue are searching for a way of describing language that is compatible with the anti-representational tenets of the ecological approach. This is clearly a difficult project. There are several traps that must be avoided.

One of these is the temptation to simply map the terms “speaking” and “listening” onto the terms “acting” and “perceiving”. Doing so obscures the nature of speaking as a form of action directed at an environment that is structured by the presence of other actors.

Another trap is the temptation to see linguistic meaning as fundamentally a relation between a word form and an event in the world. This is a disembodied view of meaning. In reality, meaning does not need to be disembodied in this way because meaning is distributed throughout the entire system that a child finds itself in. In the paper I wrote that: “The act of speaking is not meaningful because of the words it contains but because of the effect it has on the meaningfully structured environment into which it is directed.”

Clearly there is much more work to do before we can claim that we have a useful description scheme for talking about language in a properly ecological or enactive way. Are we moving towards one? Hard to say. I’d like to see a lot more attention paid by anti-representational researchers to developmental processes and to the peculiarities of the structure of settings that contain multiple actors. Until we have an account of speaking that is anti-representational from the ground up we risk falling back into the familiar conception of speaking as the sending of messages in a speech code. I think we can do better than that.



8 thoughts on “Speaking is a type of acting, but being spoken to is not a type of perceiving

  1. modvs1

    Hi Ed,
    You briefly mentioned addressing the issue of “referring to fictional entities; story telling; metaphor; etc.” in your “Radical Empiricist Theory of,…” paper. Do you have a paper that deals with language use that has no obvious bearing on ‘acting on relations in the immediate environment’? Also, how does your “non-representational” view stack up against the (really, really, really radical) anti-representationalist views of language espoused by the likes of Matthew Harvey and Stephen. J. Cowley (which I confess I can make no sense of)?


    1. Ed Baggs Post author

      Hi modvs1,

      Thanks for reading!

      You briefly mentioned addressing the issue of “referring to fictional entities; story telling; metaphor; etc.” in your “Radical Empiricist Theory of,…” paper. Do you have a paper that deals with language use that has no obvious bearing on ‘acting on relations in the immediate environment’?

      I have not published anything else as yet. In my thesis I wrote quite a bit about referential communication games. I will post a link to that once it goes online.

      The ‘theory’ is that speaking, in communication, is always acting with reference to relations in the environment. The object being referred to does not need to be present. If I ask whether you saw the game last night, and you did see it, then there is a relation there that precedes the existence of the question. If you reply ‘What game?’, then I learn that the nature of that relation is not what I thought it was.

      Storytelling maybe creates new relations that weren’t already there. But the story still has to be understandable, so those relations still have to be anchored in some way to existing experience.

      Also, how does your “non-representational” view stack up against the (really, really, really radical) anti-representationalist views of language espoused by the likes of Matthew Harvey and Stephen. J. Cowley (which I confess I can make no sense of)?

      I’m not sure. They seem to spend a lot of time arguing against things like common ground and the sender-receiver model. Rejecting cognitivism. I can get behind that. But like you I have had little success working out what they’re proposing to replace it with.


  2. modvs1

    Hutto and Myin (of the “Radical Enactivist fame”) concede that denying “content”- which I take to mean “mental imagery/representation”- at the (socio-cultural) level of languaging “…is a bridge too far”. Their view is not that there is no such thing as ‘cognition involving representation’ simpliciter, but at least there are some (many) varieties of ecologically significant activity that don’t involve the consultation of ‘inner models’ (vis. Brooks’ “the world is its own best model or Haugeland’s “reliable presence”). Inman Harvey in a paper (“Misrepresentations”), argues that you only get representation at the behavioural/personal level of communication- i.e., no such thing as representation/symbol manipulation/LOT at the sub-personal level. As someone who pushes an anti representationalist agenda from the Brooks/Beers camp, he’s Ok with Rick Grush’s sensory-motor emulation model. Andy Clark and Josefa Toribio co-authored a tidy little paper (“Doing without Representing”), where they make the eminently sensible claim that the extent to which an agent might depend on ‘(decoupled) cognition involving representation’ is a continuum. At one end the not-so-radical enactivist’s can have their cake; as you slide along you get an increasing dependence on “bouts of offline reasoning/action oriented reps”; and at the far end you have a more full blooded reliance on externally formatted varieties: maps, lists, signage, diagrams, instructions, text, tables, graphs, directories, technical drafting, etc. What problem specifically do you have the existence, or the role of (anticipation/prediction, goal registration, planning, rehearsal, logical/counterfactual reasoning, imagination, mind wandering, thought, languaging, remembering, etc.,) – call it what you like- “mental imagery”, “top down sensory motor emulation”, or neural reuse/redeployment accounts there of (Barsalou, Kosslyn, Anderson, et al.)? There’s a few papers knocking about (e.g.: Dentico, Cheung, Tononi, et al. (“Reversal of cortical information flow during visual imagery,..”, (2014))) intending to show that the flow of cortical signal switches direction (occipital/ parietal) when presented with visual stimuli vs. mental visualisation tasks- empirical evidence for neural reuse perhaps? Pace Harvey/Cowley, is there not something fundamentally disingenuous and ironic about using language to “have the thoughts of others coordinated/directed”; to “represent what one takes something some phenomena to consist in”, and then proceed to say that what one said is not a representation of anything, because “Hay, representation is not a thing!”? What exactly (if not mental imagery) is one attending to when “the object being referred to is not present”?


    1. Ed Baggs Post author

      You’re asking a couple of questions here.

      Why even try to study language without at least allowing the possibility that representations are involved? A couple of reasons: 1) It’s not clear what representations are supposed to be, exactly; 2) We won’t know whether it’s possible to properly do without them unless we actually try to properly do without them.

      What are we attending to when we talk about a fictional object, if not a mental image? Well, I’m not sure attending is the right word. If I tell you to think about a unicorn you might have some vague experience of a unicorn. But you are not looking at your own mental image. What is this mental image? Can we not say that you are creating the image? That you are enacting the unicorn? Is a mental image really an image? Is it not behaviour?


  3. modvs1

    “But you are not ‘looking’ at your own mental image”. I thought the neural reuse account covers this? The same infrastructure involved in ‘bottom-up’ perceptual processes are “reused/redeployed” top-down wise (reversal of signal flow in cortex). Accordingly, “Imagining is seeing” (smelling, hearing, palping…)- albeit in reverse”

    “Instead a sentence should be thought (“thought”= “represented as” (?)) as being directed at the relation that connects a listener to some aspect of the structure of the world”. So in the absence of something trotting about the environment answering to the name “Ribbons”, what “structure in the world” are we using speech to direct the attention of our listener to? As Haugeland/Brooks point out, “there’s no need to re-present, what is (reliably) present”, but where for art thou unicorn? Are you saying we can’t have a meaningful conversation about spatially and temporally absent objects, events (and their affordances)?

    Another issue: Is the referent or extension of your theory (or anyone’s for that matter) itself some reliably present structure in the environment; one that can be “acted upon”? Is it a “knowledge relation”? If so, you’d have to buy into such things as meta-affordances (?) If we can use language (speech) to express something that is obviously not a structure in the environment- e.g.: theories- then wouldn’t anti-representationalism about language be a non-starter? If you’re wanting to “properly do without representations” you’ll need to give up the activity of theorising, since by your own lights I’m not quite sure what “piece of structure in the environment” that you’re trying to direct my attention to.


    1. Ed Baggs Post author

      So in the absence of something trotting about the environment answering to the name “Ribbons”, what “structure in the world” are we using speech to direct the attention of our listener to? As Haugeland/Brooks point out, “there’s no need to re-present, what is (reliably) present”, but where for art thou unicorn?

      Ribbons – good name for a unicorn 🙂

      I agree that it is odd to describe speaking about an absent unicorn as directing someone’s attention to it. There’s no doubt a better, less misleading way to describe this. The paper was really about the earlier stages of speaking in childhood. Certainly the first words we learn have to be learned in the presence of the objects being referred to. Later on things become rather less constrained by what’s in the immediate environment – through what Vygotsky calls internalization, or what Holt calls the recession of the stimulus. And literacy probably changes things in a qualitative way again. The next step is to try to marry Vygotsky’s psychology with Gibson’s or with the enactivists’. This is a project that simply hasn’t been properly attempted yet, as far I’m aware.

      And the same goes for speaking about theories. By the way, theories are out in the environment. Where did you learn about Newton’s laws? They didn’t grow inside you. You encountered them in your environment. Go and talk to the physics teacher at your local high school. You’ll find it’s perfectly possible to have a conversation about Newton’s laws without insisting that the teacher climb into your fMRI machine so you can inspect her brain. The teacher is in your environment. But I guess you’re suggesting that the theory isn’t in the environment like a satsuma is in the environment – you can’t point to it. Well, true. And neither can you point to the satsuma’s silence or its zestiness. That kind of speaking can only be carried off once you’ve done a lot of Vygotsky style learning. Call it the ascent to the concrete. Dialectical materialists love that stuff.


  4. modvs1

    I’ve never quite been able to figure it out, but is the ‘anti-representationalism’ (rife) in enactivism really just a ‘direct realist’ stance regarding experience/perception (in contrast to indirect or ‘representative’ realisms)? Or is it a rejection of GOFAI ‘sub-personal symbol manipulation/LOT’. Or is it just straight-up ‘black box’ (neo) behaviourism? Convictions about ‘internal’ representations aside, what, by your lights, would a sensible account of the externally formatted kinds consist in (books, signs, maps, directories, graphs, instructions, recipes, sketches…)?


    1. Ed Baggs Post author

      Late with this reply…

      The second, I’d say – it’s a rejection of symbol juggling.

      As for the other question: I don’t know that anyone particularly objects to the notion of external representations. Is there some argument about that? It seems perfectly reasonable to say that a written word represents the sound of that word. The question is: does learning necessarily result in the formation inside the brain of something analogous to that external representation? Or should we instead conceive of learning as the mastering or growth of behaviour, such that the learner becomes less reliant on such external supports?



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s