You Have to Speak to Learn to Speak

All essays

You can study Spanish grammar for years and still freeze when a waiter in Barcelona asks what you want. This is not a confidence problem. It is a cognitive one. Understanding a language and producing it are two fundamentally different processes in your brain. You can have a large passive vocabulary, recognise complex sentence structures, and follow a conversation at full speed, and still be unable to form a basic sentence when it is your turn to speak.

The research explains why. And it points to a specific solution that most language apps completely ignore.

Swain's output hypothesis

In 1985, Merrill Swain published a paper that challenged the dominant theory of her time. The prevailing view, championed by Stephen Krashen, was that language acquisition happened primarily through comprehensible input. Hear enough of the language at the right level and you will acquire it. Swain looked at Canadian French immersion students who had spent years absorbing high-quality French input in classrooms. They understood French well. Their grammar when speaking it was poor.

Swain's conclusion was that input was necessary but not sufficient. Learners also needed to produce language. She called this the output hypothesis, and she identified three specific functions that output serves.

Output may stimulate learners to move from semantic, open-ended, nondeterministic, strategic processing prevalent in comprehension to the complete grammatical processing needed for accurate production.
Swain, M. (1985). "Communicative competence: Some roles of comprehensible input and comprehensible output." In Gass, S. & Madden, C. (Eds.), Input in Second Language Acquisition, 235-253.

First, the noticing function. When you try to say something in Spanish and realise you cannot, you notice the gap between what you want to say and what you are able to say. That noticing is not a failure. It is a critical learning event. It directs your attention to specific structures you need to acquire. Passive listening never produces this. You can listen to a sentence and understand it without ever noticing that you could not have produced it yourself.

Second, the hypothesis-testing function. Every time you construct a sentence, you are testing a hypothesis about how the language works. "I think the adjective goes after the noun, so I will say 'la casa roja.'" If the person you are talking to understands you, the hypothesis is confirmed. If they look confused or correct you, the hypothesis is revised. This is how grammar is internalised in practice, not through rule memorisation but through repeated production and feedback cycles.

Third, the metalinguistic function. Producing language forces you to reflect on language as a system. You start thinking about how verbs conjugate, why certain prepositions pair with certain verbs, where emphasis falls in a sentence. This kind of reflection does not happen when you are passively consuming content. It happens when you are building sentences from scratch.

The evidence for output

Izumi (2003) tested this directly. In a study published in Applied Linguistics, learners were divided into groups that either reconstructed target sentences (production) or were simply exposed to them (comprehension). The production group showed significantly greater noticing of target structures and superior performance on post-tests. The mere act of trying to produce language focused attention in ways that comprehension alone did not.

Izumi, S. (2003) — "Comprehension and production processes in second language learning." Applied Linguistics, 24(2), 168-196. Found that output tasks promoted more noticing of target linguistic features than comprehension tasks, and that this noticing correlated with improved learning outcomes.

De Bot (1996) applied Levelt's speech production model to this question. Speaking involves three stages: conceptualising what you want to say, formulating the grammatical and phonological structure, and articulating the sounds. Listening comprehension only exercises the reverse path, from sounds back to meaning. Production and comprehension are literally different circuits. This is why people can live in a Spanish-speaking country for years and still struggle to order coffee.

The social anxiety problem

If speaking practice is so important, why do most learners avoid it? The answer is obvious to anyone who has tried. Speaking a language you are still learning, in front of a person who speaks it natively, is terrifying. You feel slow. You make mistakes. You worry about being judged. So you default to the comfortable option: another vocabulary list, another grammar video, another episode of a Spanish show with English subtitles.

This is a genuine problem, not a character flaw. The affective filter (another Krashen concept) is real. Anxiety reduces the brain's ability to process and produce language. Practising output in a high-stress environment can actually be counterproductive if the stress level is high enough to shut down the cognitive processes that make speaking practice valuable.

The research suggests the solution is low-pressure production practice: environments where you can speak, make mistakes, receive feedback, and try again without social consequences. Language exchange partners help. Talking to yourself helps. But the most scalable version of this is a system that listens to you, evaluates what you said, and responds without judgement.

How Habla's Talk module works

The Talk module uses speech-to-text to evaluate your spoken Spanish in real time. You are not selecting from options. You are not reading a script aloud. You are producing language from scratch in response to prompts and conversational cues from an AI partner.

This triggers exactly the functions Swain identified. You notice gaps when you try to say something and cannot find the word. You test hypotheses about grammar when you construct a sentence and the system confirms or corrects it. You reflect on language structure when you rephrase something that did not come out right the first time.

The AI conversation partner removes the social anxiety that makes real-world speaking practice so difficult for beginners. There is no native speaker raising an eyebrow. There is no waiter waiting impatiently. There is a system that responds to what you said, regardless of how long it took you to say it or how many mistakes you made getting there.

From scripted drills to real conversation

The Talk module is structured as a progression. Early levels use guided prompts: the system sets up a scenario and walks you through it with support. As your level increases, the scaffolding reduces. The prompts become more open-ended. By the intermediate levels, you are having something close to a genuine conversation, with all the improvisation that real conversation involves.

Swain updated her output hypothesis in 2005, twenty years after the original paper. Her conclusion had not changed. "Output pushes learners to process language more deeply than input alone." The evidence published in the intervening decades had only strengthened the case.

You cannot learn to speak a language without speaking it. That sounds obvious. But look at how most language apps are designed: reading, matching, selecting, tapping. Very little actual production. Very little speaking. The research from Swain, Izumi, and de Bot says that is backwards. Production is not the final test of learning. It is part of the learning process itself.

Habla's Talk module exists because the science left no alternative. If you want to speak Spanish, you have to speak Spanish. The question is whether you do it in front of a stranger who makes you nervous, or in a low-pressure environment specifically designed to make speaking practice actually work.