In the second half of this article, I present a detailed curriculum for learning any language, that applies all of the concepts outlined here. Scroll down to “A Curriculum for Adult Associative Language Learning” if you are just interested in that.
I believe we should model our adult language learning strategy after how babies naturally acquire language, despite evidence for a critical period for learning and other differences in the hardware and firmware of the infant/adult brain.
This is because I believe that even in adults, associative learning + lots of training data can sometimes beat out “smarter” rule-based learning for language acquisition.
Think of the first time you saw a trending neologism like “rizz” or “lmao”. How long did it take for you to then reach a native-level proficiency at understanding and using the word? Likely, just seeing it one or two times in various contexts was already enough to acquire a fluent sense for it. You might not even have had to look up the word’s definition because the contexts (memes, tweets, texts) you saw the word in were rich enough that you were able to infer all the nuances of meaning.
This is possible because our brains are incredibly efficient at leveraging rich contextual information to acquire new concepts via associative learning. Imagine if a few examples was all it took for you to reach fluency with any word or concept in a foreign language.
This is the core motivation behind comprehensible input, coined and popularized by Stephen Krashen, as a method for language learning (and also the common principle behind Dreaming Spanish, AJATT, Refold, etc).
Comprehensible input is language input that can be understood by listeners despite them not understanding all the words and structures in it. It is described as one level above that of the learners if it can only just be understood.
By building our entire language learning strategy around it, we can maximally leverage the overwhelming efficiency of associative learning + rich contextual samples.
This is not to say that rule-based learning is to be entirely rejected. I think a hybrid approach that uses rule-based learning in order to bootstrap comprehensible input is necessary. This is because, while babies are able to naturally acquire languages entirely through comprehensible input and associative learning, the amount of training data they require to do this is vast, almost impossible for an adult to replicate.
Let’s take a quick detour to substantiate this point with some numbers.
Infant Language Acquisition By the Numbers
We can do some exceedingly blurry back-of-the-napkin math guess how many hours of comprehensible input babies receive by 5 years of age, assuming that they start to process language sounds around 6 months of age:
Infants (6–12 months): About 6 hours/day of language exposure (50% of 12 waking hours).
Toddlers (1–3 years): About 7.5 hours/day of language exposure (75% of 10 waking hours).
Preschoolers (3–5 years): About 8 hours/day of language exposure (75% of 12 waking hours).
In total, this sums to very roughly 12,000 hours of language exposure, as a baby’s full time job, just to proficiently acquire one language.
Let’s compare this to traditional measurements of adult language acquisition. We’ll use official estimates from language learning institutions (FSI, ACTFL, ILR, CEFR) for total hours spent to reach fluency for an English native as a baseline.
So it looks like natural language acquisition is not as efficient of a process as it seems.
However, while babies need a lot of hours, we must note that they both achieve native-level mastery (in pronunciation, cultural nuances, and subtle grammar) AND do so without ANY structured rule-based learning. They do so entirely through context-rich, interactive input and associative learning.
While adults can neither do this nor afford to spend this much time, adults can leverage explicit learning strategies that babies cannot. I think by combining both strategies, we can do much better than the official adult learning estimates given above.
A Note to CI Purists
I want to quickly respond to some who may, in fact, believe that comprehensible input is the ONLY thing you need, and that any structured, rule-based learning is counter productive (Dreaming Spanish is one platform built on this idea).
I think this is extreme, the real threat of rule-based learning is relying only on isolated vocabulary and grammar study. It’s true, relying too much on native translations for words and grammar can prevent the deeper acquisition process from taking place. This aligns with the experiences of many learners who find themselves still mentally translating even at intermediate levels.
However, I’d argue that using translations strategically—especially for high-frequency words / grammar concepts, and difficult-to-guess meanings—while engaging with large amounts of CI strikes a balance that surpasses either methods alone1. While translation does create initial connections to the native language, it doesn’t necessarily form a hard-to-undo pathway. Instead, learners can transition from translating to direct understanding, a claim which is supported by research on how we move from declarative (conscious) knowledge to procedural (automatic) knowledge in language learning2.
Studies show that this translation-based understanding fades naturally as the learner encounters more and more words in context3. Laufer & Girsai (2008) found that translation combined with explicit teaching outperformed other methods for initial vocabulary retention, emphasizing that translation can serve as a valuable anchor in early stages, particularly for vocabulary items that lack immediate context4. The same is true for explicit learning of grammar. Done initially, this serves as anchoring points to help learners recognize and comprehend basic grammatical concepts during CI. Then, repeated contextual exposure internalizes grammar so that it becomes automatic and “felt” rather than consciously processed5.
Thus, I want to introduce the idea of using rule-based learning as a starting point to bootstrap (massively jumpstart) the process of associative language learning.
The optimal approach is presented below as a curriculum that addresses each fundamental area of language learning: vocab, grammar, input and output.
A Curriculum for Adult Associative Language Learning
Let’s try to prescribe a detailed curriculum that you can apply towards learning any new language. We’ll anchor the method mainly around the discussed ideas in Comprehensible Input.
We’ll also utilize select insights from Cognitive Load Theory / Information Processing Theory.
The cognitive load theory of learning argues that our working memory has limited capacity, and too much unfamiliar information can quickly overwhelm it, hindering learning efficiency6 . By initially relying on native translations to quickly learn the gist of new words, learners reduce cognitive load, freeing up mental resources to focus on recognizing and processing the structure and usage of the word in the new language. This approach provides a quick "hook" for understanding that makes it possible to engage with more complex language content sooner, thus enabling faster vocabulary expansion and comprehension7. Learning resources like graded readers or platforms like Dreaming Spanish are meticulously designed with difficulty levels in order to provide granular graduation in CI difficulty (or cognitive load). In this sense, you could call them a sort of CI ladder - one where you stay at each rung until you are ready to go to the next. But the way I see it, native translation scaffolding adds more intermediary rungs to this ladder, so learners can move upwards faster incrementally and build more momentum (as we know language acquisition builds upon itself).
Cognitive Load Theory also argues that the formation of schemas, “mental frameworks stored in long-term memory that organizes and categorizes information” aids efficiently learning new concepts.
To leverage this, I believe that initial shallow study of translation creates a basic semantic schema—a mental “placeholder”—for new words or structures that reduces the amount of repetitions during CI for the pattern recognition machinery in the brain to deeply acquire the concept.
The second benefit of using associative learning to actually get to proficiency after having bootstrapped a vague sense for the denotation/connotation of a concept is that associative learning is an “averaging function” that generalizes rather than overfits to specific data samples (e.g. dictionary definitions, memorized rules, etc.)
In practice, this means study vocab and grammar just enough to have a rough, vague understanding of it, just the gist of it. I’ll refer to this as having “loosely acquired” the concept. You usually end up in this state when you’ve tried to memorize it previously and then proceeded to forget most of it. This is okay! From here, associative learning using rich contexts will take us the rest of the way. So, when doing CI, if you didn’t understand a word, don’t reach for a dictionary and go look it up immediately. Let it go. Trust that it will come again, and let the pattern recognition build over time.
Okay, all that said, here’s the curriculum.
A Curriculum for Vocab:
Use FSRS Anki cards for loosely acquiring the 1-2k most commonly used vocab words in your target language. FSRS is a modern variant of the famed Spaced Repetition technique with enough anecdotal support that I prefer it.
Here is the card structure:
Front of card is the native language
Back is 2-4 example sentences, with an image, and audio file for each sentence.
Showing the native language first forces active recall rather than recognition. This avoids something you may have experienced where after enough repetitions with a set of flashcards, you’ve found yourself overfitting to the cards themselves. You’re able to recite the back-of-card not because you’ve acquired the concept deeply, but because you’ve memorized the shape of the words, or the sequence of the set overall - in essence you’re clued in by extraneous info.
We want to maximize the richness of the context so that our brain can pick up on many sources of info to associate the unknown with. This is optimal for committing something to long term memory. Multimodality increases the efficiency of encoding the learned information by leveraging more association on a neurological level. Example sentences convey much greater bandwidth of information than lone words - grammar, connotative and denotative nuances, etc.
If possible, find, or otherwise, create a deck of the most popular 2k words in the language that follows this exact format, and start studying it in Anki with FSRS algorithm.
There is a difference between learning vocab, and acquiring it. Learning is just memorization. It is a single connection between that word and a translation - flimsy, and incomplete. Acquiring involves the pronunciation, denotative and connotative range of nuances, grammaticality, usage, etc. This happens through pattern recognition over many exposures to the word in rich contexts - this is what CI is for.
Thus, the work we do here with FSRS cards is not to acquire the vocab, it is simply to prime our brain for the acquisition of the word during CI. As a result, it is optimal to align your vocab study with the distribution of words encountered in your CI. Statistically, though, picking the top 1-2k words in the language should be “good enough”. The more important thing is to not over prioritize this. CI repetitions are more important!
A Curriculum for Grammar:
The same intuition and motivation described above for learning vocab to bootstrap our way to acquiring, also applies to structured grammar study.
The key is to keep it lightweight - Rather than learning all grammar points, focusing on high-frequency grammar structures that enable basic comprehension is most efficient. Pick the top N most common grammatical concepts and study them explicitly to further enable CI.
Additionally, we will pick up a “feel” for grammar through CI. Because our brains are pattern learning machines. From Dreaming Spanish:
“A learner will repeatedly be exposed to certain words appearing attached to certain other words. After that being reinforced over many times, the different grammatical categories are subconsciously separated in our brain. Nouns are separated from verbs and adjectives. Countable and uncountable nouns are figured out, as well as transitive and intransitive verbs, and many other categories and distinctions. It even figures out the grammar that grammarians themselves haven’t been able to. The brain figures out these patterns as a way to reduce the amount of effort needed to acquire new words and store them.”
Once the pattern has been figured out for countable plural nouns (like “chairs”, “houses” and “people”) our brain may only need one instance of the usage of a new word to figure out how to use it grammatically. For example, in the sentence “I found so many lice on my son’s hair!” the word many tells us that the word lice has to be a noun, has to be countable and has to be plural. By connecting the new word lice to the existing brain structure that represents a word’s “plural-ness”, the brain avoids having to figure out from zero how to use that particular word, saving time and storage space in our brain.
A Curriculum for Input:
This should be the MAJORITY of your time!
It is important to acquire an internal understanding of how proper speech sounds like first before attempting any output (which includes reading because of subvocalization). Raw audio input hours allows us to obtain this.
When reading comprehensible input, Krashen advocates for around 95% or more of the words should be known. For audiovisual data, this is likely somewhat lower, maybe around 80-90%. This is why video (images + audio) is the best delivery for CI (comprehensible input):
Provides the richest contextual information for associative learning to hook into
Because of the richer context, you can learn more unknowns at once (enables lower CI ratio of known:unknown)
Imitates how babies learn (they see an aligned input stream of visual and audio data, think Mom pointing at the car and saying “Car!”)
Pairs CI audio with non stop stream of images as context
Multimodal
MUCH higher bandwidth and throughput of pure information than flashcards
Must watch without native subtitles! CI by definition should be comprehensible without subtitles.
The best video is where the imagery is maximally aligned with the semantic of the language. Think the baby’s POV watching mom talking (lots of pointing, hand gestures, multimodal delivery of CI, slow talking, repetition). This seems optimal for picking up vocab.
Look for “graded videos” (as in graded readers but for videos). Dreaming Spanish is one platform that is entirely built around this concept for Spanish, but hopefully you can find others. For Korean, there’s this Youtube channel.
A Curriculum for Output:
As soon as you understand what correct pronunciation sounds like in the language, however, you should start to practice some output. Don’t do this before you have a strong feel for what that is though! Otherwise, you risk the long term formation of a foreigner accent.
Output offers repetitive practice with retrieval, vocal muscle memory, form connections between words and structures. Crucially, a conversation gives both corrective feedback and rich, contextual, and interactive CI.
Finally, interestingly, language learning speeds up as you get better at it. For a new word you encounter in your native language, it might take only a few instances, or even just one, for you to attain a native level “feel” for the word.
Why? When you get better at the language, you’ll be more used to the sounds of the language, so you’ll be more likely to recognize the sounds that form each word. Your brain will have a much easier time remembering a word, since it will just have to put together the sounds that already exist in your head, instead of having to record the sound of each word as a completely new concept. Besides becoming familiar with the sounds of the language, you’ll also intuitively get used to what combinations of letters are likely to appear and which ones aren’t. That will make it even easier to learn new words.
CI is a cumulative process: reaching higher fluency expands your comprehensible range of input, which increases the surface area of your language input exposure.
Ellis, N. C., & Ferreira-Junior, F. (2009). "Construction Learning as a Function of Frequency, Frequency Distribution, and Function." Modern Language Journal, 93(3), 370–385.
Kroll, J. F., & Sunderman, G. (2003). "Cognitive Processes in Second Language Learners and Bilinguals: The Development of Lexical and Conceptual Representations." The Handbook of Second Language Acquisition, 104–129.
Jiang, N. (2000). "Lexical Representation and Development in a Second Language." Applied Linguistics, 21(1), 47–77.
Laufer, B., & Girsai, N. (2008). "Form-Focused Instruction in Second Language Vocabulary Learning: A Case for Contrastive Analysis and Translation." Applied Linguistics, 29(4), 694–716.
Hulstijn, J. H. (2001). "Intentional and Incidental Second-Language Vocabulary Learning: A Reappraisal of Elaboration, Rehearsal, and Automaticity." In P. Robinson (Ed.), Cognition and Second Language Instruction. Cambridge University Press.
DeKeyser, R. M. (2007). "Skill Acquisition Theory and the Role of Practice in L2 Learning." In B. VanPatten & J. Williams (Eds.), Theories in Second Language Acquisition. Routledge.
Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive Load Theory. Springer.