Italian Fluency System

Solo build · 2026 – present · in-progress

The job: actually speak Italian. Not collect another year of flashcards. About fifteen minutes a day.

I’d done the comfortable half for years. Reading, listening, vocabulary apps. Input. The thing I’d never really done was open my mouth, so the language just sat there, inert.

So the honest diagnosis wasn’t “do more hours.” It was: you don’t produce, and Italian doesn’t live anywhere in your week. That’s a different problem, and it has a different fix. Not more effort. Reallocation. Trade one passive session for one where you actually talk, and give the language somewhere to show up.

The motivation part I’ve already got covered (real reasons to use this, family ties to Italy and trips I actually take), which is where most people fail. So the interesting problem isn’t wanting it. It’s the method. And the method is a system-design problem, which happens to be the part I enjoy.

Here’s the whole thing, including the parts that didn’t work.

One job per tool

Recognition and production are different skills. And you get good at exactly what you practice, nothing else. The research name for this is transfer-appropriate processing: drill a recognition card and you get better at recognizing the word, not at yanking it out of your head while you’re mid-sentence. Those two don’t transfer to each other nearly as much as you’d hope.

Most apps smear the two together and end up doing both badly. So I gave every tool exactly one job:

Job	What it means	Tool
Recognition	Understand a word or form when you read or hear it	Anki, recognition only
Production	Pull a word or form out of your head to say it	Clozemaster (cloze in context) and real conversation
Input at volume	Make comprehensible input the engine	Easy Italian, reading, podcasts
Output with correction	Where grammar actually proceduralizes	AI voice partner and a debrief loop

Everything below is just an instance of that split.

The decks, concretely

The spaced-repetition layer is Anki. But I never touch the Anki UI. The cards live in a handful of tab-separated text files, and a small Python script shoves them into Anki over AnkiConnect. It’s idempotent, so it only adds what’s new and skips the dupes. I add a row, run one command, the card exists. Everything is plain text in git, so my whole vocabulary is greppable, diffable, and mine. I am not letting years of my own learning get trapped inside some app’s database.

Three decks. Each one has a job.

Core is everything that isn’t a verb: nouns (with the article and gender baked right in, because gender is the part that bites), adjectives, adverbs, set-phrases. Bidirectional, Italian to English and back. Real rows:

la mano        the hand               f! despite the -o ending
il problema    the problem            m! Greek -ma ending
la gente       the people             f singular; takes a singular verb
magari         maybe / if only        adv
addirittura    even / actually        adv
ferito         injured / wounded      past participle of ferire

Verbs is infinitives, with the irregularities that actually trip me up, also bidirectional:

rimanere   to stay / to remain   irregular (rimango); aux essere; pp rimasto
capire     to understand         -isc verb (capisco)
spiegare   to explain            keeps the hard g; spiegalo = explain it

Then two read-only sub-decks for stuff I only need to recognize, not produce. A conjugation deck drills a whole paradigm as one card, because one-card-per-person is how you drown:

fare - present   →   faccio, fai, fa, facciamo, fate, fanno

And a recognition deck targets the opaque forms I’d actually trip over in the wild, mapping each one back to its infinitive:

vorrei   →   volere: I would like (conditional)
vanno    →   andare: they go (present)
messo    →   mettere: put (past participle)

Regular verbs barely show up in any of this. Six cards teach the regular endings, and after that every regular verb is free. Only genuine irregularity earns a card. The deck stays small on purpose. A big deck is a deck you stop reviewing.

The newest deck, and the bug it fixes

Same problem kept happening. I could recognize a form fine while reading. I could recite the whole table if you put me on the spot. And I’d still freeze mid-sentence on the dumb little ones: do, devo, vado, faccio, so, sto.

They’re a confusion cluster. Short, similar-sounding, completely unrelated in meaning, impossible to derive from anything. Bumping into them once every few hundred Clozemaster sentences never fixed it. And reciting the table is the wrong skill anyway. It builds table-reciting, not talking.

The fix is almost embarrassingly simple, and it falls straight out of the lexical research: don’t store the form as a cell in a grid, store it inside a phrase you’d actually say. The proof was already in my own mouth. I never get so wrong inside non lo so. Never. Because it lives in a chunk.

So I built a deck that welds each annoying form to one short phrase, cued from English:

"it's cold"            →   fa freddo        (fa · lui/lei)
"I have to go"         →   devo andare      (devo · io)
"I don't know"         →   non lo so        (so · io)
"let's go!"            →   andiamo!         (andiamo · noi)
"what are you doing?"  →   cosa fai?        (fai · tu)
"I'll do it myself"    →   lo faccio io     (faccio · io)

Yes, this breaks my own “Anki is recognition only” rule. On purpose, for exactly the one case the rest of the system left uncovered. Sixteen irregular verbs, about five forms each, drilled both directions so that reading the Italian out loud also trains pronunciation. The grey hint on the back ties the chunk back to the grammar, because glancing at the chart mid-sentence isn’t cheating. It’s how declarative knowledge turns into automatic knowledge. The goal is to need the chart less, not never.

Clozemaster: what I use, and what I threw away

Clozemaster blanks a word inside a real sentence and makes you fill it in. That’s producing the word with all the scaffolding still standing. Good tool. The sentences come from Tatoeba, an open corpus, so they’re real, not generated.

Now the honest part. I tried to be clever first. I generated my own custom collections: tense-targeted decks of cloze sentences, written with an LLM, one per tense, uploaded as files. On paper it was perfect. The ideal production engine, tuned to exactly what I wanted to drill.

I didn’t use it. The sentences ran long and felt a little off, weird in a way that made every rep a chore, so I stopped opening it. The clever pipeline lost to the boring built-in tool I’d actually do.

So I deleted my collections and switched to Clozemaster’s own Fast Track and Grammar Challenges. Real sentences, zero maintenance, and the one property that actually matters: I play them. A clever thing you avoid loses to a boring thing you do, every single time. I keep having to relearn this.

Reading turns into vocabulary on its own

I read native material in Readlang. Click any word, the translation shows up inline, and it keeps a running list of everything I looked up. Those words export, and a short routine turns the inflected forms back into their base (gira, siede, abbracciano become girare, sedersi, abbracciare), dedupes them against what’s already in the decks, files them into Core or Verbs, and syncs. The reading is the input. The words I tripped on become tomorrow’s cards. No retyping. The reading does double duty.

The speaking gym

People skip speaking because it’s scary. So I took the people out of it. An AI voice partner has infinite patience, is available at any hour, and won’t judge me. I give it a persona and a leash (“you’re an Italian friend, keep it simple, correct my single biggest mistake after each reply, then keep going”), and it corrects one thing at a time instead of every flaw.

Then I close the loop, which is the part everyone skips. After a conversation I paste the transcript into a debrief routine. It pulls out the words I fumbled or that the AI fed me, writes them into my vocabulary layer with a real example sentence lifted from the actual chat, and flags any grammar mistake that’s now shown up enough times to be worth a real look. The throwaway conversation feeds the decks. The decks make the next conversation easier. That’s the loop.

Turning a video into a deck

The most fun piece. I wrote a Claude Code skill that eats a native-Italian episode transcript (I use Easy Italian, real unscripted conversations, with transcripts) and spits out a spaced-repetition deck for the whole episode, plus a shortlist of the distinctive vocabulary worth knowing on sight.

Then the loop runs in one direction:

Generate the deck from the transcript.
Drill the sentences in Clozemaster (produce them).
Harvest the distinctive vocab into Anki (recognize them).
Watch the episode again and understand all of it, no subtitles.

The episode is the unit of study. “I followed it without reading” is the win. And the jump always lands on the rewatch, when you already know the words and the audio finally just clicks.

Why each piece is shaped the way it is

None of this is vibes. It’s built on second-language-acquisition research, and every design choice maps to a principle:

Comprehensible input is the engine. Most of what you acquire comes from a high volume of mostly-understandable input, so reading and listening at around 80–90% comprehension is the spine, not a warmup.
Transfer-appropriate processing. You improve at the exact thing you practice, which is the entire reason recognition and production live in separate tools instead of one deck pretending to do both.
Chunks over paradigms. The highest-frequency verbs are irregular, and you’re better off storing their forms as memorized units (ho fatto, non lo so, devo andare) than rebuilding them from a table in real time. That’s the whole basis of the production deck.
Declarative to procedural. Knowing a rule and being able to use it are different stages, and the second one only comes from real communicative practice. So grammar tables are a reference shelf, and conversation-with-correction is the actual engine.
Intensive vs extensive input. Short, decode-every-word clips when I’ve got focus. A comfortable volume of familiar, enjoyable stuff when I’m tired. Running the intensive method on a whole video is the classic way to burn out, so I don’t.
Focus on form. Leave grammar alone until something you keep hearing finally bugs you, then go look up that one thing. Grammar you study to scratch an itch sticks. Grammar you study cold evaporates.

Every choice points back to one of these. That’s what keeps the thing from turning into busywork that feels like progress and isn’t.

The actual bet

It’s the same bet I keep making everywhere: the leverage is in the system design, not in grinding harder at the human part. The machine does the boring stuff. The retyping, the dedup, the deck generation, the tireless 11pm conversation. I do the one thing that doesn’t transfer to a machine, and doesn’t transfer between skills either: producing the language, out loud, under a little pressure.

Same move as the Agent Pipeline and Work Context Protocol, just pointed at a hard human skill instead of at code. The augmentation thinking behind all of it is over in /writing.

Links

Tags: ai, claude-code, language-learning, spaced-repetition, augmentation