A small, scriptable pipeline that generates beginner-friendly sentence cards with audio and exports them as an Anki deck. Example sentences are composed by LLM, while the information about each words in each card is based primarily on Wiktionary data, curated and adjusted via LLM. The current repo targets Tagalog, but the same process can be quickly retargeted to other languages with an LLM edit pass. For updating any single script to another language, you can just ask an AI (e.g., DeepSeek), to revise each given script, see e.g. L2change.md for a prompt.
The pipeline (1) uses gen_lists.txt with an LLM to produce frequency-based word buckets (WORD_BUCKETS) and a curated set of beginner grammar points (GRAMMAR_POINTS); (2) gen_sentence.py composes, based largely on Wiktionary data, validated example sentences (ensuring coverage of target words/points), adds short English notes, and writes a timestamped JSON; (3) gen_voice.py synthesizes per-item WAV audio via system TTS; and (4) gen_card.py builds an Anki .apkg with both directions (L2→EN and EN→L2), bundling audio and metadata.
gen_lists.txt — LLM prompts to (re)build word/grammar lists (step 1).word_list.py, grammar_list.py — seed data generated by gen_lists.txt prompts.gen_sentence.py — generate json/tl_cards_*.json (step 2).gen_voice.py — synthesize WAV audio per card (step 3).gen_card.py — package the Anki deck from JSON + audio (step 4).L2change.md — example prompt for retargeting any single script to another language.genanki, requests, and an OpenAI-compatible clientbal4web.exe) for TTS using Microsoft voicespip install genanki requests openai
Provide an API key via OPENAI_API_KEY (env) or a local config.json.
If needed, edit BAL4WEB and voice names in gen_voice.py.
gen_lists.txt)
Use gen_lists.txt to run the three prompts and create:WORD_BUCKETS (greetings/expressions/grammar_drills),GRAMMAR_POINTS module).
Save to word_list.py / grammar_list.py. (The scripts later import these modules at runtime.)python gen_sentence.py
This script imports your word_list.py / grammar_list.py, calls an OpenAI-compatible chat endpoint, and writes a timestamped JSON like json/tl_cards_YYYYMMDD_HHMM.json plus a small usage summary. A partial checkpoint JSON file is written during generation. For convenience, you can point later steps at the current file:
cp json/tl_cards_YYYYMMDD_HHMM.json json/tl_cards.json
Notes:
WORD_BUCKETS and GRAMMAR_POINTS to be defined; it validates outputs and enforces that targets actually appear in the Tagalog sentences.TEST_MODE defaults to True to produce a tiny sample; set it to False for full generation.json/openai_usage_summary_*.json.python gen_voice.py
Audio is currently generated via Balabolka’s bal4web.exe using Microsoft voices (Tagalog fil-PH), probing for Blessica and Angelo and saving one WAV per card per voice as readings/{id}_Blessica.wav and/or readings/{id}_Angelo.wav. This can be rather straightforwardly to modify to other TTS services and voices.
python gen_card.py
This reads json/tl_cards.json, looks for matching audio in readings/, and creates an .apkg named like AITagalogIntro_YYYYMMDD_HHMM.apkg. Cards are added in both directions per 30-item block, tagged v1 (TL→EN) and v2 (EN→TL), and include dual-audio fields if present. Import into Anki via File → Import….
LLM stage (gen_sentence.py)
config.json or OPENAI_API_KEY.word_list.py/grammar_list.py (must exist and define the expected symbols).TEST_MODE.Voice stage (gen_voice.py)
readings/. You might need to adjust BAL4WEB, voice labels, or LANG_CODE. If your system doesn’t have Tagalog voices, either skip audio (the deck still builds) or switch to another TTS provider.Deck stage (gen_card.py)
json/tl_cards.json; if you generated a timestamped file, copy it to that path first.{id}_Angelo.wav / {id}_Blessica.wav (filenames matter). Missing audio is tolerated—the note is created without that channel.Other languages / services
gen_voice.py for another service (e.g., Azure TTS, ElevenLabs, Coqui-TTS, gTTS) and keep the output naming convention (readings/{id}_VoiceName.wav). The deck builder will pick up any files that match those names.GPL.