anki_ai

Autogenerated Anki Decks for Language Introduction

A small, scriptable pipeline that generates beginner-friendly sentence cards with audio and exports them as an Anki deck. Example sentences are composed by LLM, while the information about each words in each card is based primarily on Wiktionary data, curated and adjusted via LLM. The current repo targets Tagalog, but the same process can be quickly retargeted to other languages with an LLM edit pass. For updating any single script to another language, you can just ask an AI (e.g., DeepSeek), to revise each given script, see e.g. L2change.md for a prompt.

What the pipeline does

The pipeline (1) uses gen_lists.txt with an LLM to produce frequency-based word buckets (WORD_BUCKETS) and a curated set of beginner grammar points (GRAMMAR_POINTS); (2) gen_sentence.py composes, based largely on Wiktionary data, validated example sentences (ensuring coverage of target words/points), adds short English notes, and writes a timestamped JSON; (3) gen_voice.py synthesizes per-item WAV audio via system TTS; and (4) gen_card.py builds an Anki .apkg with both directions (L2→EN and EN→L2), bundling audio and metadata.

Repo layout

Requirements

Setup

pip install genanki requests openai

Provide an API key via OPENAI_API_KEY (env) or a local config.json. If needed, edit BAL4WEB and voice names in gen_voice.py.

Usage

  1. Prepare lists (via gen_lists.txt) Use gen_lists.txt to run the three prompts and create:
  1. Generate cards (JSON)
python gen_sentence.py

This script imports your word_list.py / grammar_list.py, calls an OpenAI-compatible chat endpoint, and writes a timestamped JSON like json/tl_cards_YYYYMMDD_HHMM.json plus a small usage summary. A partial checkpoint JSON file is written during generation. For convenience, you can point later steps at the current file:

cp json/tl_cards_YYYYMMDD_HHMM.json json/tl_cards.json

Notes:

  1. Synthesize audio
python gen_voice.py

Audio is currently generated via Balabolka’s bal4web.exe using Microsoft voices (Tagalog fil-PH), probing for Blessica and Angelo and saving one WAV per card per voice as readings/{id}_Blessica.wav and/or readings/{id}_Angelo.wav. This can be rather straightforwardly to modify to other TTS services and voices.

  1. Build the Anki deck
python gen_card.py

This reads json/tl_cards.json, looks for matching audio in readings/, and creates an .apkg named like AITagalogIntro_YYYYMMDD_HHMM.apkg. Cards are added in both directions per 30-item block, tagged v1 (TL→EN) and v2 (EN→TL), and include dual-audio fields if present. Import into Anki via File → Import….

More details & caveats

Notes

License

GPL.