Karukan: Offline neural kana–kanji IME with llama.cpp and live conversion

An offline neural kana–kanji IME for Linux and macOS that uses llama.cpp for context‑aware, live candidate generation; integrates Sudachi/Mozc assets and provides frontends and dictionary tools — suited for privacy‑conscious power users and researchers.

GitHub togatoga/karukan Updated 2026-07-02 Branch main Stars 580 Forks 35

Rust/Swift/C(FFI) Japanese input (IME) Offline neural inference Live conversion / dictionary tools

💡 Deep Analysis

How does karukan collaborate with Sudachi/Mozc dictionaries and candidate rewriters to compensate for neural model shortcomings?

Core Analysis ¶

Collaboration Goal: Combine the neural model’s contextual judgment with traditional dictionary/rewriter strengths in formatting and candidate coverage to mitigate each approach’s weaknesses.

Collaboration Mechanism ¶

Sudachi as base dictionary and morphological analyzer: Builds system dictionaries to improve coverage and provides lemma/pos information for boundary detection.
Neural engine for contextual candidates: GPT-2 style model produces candidates that are more natural and context-appropriate for longer contexts.
Mozc candidate rewriter as post-processing: Generates variants (half-width/full-width, case, numeric forms like 漢数字/丸数字/hex) for neural or dictionary candidates and attaches annotations for user clarity.

Practical Recommendations ¶

Regularly build and update Sudachi system dictionaries via karukan-cli to keep coverage and morphological data fresh.
For formatting or special representation needs (code, numbering, dates), prefer rewriter-generated candidates to ensure stability.
When updating models, validate that rewriter coverage remains adequate so neural candidate variability doesn’t reduce final candidate quality.

Important Notice: The rewriter and dictionary data (Mozc-derived) have licensing and compliance considerations—confirm usage rights before deployment.

Summary: karukan combines Sudachi for morphology/coverage, the neural model for contextual selection, and Mozc’s rewriter for formatted candidates—yielding balanced accuracy and candidate richness.

87.0%

Why does karukan use `llama.cpp` with a GPT-2-style model for inference, and what are the advantages of that choice?

Core Analysis ¶

Rationale: karukan uses llama.cpp with a GPT-2-style model to enable controllable local inference (privacy/offline) while leveraging autoregressive models for natural sequence-to-sequence style kana→kanji candidate generation.

Technical Advantages ¶

Local inference & privacy: llama.cpp allows inference without cloud services, reducing data leakage risk.
Feasibility: GPT-2 style autoregressive architecture suits mapping kana sequences to kanji candidates; training and deployment are well-understood.
Cross-platform and slimming potential: llama.cpp runs across CPU environments and supports quantization to lower memory and latency.

Practical Recommendations ¶

Use quantized models or llama.cpp acceleration options on desktop machines to ensure responsive live conversion.
For stricter latency or concurrency requirements, consider GPU or a more efficient inference backend.
Keep model and Sudachi dictionary versions aligned to avoid candidate quality degradation from mismatches.

Important Notice: Pure CPU inference may still be slow on low-power devices; quantization or hardware acceleration is often necessary.

Summary: The choice balances privacy, deployability, and sequence prediction capability—suitable for local, context-aware conversion but requires performance tuning for real-time responsiveness.

86.0%

What is the learning curve and common issues when using karukan, and how can they be avoided?

Core Analysis ¶

Learning Curve:
- End users (using packaged binaries/system bundles): Low–moderate; mainly switch IME and enable live conversion.
- Developers/ops: Building from source, preparing model weights, and generating Sudachi dictionaries require mid–high skill (Rust/C builds, llama.cpp model formatting, dictionary pipeline).

Common Issues and Mitigations ¶

Initial model download time and disk usage: Pre-download and cache model files or provide prepackaged images.
Latency on low-power devices: Use quantized models, enable llama.cpp acceleration options, or fall back to dictionary-only mode.
Dictionary/model mismatches: Use karukan-cli to build and verify Sudachi system dictionaries; include integrity checks in CI.
Platform integration problems: Follow karukan-fcitx5 and karukan-macos READMEs; test plugin loading, signing, and permissions ahead of time.

Practical Recommendations ¶

Non-developers should use official binaries/packages; developers should script model download/quantization/dictionary building.
Enable and keep “conversion learning” active so the engine adapts to personal usage over time.
Perform end-to-end tests before production use, including latency measurement and candidate-accuracy sampling.

Important Notice: On constrained devices, prepare a fallback (dictionary-only) to maintain typing responsiveness.

Summary: End users can quickly try karukan’s core features; for stable, high-performance operation, developers must automate model quantization, dictionary builds, and platform integration.

86.0%

How can karukan be optimized on resource-constrained devices (low-end CPU or no GPU) to achieve usable real-time conversion?

Core Analysis ¶

Problem: Local inference via llama.cpp on low-power devices can introduce latency that breaks live conversion responsiveness.

Optimization Strategies (Actionable)¶

Model quantization: Quantize models to 8-bit or 4-bit (if supported) to drastically reduce memory and inference time. Use karukan-cli or existing quantization tools to produce quantized weights.
Tweak inference parameters: Adjust llama.cpp thread count, batch size, and context window length to reduce tokens processed per inference and lower latency.
Tiered neural conversion: Enable full neural conversion only at sentence end or on specific triggers; rely on dictionary candidates during rapid typing.
Warmup and resident process: Keep the engine as a resident process to avoid cold-start delays or perform a hidden warmup inference on IME load.
Prepare a fallback: Auto-fallback to dictionary/rewriter mode when inference is too slow or resources are constrained.

Practical Recommendations ¶

Run benchmarks (AJIMEE-Bench or custom) on target devices to measure latency/quality trade-offs for different quantization/settings.
Expose performance modes (high-precision vs resource-saver) for users to choose.

Important Notice: Quantization introduces minor accuracy loss—validate candidate quality on realistic text samples.

Summary: By quantization, parameter tuning, tiered usage, and fallback mechanisms, karukan can be made usable on constrained devices, though trade-offs between latency and accuracy remain.

86.0%

✨ Highlights

Offline neural kana–kanji conversion powered by llama.cpp
Supports live input with context‑aware candidate generation
Initial run downloads model from Hugging Face — slow and disk‑heavy
Limited community contributions and releases — uncertainty in long‑term maintenance

🔧 Engineering

Neural kana→kanji conversion using a GPT‑2‑style model that leverages context and learns user selections
Provides frontends for Linux (fcitx5) and macOS (InputMethodKit), plus dictionary build and CLI tools
Candidate rewriter ported from Mozc generates varied candidates (half/full width, numeric formats) with annotations
Supports emoji and :trigger style inputs and builds system dictionaries from Sudachi data

⚠️ Risks

Inference relies on local CPU/GPU and memory — poor UX on low‑spec machines
Initial model download is time‑consuming and requires network access, hindering immediate use
Includes Mozc‑derived data (BSD‑3) — third‑party license notices must be preserved to meet legal obligations
Sparse community activity and release history — uncertain for commercial deployment and long‑term support

👥 For who?

Japanese power users and privacy‑minded typists on Linux/macOS
NLP/IME researchers and developers evaluating offline neural conversion and candidate rewriting
Open‑source users wanting offline operation, dictionary extension, and frontend customization