From 93d1a67c3668fe36f5379467fdedf219c083d687 Mon Sep 17 00:00:00 2001 From: Gu Date: Thu, 6 Nov 2025 12:28:12 -0800 Subject: [PATCH] docs: update tools/translate/README.md to reflect current system MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major corrections: - Remove outdated info about not overwriting files (we now have incremental updates) - Fix language directories (zh-hans → cn, ja-jp → jp) - Add surgical reconciliation documentation (move/rename detection) - Add PR workflow explanation (execute vs update) - Add move/rename testing examples with expected log output - Add troubleshooting for move/rename issues - Update key files list with correct workflow names - Remove completed TODO about supporting updates - Remove deployment section (repo-specific) --- tools/translate/README.md | 222 +++++++++++++++++++++++++------------- 1 file changed, 146 insertions(+), 76 deletions(-) diff --git a/tools/translate/README.md b/tools/translate/README.md index 169cc72d..9c1730b5 100644 --- a/tools/translate/README.md +++ b/tools/translate/README.md @@ -1,125 +1,195 @@ # Automatic Document Translation -Multi-language document auto-translation system based on GitHub Actions and Dify AI, supporting English, Chinese, and Japanese trilingual translation. +Multi-language document auto-translation system based on GitHub Actions and Dify AI, supporting English, Chinese, and Japanese. -> **Other Languages**: [中文](README.md) | [日本語](README_JA.md) +> **Other Languages**: [中文](README_CN.md) | [日本語](README_JA.md) ## How It Works -1. **Trigger Condition**: Automatically runs when pushing to non-main branches -2. **Smart Detection**: Automatically identifies modified `.md/.mdx` files and determines source language -3. **Translation Logic**: - - ✅ Translates new documents to other languages - - ❌ Skips existing translation files (avoids overwriting manual edits) -4. **Auto Commit**: Translation results are automatically pushed to the current branch +### Workflow Triggers + +1. **Execute Workflow** (New PRs): + - Triggers when PR is opened with `.md/.mdx` changes in `en/` directory + - Creates translation PR with fresh translations for all changed files + - Translation PR tracks the source PR + +2. **Update Workflow** (Incremental Changes): + - Triggers on new commits to source PR + - Updates existing translation PR with incremental changes + - **Context-aware translation**: Uses existing translation + git diff for modified files + - **Surgical reconciliation**: Detects and applies move/rename operations + +### Translation Operations + +- ✅ **New files**: Fresh translation to all target languages +- ✅ **Modified files**: Context-aware update using existing translation + git diff +- ✅ **Deleted files**: Removed from all language sections + physical files +- ✅ **Moved files**: Detected via `group_path` changes, applied with index-based navigation +- ✅ **Renamed files**: Detected when deleted+added in same location, preserves file extensions + +### Surgical Reconciliation + +Automatically detects structural changes in `docs.json`: + +- **Move detection**: Same file, different `group_path` → moves cn/jp files to same nested location using index-based navigation +- **Rename detection**: File deleted+added in same location → renames cn/jp files with extension preserved +- **Index-based navigation**: Groups matched by position, not name (works across translations: "Nodes" ≠ "节点") ## System Features -- 🌐 **Multi-language Support**: Configuration-based language mapping, theoretically supports any language extension -- 📚 **Terminology Consistency**: Built-in professional terminology database, LLM intelligently follows terminology to ensure unified technical vocabulary translation -- 🔄 **Concurrent Processing**: Smart concurrency control, translates multiple target languages simultaneously -- 🛡️ **Fault Tolerance**: 3-retry mechanism with exponential backoff strategy -- ⚡ **Incremental Translation**: Only processes changed files, avoids redundant work -- 🧠 **High-Performance Models**: Uses high-performance LLM models to ensure translation quality +- 🌐 **Multi-language Support**: Configuration-based language mapping (`config.json`) +- 📚 **Terminology Consistency**: Built-in professional terminology database (`termbase_i18n.md`) +- 🔄 **Incremental Updates**: Context-aware translation using git diff for modified files +- 🎯 **Surgical Reconciliation**: Automatic detection and application of move/rename operations +- 🛡️ **Fault Tolerance**: Retry mechanism with exponential backoff +- ⚡ **Efficient Processing**: Only processes changed files since last commit + +## Language Directories + +- **General docs**: `en/` (source) → `cn/`, `jp/` (targets) +- **Plugin dev docs**: `plugin-dev-en/` → `plugin-dev-zh/`, `plugin-dev-ja/` +- **Versioned docs**: `versions/{version}/en-us/` → `versions/{version}/zh-cn/`, `versions/{version}/jp/` + +Configuration in `tools/translate/config.json`. ## Usage ### For Document Writers -1. Write/modify documents in any language directory -2. Push to branch (non-main) -3. Wait 0.5-1 minute for automatic translation completion -4. **View Translation Results**: - - Create Pull Request for local viewing and subsequent editing - - Or view Actions push commit details on GitHub to directly review translation quality +1. Create branch from main +2. Add/modify/delete files in `en/` directory +3. Update `docs.json` if adding/removing/moving/renaming files +4. Push to branch → workflow creates translation PR automatically +5. Make additional changes → workflow updates translation PR incrementally +6. Review and merge translation PR -### Supported Language Directories +### Testing Moves & Renames -- **General Documentation**: `en/` ↔ `zh-hans/` ↔ `ja-jp/` -- **Plugin Development Documentation**: `plugin-dev-en/` ↔ `plugin-dev-zh/` ↔ `plugin-dev-ja/` +**Move**: Edit `docs.json` to move file between groups (e.g., Getting Started → Nodes) +```json +// Before: en/test-file in "Getting Started" group +// After: en/test-file in "Nodes" group +``` -Note: System architecture supports extending more languages, just modify configuration files +**Rename**: Rename file + update `docs.json` entry +```bash +git mv en/old-name.md en/new-name.md +# Update docs.json: "en/old-name" → "en/new-name" +``` -## Important Notes +Logs will show: +``` +INFO: Detected 1 moves, 0 renames, 0 adds, 0 deletes +INFO: Moving en/test-file from 'Dropdown > GroupA' to 'Dropdown > GroupB' +SUCCESS: Moved cn/test-file to new location +SUCCESS: Moved jp/test-file to new location +``` -- System only translates new documents, won't overwrite existing translations -- To update existing translations, manually delete target files then retrigger -- Terminology translation follows professional vocabulary in `termbase_i18n.md`, LLM has intelligent terminology recognition capabilities -- Translation quality depends on configured high-performance models, recommend using high-performance base models in Dify Studio +## Configuration -### System Configuration +### Language Settings -#### Terminology Database +Edit `tools/translate/config.json`: -Edit `tools/translate/termbase_i18n.md` to update professional terminology translation reference table. +```json +{ + "source_language": "en", + "target_languages": ["cn", "jp"], + "languages": { + "en": {"code": "en", "name": "English", "directory": "en"}, + "cn": { + "code": "cn", + "name": "Chinese", + "directory": "cn", + "translation_notice": "⚠️ AI translation..." + } + } +} +``` -#### Translation Model +### Terminology Database -Visit Dify Studio to adjust translation prompts or change base models. +Edit `tools/translate/termbase_i18n.md` to update professional terminology translations. ---- +### Translation Model -## 🔧 Development and Deployment Configuration +Configure in Dify Studio - adjust prompts or change base models. -### Local Development Environment +## Local Development -#### 1. Create Virtual Environment +### Setup ```bash # Create virtual environment python -m venv venv +source venv/bin/activate # macOS/Linux +# venv\Scripts\activate # Windows -# Activate virtual environment -# macOS/Linux: -source venv/bin/activate -# Windows: -# venv\Scripts\activate -``` - -#### 2. Install Dependencies - -```bash +# Install dependencies pip install -r tools/translate/requirements.txt + +# Configure API key +echo "DIFY_API_KEY=your_key" > tools/translate/.env ``` -#### 3. Configure API Key - -Create `.env` file in `tools/translate/` directory: +### Run Translation ```bash -DIFY_API_KEY=your_dify_api_key_here -``` - -#### 4. Run Translation - -```bash -# Interactive mode (recommended for beginners) +# Interactive mode python tools/translate/main.py -# Command line mode (specify file) -python tools/translate/main.py path/to/file.mdx [DIFY_API_KEY] +# Specify file +python tools/translate/main.py path/to/file.mdx ``` -> **Tip**: Right-click in IDE and select "Copy Relative Path" to use as parameter +### Test Surgical Reconciliation -### Deploy to Other Repositories +```bash +# Test locally with git refs +cd tools/translate +python -c " +from sync_and_translate import DocsSynchronizer +import asyncio +import os -1. **Copy Files**: - - `.github/workflows/translate.yml` - - `tools/translate/` entire directory +api_key = os.getenv('DIFY_API_KEY') +sync = DocsSynchronizer(api_key) -2. **Configure GitHub Secrets**: - - Repository Settings → Secrets and variables → Actions - - Add `DIFY_API_KEY` secret +# Test with specific commits +logs = sync.reconcile_docs_json_structural_changes('base_sha', 'head_sha') +for log in logs: + print(log) +" +``` -3. **Test**: Modify documents in branch to verify automatic translation functionality +## Troubleshooting -### Technical Details +### Translation Issues -- Concurrent translation limited to 2 tasks to avoid excessive API pressure +- **HTTP 504**: Verify `response_mode: "streaming"` in `main.py` +- **Missing output**: Check Dify workflow has output variable `output1` +- **Failed workflow**: Review Dify workflow logs for node errors + +### Move/Rename Issues + +- **Not detected**: Check logs for "INFO: Detected X moves, Y renames" - verify `group_path` changed +- **Wrong location**: Structure mismatch between languages - verify group indices align +- **File not found**: Ensure file has .md or .mdx extension + +## Key Files + +- `config.json` - Language configuration (single source of truth) +- `termbase_i18n.md` - Translation terminology database +- `sync_and_translate.py` - Core translation + surgical reconciliation logic +- `main.py` - Local translation tool with Dify API integration +- `translate_pr.py` - PR workflow orchestration +- `.github/workflows/sync_docs_execute.yml` - Execute workflow (new PRs) +- `.github/workflows/sync_docs_update.yml` - Update workflow (incremental changes) + +## Technical Details + +- Concurrent translation limited to 2 tasks for API stability - Supports `.md` and `.mdx` file formats -- Based on Dify API workflow mode - -## TODO - -- [ ] Support updating existing translations +- Based on Dify API streaming mode +- Index-based navigation for language-independent group matching +- Extension detection and preservation for rename operations