Bug: new_target includes file extension (.mdx), but docs.json entries
don't store extensions. This caused entries like:
- cn/documentation/pages/nodes/test-final-step2.mdx (WRONG)
Instead of:
- cn/documentation/pages/nodes/test-final-step2 (CORRECT)
Fixed by stripping extension before calling add_file_to_navigation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
When files_to_sync is empty (e.g., R100 rename with no content changes),
detect_file_changes doesn't run, so renamed_files is empty.
Previously, reconcile ALWAYS skipped rename detection, so it treated
renames as separate delete+add operations.
Now, reconcile only skips rename detection if git actually found renames.
Otherwise, it uses heuristic-based detection to handle the rename properly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The bug: _handle_rename was looking up file location using paths WITH extensions (e.g., 'en/.../file.mdx'), but docs.json entries don't include extensions.
This caused location lookup to fail, skipping the docs.json update step entirely.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The reconcile function was detecting deleted files from structural
changes but not processing them. This caused orphaned files in cn/jp
when English files were deleted.
Changes:
- Added delete loop after rename handling
- Removes deleted files from docs.json navigation
- Deletes physical files from cn/jp directories
- Properly handles file extensions (.md, .mdx)
Fixes the issue where renamed files (with content changes) were
treated as delete+add but the delete was never processed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Move operations were incorrectly placing files at the end of target groups
instead of at the specified index position from the English section.
Root cause:
- extract_file_locations() captured page position (idx) but didn't store it
- add_file_to_navigation() used append() instead of insert()
Fix:
- Add "page_index" field to location dict to capture position
- Use insert(page_index, file_path) to preserve ordering from source
This ensures cn/jp files are placed at the exact same position as their
English counterparts when moved between groups.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Previously, if a file was added in one commit and deleted in another
within the same PR, it could still appear in the sync plan because
git diff --diff-filter=AM would show it as added when comparing
base to an intermediate commit.
Now we verify that each file actually exists at head_sha before
including it in files_to_sync. This prevents sync PRs from containing
files that were ultimately deleted.
Fixes issue discovered in Test 7 where temp-test-doc.mdx was added
then deleted but still appeared in the sync PR.
Changes:
- PR title: "Auto-translations" → "Sync PR #X to cn/jp"
- PR body: More concise, clearer
- Workflow comments: Drastically simplified
- "Translation PR" → "Sync PR"
- Removed verbose sections, kept essential info only
- Success: Link + file count + failures (if any)
- Cancellation: Brief explanation + manual re-run link
- Back-link: One sentence
Before: 20+ line comments with multiple sections
After: 2-5 line comments with just the facts
This makes the multi-language sync feature more straightforward for
doc writers without overwhelming them with information.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Track group indices in extract_file_locations for language-independent navigation
- Replace name-based group matching with index-based navigation in add_file_to_navigation
- Fixes issue where English group names couldn't match translated names (e.g., 'Nodes' vs '节点')
- Detect and preserve whitespace-only group names (e.g., ' ')
- Use None as marker for whitespace groups during parsing
- Match whitespace groups correctly when navigating structure
Fixes file placement in structures with space-named parent groups.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
1. Fix add_file_to_navigation() to navigate through group path
- Parse group_path to find nested groups within dropdowns
- Navigate recursively to correct group location
- Insert file at proper nesting level instead of dropdown root
2. Fix file extension handling in rename operations
- Check for common extensions (.md, .mdx, no extension)
- Preserve file extension when renaming
- Handle docs.json entries that don't include extensions
Fixes:
- Files now placed in correct groups after move operations
- File renames now work correctly for translation files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Track full group path in extract_file_locations() instead of just dropdown name
- Add rename detection by matching files in same location with different paths
- Implement file rename handling for translation files (cn/jp)
- Update docs.json entries when files are renamed
- Use group_path for accurate location comparison in move operations
Fixes issues where:
1. Moves within same dropdown weren't detected (e.g., Getting Started -> Nodes)
2. File renames weren't replicated to translation files and docs.json
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add extract_file_locations() to map files to their navigation positions
- Add reconcile_docs_json_structural_changes() to detect and apply moves
- Modify sync_docs_json_incremental() to handle structural changes when no file adds/deletes
- Pass base_sha and head_sha for structural reconciliation
Fixes location changes that don't involve file system changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Problem: Smart merge was using json.dump() which reformatted docs.json,
creating huge diffs that made translation PRs hard to review.
Solution: Use save_json_with_preserved_format() which detects and preserves
the original formatting style from the translation branch's docs.json.
Impact:
- Translation PR diffs now show only actual content changes
- Formatting (indentation, spacing) preserved from original
- Much easier to review what actually changed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Problem: Incremental updates were losing existing cn/jp entries because
checking out docs.json from PR HEAD overwrote the translation branch's
entries with main branch state.
Solution: Instead of checking out docs.json from PR HEAD:
1. Get English section from PR HEAD (latest structure)
2. Get cn/jp sections from translation branch (preserve existing)
3. Merge and write to working directory
Impact:
- Incremental updates now preserve all previous translation entries
- Only new files are added to cn/jp navigation
- Existing translations remain intact
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Problem: Update workflow was not syncing docs.json changes to translation PRs.
Root Cause: setup_translation_branch() only checked out en/ files from PR HEAD,
leaving docs.json at the old version from translation branch. When
sync_docs_json_incremental() tried to find new files in navigation, they weren't
there, causing "WARNING: Could not find ... in English navigation" errors.
Solution: Also checkout docs.json from PR HEAD SHA (alongside en/ files) so that
sync_docs_json_incremental() has the correct navigation structure.
Impact:
- docs.json changes now properly sync to translation PRs
- Execute and update workflows now have identical docs.json handling
- Consolidates docs.json sync logic as requested
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Consolidates sync_plan.json generation logic from analyze workflow into
reusable SyncPlanGenerator class to ensure both execute and update workflows
use identical file filtering and docs.json change detection.
**Problems Solved:**
1. **Unnecessary Re-translations**: Update workflow was re-translating
unchanged files because it hardcoded structure_changed: true and got
ALL files from PR instead of only A/M files.
2. **docs.json Not Syncing**: Update workflow lacked proper docs.json
change analysis, causing navigation structure to not update correctly.
**Changes:**
- **pr_analyzer.py**: Added SyncPlanGenerator class with:
- get_changed_files_with_status(): Uses git diff --diff-filter=AM
- generate_sync_plan(): Identical logic to analyze workflow
- Proper docs.json change detection (not always true)
- **translate_pr.py**: Updated run_translation_from_pr_analysis() to:
- Use SyncPlanGenerator instead of hardcoded sync_plan
- Remove structure_changed: true assumption
- Add detailed logging of sync plan contents
- **sync_docs_analyze.yml**: Simplified from 70 lines to 40 lines:
- Replaced inline Python script with SyncPlanGenerator call
- Maintains all functionality with cleaner code
- Single source of truth for sync logic
**Benefits:**
✅ Only A/M files are translated (no re-translation of unchanged files)
✅ docs.json changes are properly analyzed (not assumed)
✅ Both workflows use identical sync logic
✅ ~30 lines of duplicate code eliminated
✅ Easier to test and maintain
✅ Consistent behavior across execute and update workflows
* refactor: extract translation logic into reusable Python script
- Created tools/translate/translate_pr.py to consolidate translation workflow
- Refactored sync_docs_execute.yml: reduced from 941 to 513 lines
- Refactored sync_docs_update.yml: reduced from 552 to 381 lines
- Total reduction: ~600 lines of duplicated workflow YAML
Benefits:
- Single source of truth for translation logic
- All fixes (English file removal, branch handling) automatically apply to both workflows
- Easier to test and maintain
- Reusable Python module with proper error handling
The script handles:
- Branch setup (create new or checkout existing)
- Translation of documentation files
- English file removal (fixes the leak bug)
- Committing and pushing changes
- Creating/updating translation PRs
- JSON output for workflow integration
* touch ups
* more touch ups
Root cause: Dify's API gateway has ~60-90s timeout, but modified file
workflows take 2-3 min. The gateway returns 504 before workflow completes,
even though backend processes successfully.
Solution: Switch from blocking to streaming mode
- Streaming responses bypass gateway timeout limits
- Parse Server-Sent Events (SSE) format
- Track workflow_started, node_started, workflow_finished events
- Extract output1 from workflow_finished event
- Fail fast on error events
This eliminates the retry storm (10 requests = 5 retries × 2 langs)
and allows workflows to complete successfully.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
### config.json
- Added `versioned_docs` section with version mappings
- Now includes 2-8-x, 3-0-x, and 3-1-x version paths
- Maps language codes to versioned directory paths
### main.py
- Updated `build_docs_structure()` to load versioned docs from config
- Added dynamic version key conversion (e.g., "2-8-x" → "version_28x")
- Maps language codes to names using config
- Maintains fallback for backward compatibility
## Benefits
- All versioned docs paths now in single config file
- Adding new versions only requires config.json update
- No code changes needed for new doc versions
- Consistent with general docs configuration pattern
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added detailed debug output to diagnose translation failures:
- Log response status and structure
- Check workflow execution status explicitly
- Fail fast on failed/stopped/unknown statuses
- Only proceed when status is 'succeeded'
- Enhanced output1 extraction debugging
This will help identify why translations are failing despite Dify
returning successful responses with output1.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Changes
### Unified Configuration
- Centralized all language configuration in `tools/translate/config.json`
- Added `source_language`, `target_languages`, and `languages` structure
- Merged translation notices from `notices.json` into language configs
- Each language now has: code, name, directory, and translation_notice
### Updated sync_and_translate.py
- Removed hardcoded LANGUAGES dict and TARGET_LANGUAGES list
- Enhanced load_config() with validation
- Added helper methods for language info access
- All methods now use config-based language properties
### Updated main.py
- Added config loading at module level
- Dynamically builds docs_structure from config
- Keeps plugin-dev/versioned paths hardcoded as requested
### Updated workflow
- .github/workflows/sync_docs_execute.yml now loads config
- Replaced all hardcoded language references with config values
### Cleanup
- Removed deprecated notices.json
### Test File
- Added en/testing/config-refactor-test.mdx to test the refactoring
- Added to docs.json to trigger auto-translation workflow
## Benefits
- Single source of truth for language configuration
- Adding new languages requires only config.json changes
- No code changes needed to add/modify languages
- Better validation and error handling
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Modified files take 2-3 minutes to translate (much longer than new files)
due to additional context processing. Updated retry strategy:
- Timeout: 180s → 420s (7 minutes)
- Backoff delays: 5s/10s/20s/40s/80s → 30s/60s/120s/240s/300s
- Backoff cap: 120s → 300s (5 minutes)
This prevents premature timeouts and excessive retry pressure on the API
when processing modified files with existing translations and diffs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Improved auto-translation system to better handle updates to existing documentation:
**Core Changes:**
- Updated translate_text() to accept optional `the_doc_exist` and `diff_original` parameters
- Added get_file_diff() helper method to retrieve git diffs for specific files
- Enhanced translate_file_with_notice() to pass optional parameters to translation API
**Modified File Handling:**
- Refactored translate_new_and_modified_files() to distinguish between added and modified files
- For modified files: loads existing translation and retrieves git diff
- For added files: continues with existing flow (no additional inputs needed)
- Passes both existing translation and diff to Dify API for context-aware updates
**Workflow Updates:**
- Updated sync_docs_execute.yml inline secure_sync.py script
- Detects file status (added vs modified) using git diff
- Loads existing translations for modified files
- Retrieves diffs for modified files
- Passes appropriate inputs based on file status
**Benefits:**
- New translations for modified files generated based on existing translation and diff
- Maintains translation consistency across updates
- Reduces re-translation of unchanged content
- Improves translation quality for incremental changes
🤖 Generated with Claude Code
Previously, translated files were appended to the end of their group's
pages array, resulting in different orderings across languages.
Now files are inserted at the same index position as in the English
structure, maintaining consistent ordering across all languages.
Example:
- English: shortcut-key (pos 0), keyboard-shortcuts-advanced (pos 1)
- Before fix: cn/jp appended keyboard-shortcuts-advanced at end
- After fix: cn/jp insert keyboard-shortcuts-advanced at position 1
Changes:
- Modified add_page_at_location() to extract insertion index from
file_location and use insert() instead of append()
- Maintains fallback to append if index exceeds current array length
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Improvements:
- Add detailed logging for git diff detection of deleted files
- Add comprehensive logging in sync_docs_json_incremental for deletions
- Track each step of the deletion process to aid debugging
- Add error handling and traceback output for failed operations
- Log dropdown search process and removal attempts
This will help diagnose why cn/jp entries are not being removed from
docs.json when English files are deleted. The test in test_delete_logic.py
proves the remove_page_from_structure method works correctly, so the issue
must be in how deleted_files are detected or how the workflow calls the sync.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Moved padding logic inside group check to only pad when navigating through groups.
This prevents creating null placeholders for file positions that aren't groups.
Fixes issue where docs.json had null entries when adding files to nested groups.
Add add_page_at_location() method that navigates through nested groups
using the file_location path to add files at the correct nested position.
Fixes the issue where files were being added at the top level of dropdowns
instead of inside their proper nested groups (e.g., 'Getting Started').
Test coverage: All existing tests pass + new nested placement test added
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed bug where incremental sync couldn't find files in docs.json
because it was searching with extensions (.mdx) but docs.json stores
paths without extensions.
Changes:
- Updated find_file_in_dropdown_structure() to strip extensions
- Updated add_page_to_structure() to strip extensions
- Updated remove_page_from_structure() to strip extensions
- Added comprehensive implementation documentation
This fixes the issue where files weren't added to cn/jp navigation.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Critical bug fix: sync_and_translate.py was using save_json_with_preserved_format()
but didn't import it from json_formatter.py. This caused docs.json structure sync
to fail silently, resulting in CN/JP navigation entries not being added to docs.json
in translation PRs.
Root cause: When I implemented the format-preserving JSON serialization in the
previous commit, I added the import to the save_docs_json() function but forgot
to add the import statement at the top of the file.
Impact: Translation workflow was creating translated files but not updating
docs.json with corresponding CN/JP navigation entries.
Fix: Added missing import statement for save_json_with_preserved_format.
Tested: Manual sync test confirms CN and JP entries are now correctly added.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Completely resolves the massive docs.json diff issue by detecting and
preserving the exact formatting of the original file.
Problem:
- docs.json uses mixed indentation (4→6→8→10 spaces, not 4→8→12→16)
- Previous json.dump() with indent=4 reformatted entire file
- Translation PRs showed 11,000+ line changes instead of 3-6 lines
Solution:
- New json_formatter.py module with format detection
- Detects: indent pattern, spaces per level, trailing newlines
- Custom serializer respects detected formatting exactly
- Handles both consistent (2/4-space) and mixed patterns
Implementation:
- detect_json_format(): Analyzes existing file structure
- format_preserving_json_dump(): Custom JSON serializer
- save_json_with_preserved_format(): High-level save function
- Updated sync_and_translate.py to use new formatter
Testing:
✓ Detects docs.json pattern: [4, 2, 2, 2, ...]
✓ Generates correct indents for all levels
✓ Perfect match on full file serialization
✓ Reference file parameter for temp file writes
Result:
Translation PRs will now show minimal, surgical diffs
(only the actual navigation changes, not 11k reformatted lines)
The translation system was using 2-space indentation while docs.json
uses 4-space indentation. This caused the entire file to be reformatted
on every sync, creating massive diffs (11k+ line changes).
Changes:
- Update save_docs_json() to use indent=4
- Add trailing newline for consistency
- Prevents unnecessary reformatting of existing structure
This ensures minimal, surgical changes to docs.json when adding
translations, making PR reviews manageable.