mirror of https://github.com/lobehub/lobehub.git synced 2026-03-26 13:19:34 +07:00

Files

Arvin Xu 44e4f6e4b0 ⚡️ perf: optimize tool system prompt — remove duplicate APIs, simplify XML tags (#13041 )

* 💄 style: remove platform-specific Spotlight reference from searchLocalFiles

Replace "using Spotlight (macOS) or native search" with "using native search"
since the actual search implementation is platform-dependent and the LLM
doesn't need to know the specific backend.

Fixes LOBE-5778

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ⚡️ perf: remove duplicate API descriptions from tool system prompt

API identifiers and descriptions are already in the tools schema passed
via the API tools parameter. Repeating them in the system prompt wastes
tokens. Now only tools with systemRole (usage instructions) are injected.

Also rename XML tags: plugins→tools, collection→tool,
collection.instructions→tool.instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* 💄 style: inject tool description when no systemRole instead of skipping

Tools without systemRole now show their description as <tool> children.
Tools with systemRole use <tool.instructions> wrapper as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* 💄 style: always emit <tool> tag, fallback to "no description"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* update tools

* fix

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-17 02:01:05 +08:00

promptfoo

✨ feat(desktop): implement subscription pages embedding with webview (#12114 )

2026-02-11 13:05:10 +08:00

src

⚡️ perf: optimize tool system prompt — remove duplicate APIs, simplify XML tags (#13041 )

2026-03-17 02:01:05 +08:00

.gitignore

💄 style: add promptfoo to improve prompts quality (#9568 )

2025-10-05 22:24:30 +08:00

CLAUDE.md

📝 docs: Polishing and improving product documentation (#12612 )

2026-03-03 16:01:41 +08:00

package.json

✅ test(builtin-tool-memory): added promptfoo test for memory (#11786 )

2026-01-24 23:31:54 +08:00

promptfooconfig.yaml

💄 style: add lab to support disable/enable rich text (#9652 )

2025-10-11 18:03:56 +08:00

README.md

📝 docs: Polishing and improving product documentation (#12612 )

2026-03-03 16:01:41 +08:00

vitest.config.mts

✅ test: add unit tests for @lobechat/prompts (#8967 )

2025-08-28 20:44:28 +08:00

README.md

@lobechat/prompts

This package contains prompt chains and templates for the LobeHub application, with comprehensive testing using promptfoo.

Features

Prompt Chains: Reusable prompt templates for various AI tasks
AI Testing: Comprehensive testing using promptfoo for prompt quality assurance
Multi-language Support: Prompts and tests for multiple languages
Type Safety: Full TypeScript support with proper type definitions

Available Prompt Chains

chainSummaryTitle - Generate conversation titles
chainLangDetect - Detect language of input text
chainTranslate - Translate content between languages
chainPickEmoji - Select appropriate emojis for content
chainAnswerWithContext - Answer questions using knowledge base context

Testing with promptfoo

This package uses promptfoo for AI-powered testing of prompts. The testing suite evaluates prompt quality, consistency, and performance across different AI models.

Prerequisites

Set up your API keys in your environment:

export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key" # optional

Running Tests

# Run all prompt tests
pnpm test:prompts

# Run tests in watch mode for development
pnpm test:prompts:watch

# Generate summary report
pnpm test:prompts:summary

# Run tests for CI (no cache, structured output)
pnpm test:prompts:ci

# View test results in web UI
pnpm promptfoo:view

Test Configuration

Tests are organized by prompt type in the promptfoo/ directory:

promptfoo/
├── summary-title/
│   ├── eval.yaml      # Test configuration
│   └── prompt.ts      # Prompt wrapper
├── translation/
│   ├── eval.yaml
│   └── prompt.ts
├── language-detection/
│   ├── eval.yaml
│   └── prompt.ts
├── emoji-picker/
│   ├── eval.yaml
│   └── prompt.ts
└── knowledge-qa/
    ├── eval.yaml
    └── prompt.ts

Each test configuration includes:

Multiple test cases with different inputs
Assertions for output validation (regex, JSON, custom logic)
LLM-based rubric evaluation for semantic correctness
Performance and cost monitoring

Test Structure

Tests directly use the actual prompt chain functions from src/chains/. The TypeScript wrapper files in promptfoo/prompts/ import and call the real chain functions, ensuring perfect synchronization.

description: Test description
providers:
  - openai:gpt-4o-mini
  - anthropic:claude-3-5-haiku-latest
prompts:
  - file://prompts/summary-title.ts # Imports and uses src/chains/summaryTitle.ts
tests:
  - vars:
      messages: [...]
      locale: 'en-US'
    assert:
      - type: llm-rubric
        value: 'Expected behavior description'
        provider: openai:gpt-4o # Specify grader model for LLM rubric
      - type: contains
        value: 'expected text'
      - type: not-contains
        value: 'unwanted text'

Adding New Tests

Create a test configuration file in promptfoo/
Create a TypeScript wrapper in promptfoo/prompts/ that imports and calls your chain function from src/chains/
Add the test to promptfooconfig.yaml
Run tests to validate

Advantage: The wrapper files automatically stay in sync with source code changes since they directly import and use the actual chain functions.

Performance Monitoring

Tests include performance monitoring:

Response time tracking
Cost per request monitoring
Quality score evaluation
Cross-model consistency checks

CI Integration

The test:prompts:ci script is designed for continuous integration:

Structured JSON output for parsing
No interactive prompts
Clear pass/fail status codes
Detailed error reporting

Development

# Install dependencies
pnpm install

# Run unit tests
pnpm test

# Run prompt tests
pnpm test:prompts

# Run all tests
pnpm test && pnpm test:prompts

Contributing

When adding new prompt chains:

Implement the prompt function in src/chains/
Add unit tests in src/chains/__tests__/
Create promptfoo tests in promptfoo/
Update this README with the new chain description

Architecture

The package follows a layered architecture:

src/
├── chains/           # Prompt chain implementations
├── prompts/          # Prompt templates and utilities
└── index.ts          # Main exports

promptfoo/
├── prompts/          # Prompt implementations for testing
├── *.yaml           # Test configurations
└── results/          # Test output directory

Best Practices

Test Coverage: Every prompt chain should have comprehensive promptfoo tests
Multi-language: Test prompts with multiple languages when applicable
Edge Cases: Include tests for edge cases and error conditions
Performance: Monitor cost and response time in tests
Consistency: Use consistent assertion patterns across tests
Prompt Optimization: Use test results to iteratively improve prompts (see CLAUDE.md for optimization workflow)

Prompt Optimization Workflow

This package follows an iterative prompt optimization process using promptfoo test results:

Example: Translation Prompt Optimization

Initial State: 85% pass rate with issues:

Claude models added explanatory text ("以下是翻译...")
GPT models over-translated technical terms (API_KEY_12345 → API 密钥_12345)

Optimization Process:

Identify Failures: Run tests and analyze specific failure patterns
Update Prompts: Modify prompt rules based on failure analysis
- Added: "Output ONLY the translated text, no explanations"
- Added: "Preserve technical terms, code identifiers, API keys exactly as they appear"
Re-run Tests: Validate improvements across all models
Iterate: Repeat until 100% pass rate achieved

Final Result: 100% pass rate (14/14 tests) across GPT-5-mini, Claude-3.5-Haiku, and Gemini-Flash

Example: Knowledge Q&A Optimization

Initial State: 71.43% pass rate with context handling issues

Optimization Journey:

Round 1 (80.95%): Clarified context relevance checking
Round 2 (90.48%): Distinguished between "no context" vs "irrelevant context"
Round 3 (92.86%): Added explicit rules for partial context
Round 4 (96.43%): Emphasized supplementing with general knowledge
Final (100%): Added concrete example and MUST/SHOULD directives

Key Learning: When context is topic-relevant but information-limited, models should:

Use context as foundation
Supplement with general knowledge
Provide practical, actionable guidance

See CLAUDE.md for detailed prompt engineering guidelines.