mirror of
https://github.com/lobehub/lobehub.git
synced 2026-03-27 13:29:15 +07:00
♻️ refactor: Migrating Firecrawl to v2 (#9850)
* ✨ feat: 更新 Firecrawl API 版本,增强搜索功能和结果映射 * ✨ feat: 更新 FirecrawlMetadata 和 FirecrawlResults 接口,添加可选字段并增强错误处理 * fix typo (native -> naive) * ✨ feat: 更新模型配置,移除 Claude 3.5 Sonnet 相关条目,添加 MiniMax M2 和 KAT-Dev 32B 模型 * ✨ feat: 添加 MiniMax M2 模型并更新代理 URL * fix test
This commit is contained in:
@@ -2,9 +2,11 @@
|
||||
title: >-
|
||||
Configuring Online Search Functionality - Enhancing AI's Ability to Access Web Information
|
||||
|
||||
|
||||
description: >-
|
||||
Learn how to configure the SearXNG online search functionality for LobeChat, enabling AI to access the latest web information.
|
||||
|
||||
|
||||
tags:
|
||||
- Online Search
|
||||
- SearXNG
|
||||
@@ -17,7 +19,10 @@ tags:
|
||||
LobeChat supports configuring **web search functionality** for AI, enabling it to retrieve real-time information from the internet to provide more accurate and up-to-date responses. Web search supports multiple search engine providers, including [SearXNG](https://github.com/searxng/searxng), [Search1API](https://www.search1api.com), [Google](https://programmablesearchengine.google.com), and [Brave](https://brave.com/search/api), among others.
|
||||
|
||||
<Callout type="info">
|
||||
Web search allows AI to access time-sensitive content, such as the latest news, technology trends, or product information. You can deploy the open-source SearXNG yourself, or choose to integrate mainstream search services like Search1API, Google, Brave, etc., combining them freely based on your use case.
|
||||
Web search allows AI to access time-sensitive content, such as the latest news, technology trends,
|
||||
or product information. You can deploy the open-source SearXNG yourself, or choose to integrate
|
||||
mainstream search services like Search1API, Google, Brave, etc., combining them freely based on
|
||||
your use case.
|
||||
</Callout>
|
||||
|
||||
By setting the search service environment variable `SEARCH_PROVIDERS` and the corresponding API Keys, LobeChat will query multiple sources and return the results. You can also configure crawler service environment variables such as `CRAWLER_IMPLS` (e.g., `browserless`, `firecrawl`, `tavily`, etc.) to extract webpage content, enhancing the capability of search + reading.
|
||||
@@ -29,20 +34,20 @@ By setting the search service environment variable `SEARCH_PROVIDERS` and the co
|
||||
Configure available web crawlers for structured extraction of webpage content.
|
||||
|
||||
```env
|
||||
CRAWLER_IMPLS="native,search1api"
|
||||
CRAWLER_IMPLS="naive,search1api"
|
||||
```
|
||||
|
||||
Supported crawler types are listed below:
|
||||
|
||||
| Value | Description | Environment Variable |
|
||||
| ------------- | ------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
|
||||
| `browserless` | Headless browser crawler based on [Browserless](https://www.browserless.io/), suitable for rendering complex pages. | `BROWSERLESS_TOKEN` |
|
||||
| `exa` | Crawler capabilities provided by [Exa](https://exa.ai/), API required. | `EXA_API_KEY` |
|
||||
| `firecrawl` | [Firecrawl](https://firecrawl.dev/) headless browser API, ideal for modern websites. | `FIRECRAWL_API_KEY` |
|
||||
| `jina` | Crawler service from [Jina AI](https://jina.ai/), supports fast content summarization. | `JINA_READER_API_KEY` |
|
||||
| `native` | Built-in general-purpose crawler for standard web structures. | |
|
||||
| `search1api` | Page crawling capabilities from [Search1API](https://www.search1api.com), great for structured content extraction. | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
|
||||
| `tavily` | Web scraping and summarization API from [Tavily](https://www.tavily.com/). | `TAVILY_API_KEY` |
|
||||
| Value | Description | Environment Variable |
|
||||
| ------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| `browserless` | Headless browser crawler based on [Browserless](https://www.browserless.io/), suitable for rendering complex pages. | `BROWSERLESS_TOKEN` |
|
||||
| `exa` | Crawler capabilities provided by [Exa](https://exa.ai/), API required. | `EXA_API_KEY` |
|
||||
| `firecrawl` | [Firecrawl](https://firecrawl.dev/) headless browser API, ideal for modern websites. | `FIRECRAWL_API_KEY` |
|
||||
| `jina` | Crawler service from [Jina AI](https://jina.ai/), supports fast content summarization. | `JINA_READER_API_KEY` |
|
||||
| `naive` | Built-in general-purpose crawler for standard web structures. | |
|
||||
| `search1api` | Page crawling capabilities from [Search1API](https://www.search1api.com), great for structured content extraction. | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
|
||||
| `tavily` | Web scraping and summarization API from [Tavily](https://www.tavily.com/). | `TAVILY_API_KEY` |
|
||||
|
||||
> 💡 Setting multiple crawlers increases success rate; the system will try different ones based on priority.
|
||||
|
||||
@@ -58,19 +63,19 @@ SEARCH_PROVIDERS="searxng"
|
||||
|
||||
Supported search engines include:
|
||||
|
||||
| Value | Description | Environment Variable |
|
||||
| ------------ | --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
|
||||
| `anspire` | Search service provided by [Anspire](https://anspire.ai/). | `ANSPIRE_API_KEY` |
|
||||
| `bocha` | Search service from [Bocha](https://open.bochaai.com/). | `BOCHA_API_KEY` |
|
||||
| `brave` | [Brave](https://search.brave.com/help/api), a privacy-friendly search source. | `BRAVE_API_KEY` |
|
||||
| `exa` | [Exa](https://exa.ai/), a search API designed for AI. | `EXA_API_KEY` |
|
||||
| `firecrawl` | Search capabilities via [Firecrawl](https://firecrawl.dev/). | `FIRECRAWL_API_KEY` |
|
||||
| `google` | Uses [Google Programmable Search Engine](https://programmablesearchengine.google.com/). | `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID` |
|
||||
| `jina` | Semantic search provided by [Jina AI](https://jina.ai/). | `JINA_READER_API_KEY` |
|
||||
| `kagi` | Premium search API by [Kagi](https://kagi.com/), requires a subscription key. | `KAGI_API_KEY` |
|
||||
| `search1api` | Aggregated search capabilities from [Search1API](https://www.search1api.com). | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
|
||||
| `searxng` | Use a self-hosted or public [SearXNG](https://searx.space/) instance. | `SEARXNG_URL` |
|
||||
| `tavily` | [Tavily](https://www.tavily.com/), offers fast web summaries and answers. | `TAVILY_API_KEY` |
|
||||
| Value | Description | Environment Variable |
|
||||
| ------------ | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| `anspire` | Search service provided by [Anspire](https://anspire.ai/). | `ANSPIRE_API_KEY` |
|
||||
| `bocha` | Search service from [Bocha](https://open.bochaai.com/). | `BOCHA_API_KEY` |
|
||||
| `brave` | [Brave](https://search.brave.com/help/api), a privacy-friendly search source. | `BRAVE_API_KEY` |
|
||||
| `exa` | [Exa](https://exa.ai/), a search API designed for AI. | `EXA_API_KEY` |
|
||||
| `firecrawl` | Search capabilities via [Firecrawl](https://firecrawl.dev/). | `FIRECRAWL_API_KEY` |
|
||||
| `google` | Uses [Google Programmable Search Engine](https://programmablesearchengine.google.com/). | `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID` |
|
||||
| `jina` | Semantic search provided by [Jina AI](https://jina.ai/). | `JINA_READER_API_KEY` |
|
||||
| `kagi` | Premium search API by [Kagi](https://kagi.com/), requires a subscription key. | `KAGI_API_KEY` |
|
||||
| `search1api` | Aggregated search capabilities from [Search1API](https://www.search1api.com). | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
|
||||
| `searxng` | Use a self-hosted or public [SearXNG](https://searx.space/) instance. | `SEARXNG_URL` |
|
||||
| `tavily` | [Tavily](https://www.tavily.com/), offers fast web summaries and answers. | `TAVILY_API_KEY` |
|
||||
|
||||
> ⚠️ Some search providers require you to apply for an API Key and configure it in your `.env` file.
|
||||
|
||||
@@ -139,7 +144,7 @@ GOOGLE_PSE_ENGINE_ID=your-google-cx-id
|
||||
Sets the access URL for the [Firecrawl](https://firecrawl.dev/) API, used for web content scraping. Default value:
|
||||
|
||||
```env
|
||||
FIRECRAWL_URL=https://api.firecrawl.dev/v1
|
||||
FIRECRAWL_URL=https://api.firecrawl.dev/v2
|
||||
```
|
||||
|
||||
> ⚙️ Usually does not need to be changed unless you’re using a self-hosted version or a proxy service.
|
||||
|
||||
@@ -13,7 +13,9 @@ tags:
|
||||
LobeChat 支持为 AI 配置**联网搜索功能**,使其能够实时获取互联网信息,从而提供更准确、最新的回答。联网搜索支持多个搜索引擎提供商,包括 [SearXNG](https://github.com/searxng/searxng)、[Search1API](https://www.search1api.com)、[Google](https://programmablesearchengine.google.com)、[Brave](https://brave.com/search/api) 等。
|
||||
|
||||
<Callout type="info">
|
||||
联网搜索可以让 AI 获取时效性内容,如最新新闻、技术动态或产品信息。你可以使用开源的 SearXNG 自行部署,也可以选择集成主流搜索引擎服务,如 Search1API、Google、Brave 等,根据你的使用场景自由组合。
|
||||
联网搜索可以让 AI 获取时效性内容,如最新新闻、技术动态或产品信息。你可以使用开源的 SearXNG
|
||||
自行部署,也可以选择集成主流搜索引擎服务,如 Search1API、Google、Brave
|
||||
等,根据你的使用场景自由组合。
|
||||
</Callout>
|
||||
|
||||
通过设置搜索服务环境变量 `SEARCH_PROVIDERS` 和对应的 API Key,LobeChat 将在多个搜索源中查询并返回结果。你还可以搭配配置爬虫服务环境变量 `CRAWLER_IMPLS`(如 `browserless`、`firecrawl`、`tavily` 等)以提取网页内容,实现搜索 + 阅读的增强能力。
|
||||
@@ -25,20 +27,20 @@ LobeChat 支持为 AI 配置**联网搜索功能**,使其能够实时获取互
|
||||
配置可用的网页爬虫,用于对网页进行结构化内容提取。
|
||||
|
||||
```env
|
||||
CRAWLER_IMPLS="native,search1api"
|
||||
CRAWLER_IMPLS="naive,search1api"
|
||||
```
|
||||
|
||||
支持的爬虫类型如下:
|
||||
|
||||
| 值 | 说明 | 环境变量 |
|
||||
| ------------- | ---------------------------------------------------------------- | -------------------------- |
|
||||
| `browserless` | 基于 [Browserless](https://www.browserless.io/) 的无头浏览器爬虫,适合渲染复杂页面。 | `BROWSERLESS_TOKEN` |
|
||||
| `exa` | 使用 [Exa](https://exa.ai/) 提供的爬虫能力,需申请 API。 | `EXA_API_KEY` |
|
||||
| `firecrawl` | [Firecrawl](https://firecrawl.dev/) 无头浏览器 API,适合现代网站抓取。 | `FIRECRAWL_API_KEY` |
|
||||
| `jina` | 使用 [Jina AI](https://jina.ai/) 的爬虫服务,支持快速提取摘要信息。 | `JINA_READER_API_KEY` |
|
||||
| `native` | 内置通用爬虫,适用于标准网页结构。 | |
|
||||
| 值 | 说明 | 环境变量 |
|
||||
| ------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| `browserless` | 基于 [Browserless](https://www.browserless.io/) 的无头浏览器爬虫,适合渲染复杂页面。 | `BROWSERLESS_TOKEN` |
|
||||
| `exa` | 使用 [Exa](https://exa.ai/) 提供的爬虫能力,需申请 API。 | `EXA_API_KEY` |
|
||||
| `firecrawl` | [Firecrawl](https://firecrawl.dev/) 无头浏览器 API,适合现代网站抓取。 | `FIRECRAWL_API_KEY` |
|
||||
| `jina` | 使用 [Jina AI](https://jina.ai/) 的爬虫服务,支持快速提取摘要信息。 | `JINA_READER_API_KEY` |
|
||||
| `naive` | 内置简易通用爬虫,适用于标准网页结构。 | |
|
||||
| `search1api` | 利用 [Search1API](https://www.search1api.com) 提供的页面抓取能力,适合结构化内容提取。 | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
|
||||
| `tavily` | 使用 [Tavily](https://www.tavily.com/) 的网页抓取与摘要 API。 | `TAVILY_API_KEY` |
|
||||
| `tavily` | 使用 [Tavily](https://www.tavily.com/) 的网页抓取与摘要 API。 | `TAVILY_API_KEY` |
|
||||
|
||||
> 💡 设置多个爬虫可提升成功率,系统将根据优先级尝试不同爬虫。
|
||||
|
||||
@@ -54,19 +56,19 @@ SEARCH_PROVIDERS="searxng"
|
||||
|
||||
支持的搜索引擎如下:
|
||||
|
||||
| 值 | 说明 | 环境变量 |
|
||||
| ------------ | ------------------------------------------------------------------------------------- | ------------------------------------------- |
|
||||
| `anspire` | 基于 [Anspire(安思派)](https://anspire.ai/) 提供的搜索服务。 | `ANSPIRE_API_KEY` |
|
||||
| `bocha` | 基于 [Bocha(博查)](https://open.bochaai.com/) 提供的搜索服务。 | `BOCHA_API_KEY` |
|
||||
| `brave` | [Brave](https://search.brave.com/help/api),隐私友好的搜索源。 | `BRAVE_API_KEY` |
|
||||
| `exa` | [Exa](https://exa.ai/),面向 AI 的搜索 API。 | `EXA_API_KEY` |
|
||||
| `firecrawl` | 支持 [Firecrawl](https://firecrawl.dev/) 提供的搜索服务。 | `FIRECRAWL_API_KEY` |
|
||||
| `google` | 使用 [Google Programmable Search Engine](https://programmablesearchengine.google.com/)。 | `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID` |
|
||||
| `jina` | 使用 [Jina AI](https://jina.ai/) 提供的语义搜索服务。 | `JINA_READER_API_KEY` |
|
||||
| `kagi` | [Kagi](https://kagi.com/) 提供的高级搜索 API,需订阅 Key。 | `KAGI_API_KEY` |
|
||||
| 值 | 说明 | 环境变量 |
|
||||
| ------------ | ------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
|
||||
| `anspire` | 基于 [Anspire(安思派)](https://anspire.ai/) 提供的搜索服务。 | `ANSPIRE_API_KEY` |
|
||||
| `bocha` | 基于 [Bocha(博查)](https://open.bochaai.com/) 提供的搜索服务。 | `BOCHA_API_KEY` |
|
||||
| `brave` | [Brave](https://search.brave.com/help/api),隐私友好的搜索源。 | `BRAVE_API_KEY` |
|
||||
| `exa` | [Exa](https://exa.ai/),面向 AI 的搜索 API。 | `EXA_API_KEY` |
|
||||
| `firecrawl` | 支持 [Firecrawl](https://firecrawl.dev/) 提供的搜索服务。 | `FIRECRAWL_API_KEY` |
|
||||
| `google` | 使用 [Google Programmable Search Engine](https://programmablesearchengine.google.com/)。 | `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID` |
|
||||
| `jina` | 使用 [Jina AI](https://jina.ai/) 提供的语义搜索服务。 | `JINA_READER_API_KEY` |
|
||||
| `kagi` | [Kagi](https://kagi.com/) 提供的高级搜索 API,需订阅 Key。 | `KAGI_API_KEY` |
|
||||
| `search1api` | 使用 [Search1API](https://www.search1api.com) 聚合搜索能力。 | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
|
||||
| `searxng` | 使用自托管或公共 [SearXNG](https://searx.space/) 实例。 | `SEARXNG_URL` |
|
||||
| `tavily` | [Tavily](https://www.tavily.com/),快速网页摘要与答案返回。 | `TAVILY_API_KEY` |
|
||||
| `searxng` | 使用自托管或公共 [SearXNG](https://searx.space/) 实例。 | `SEARXNG_URL` |
|
||||
| `tavily` | [Tavily](https://www.tavily.com/),快速网页摘要与答案返回。 | `TAVILY_API_KEY` |
|
||||
|
||||
> ⚠️ 某些搜索提供商需要单独申请 API Key,并在 `.env` 中配置相关凭证。
|
||||
|
||||
@@ -135,7 +137,7 @@ GOOGLE_PSE_ENGINE_ID=your-google-cx-id
|
||||
设置 [Firecrawl](https://firecrawl.dev/) API 的访问地址。用于网页内容抓取,默认值如下:
|
||||
|
||||
```env
|
||||
FIRECRAWL_URL=https://api.firecrawl.dev/v1
|
||||
FIRECRAWL_URL=https://api.firecrawl.dev/v2
|
||||
```
|
||||
|
||||
> ⚙️ 一般无需修改,除非你使用的是自托管版本或代理服务。
|
||||
|
||||
@@ -197,69 +197,6 @@ const anthropicChatModels: AIChatModelCard[] = [
|
||||
},
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
search: true,
|
||||
vision: true,
|
||||
},
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet (New)',
|
||||
id: 'claude-3-5-sonnet-20241022',
|
||||
maxOutput: 8192,
|
||||
pricing: {
|
||||
units: [
|
||||
{ name: 'textInput_cacheRead', rate: 0.3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textInput', rate: 3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textOutput', rate: 15, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{
|
||||
lookup: { prices: { '1h': 6, '5m': 3.75 }, pricingParams: ['ttl'] },
|
||||
name: 'textInput_cacheWrite',
|
||||
strategy: 'lookup',
|
||||
unit: 'millionTokens',
|
||||
},
|
||||
],
|
||||
},
|
||||
releasedAt: '2024-10-22',
|
||||
settings: {
|
||||
extendParams: ['disableContextCaching'],
|
||||
searchImpl: 'params',
|
||||
},
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
vision: true,
|
||||
},
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet (Old)',
|
||||
id: 'claude-3-5-sonnet-20240620',
|
||||
maxOutput: 8192,
|
||||
pricing: {
|
||||
units: [
|
||||
{ name: 'textInput_cacheRead', rate: 0.3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textInput', rate: 3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textOutput', rate: 15, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{
|
||||
lookup: { prices: { '1h': 6, '5m': 3.75 }, pricingParams: ['ttl'] },
|
||||
name: 'textInput_cacheWrite',
|
||||
strategy: 'lookup',
|
||||
unit: 'millionTokens',
|
||||
},
|
||||
],
|
||||
},
|
||||
releasedAt: '2024-06-20',
|
||||
settings: {
|
||||
extendParams: ['disableContextCaching'],
|
||||
searchImpl: 'params',
|
||||
},
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
|
||||
@@ -2393,61 +2393,6 @@ const higressChatModels: AIChatModelCard[] = [
|
||||
releasedAt: '2024-11-05',
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
vision: true,
|
||||
},
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet',
|
||||
enabled: true,
|
||||
id: 'claude-3-5-sonnet-20241022',
|
||||
maxOutput: 8192,
|
||||
pricing: {
|
||||
units: [
|
||||
{ name: 'textInput_cacheRead', rate: 0.3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textInput', rate: 3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textOutput', rate: 15, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{
|
||||
lookup: { prices: { '5m': 3.75 }, pricingParams: ['ttl'] },
|
||||
name: 'textInput_cacheWrite',
|
||||
strategy: 'lookup',
|
||||
unit: 'millionTokens',
|
||||
},
|
||||
],
|
||||
},
|
||||
releasedAt: '2024-10-22',
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
vision: true,
|
||||
},
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet 0620',
|
||||
id: 'claude-3-5-sonnet-20240620',
|
||||
maxOutput: 8192,
|
||||
pricing: {
|
||||
units: [
|
||||
{ name: 'textInput_cacheRead', rate: 0.3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textInput', rate: 3, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textOutput', rate: 15, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{
|
||||
lookup: { prices: { '5m': 3.75 }, pricingParams: ['ttl'] },
|
||||
name: 'textInput_cacheWrite',
|
||||
strategy: 'lookup',
|
||||
unit: 'millionTokens',
|
||||
},
|
||||
],
|
||||
},
|
||||
releasedAt: '2024-06-20',
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
|
||||
@@ -3,6 +3,27 @@ import { AIChatModelCard } from '../types/aiModel';
|
||||
// https://cloud.infini-ai.com/genstudio/model
|
||||
|
||||
const infiniaiChatModels: AIChatModelCard[] = [
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
reasoning: true,
|
||||
},
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'MiniMax-M2 是一款专为编码与智能体工作流优化的专家混合(MoE)语言模型,具有约 230B 总参数与约 10B 活跃参数。它在保持强通用智能的同时,针对多文件编辑、代码-运行-修复闭环、测试校验修复等开发者场景进行深度增强,在终端、IDE 与 CI 等真实环境中表现稳定、高效。',
|
||||
displayName: 'MiniMax M2',
|
||||
enabled: true,
|
||||
id: 'minimax-m2',
|
||||
maxOutput: 200_000,
|
||||
pricing: {
|
||||
currency: 'CNY',
|
||||
units: [
|
||||
{ name: 'textInput', rate: 2.1, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textOutput', rate: 8.4, strategy: 'fixed', unit: 'millionTokens' },
|
||||
],
|
||||
},
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
|
||||
@@ -1,6 +1,19 @@
|
||||
import { AIChatModelCard } from '../types/aiModel';
|
||||
|
||||
const ollamaCloudModels: AIChatModelCard[] = [
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
reasoning: true,
|
||||
},
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'MiniMax M2 是专为编码和代理工作流程构建的高效大型语言模型。',
|
||||
displayName: 'MiniMax M2',
|
||||
enabled: true,
|
||||
id: 'minimax-m2',
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
|
||||
@@ -145,6 +145,25 @@ const siliconcloudChatModels: AIChatModelCard[] = [
|
||||
releasedAt: '2025-09-01',
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
},
|
||||
contextWindowTokens: 131_072,
|
||||
description:
|
||||
'KAT-Dev(32B)是一款专为软件工程任务设计的开源 32B 参数模型。在 SWE-Bench Verified 基准测试中,它取得了 62.4% 的解决率,在所有不同规模的开源模型中排名第五。该模型通过多个阶段进行优化,包括中间训练、监督微调(SFT)与强化学习(RL),旨在为代码补全、缺陷修复、代码评审等复杂编程任务提供强大支持。',
|
||||
displayName: 'KAT-Dev 32B',
|
||||
id: 'Kwaipilot/KAT-Dev',
|
||||
pricing: {
|
||||
currency: 'CNY',
|
||||
units: [
|
||||
{ name: 'textInput', rate: 1, strategy: 'fixed', unit: 'millionTokens' },
|
||||
{ name: 'textOutput', rate: 4, strategy: 'fixed', unit: 'millionTokens' },
|
||||
],
|
||||
},
|
||||
releasedAt: '2025-09-27',
|
||||
type: 'chat',
|
||||
},
|
||||
{
|
||||
abilities: {
|
||||
functionCall: true,
|
||||
|
||||
@@ -145,25 +145,6 @@ exports[`OpenAIResponsesStream > Reasoning > summary 1`] = `
|
||||
]
|
||||
`;
|
||||
|
||||
exports[`OpenAIResponsesStream > should handle chunk errors in catch block 1`] = `
|
||||
[
|
||||
"id: resp_error_catch
|
||||
",
|
||||
"event: data
|
||||
",
|
||||
"data: "in_progress"
|
||||
|
||||
",
|
||||
"id: undefined
|
||||
",
|
||||
"event: reasoning
|
||||
",
|
||||
"data: undefined
|
||||
|
||||
",
|
||||
]
|
||||
`;
|
||||
|
||||
exports[`OpenAIResponsesStream > should handle chunks with undefined values gracefully 1`] = `
|
||||
[
|
||||
"id: resp_undefined_vals
|
||||
@@ -546,25 +527,6 @@ exports[`OpenAIResponsesStream > should handle response.reasoning_summary_text.d
|
||||
]
|
||||
`;
|
||||
|
||||
exports[`OpenAIResponsesStream > should handle stream chunk transformation error with null access 1`] = `
|
||||
[
|
||||
"id: resp_error_test
|
||||
",
|
||||
"event: data
|
||||
",
|
||||
"data: "in_progress"
|
||||
|
||||
",
|
||||
"id: null
|
||||
",
|
||||
"event: text
|
||||
",
|
||||
"data: "test"
|
||||
|
||||
",
|
||||
]
|
||||
`;
|
||||
|
||||
exports[`OpenAIResponsesStream > should handle unknown chunk type as data 1`] = `
|
||||
[
|
||||
"id: resp_unknown
|
||||
|
||||
@@ -17,11 +17,11 @@ export const LobeMinimaxAI = createOpenAICompatibleRuntime({
|
||||
|
||||
const minimaxTools = enabledSearch
|
||||
? [
|
||||
...(tools || []),
|
||||
{
|
||||
type: 'web_search',
|
||||
},
|
||||
]
|
||||
...(tools || []),
|
||||
{
|
||||
type: 'web_search',
|
||||
},
|
||||
]
|
||||
: tools;
|
||||
|
||||
// Resolve parameters with constraints
|
||||
|
||||
@@ -830,9 +830,9 @@ describe('LobeSearch1API - custom features', () => {
|
||||
it('should handle mix of known and unknown models', async () => {
|
||||
mockClient.models.list.mockResolvedValue({
|
||||
data: [
|
||||
{ id: 'gpt-4o-mini' },
|
||||
{ id: 'gpt-4o' },
|
||||
{ id: 'unknown-model-1' },
|
||||
{ id: 'claude-3-5-sonnet-20241022' },
|
||||
{ id: 'gpt-4o-mini' },
|
||||
{ id: 'unknown-model-2' },
|
||||
],
|
||||
});
|
||||
|
||||
@@ -3,36 +3,55 @@ import { NetworkConnectionError, PageNotFoundError, TimeoutError } from '../util
|
||||
import { DEFAULT_TIMEOUT, withTimeout } from '../utils/withTimeout';
|
||||
|
||||
interface FirecrawlMetadata {
|
||||
description: string;
|
||||
keywords: string;
|
||||
language: string;
|
||||
description?: string;
|
||||
error?: string;
|
||||
keywords?: string;
|
||||
language?: string;
|
||||
ogDescription?: string;
|
||||
ogImage?: string;
|
||||
ogLocaleAlternate?: string[];
|
||||
ogSiteName?: string;
|
||||
ogTitle?: string;
|
||||
ogUrl?: string;
|
||||
robots: string;
|
||||
statusCode: number;
|
||||
robots?: string;
|
||||
sourceURL: string;
|
||||
title: string;
|
||||
statusCode: number;
|
||||
title?: string;
|
||||
}
|
||||
|
||||
interface FirecrawlResults {
|
||||
actions?: {
|
||||
javascriptReturns?: Array<{ type: string; value: any }>;
|
||||
pdfs?: string[];
|
||||
scrapes?: Array<{ html: string; url: string }>;
|
||||
screenshots?: string[];
|
||||
};
|
||||
changeTracking?: {
|
||||
changeStatus?: string;
|
||||
diff?: string;
|
||||
json?: Record<string, any>;
|
||||
previousScrapeAt?: string;
|
||||
visibility?: string;
|
||||
};
|
||||
html?: string;
|
||||
links?: string[];
|
||||
markdown?: string;
|
||||
metadata: FirecrawlMetadata;
|
||||
rawHtml?: string;
|
||||
screenshot?: string;
|
||||
summary?: string;
|
||||
warning?: string;
|
||||
}
|
||||
|
||||
interface FirecrawlResponse {
|
||||
success: boolean;
|
||||
data: FirecrawlResults;
|
||||
success: boolean;
|
||||
}
|
||||
|
||||
export const firecrawl: CrawlImpl = async (url) => {
|
||||
// Get API key from environment variable
|
||||
const apiKey = process.env.FIRECRAWL_API_KEY;
|
||||
const baseUrl = process.env.FIRECRAWL_URL || 'https://api.firecrawl.dev/v1';
|
||||
const baseUrl = process.env.FIRECRAWL_URL || 'https://api.firecrawl.dev/v2';
|
||||
|
||||
let res: Response;
|
||||
|
||||
@@ -40,7 +59,7 @@ export const firecrawl: CrawlImpl = async (url) => {
|
||||
res = await withTimeout(
|
||||
fetch(`${baseUrl}/scrape`, {
|
||||
body: JSON.stringify({
|
||||
formats: ["markdown"], // ["markdown", "html"]
|
||||
formats: ['markdown'], // ["markdown", "html"]
|
||||
url,
|
||||
}),
|
||||
headers: {
|
||||
@@ -75,6 +94,14 @@ export const firecrawl: CrawlImpl = async (url) => {
|
||||
try {
|
||||
const data = (await res.json()) as FirecrawlResponse;
|
||||
|
||||
if (data.data.warning) {
|
||||
console.warn('[Firecrawl] Warning:', data.data.warning);
|
||||
}
|
||||
|
||||
if (data.data.metadata.error) {
|
||||
console.error('[Firecrawl] Metadata error:', data.data.metadata.error);
|
||||
}
|
||||
|
||||
// Check if content is empty or too short
|
||||
if (!data.data.markdown || data.data.markdown.length < 100) {
|
||||
return;
|
||||
@@ -83,14 +110,14 @@ export const firecrawl: CrawlImpl = async (url) => {
|
||||
return {
|
||||
content: data.data.markdown,
|
||||
contentType: 'text',
|
||||
description: data.data.metadata.description,
|
||||
description: data.data.metadata.description || '',
|
||||
length: data.data.markdown.length,
|
||||
siteName: new URL(url).hostname,
|
||||
title: data.data.metadata.title,
|
||||
title: data.data.metadata.title || '',
|
||||
url: url,
|
||||
} satisfies CrawlSuccessResult;
|
||||
} catch (error) {
|
||||
console.error(error);
|
||||
console.error('[Firecrawl] Parse error:', error);
|
||||
}
|
||||
|
||||
return;
|
||||
|
||||
@@ -48,29 +48,6 @@ const Anthropic: ModelProviderCard = {
|
||||
maxOutput: 8192,
|
||||
releasedAt: '2024-11-05',
|
||||
},
|
||||
{
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet',
|
||||
enabled: true,
|
||||
functionCall: true,
|
||||
id: 'claude-3-5-sonnet-20241022',
|
||||
maxOutput: 8192,
|
||||
releasedAt: '2024-10-22',
|
||||
vision: true,
|
||||
},
|
||||
{
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet 0620',
|
||||
functionCall: true,
|
||||
id: 'claude-3-5-sonnet-20240620',
|
||||
maxOutput: 8192,
|
||||
releasedAt: '2024-06-20',
|
||||
vision: true,
|
||||
},
|
||||
{
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
|
||||
@@ -1298,29 +1298,6 @@ const Higress: ModelProviderCard = {
|
||||
maxOutput: 8192,
|
||||
releasedAt: '2024-11-05',
|
||||
},
|
||||
{
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet',
|
||||
enabled: true,
|
||||
functionCall: true,
|
||||
id: 'claude-3-5-sonnet-20241022',
|
||||
maxOutput: 8192,
|
||||
releasedAt: '2024-10-22',
|
||||
vision: true,
|
||||
},
|
||||
{
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
'Claude 3.5 Sonnet 提供了超越 Opus 的能力和比 Sonnet 更快的速度,同时保持与 Sonnet 相同的价格。Sonnet 特别擅长编程、数据科学、视觉处理、代理任务。',
|
||||
displayName: 'Claude 3.5 Sonnet 0620',
|
||||
functionCall: true,
|
||||
id: 'claude-3-5-sonnet-20240620',
|
||||
maxOutput: 8192,
|
||||
releasedAt: '2024-06-20',
|
||||
vision: true,
|
||||
},
|
||||
{
|
||||
contextWindowTokens: 200_000,
|
||||
description:
|
||||
|
||||
@@ -12,7 +12,7 @@ const Minimax: ModelProviderCard = {
|
||||
settings: {
|
||||
disableBrowserRequest: true, // CORS error
|
||||
proxyUrl: {
|
||||
placeholder: 'https://api.minimax.chat/v1',
|
||||
placeholder: 'https://api.minimaxi.com/v1',
|
||||
},
|
||||
responseAnimation: {
|
||||
speed: 2,
|
||||
|
||||
@@ -28,7 +28,7 @@ const Qiniu: ModelProviderCard = {
|
||||
name: 'Qiniu',
|
||||
settings: {
|
||||
proxyUrl: {
|
||||
placeholder: 'https://api.qnaigc.com/v1',
|
||||
placeholder: 'https://openai.qiniu.com/v1',
|
||||
},
|
||||
sdkType: 'openai',
|
||||
showModelFetcher: true,
|
||||
|
||||
@@ -26,7 +26,7 @@ export class FirecrawlImpl implements SearchServiceImpl {
|
||||
|
||||
private get baseUrl(): string {
|
||||
// Assuming the base URL is consistent with the crawl endpoint
|
||||
return process.env.FIRECRAWL_URL || 'https://api.firecrawl.dev/v1';
|
||||
return process.env.FIRECRAWL_URL || 'https://api.firecrawl.dev/v2';
|
||||
}
|
||||
|
||||
async query(query: string, params: SearchParams = {}): Promise<UniformSearchResponse> {
|
||||
@@ -34,13 +34,14 @@ export class FirecrawlImpl implements SearchServiceImpl {
|
||||
const endpoint = urlJoin(this.baseUrl, '/search');
|
||||
|
||||
const defaultQueryParams: FirecrawlSearchParameters = {
|
||||
limit: 15,
|
||||
limit: 20,
|
||||
query,
|
||||
/*
|
||||
scrapeOptions: {
|
||||
formats: ["markdown"]
|
||||
},
|
||||
*/
|
||||
sources: [{ type: 'web' }, { type: 'news' }],
|
||||
};
|
||||
|
||||
let body: FirecrawlSearchParameters = {
|
||||
@@ -95,25 +96,64 @@ export class FirecrawlImpl implements SearchServiceImpl {
|
||||
|
||||
log('Parsed Firecrawl response: %o', firecrawlResponse);
|
||||
|
||||
const mappedResults = (firecrawlResponse.data || []).map(
|
||||
// V2 API returns data as object with web/images/news arrays
|
||||
const webResults = firecrawlResponse.data.web || [];
|
||||
const imageResults = firecrawlResponse.data.images || [];
|
||||
const newsResults = firecrawlResponse.data.news || [];
|
||||
|
||||
// Map web results
|
||||
const mappedWebResults = webResults.map(
|
||||
(result): UniformSearchResult => ({
|
||||
category: 'general', // Default category
|
||||
content: result.description || '', // Prioritize content, fallback to snippet
|
||||
engines: ['firecrawl'], // Use 'firecrawl' as the engine name
|
||||
parsedUrl: result.url ? new URL(result.url).hostname : '', // Basic URL parsing
|
||||
score: 1, // Default score to 1
|
||||
category: 'general',
|
||||
content: result.description || result.markdown || '',
|
||||
engines: ['firecrawl'],
|
||||
parsedUrl: result.url ? new URL(result.url).hostname : '',
|
||||
score: 1,
|
||||
title: result.title || '',
|
||||
url: result.url,
|
||||
}),
|
||||
);
|
||||
|
||||
log('Mapped %d results to SearchResult format', mappedResults.length);
|
||||
// Map news results
|
||||
const mappedNewsResults = newsResults.map(
|
||||
(result): UniformSearchResult => ({
|
||||
category: 'news',
|
||||
content: result.snippet || result.markdown || '',
|
||||
engines: ['firecrawl'],
|
||||
parsedUrl: result.url ? new URL(result.url).hostname : '',
|
||||
score: 1,
|
||||
title: result.title || '',
|
||||
url: result.url,
|
||||
}),
|
||||
);
|
||||
|
||||
// Map image results
|
||||
const mappedImageResults = imageResults.map(
|
||||
(result): UniformSearchResult => ({
|
||||
category: 'images',
|
||||
content: result.title || '',
|
||||
engines: ['firecrawl'],
|
||||
parsedUrl: result.url ? new URL(result.url).hostname : '',
|
||||
score: 1,
|
||||
title: result.title || '',
|
||||
url: result.imageUrl, // Use imageUrl for images
|
||||
}),
|
||||
);
|
||||
|
||||
// Combine all results
|
||||
const allResults = [...mappedWebResults, ...mappedNewsResults, ...mappedImageResults];
|
||||
|
||||
log('Mapped %d results to SearchResult format', allResults.length);
|
||||
|
||||
if (firecrawlResponse.warning) {
|
||||
log.extend('warn')('Firecrawl warning: %s', firecrawlResponse.warning);
|
||||
}
|
||||
|
||||
return {
|
||||
costTime,
|
||||
query: query,
|
||||
resultNumbers: mappedResults.length,
|
||||
results: mappedResults,
|
||||
resultNumbers: allResults.length,
|
||||
results: allResults,
|
||||
};
|
||||
} catch (error) {
|
||||
log.extend('error')('Error parsing Firecrawl response: %o', error);
|
||||
|
||||
@@ -1,35 +1,86 @@
|
||||
// V2 API Types
|
||||
interface FirecrawlScrapeOptions {
|
||||
formats: string[];
|
||||
blockAds?: boolean;
|
||||
formats?: string[];
|
||||
maxAge?: number;
|
||||
onlyMainContent?: boolean;
|
||||
removeBase64Images?: boolean;
|
||||
}
|
||||
|
||||
type FirecrawlSource =
|
||||
| { location?: string; tbs?: string; type: 'web' }
|
||||
| { type: 'images' }
|
||||
| { type: 'news' };
|
||||
|
||||
type FirecrawlCategory = { type: 'github' } | { type: 'research' } | { type: 'pdf' };
|
||||
|
||||
export interface FirecrawlSearchParameters {
|
||||
categories?: FirecrawlCategory[];
|
||||
country?: string;
|
||||
lang?: string;
|
||||
ignoreInvalidURLs?: boolean;
|
||||
limit?: number;
|
||||
location?: string;
|
||||
query: string;
|
||||
scrapeOptions?: FirecrawlScrapeOptions;
|
||||
sources?: FirecrawlSource[];
|
||||
tbs?: string;
|
||||
timeout?: number;
|
||||
}
|
||||
|
||||
interface FirecrawlMetadata {
|
||||
description?: string;
|
||||
error?: string | null;
|
||||
sourceURL?: string;
|
||||
statusCode?: number;
|
||||
title: string;
|
||||
title?: string;
|
||||
}
|
||||
|
||||
interface FirecrawlData {
|
||||
description?: string;
|
||||
html?: string;
|
||||
// Web search result
|
||||
interface FirecrawlWebResult {
|
||||
description: string;
|
||||
html?: string | null;
|
||||
links?: string[];
|
||||
markdown?: string;
|
||||
markdown?: string | null;
|
||||
metadata?: FirecrawlMetadata;
|
||||
title?: string;
|
||||
rawHtml?: string | null;
|
||||
screenshot?: string | null;
|
||||
title: string;
|
||||
url: string;
|
||||
}
|
||||
|
||||
export interface FirecrawlResponse {
|
||||
data: FirecrawlData[];
|
||||
success?: boolean;
|
||||
// Image search result
|
||||
interface FirecrawlImageResult {
|
||||
imageHeight: number;
|
||||
imageUrl: string;
|
||||
imageWidth: number;
|
||||
position: number;
|
||||
title: string;
|
||||
url: string;
|
||||
}
|
||||
|
||||
// News search result
|
||||
interface FirecrawlNewsResult {
|
||||
date: string;
|
||||
html?: string | null;
|
||||
imageUrl?: string;
|
||||
links?: string[];
|
||||
markdown?: string | null;
|
||||
metadata?: FirecrawlMetadata;
|
||||
position: number;
|
||||
rawHtml?: string | null;
|
||||
screenshot?: string | null;
|
||||
snippet: string;
|
||||
title: string;
|
||||
url: string;
|
||||
}
|
||||
|
||||
// V2 Response structure
|
||||
export interface FirecrawlResponse {
|
||||
data: {
|
||||
images?: FirecrawlImageResult[];
|
||||
news?: FirecrawlNewsResult[];
|
||||
web?: FirecrawlWebResult[];
|
||||
};
|
||||
success: boolean;
|
||||
warning?: string | null;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user