mirror of
https://github.com/GoogleChrome/chrome-extensions-samples.git
synced 2026-03-26 13:19:49 +07:00
* Migrate to latest version of the language model * Update readme to better describe the sample. * More readme updates * Consistent API naming and format lists
1.5 KiB
1.5 KiB
Audio-Scribe: Transcribe audio messages with Chrome's multimodal Prompt API
This sample demonstrates how to use Chrome's built-in AI APIs to transcribe audio messages directly in the browser. It uses:
- Prompt API with multimodal audio input (Gemini Nano) for on-device speech-to-text transcription
Overview
Audio-Scribe adds a side panel that automatically transcribes audio messages from chat applications. When activated, it:
- Monitors the page for audio blobs created via
URL.createObjectURL. - Detects audio content and sends it to Gemini Nano for transcription.
- Streams the transcribed text in real-time to the side panel.
- Works with messaging apps like WhatsApp Web that use blob URLs for audio messages.
Running this extension
- Clone this repository.
- Load this directory in Chrome as an unpacked extension.
- Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the included demo chat app:
npx serve demo-chat-app - Open the Audio-Scribe side panel by clicking the extension icon or pressing
Alt+A. - Play or load audio messages in the chat - they will be automatically transcribed in the side panel.
