# Audio-Scribe: Transcribe audio messages with Chrome's multimodal Prompt API

This sample demonstrates how to use Chrome's built-in AI APIs to transcribe audio messages directly in the browser. It uses:

- **[Prompt API](https://developer.chrome.com/docs/extensions/ai/prompt-api)** with multimodal audio input (Gemini Nano) for on-device speech-to-text transcription

## Overview

Audio-Scribe adds a side panel that automatically transcribes audio messages from chat applications. When activated, it:

1. Monitors the page for audio blobs created via `URL.createObjectURL`.
2. Detects audio content and sends it to Gemini Nano for transcription.
3. Streams the transcribed text in real-time to the side panel.
4. Works with messaging apps like WhatsApp Web that use blob URLs for audio messages.

## Running this extension

1. Clone this repository.
2. Load this directory in Chrome as an [unpacked extension](https://developer.chrome.com/docs/extensions/get-started/tutorial/hello-world#load-unpacked).
3. Open a chat app in the browser, for example https://web.whatsapp.com/. You can also run the included demo chat app:
   ```
   npx serve demo-chat-app
   ```
4. Open the Audio-Scribe side panel by clicking the extension icon or pressing `Alt+A`.
5. Play or load audio messages in the chat - they will be automatically transcribed in the side panel.

![Screenshot displaying a demo chat app with a few audio messages. On the right, there is the audio-scribe extension's sidepanel which displayes the transcribed text messages](assets/screenshot.png)