LinconwavesLinconwavesUnified docs
Ai workers

Auto (multimodal router)

One endpoint that routes text, vision, audio, and document tasks to the right Workers AI models.

Auto is a multimodal “router” endpoint that inspects each request and:

Endpoint

  • URL: POST /auto
  • Auth: Authorization: Bearer <api_key>
  • Content-Type: application/json
  • Base URL: https://aiworker.linconwaves.com (or your deployment)

Capabilities

  • Text reasoning: uses @cf/openai/gpt-oss-120b for intent, planning, and replies.
  • Vision: uses llava-1p5-7b-hf, then falls back to uform-gen2-qwen-500m to describe images.
  • Audio (STT): uses whisper-large-v3-turbo (or whisper fallback) to transcribe audio files.
  • Document parsing: extracts text from PDFs and plaintext uploads; summarizes and proposes follow-ups.
  • File generation (on request): can generate Markdown, PDF, DOCX, XLSX, or CSV from the latest assistant content.
  • TTS (on request): can render the latest assistant content as audio with aura-2-en.

When to use

Use /auto when you want a single endpoint to handle:

  • Pure text chat and reasoning.
  • Images that need a description or follow-up questions.
  • Audio files that need transcription and next-step guidance.
  • Documents (PDF/Doc/Excel/text) that need quick summaries or extractions.
  • On-demand file generation or TTS after the assistant has produced content.

Inputs

Send a JSON body with:

{
  "messages": [{ "role": "user", "content": "your text" }],
  "attachments": [
    {
      "name": "file.png",
      "mime": "image/png",
      "type": "image",
      "data": "<data-url-or-base64>"
    }
  ],
  "conversationId": "optional-conversation-id"
}
  • messages: chat-style array (roles: user, assistant, system).
  • attachments: optional array of uploaded files. Supported type values: image, audio, video, file. PDFs and common office docs go in type: "file".
  • conversationId: optional; when provided, Auto will thread memory via chat history.

Outputs

Auto returns a standard chat-like JSON response with:

  • response: assistant text.
  • generatedAttachments (optional): files produced on request (md/pdf/docx/xlsx/csv or audio).
  • conversationId / conversationSlug: conversation tracking.
  • memoryMessages: recent messages used for summarization.

Binary media (e.g., generated audio) is uploaded to storage and returned as attachment metadata (type, mime, size, r2Key, url).

Behavior notes

  • Vision payloads are sent as compressed image: number[], matching Workers AI vision schema.
  • Audio is transcoded to 16 kHz mono WAV for STT; TTS uses aura-2-en.
  • Document parsing uses PDF text extraction (where possible) or a safe text preview, then summarizes.
  • File generation and TTS only happen when explicitly requested in the user text (e.g., “generate a pdf”, “make audio”), and only after the assistant produces content.
  • If vision fails to read an image, the assistant will ask for clarification/re-upload instead of guessing.

Example request

curl -X POST https://aiworker.linconwaves.com/auto \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "What is in this image?" }
    ],
    "attachments": [
      {
        "name": "photo.png",
        "type": "image",
        "mime": "image/png",
        "data": "data:image/png;base64,iVBORw0KGgo..."
      }
    ]
  }'

Auto will:

  1. Describe the image via LLaVA (fallback UForm).
  2. Pass the description into GPT-OSS reasoning.
  3. Reply with a concise description and a follow-up question.