Voice Channel (Phone)

Connect a phone number to your agent via Vapi — setup guide, voice modes, model options, and tips.

Tip: Your agent handles phone calls with the same tools, skills, credentials, and context it has in the dashboard. The only difference is the interface — voice instead of text.

Connecting Voice

The Voice channel uses Vapi to handle phone calls. Vapi manages the telephony layer (STT → LLM → TTS) while your agent provides the intelligence.

Step 1: Create a Vapi Account

  1. Go to vapi.ai and create an account
  2. In the Vapi dashboard, navigate to Dashboard → API Keys
  3. Copy your API key — you'll need it when connecting in Communa

Step 2: Add a Phone Number in Vapi

You need a phone number for callers to reach your agent:

  1. In the Vapi dashboard, go to Phone Numbers
  2. Click Add Phone Number
  3. Choose a provider — Vapi offers built-in numbers, or you can connect your own via Twilio or Vonage
  4. Follow the prompts to provision a number

Info: Vapi's built-in numbers are the easiest way to get started. For production use with specific area codes or international numbers, connect a Twilio or Vonage account.

Step 3: Connect in Communa

  1. Go to your agent → Channels tab
  2. Click Connect Channel → select the Voice tab
  3. Paste your Vapi API key
  4. Select a phone number from the dropdown (Communa fetches your available numbers automatically)
  5. Optionally configure:
    • Greeting message — The first thing callers hear (default: "Hi! How can I help you today?")
    • STT language — The language for speech-to-text recognition (default: English)
    • Conversation model — The LLM that powers the voice conversation (default: GPT-4o)
  6. Click Connect Voice

Communa automatically registers the webhook on your Vapi phone number — no manual webhook configuration needed (unlike WhatsApp).

Test Your Connection

Call the phone number shown on your connection card. Your agent should pick up, speak the greeting message, and be ready to converse.


How It Works

The Voice channel uses Vapi's native LLM integration with smart turn-taking for natural conversations:

  1. Caller speaks → Vapi transcribes speech to text (STT)
  2. Vapi sends text directly to the selected LLM (low latency, native integration)
  3. LLM responds → Vapi converts to speech (TTS) → played to caller
  4. When the caller needs the agent to do something (check emails, run a script, look up data), the LLM invokes the agent_action tool — your full agent pipeline runs and returns the result

This gives you natural conversational flow for simple interactions, with the full power of your agent's tools (sandbox, credentials, skills, email, web search, etc.) available when needed.


Model Options

Choose which LLM powers the voice conversation:

OpenAI

ModelNotes
GPT-5.4Latest flagship
GPT-5.4 MiniFast, cost-effective
GPT-5.4 NanoUltra-lightweight
GPT-5.2Previous generation
GPT-5.1Previous generation
GPT-5First GPT-5 release
GPT-5 MiniCompact GPT-5
GPT-5 NanoLightweight GPT-5
o3Reasoning model
o4 MiniCompact reasoning
GPT-4oReliable all-rounder
GPT-4o MiniFast and affordable

Anthropic

ModelNotes
Claude Sonnet 4Balanced performance
Claude Sonnet 4.5Enhanced capabilities
Claude Haiku 4.5Fast, cost-effective
Claude Opus 4Most capable
Claude 3.5 SonnetPrevious generation
Claude 3.5 HaikuPrevious generation

Google

ModelNotes
Gemini 2.5 FlashFast and capable
Gemini 2.5 ProMost capable
Gemini 2.0 FlashPrevious generation
Gemini 1.5 FlashLightweight
Gemini 1.5 ProPrevious generation

Info: The conversation model handles turn-taking and general responses. When tools are invoked, the agent uses its configured model from the Chat settings.


Voice Settings

The Settings tab includes a Voice section where you can customize the AI's behavior during phone calls.

Voice Processing Instructions

These are the primary instructions your agent receives when handling calls. They control:

  • How the agent greets and interacts with callers
  • When and how to use tools during calls
  • Language preferences and multilingual behavior
  • Conversation style and tone

The default instructions configure a natural, multilingual conversational style. You can customize them for your specific use case — for example, making the agent always respond in a specific language, follow a script, or prioritize certain tools.

Tip: Voice output rules (no markdown, spell out numbers, no emojis) are added automatically as guardrails. You don't need to include those in your custom instructions.

Reset to Defaults

If you've customized the instructions and want to start fresh, click the Reset button to restore the default voice processing instructions.


Features

Auto-Wake

When your agent is sleeping and a call comes in:

  1. The sandbox is automatically provisioned — no dashboard visit needed
  2. The greeting message plays while the sandbox warms up
  3. The agent is ready to handle the call with full capabilities

This means your agent is effectively always reachable by phone, even when its sandbox is shut down to save resources.

Greeting Message

The first thing callers hear when the call connects. Configurable during setup or in the connection settings. Supports personalization — if the caller's name is available (from caller ID), it's included automatically.

Speech-to-Text (STT)

Powered by Deepgram Nova-3. Configure the primary language during setup:

  • English (default), Hebrew, Spanish, French, German, Arabic, and many more
  • Language selection optimizes recognition accuracy for the primary spoken language
  • The agent itself can respond in any language based on its instructions

Text-to-Speech (TTS)

Configurable voice provider and voice ID. Default: OpenAI Alloy — a natural, conversational voice. You can change the TTS provider and voice in the connection configuration.

Call Duration

Default maximum: 30 minutes per call. Configurable in the connection settings. After the maximum duration, the call ends gracefully.

Call Recording

Enabled by default. Call recordings are captured by Vapi and available for review. Useful for quality assurance and training.

End-Call Function

The agent can hang up the call when appropriate — for example, after saying goodbye or when the caller's needs are fully addressed. This is handled automatically by Vapi's end-call function.

Silence Timeout

If neither party speaks for 30 seconds, the call ends automatically. This prevents abandoned calls from consuming resources.


How Voice Differs from Text Channels

FeatureTelegram / WhatsAppVoice
Message formatText + attachmentsSpoken audio (STT/TTS)
File attachments✅ Photos, docs, videos❌ Audio only
send_channel_message✅ Used for outbound messages❌ Vapi handles audio delivery
Bot commands✅ Telegram: /start, /stop, /help❌ Not applicable
Group chats✅ Telegram groups❌ 1:1 calls only
Conversation historyMessages appear in dashboard chatCall transcript appears after call ends
LatencyNear-instant text deliveryLow latency with smart turn-taking

Tips & Best Practices

  • Choose the right model — GPT-4o is a great default. For simpler use cases, GPT-4o Mini offers faster responses at lower cost.
  • Test your greeting message — Call your agent and listen to the first impression. A good greeting sets the tone for the entire call.
  • Keep voice instructions focused — Unlike text chat, callers can't scroll back. Instruct your agent to be concise and confirm understanding.
  • Set the right STT language — If your callers primarily speak a non-English language, set the STT language accordingly for better recognition accuracy.
  • Combine with other channels — An agent can handle phone calls during business hours and process Telegram/WhatsApp messages anytime. Use the voice channel for high-touch interactions and text channels for async communication.
  • Monitor from the dashboard — While a call is in progress, you can observe the agent's actions in the dashboard chat in real time.

What's Next?