Voice Channel (Phone)
Connect a phone number to your agent via Vapi — setup guide, voice modes, model options, and tips.
Tip: Your agent handles phone calls with the same tools, skills, credentials, and context it has in the dashboard. The only difference is the interface — voice instead of text.
Connecting Voice
The Voice channel uses Vapi to handle phone calls. Vapi manages the telephony layer (STT → LLM → TTS) while your agent provides the intelligence.
Step 1: Create a Vapi Account
- Go to vapi.ai and create an account
- In the Vapi dashboard, navigate to Dashboard → API Keys
- Copy your API key — you'll need it when connecting in Communa
Step 2: Add a Phone Number in Vapi
You need a phone number for callers to reach your agent:
- In the Vapi dashboard, go to Phone Numbers
- Click Add Phone Number
- Choose a provider — Vapi offers built-in numbers, or you can connect your own via Twilio or Vonage
- Follow the prompts to provision a number
Info: Vapi's built-in numbers are the easiest way to get started. For production use with specific area codes or international numbers, connect a Twilio or Vonage account.
Step 3: Connect in Communa
- Go to your agent → Channels tab
- Click Connect Channel → select the Voice tab
- Paste your Vapi API key
- Select a phone number from the dropdown (Communa fetches your available numbers automatically)
- Optionally configure:
- Greeting message — The first thing callers hear (default: "Hi! How can I help you today?")
- STT language — The language for speech-to-text recognition (default: English)
- Conversation model — The LLM that powers the voice conversation (default: GPT-4o)
- Click Connect Voice
Communa automatically registers the webhook on your Vapi phone number — no manual webhook configuration needed (unlike WhatsApp).
Test Your Connection
Call the phone number shown on your connection card. Your agent should pick up, speak the greeting message, and be ready to converse.
How It Works
The Voice channel uses Vapi's native LLM integration with smart turn-taking for natural conversations:
- Caller speaks → Vapi transcribes speech to text (STT)
- Vapi sends text directly to the selected LLM (low latency, native integration)
- LLM responds → Vapi converts to speech (TTS) → played to caller
- When the caller needs the agent to do something (check emails, run a script, look up data), the LLM invokes the
agent_actiontool — your full agent pipeline runs and returns the result
This gives you natural conversational flow for simple interactions, with the full power of your agent's tools (sandbox, credentials, skills, email, web search, etc.) available when needed.
Model Options
Choose which LLM powers the voice conversation:
OpenAI
| Model | Notes |
|---|---|
| GPT-5.4 | Latest flagship |
| GPT-5.4 Mini | Fast, cost-effective |
| GPT-5.4 Nano | Ultra-lightweight |
| GPT-5.2 | Previous generation |
| GPT-5.1 | Previous generation |
| GPT-5 | First GPT-5 release |
| GPT-5 Mini | Compact GPT-5 |
| GPT-5 Nano | Lightweight GPT-5 |
| o3 | Reasoning model |
| o4 Mini | Compact reasoning |
| GPT-4o | Reliable all-rounder |
| GPT-4o Mini | Fast and affordable |
Anthropic
| Model | Notes |
|---|---|
| Claude Sonnet 4 | Balanced performance |
| Claude Sonnet 4.5 | Enhanced capabilities |
| Claude Haiku 4.5 | Fast, cost-effective |
| Claude Opus 4 | Most capable |
| Claude 3.5 Sonnet | Previous generation |
| Claude 3.5 Haiku | Previous generation |
| Model | Notes |
|---|---|
| Gemini 2.5 Flash | Fast and capable |
| Gemini 2.5 Pro | Most capable |
| Gemini 2.0 Flash | Previous generation |
| Gemini 1.5 Flash | Lightweight |
| Gemini 1.5 Pro | Previous generation |
Info: The conversation model handles turn-taking and general responses. When tools are invoked, the agent uses its configured model from the Chat settings.
Voice Settings
The Settings tab includes a Voice section where you can customize the AI's behavior during phone calls.
Voice Processing Instructions
These are the primary instructions your agent receives when handling calls. They control:
- How the agent greets and interacts with callers
- When and how to use tools during calls
- Language preferences and multilingual behavior
- Conversation style and tone
The default instructions configure a natural, multilingual conversational style. You can customize them for your specific use case — for example, making the agent always respond in a specific language, follow a script, or prioritize certain tools.
Tip: Voice output rules (no markdown, spell out numbers, no emojis) are added automatically as guardrails. You don't need to include those in your custom instructions.
Reset to Defaults
If you've customized the instructions and want to start fresh, click the Reset button to restore the default voice processing instructions.
Features
Auto-Wake
When your agent is sleeping and a call comes in:
- The sandbox is automatically provisioned — no dashboard visit needed
- The greeting message plays while the sandbox warms up
- The agent is ready to handle the call with full capabilities
This means your agent is effectively always reachable by phone, even when its sandbox is shut down to save resources.
Greeting Message
The first thing callers hear when the call connects. Configurable during setup or in the connection settings. Supports personalization — if the caller's name is available (from caller ID), it's included automatically.
Speech-to-Text (STT)
Powered by Deepgram Nova-3. Configure the primary language during setup:
- English (default), Hebrew, Spanish, French, German, Arabic, and many more
- Language selection optimizes recognition accuracy for the primary spoken language
- The agent itself can respond in any language based on its instructions
Text-to-Speech (TTS)
Configurable voice provider and voice ID. Default: OpenAI Alloy — a natural, conversational voice. You can change the TTS provider and voice in the connection configuration.
Call Duration
Default maximum: 30 minutes per call. Configurable in the connection settings. After the maximum duration, the call ends gracefully.
Call Recording
Enabled by default. Call recordings are captured by Vapi and available for review. Useful for quality assurance and training.
End-Call Function
The agent can hang up the call when appropriate — for example, after saying goodbye or when the caller's needs are fully addressed. This is handled automatically by Vapi's end-call function.
Silence Timeout
If neither party speaks for 30 seconds, the call ends automatically. This prevents abandoned calls from consuming resources.
How Voice Differs from Text Channels
| Feature | Telegram / WhatsApp | Voice |
|---|---|---|
| Message format | Text + attachments | Spoken audio (STT/TTS) |
| File attachments | ✅ Photos, docs, videos | ❌ Audio only |
send_channel_message | ✅ Used for outbound messages | ❌ Vapi handles audio delivery |
| Bot commands | ✅ Telegram: /start, /stop, /help | ❌ Not applicable |
| Group chats | ✅ Telegram groups | ❌ 1:1 calls only |
| Conversation history | Messages appear in dashboard chat | Call transcript appears after call ends |
| Latency | Near-instant text delivery | Low latency with smart turn-taking |
Tips & Best Practices
- Choose the right model — GPT-4o is a great default. For simpler use cases, GPT-4o Mini offers faster responses at lower cost.
- Test your greeting message — Call your agent and listen to the first impression. A good greeting sets the tone for the entire call.
- Keep voice instructions focused — Unlike text chat, callers can't scroll back. Instruct your agent to be concise and confirm understanding.
- Set the right STT language — If your callers primarily speak a non-English language, set the STT language accordingly for better recognition accuracy.
- Combine with other channels — An agent can handle phone calls during business hours and process Telegram/WhatsApp messages anytime. Use the voice channel for high-touch interactions and text channels for async communication.
- Monitor from the dashboard — While a call is in progress, you can observe the agent's actions in the dashboard chat in real time.
What's Next?
- Channels Overview — Shared channel features, auto-wake, and connection management
- Telegram Channel — Connect your agent via Telegram
- WhatsApp Channel — Connect your agent via WhatsApp
- Chat & Sandbox — The dashboard workspace for direct agent interaction