All Posts

The Per-Minute Calling Myth: The Hidden Economics of Voice and Video Calling APIs

Why do most communication platforms charge per-minute fees even when peer-to-peer calls never traverse their infrastructure? A technical look at how modern WebRTC systems actually work, and why per-minute billing for internet communication is increasingly difficult to justify — technically and ethically.

One of the most common questions we get from developers evaluating mesibo is:

“Why do you not charge for voice and video calls like other platforms? What is the catch?”

There is no catch.

We do not charge for peer-to-peer voice and video calls because, in the majority of calls, the media never traverses our infrastructure.

Therefore, charging per-minute for traffic we are not carrying is not a pricing model — it is margin.

This is also why platforms like WhatsApp and Telegram can support billions of calls globally without charging users per minute. Modern internet communication is architecturally different from traditional telecom networks, but many CPaaS pricing models still resemble telecom-era billing.

Let’s understand.

How Peer-to-Peer Calling Actually Works

Most modern calling systems use WebRTC.

A call starts with signaling:

Phase 1: Signaling

 User A  <--->  Signaling Server  <--->  User B

 Exchange:
 - SDP negotiation
 - ICE candidates
 - Presence

This phase is lightweight. It mainly involves metadata exchange and connection setup. Users are generally authenticated before signaling starts.

Once negotiation succeeds, the media path typically becomes direct:

Phase 2: Direct Peer-to-Peer Media

 User A  <=============================>  User B

 Audio/video flows directly between devices.
 No media passes through provider infrastructure.

This is not visible in the pricing model.

In a true peer-to-peer session, the provider is not carrying any audio or video. They helped establish the connection. That’s it. The actual media bypasses their servers entirely.

What About TURN Servers?

While direct peer-to-peer connections are successful roughly 75% to 80% of the time in standard consumer internet environments, there are still cases where media relaying is required because of NAT restrictions, enterprise firewalls, carrier networks, or restrictive routers.

Phase 2 (Fallback): TURN Relay

 User A  <=====>  TURN Server  <=====>  User B

 Media is relayed through infrastructure.

TURN uses real resources — bandwidth, relay capacity, compute, server infrastructure. Charging for it is completely reasonable.

Note that TURN is not just a failure fallback. Used intelligently, it is also a performance tool. At mesibo, we can route a call through TURN even when a direct connection exists — for example, when the BGP-selected path between two peers is congested or suboptimal, and our TURN infrastructure offers a faster, more reliable route. We also use TURN briefly at call start to reduce setup latency, dropping the relay as soon as the best direct path is confirmed.

However, even in TURN-relayed scenarios, the real cost driver is bandwidth — not call duration — and, unless TURN is forced, the active TURN duration is often only a small fraction of the total call duration.

ScenarioActive TURN DurationTypical Frequency
Direct Peer-to-Peer SuccessNo TURN usage~75% of calls
Trickle ICE / Race ConditionsFirst 2–5 seconds~5–10% of calls
Network HandoffTemporaryCommon on mobile
Strict Symmetric NATEntire call duration~10–15% of calls

Media bandwidth is also adaptive — codecs, resolution, and bitrate scale continuously.

A flat per-minute fee is already a rough proxy for actual relay costs. Charging per-minute for fully peer-to-peer calls — where no media relay exists at all — becomes increasingly difficult to justify technically.

Per-Minute Billing: Blind-Copied From Cellular

Many CPaaS platforms inherited per-minute pricing models from legacy telecom operators. Cellular carriers at least have infrastructure that justifies that model: cell towers, licensed spectrum, radio access networks, nationwide mobility systems, and carrier interconnects.

A peer-to-peer WebRTC call is fundamentally different. The provider’s infrastructure requirements are comparatively modest, and for much of the call duration they may not be transporting any media at all.

Yet much of the video SDK market — Sendbird, Agora, and others — still charges per participant per minute.

The underlying routing mechanisms differ: some platforms always relay media, while others relay conditionally based on network topology. But none of that is visible in the bill. You get charged per minute regardless of whether your call traversed provider infrastructure for two seconds or the full duration.

Interestingly, Twilio’s now-discontinued peer-to-peer video rooms did make this distinction: one-to-one peer-to-peer calls were free — effectively the same position as mesibo.

The Ethical Question

Companies absolutely deserve to charge for what they build and operate:

  • SDK development and maintenance
  • Signaling infrastructure
  • TURN relay bandwidth (when actually used)
  • Cloud recording and storage
  • PSTN connectivity
  • Moderation and compliance tools
  • AI processing and analytics
  • Support

Those are real costs.

But charging recurring per-minute fees for direct peer-to-peer communication — where the media never traverses provider infrastructure — becomes increasingly difficult to justify technically.

At mesibo, we do not believe it is ethical to charge developers for infrastructure that is not actually being consumed.

So we do not charge for peer-to-peer voice and video calls. We charge for TURN bandwidth only when you use our TURN servers, and only for the duration and volume of traffic actually relayed. If you bring your own TURN infrastructure, there is no charge from us at all.

There is no catch.