Wired for sound: How SIP won the VoIP protocol wars

Wired for sound: How SIP won the VoIP protocol wars

Update: We’re in the last throes of winter break 2019, which means most Ars’ home office phones can stay dormant for a few more days. As such, we’ve been resurfacing a few classics from the archives—the latest being this look at how SIP (Session Initiation Protocol) won the VoIP protocol wars once upon a time. This story first appeared on December 8, 2009, and it appears unchanged below.

As an industry grows, it is quite common to find multiple solutions that all attempt to address similar requirements. This evolution dictates that these proposed standards go through a stage of selection—over time, we see some become more dominant than others. Today, the Session Initiation Protocol (SIP) is clearly one of the dominant VoIP protocols, but that obviously didn’t happen overnight. In this article, the first of a series of in-depth articles exploring SIP and VoIP, we’ll look at the main factors that led to this outcome.

A brief history of VoIP

Let’s go back to 1995 in the days prior to Google, IM, and even broadband. Cell phones were large and bulky, Microsoft had developed a new Windows interface with a “Start” button, and Netscape had the most popular Web browser. The growth of the Internet and data networks prompted many to realize that it’s possible to use the new networks to serve our voice communication needs while substantially lowering the associated cost. The first commercial solution of Internet VoIP came from a company called VocalTec; their software allowed two people to talk with each other over the Internet. One would make a local call to an ISP via a 28.8K or 36.6K modem and be able to talk with friends even if they lived far away. I remember trying out this software, and the sound was definitely below acceptable quality. (It frequently sounded like you were attempting to speak while submerged in a swimming pool.) However, the software successfully connected two people and introduced real-time voice conversation for a bandwidth-constrained network.

It was immediately apparent to the first VoIP implementors that there are several differences between the telephone network and the data network. One of them is the message exchange design. The phone system works in circuit-switch, where a circuit is the complete path between two endpoints. Thus, it is possible to guarantee a single path for all messages in a single communication. The data network works with packets, where various hops along the way help to route the packets to their final destination, and this path may change from one packet to the other. Because of this structure, the data network cannot guarantee that the packets of a single session will traverse through the same path. VoIP therefore required some new innovations before it could really get off the ground.

To start a call, you need a VoIP signaling protocol. The term “signaling” comes from the circuit-switch telephone communication world. In this system, we have signals sent from one end to the other in order to communicate and allow us to talk over vast distances. The role of a signaling protocol is to define the way these messages are structured and the rules that let us start, configure, and end conversation. It is worth it to point out that signaling messages do not include the voice one hears (the media of the call). The signaling protocol may include the media streams information and their attributes, but the speech itself in a voice call is not a signaling message. If you’re looking for a very high-level explanation, just think of signaling as the messages a device sends when you dial or hang up the phone.

So the race was on to create a new signaling protocol. Some of these protocol specifications were open for everyone to implement, and others were vendor-proprietary solutions. And that race still isn’t quite over, as we’re constantly seeing new proposals that attempt to convince everyone that there’s a better way to do things. A VoIP signaling protocol must show how it integrates with the data network; this includes aspects such as defining a method of locating the communication devices, specifying server behavior, introducing new services, and security design.

SIP protocol design

SIP is an Internet Engineering Task Force (IETF) protocol and as such, it was designed to be an open Internet protocol. Its first release was in 1999, defined by RFC 2543, but its early drafts date back to 1996. It had some of its definitions revised later in 2002 by RFC 3261.

Let’s look at a simple SIP request:

INVITE sip:hannibal@arstechnica.com SIP/2.0
Via: SIP/2.0/UDP home.mynetwork.org;branch=z9hG4bK8uf35f
To: Jon Stokes <sip:hannibal@arstechnica.com>
From: Gilad <sip:gilad@voxisoft.com>;tag=n23ycs
Call-ID: nbo34tsggvsqap@home.mynetwork.org
CSeq: 59164 INVITE
Contact: sip:gilad@voxisoft.com
Max-Forwards: 70

SIP is text-based. Notice the addresses are very similar to email addresses. Although SIP can support telephone numbers, the basic idea is that the addresses do not have to be phone numbers, just as you would not expect your email address to look like your home or work address. A SIP message might resemble the following (partial) example:

GET /reviews/ HTTP/1.1
Host: arstechnica.com
User-Agent: Gecko/Firefox/3.5.5

Thus, SIP is quite similar to HTTP. The first line is the request line, which contains information regarding the type of request (GET in HTTP and INVITE in SIP for these examples) and the intended address, while subsequent lines are headers with additional information. Naturally, responses in SIP also look very similar to HTTP responses. The idea is to use the structure of one of the most popular Internet protocols and make it easier for software developers and network managers to work with SIP.

These attempts to make SIP as easy as HTTP worked out to some extent, but the requirements of SIP addresses are more complex than HTTP, so the protocol is more complex. For example, it is a basic requirement in SIP to be able to have 2-way symmetric communication, whereas a typical HTTP scenario would be a client making requests to a server and the server sending a response. Even without prior HTTP knowledge, learning this message structure is a very easy task.

For those who are wondering, the SIP example above is the first packet one might send when calling from a SIP phone to Ars Technica’s Deputy Editor, Jon Stokes. I will refrain from going into the technical details of the message contents at this time, as this is a subject for a separate article.

Reuse, and keeping it simple

The role of a signaling protocol is to define the way these messages are structured and the rules that let us start, configure, and end conversation.

Another important factor in SIP’s design was the decision to reuse other existing Internet standards as much as possible. Address location uses DNS, user authentication uses HTTP digest authentication, setting the call media streams uses the Session Description Protocol (SDP), encryption uses TLS and, when applicable, users send each other XML information. This integration further helped establish SIP as part of the Internet protocol world, and vendors could reuse existing implementations in their SIP applications. On the other hand, in some cases the IETF had to make additional definitions in other protocols in order to serve SIP needs.

Keeping the complexity of the servers, especially the proxies, along the call path as minimal as possible is also an emphasis in SIP’s design. SIP Proxies route the messages between the calling parties. The proxies defined in the standard are not aware of the call state, but rather operate on the transaction level and may also be stateless. This helps with scalability, because fewer devices can serve more calls. To do that, the protocol itself was separated to several distinct layers, a common practice programmers use to break down a complex system. This design helps to further simplify SIP and make it easier to implement. At times, keeping this minimal state forced some limitations (and later, some changes in the protocol), but these byproducts were kept to a minimum.

Finally, and perhaps most importantly, SIP was not built solely as a replacement for the telephone system. It allows extensions, and it relies on them to provide additional services beyond just simple calls. For example, you can use SIP to maintain user status information in an IM client as well as to set up IM sessions. Another extension enables transferring a call to a third party, something that was simply not defined by the basic SIP specification. This is possible thanks to the fact that SIP provides the necessary basic constructs while limiting those constructs only when necessary. SIP defines the concept of “dialog” which is a 2-way communication, but does not limit dialogs to calls. Two-way communication also includes setting your IM status and receiving your IM friends’ updates. Extensions can also easily define new request or response types and new headers when needed.

Similar Posts: