A newcomer's guide to SIP

Introduction

Session Initiation Protocol (SIP) has become the de-facto standard for call and session control in next-generation networks. This article provides an introduction to SIP for the technical audience that is familiar with Internet Protocol networking but has not been exposed to SIP in any level of detail. The intent of the article is to introduce the key concepts and mechanisms of the SIP protocol. While the reader will not become a "SIP expert" after reading this article, the information provided should be sufficient to enable one:

Contents


Overview

SIP is an IETF draft standard, which is specified in RFC 3261. Generally speaking, SIP is a protocol that is designed to create and manage communication sessions between parties that wish to exchange various forms of media, such as voice and video, over an IP network. It is often described as a "signaling" protocol, in the sense that it is concerned mostly with the information flows that are necessary to set up a "call" or "session", rather than the media flows that will be exchanged during that session. By design, it has a loose coupling to the actual media that is exchanged during the session; by not being tied to any specific media formats, SIP is designed to be able to be used to initiate sessions that exchange any type of media formats, including those that will be invented in the future.

SIP also includes a mechanism for users to signal their availability to engage in a communication session, and to indicate the capabilities of their communication devices. This gives parties that want to communicate with each other the ability to rendezvous with each other, and it allows communication services to be tailored to the specific environmental capabilities of the participants.

These characteristics of flexibility and personalization make SIP well-suited for the growing desire of consumers and businesses for converged communication services. As rapidly as consumers and service providers dream up new ways of delivering information and multimedia content, SIP is there to provide a framework for delivering those services.

Basic terms

Let's get some common terms defined at the outset. Above, we used the terms call and session more or less interchangeably. Let's say that a SIP session represents a communication dialog between two parties that has a defined start time and end time, during which information is exchanged using one or more media formats. A call, then, in the sense of a phone call, is generally meant to describe a session where the media exchanged uses an audio format.

Furthermore, sometimes the terms SIP session and SIP dialog are used interchangeably. For the remainder of this article -- and in the SCE itself -- we choose to use the term SIP dialog. The reason for this is to avoid the possible confusion that arises from the multiple uses of the term session -- we frequently use the term application session to refer to a session of an executing XTML application, and it might be confusing to be referring to a SIP session at the same time. Thus, for the sake of clarity, we will speak of application sessions and SIP dialogs.

We can look at the terms we will use to describe SIP communications as a hierarchy. Starting from the most granular level and working our way up, we have:

SIP Message structure

Methods

SIP messages are transmitted over the network in ASCII format, which makes them easy to read and troubleshoot. A variety of transport protocols may be used including UDP, TCP, or TLS, although UDP is still the most commonly-used transport. As mentioned earlier, all messages can either be classified as a request or a response. Responses always match to a request (we'll describe the rules for matching request and responses later in this article). Requests have a specific type, or method, which indicate the purpose of the request. The table below describes the various SIP method types that can appear in a request message.

SIP Methods
MethodPurpose
INVITEEstablish a SIP dialog between two parties.
BYETerminate a SIP dialog that was previously established via an INVITE transaction.
CANCELCancel an INVITE request for which a final response has not yet been received.
ACKCompletes an INVITE transaction by indicating that the final response to the INVITE request has been received.
REGISTERAdd, update, or remove information from a SIP location server pertaining to the location and availability of a user or device.
SUBSCRIBEIndicate interest in receiving certain types of events.
NOTIFYSend information about an event, typically to a party that has previously subscribed to that event.
PRACK(PRovisional ACKnowledgement); Provide an indication that a provisional response has been received.
UPDATEModify some aspect of the media sources or formats offered during an INVITE transaction.
INFOSend an informational message of some kind
OPTIONSRequest information about capabilities from the remote party
REFERTransfer the remote party during a SIP dialog to another party
PUBLISHProvide information that may be useful to parties that want to establish a SIP dialog; e.g., presence-related information such as privacy, environmental factors, etc
MESSAGESend a text message

Responses

SIP Response Codes
RangeMeaning
100-199Provisional responses.
200-299Success final responses.
300-399Non-success response; includes information to redirect request to another server
400-499Non-success response; client error
500-599Non-success response; server error
600-699Non-success response; global error

Response messages are generated in reply to requests, and indicate the status of the requested operation. Response messages can be classified into two categories:

A provisional response provides information about the current state of the operation requested, provided while the operation is still in progress. A final response describes the outcome of the operation once it has completed. A request may result in zero or more provisional responses, and one and only one final response.

Final responses can further be classified as:

A successful response indicates the requested operation was performed and the desired result was achieved. A non-success response indicates that the requested operation could not be performed, or that the desired result was not achieved. A non-success response is not an error condition -- it is a normally-occurring fact of life, which is quite simply that sometimes a request made by one party can not be fulfilled by another party.

Response messages contain an integer response code to make it easier to automate the processing of them. Based on the response code, it is evident whether the response is a provisional or final, and whether it is a success or non-success. These codes are summarized in the table at right.

And now for something completely different

Time for a brief digression. Early on we discussed how SIP can be used to set up multimedia conversations, and how it is loosely-coupled to the underlying media flows. Now let's examine how that works in a bit more detail.

SIP by itself does not contain a mechanism for describing media formats. Instead, it uses Session Description Protocol (SDP) for this purpose. Session Description Protocol is an IETF standard that was defined in RFC 2327. Session Description Protocol does just what its name implies -- it is a protocol for describing the media characteristics of a session. SIP and SDP work together to establish a multimedia dialog between two parties -- SIP is used to describe how to route signaling-related messages and to carry the SDP descriptions, and SDP is used to describe the specific media-related capabilities of each side.

Establishing the media flows that will be used in a SIP dialog is a negotiation. The side initiating the dialog offers a set of media types that it is capable of handling, and the receiving side responds with an answer containing the set of media types that it is willing to handle. Together, this process is referred to as The offer/answer model, and is itself codified in RFC 3264 "An Offer/Answer Model with the Session Description Protocol".

Let's look at that example INVITE again, this time at the body, which contains an SDP.

v=0
o=291 2894407 2894407 IN IP4 10.10.250.233
s=ATA186 Call
c=IN IP4 10.10.250.233
t=0 0
m=audio 16386 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000/1
a=rtpmap:8 PCMA/8000/1
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15

Let's review some of the lines in this SDP; in keeping with our intended level of detail, we'll focus only on the most significant ones.

An SDP is not restricted to containing only a single media line. A device capable of simultaneously transmitting an audio and video call would offer an SDP with two media lines: one offering a set of audio codecs and a second offering a set of video codecs. If the INVITE is received by a device of similar capabilities, the SDP in the answer would also include two media lines, and the call would include both audio and video. On the other hand, if the INVITE is received by an audio-only device, the SDP in the answer would include only a single media line for the audio, and the call would proceed with audio only.

Message retransmission

We mentioned earlier that UDP is the most widely-used transport protocol for carrying SIP messages. Because UDP is a connectionless protocol, there is no guarantee that messages will arrive to their destination, or that if they do they will arrive in the same order as sent. The SIP protocol therefore was designed with a message retransmission scheme to overcome this. Message retransmission only comes into play when UDP is used as the transport; if a TCP connection is used, then messages are not retransmitted. At a high level, message retransmission works like this:

Retransmissions should be rare on any production network with reasonable quality of service. Therefore, repeated retransmissions are generally an indication of a problem of some kind: either a network outage, the failure of a far-end SIP server, or a local application error that is causing a request to be sent to an invalid address. When you spot repeated retransmissions in your application server log file when developing or testing a new application, it's a sign that one of these types of errors has occurred and it should be investigated immediately.

Establishing a SIP dialog

Now let's revisit the actions required to establish a SIP dialog, this time in a bit more detail.

Routes for messages within a dialog

In the description above, we actually glossed over a rather important point: the ACK request that is sent from UAC to UAS to complete the INVITE transaction may actually take a different route through the network than the INVITE request did. More generally, one of the aspects of the SIP dialog that is negotiated during the INVITE transaction is the route that requests sent during the dialog will take from one side to the other. The ACK request is simply the first request sent after the dialog is established, and other requests that follow it will take the same route as well.

The reason that the route which requests take during a dialog take might differ from the route which the INVITE request took is that some SIP proxy servers may want to be in the message path only for the initial INVITE request. For instance, a SIP proxy that helps locate a user would want to be in the route of the initial INVITE, but would like to "drop out" after that. On the other hand, a different SIP proxy that provides a service of writing call detail records would like to be in the message path for all messages, so that it can write a call detail record when the call ends (i.e., when it proxies a SIP BYE request).

To address this, SIP provides a means for proxies to signal to both sides whether or not they want to be included in messages sent within a dialog. Each side -- the UAC and UAS -- then uses this information to construct a route that they will use to send messages once the dialog is established. Here is how it works:

Note: If the final response is a non-success response, then the UAC simply sends the ACK on the same path as the INVITE. Only in the case where the final response is a success response does the UAC send the ACK on the new route calculated as described above.

So once the SIP dialog is established, the route for requests within the dialog is established for both the UAC and UAS. When either side sends a request during the dialog, it writes the route list to the Route header of the request it is sending. This is done so that as each proxy server receives the request, it knows the next hop to forward the request on to. In a similar fashion to the handling of the Via header, the proxy removes itself from the Route header and sends the request on to the next hop.

Identifying a SIP dialog

When either the UAC or the UAS receives a request message, it needs to determine whether or not this request pertains to an existing SIP dialog. A dialog is uniquely identified by the combination of:

Taken together, those three pieces of information uniquely identify a SIP dialog. Of course, we need to define exactly what is meant by the local and remote tag values, and how they can be matched to an incoming message. To do that, let's consider again that initial INVITE request we examined earlier -- this time, let's look at the From and To headers:

From: <sip:291@10.10.200.147>;tag=2084442460
To: <sip:99@10.10.200.147>

Note that the From header has a tag parameter, while the To header does not. This is characteristic of the initial INVITE that is sent by a UAC to establish a SIP dialog. The UAC is essentially providing "half" of the dialog ID, by including a tag on the From header. For the UAC, this represents a tag for the local side of the dialog -- i.e., the "local tag". There is no tag on the To header, because that must be assigned by the other side -- the UAS. However, in the final response that comes back to the UAC, the UAS will have added a tag of its own to the To header. The 200 OK might then look like this:

SIP/2.0 200 OK 
From: <sip:291@10.10.200.147>;tag=2084442460
To: <sip:99@10.10.200.147>tag=72634535

At this point, the UAC takes note of the tag parameter on the To header as the "remote tag". The UAC then uniquely identifies this dialog as follows:

Call-ID: 3705165591@10.10.250.233
local tag: 2084442460
remote tag: 72634535

On the UAS side, the tags are reversed. The local tag is the one assigned by the UAS, and vice-versa, so the UAS then uniquely identifies this dialog as follows:

Call-ID: 3705165591@10.10.250.233
local tag: 72634535
remote tag: 2084442460

Either side can now match an incoming request to the dialog by comparing the Call-ID header and the tag values from the From and To headers. If all three pieces of information match, then this is a request within a SIP dialog; otherwise it is a request outside of a SIP dialog.

Matching requests and responses

SIP provides rules for matching requests to responses. For the most part, when you're reviewing a SIP trace you can match requests to responses simply by looking for response messages which match the Call-ID and CSeq header of a request. This method is not completely accurate, since the SIP rules for matching requests are a bit more detailed, but in the general case, and particularly when you are simply tracing a call in a log file, this method is usually sufficient and accurate.

The "true" (i.e., completely accurate) method for matching requests and responses is as follows:

  1. The "branch" parameter of the topmost Via header of the response matches the branch parameter of the topmost Via in the request, and
  2. The method parameter of the CSeq header matches as well.

For instance, consider the Via header and CSeq of the INVITE request shown earlier:

Via: SIP/2.0/UDP 10.10.250.233:5060;branch=z9hG4bK2df7b9194cd51e25
CSeq: 1 INVITE

The matching response to this INVITE would have to have a Via with a branch parameter z9hG4bK2df7b9194cd51e25 and a CSeq method of INVITE

Session timers

There are many different features and extensions to SIP, but we will address one of the most important here -- SIP session timers. We've described how SIP dialogs are established, and how things like media flows and signaling paths are negotiated during call setup. SIP session timers are another aspect of a SIP dialog that is negotiated during the initial INVITE transaction.

The basic concept is that a SIP dialog can be established with a specific duration negotiated as part of the call setup. Both UAC and UAS agree that the dialog is valid only for that time interval, but can also "refresh" the dialog at any time prior to the end of that interval if it is desired to extend it. If, however, the interval elapses without a refresh, then each side has a responsibility to terminate the dialog.

The motivation behind the SIP session timer feature once again has to do with the unreliable nature of the UDP transport protocol. Because it is a connectionless protocol, it is possible that two servers can establish a SIP dialog, and then one of the servers can crash or become unreachable due to a network failure, without the other server receiving any indication of a problem. In such a scenario, resources could continue be tied up on the active server since no dialog termination message (i.e., BYE request) would ever be received from the far end.

The solution to this problem is to establish a finite duration for a sip dialog, and then require this duration to be extended periodically by an operation that requires action on the part of both servers. This is what the SIP session timer feature does. Here is how it works:

If the session is established with a session timer, then one side will have been designated as the refresher. The responsibility of the refresher is to send a "refreshing re-INVITE" roughly halfway through the session timer interval, to extend the dialog. The processing of the re-INVITE will renegoitiate the session timer value and refresher all over again. There is no requirement that the same interval and refresher are renegotiated, but this is typically the case. Finally, if the session timer expires without a refreshing re-INVITE, then a BYE request should be sent to terminate the session.

Session timers are commonly used on many networks and are widely supported on most gateways and session border controllers, so it is important that applications are able to be written to support session timers.

Using the SCE to establish and manage SIP dialogs

This article has described what SIP dialogs are, how they are established to enable communications between two parties, and how some of the key attributes are negotiated between the parties during call setup. The SCE is a developer tool that allows you to build applications which create and manage SIP dialogs.

In the SCE, the concept of the "SIP dialog" is central to the environment. The SCE includes a data type to represent a SIP Dialog -- the SipDialog object. The SCE also includes a set of PACs that allow you to create, modify, and terminate SIP dialog objects. These PACs -- the Session/Call Control PACs -- automate much of the behavior described in this article that is required from SIP dialogs. The developer needs only to specify the policies that he or she wants to apply to the dialog, and the platform handles the implementation details.

A developer establishes a SIP dialog in the SCE by using either the Respond to a SIP dialog request PAC or the Generate a SIP dialog request PAC. If the application is acting as either a UAS or a proxy, then it will be receiving a SIP INVITE (i.e., a SIP dialog request), and then it would use the "Respond..." PAC. This PAC allows the developer to choose from several options about how to respond to the incoming request; the developer can choose to:

On the other hand, if the application is acting as a UAC, or as a B2BUA where it has first connected the call to a media server for IVR and now wants to connect to a third party, then the application will use the "Generate SIP dialog request" PAC. This allows the developer to generate an outgoing SIP INVITE and choose from several options about how to connect the local end of the media connection for the call:

In either case, using the "Respond..." or "Generate..." PAC, the developer establishes a SIP dialog. Both PACs give the developer full control over whether or not to use SIP session timers on the call and, if so, what preferred values to use for session timer duration and refresher. If session timers are negotiated with the far end, the underlying application server platform handles all the details of managing refreshes for the application.

Once a sip dialog is established, the developer can modify aspects of it by using the Modify SIP dialog PAC. This allows the developer to modify the underlying media streams in any of the following ways:

When it's time to terminate the SIP dialog, the developer simply uses the Terminate SIP dialog PAC. If the far end chooses to terminate the dialog first (i.e., the far end sends a BYE request to the application), then the application will be notified by the 'SipDialogTerminated' event. No special application processing of this event is required, it is simply a notification that the dialog is now terminated.

See Also

External links

Retrieved from "http://www.sipdev.org/wiki/index.php/A_newcomer%27s_guide_to_SIP"