A newcomer's guide to SIP
Introduction
Session Initiation Protocol (SIP) has become the de-facto standard for call and session control in next-generation networks. This article provides an introduction to SIP for the technical audience that is familiar with Internet Protocol networking but has not been exposed to SIP in any level of detail. The intent of the article is to introduce the key concepts and mechanisms of the SIP protocol. While the reader will not become a "SIP expert" after reading this article, the information provided should be sufficient to enable one:
- to understand how SIP is used to establish and manage sessions in [VoIP|VoIP]] networks, and
- to build SIP-based applications using the Pactolus SCE service creation environment.
Contents |
Overview
SIP is an IETF draft standard, which is specified in RFC 3261. Generally speaking, SIP is a protocol that is designed to create and manage communication sessions between parties that wish to exchange various forms of media, such as voice and video, over an IP network. It is often described as a "signaling" protocol, in the sense that it is concerned mostly with the information flows that are necessary to set up a "call" or "session", rather than the media flows that will be exchanged during that session. By design, it has a loose coupling to the actual media that is exchanged during the session; by not being tied to any specific media formats, SIP is designed to be able to be used to initiate sessions that exchange any type of media formats, including those that will be invented in the future.
SIP also includes a mechanism for users to signal their availability to engage in a communication session, and to indicate the capabilities of their communication devices. This gives parties that want to communicate with each other the ability to rendezvous with each other, and it allows communication services to be tailored to the specific environmental capabilities of the participants.
These characteristics of flexibility and personalization make SIP well-suited for the growing desire of consumers and businesses for converged communication services. As rapidly as consumers and service providers dream up new ways of delivering information and multimedia content, SIP is there to provide a framework for delivering those services.
Basic terms
Let's get some common terms defined at the outset. Above, we used the terms call and session more or less interchangeably. Let's say that a SIP session represents a communication dialog between two parties that has a defined start time and end time, during which information is exchanged using one or more media formats. A call, then, in the sense of a phone call, is generally meant to describe a session where the media exchanged uses an audio format.
Furthermore, sometimes the terms SIP session and SIP dialog are used interchangeably. For the remainder of this article -- and in the SCE itself -- we choose to use the term SIP dialog. The reason for this is to avoid the possible confusion that arises from the multiple uses of the term session -- we frequently use the term application session to refer to a session of an executing XTML application, and it might be confusing to be referring to a SIP session at the same time. Thus, for the sake of clarity, we will speak of application sessions and SIP dialogs.
We can look at the terms we will use to describe SIP communications as a hierarchy. Starting from the most granular level and working our way up, we have:
- messages; a SIP message must be either a request or a response.
- transactions; a transaction consists of a request-response pair
- dialogs; a useful (though imprecise) definition of a SIP dialog was given above ("a communication dialog between two parties that has a defined start time and end time, during which information is exchanged using one or more media formats"). A SIP dialog can also be said to consist of one or more transactions.
SIP Message structure
Methods
SIP messages are transmitted over the network in ASCII format, which makes them easy to read and troubleshoot. A variety of transport protocols may be used including UDP, TCP, or TLS, although UDP is still the most commonly-used transport. As mentioned earlier, all messages can either be classified as a request or a response. Responses always match to a request (we'll describe the rules for matching request and responses later in this article). Requests have a specific type, or method, which indicate the purpose of the request. The table below describes the various SIP method types that can appear in a request message.
| Method | Purpose |
|---|---|
| INVITE | Establish a SIP dialog between two parties. |
| BYE | Terminate a SIP dialog that was previously established via an INVITE transaction. |
| CANCEL | Cancel an INVITE request for which a final response has not yet been received. |
| ACK | Completes an INVITE transaction by indicating that the final response to the INVITE request has been received. |
| REGISTER | Add, update, or remove information from a SIP location server pertaining to the location and availability of a user or device. |
| SUBSCRIBE | Indicate interest in receiving certain types of events. |
| NOTIFY | Send information about an event, typically to a party that has previously subscribed to that event. |
| PRACK | (PRovisional ACKnowledgement); Provide an indication that a provisional response has been received. |
| UPDATE | Modify some aspect of the media sources or formats offered during an INVITE transaction. |
| INFO | Send an informational message of some kind |
| OPTIONS | Request information about capabilities from the remote party |
| REFER | Transfer the remote party during a SIP dialog to another party |
| PUBLISH | Provide information that may be useful to parties that want to establish a SIP dialog; e.g., presence-related information such as privacy, environmental factors, etc |
| MESSAGE | Send a text message |
Responses
| Range | Meaning |
|---|---|
| 100-199 | Provisional responses. |
| 200-299 | Success final responses. |
| 300-399 | Non-success response; includes information to redirect request to another server |
| 400-499 | Non-success response; client error |
| 500-599 | Non-success response; server error |
| 600-699 | Non-success response; global error |
Response messages are generated in reply to requests, and indicate the status of the requested operation. Response messages can be classified into two categories:
- provisional responses, and
- final responses
A provisional response provides information about the current state of the operation requested, provided while the operation is still in progress. A final response describes the outcome of the operation once it has completed. A request may result in zero or more provisional responses, and one and only one final response.
Final responses can further be classified as:
- success responses, and
- non-success responses
A successful response indicates the requested operation was performed and the desired result was achieved. A non-success response indicates that the requested operation could not be performed, or that the desired result was not achieved. A non-success response is not an error condition -- it is a normally-occurring fact of life, which is quite simply that sometimes a request made by one party can not be fulfilled by another party.
Response messages contain an integer response code to make it easier to automate the processing of them. Based on the response code, it is evident whether the response is a provisional or final, and whether it is a success or non-success. These codes are summarized in the table at right.
And now for something completely different
Time for a brief digression. Early on we discussed how SIP can be used to set up multimedia conversations, and how it is loosely-coupled to the underlying media flows. Now let's examine how that works in a bit more detail.
SIP by itself does not contain a mechanism for describing media formats. Instead, it uses Session Description Protocol (SDP) for this purpose. Session Description Protocol is an IETF standard that was defined in RFC 2327. Session Description Protocol does just what its name implies -- it is a protocol for describing the media characteristics of a session. SIP and SDP work together to establish a multimedia dialog between two parties -- SIP is used to describe how to route signaling-related messages and to carry the SDP descriptions, and SDP is used to describe the specific media-related capabilities of each side.
Establishing the media flows that will be used in a SIP dialog is a negotiation. The side initiating the dialog offers a set of media types that it is capable of handling, and the receiving side responds with an answer containing the set of media types that it is willing to handle. Together, this process is referred to as The offer/answer model, and is itself codified in RFC 3264 "An Offer/Answer Model with the Session Description Protocol".
Let's look at that example INVITE again, this time at the body, which contains an SDP.
v=0 o=291 2894407 2894407 IN IP4 10.10.250.233 s=ATA186 Call c=IN IP4 10.10.250.233 t=0 0 m=audio 16386 RTP/AVP 0 8 101 a=rtpmap:0 PCMU/8000/1 a=rtpmap:8 PCMA/8000/1 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-15
Let's review some of the lines in this SDP; in keeping with our intended level of detail, we'll focus only on the most significant ones.
- The connection line (c=IN IP4 10.10.250.233) specifies the address where the media should be sent. In this case, the INVITE was generated by a SIP phone, and the phone is saying that it expects to receive incoming media on ip address 10.10.250.233.
- The media line (m=audio 16386 RTP/AVP 0 8 101) describes a type of media that the sender is willing to exchange (in this case, audio), and offers one more payload types for that media. In this case, the phone is offering several different audio codecs: PCMU (G.711 uLaw), PCMA (G.711 aLaw), as well as a payload type (101) which is used to transmit telephony events, such as DTMF events.
- The rtpmap describes each of the payload types in more detail.
An SDP is not restricted to containing only a single media line. A device capable of simultaneously transmitting an audio and video call would offer an SDP with two media lines: one offering a set of audio codecs and a second offering a set of video codecs. If the INVITE is received by a device of similar capabilities, the SDP in the answer would also include two media lines, and the call would include both audio and video. On the other hand, if the INVITE is received by an audio-only device, the SDP in the answer would include only a single media line for the audio, and the call would proceed with audio only.
Message retransmission
We mentioned earlier that UDP is the most widely-used transport protocol for carrying SIP messages. Because UDP is a connectionless protocol, there is no guarantee that messages will arrive to their destination, or that if they do they will arrive in the same order as sent. The SIP protocol therefore was designed with a message retransmission scheme to overcome this. Message retransmission only comes into play when UDP is used as the transport; if a TCP connection is used, then messages are not retransmitted. At a high level, message retransmission works like this:
- When a SIP UAC sends a request, it temporarily stores a copy of that request.
- If a response (provisional or final) is not received within 500 ms, the UAC retransmits the request.
- If a response is still not received 1 second later (i.e., a doubling of the initial retransmission timer) the UAC will again retransmit the request.
- The UAC will similarly retransmit the request at intervals increasing by doubling the retransmission timer, until it reaches a maximum of 4 seconds.
- The UAC stops retransmitting the request as soon as a response is received, or after 7 requests have been sent without receiving a response.
Retransmissions should be rare on any production network with reasonable quality of service. Therefore, repeated retransmissions are generally an indication of a problem of some kind: either a network outage, the failure of a far-end SIP server, or a local application error that is causing a request to be sent to an invalid address. When you spot repeated retransmissions in your application server log file when developing or testing a new application, it's a sign that one of these types of errors has occurred and it should be investigated immediately.
Establishing a SIP dialog
Now let's revisit the actions required to establish a SIP dialog, this time in a bit more detail.
- First, as described before, the entity acting as the SIP UAC for this call generates a SIP INVITE. The INVITE contains an SDP offer that describes the media formats and addresses that the UAC is capable of handling.
- Next, the INVITE may pass through zero or more proxy servers before it reaches its final destination. Each proxy server adds itself to the Via header of the INVITE as it proxies it on.
- Finally, the INVITE arrives at the entity acting as the SIP UAS for this call. The UAS may not be able to immediately provide a final response to the INVITE -- for instance, it may have to alert the called party by ringing the phone and wait for the phone to be answered. Therefore, upon receiving the INVITE, the UAS immediately sends a provisional response -- a 100 Trying response. This response will be sent back to the UAC, traversing the servers in the Via headers in reverse order. As each proxy receives the response, it removes itself from the Via header and passes the response to the next server in the Via header.
- The UAC receives the 100 Trying response and stops retransmitting the INVITE, since it now has received confirmation that the request was received.
- Back at the UAS side, the called party picks up the phone. The UAS now transmits a final success response -- a 200 OK -- containing an SDP answer in the body of the response. This SDP contains the media formats and addresses that the UAS is capable of handling. This response wends its way back to the UAC in the same manner as the 100 Trying.
- The UAC receives the 200 OK and completes the INVITE transaction by sending an ACK to the UAS.
- Both sides now begin exchanging media using the negotiated formats.
Routes for messages within a dialog
In the description above, we actually glossed over a rather important point: the ACK request that is sent from UAC to UAS to complete the INVITE transaction may actually take a different route through the network than the INVITE request did. More generally, one of the aspects of the SIP dialog that is negotiated during the INVITE transaction is the route that requests sent during the dialog will take from one side to the other. The ACK request is simply the first request sent after the dialog is established, and other requests that follow it will take the same route as well.
The reason that the route which requests take during a dialog take might differ from the route which the INVITE request took is that some SIP proxy servers may want to be in the message path only for the initial INVITE request. For instance, a SIP proxy that helps locate a user would want to be in the route of the initial INVITE, but would like to "drop out" after that. On the other hand, a different SIP proxy that provides a service of writing call detail records would like to be in the message path for all messages, so that it can write a call detail record when the call ends (i.e., when it proxies a SIP BYE request).
To address this, SIP provides a means for proxies to signal to both sides whether or not they want to be included in messages sent within a dialog. Each side -- the UAC and UAS -- then uses this information to construct a route that they will use to send messages once the dialog is established. Here is how it works:
- If a proxy wants to remain in the path for messages sent during a dialog, then it adds its own address to the Record-Route SIP header of the initial SIP INVITE as it proxies it along. Otherwise, it proxies the message without adding itself to the Record-Route header.
- When the UAS receives the INVITE, it constructs a route for requests that it will send during the dialog by taking all of the sip addresses in the Record-Route header, reversing them, and adding the sip address from the Contact header onto the end of that list. This is the route that requests sent by the UAS after the dialog is established will follow.
- When the UAC receives the final 200 OK response to the INVITE, it constructs a route for requests that it will send during the dialog by taking all of the sip addresses in the Record-Route header and adding the sip address from the Contact header onto the end of that list. This is the route that requests sent by the UAC after the dialog is established will follow. It sends the ACK along this route, as the ACK is considered a request within a dialog.
Note: If the final response is a non-success response, then the UAC simply sends the ACK on the same path as the INVITE. Only in the case where the final response is a success response does the UAC send the ACK on the new route calculated as described above.
So once the SIP dialog is established, the route for requests within the dialog is established for both the UAC and UAS. When either side sends a request during the dialog, it writes the route list to the Route header of the request it is sending. This is done so that as each proxy server receives the request, it knows the next hop to forward the request on to. In a similar fashion to the handling of the Via header, the proxy removes itself from the Route header and sends the request on to the next hop.
Identifying a SIP dialog
When either the UAC or the UAS receives a request message, it needs to determine whether or not this request pertains to an existing SIP dialog. A dialog is uniquely identified by the combination of:
- the Call-ID from the initial INVITE,
- the local tag value, and
- the remote tag value.
Taken together, those three pieces of information uniquely identify a SIP dialog. Of course, we need to define exactly what is meant by the local and remote tag values, and how they can be matched to an incoming message. To do that, let's consider again that initial INVITE request we examined earlier -- this time, let's look at the From and To headers:
From: <sip:291@10.10.200.147>;tag=2084442460 To: <sip:99@10.10.200.147>
Note that the From header has a tag parameter, while the To header does not. This is characteristic of the initial INVITE that is sent by a UAC to establish a SIP dialog. The UAC is essentially providing "half" of the dialog ID, by including a tag on the From header. For the UAC, this represents a tag for the local side of the dialog -- i.e., the "local tag". There is no tag on the To header, because that must be assigned by the other side -- the UAS. However, in the final response that comes back to the UAC, the UAS will have added a tag of its own to the To header. The 200 OK might then look like this:
SIP/2.0 200 OK From: <sip:291@10.10.200.147>;tag=2084442460 To: <sip:99@10.10.200.147>tag=72634535
At this point, the UAC takes note of the tag parameter on the To header as the "remote tag". The UAC then uniquely identifies this dialog as follows:
Call-ID: 3705165591@10.10.250.233 local tag: 2084442460 remote tag: 72634535
On the UAS side, the tags are reversed. The local tag is the one assigned by the UAS, and vice-versa, so the UAS then uniquely identifies this dialog as follows:
Call-ID: 3705165591@10.10.250.233 local tag: 72634535 remote tag: 2084442460
Either side can now match an incoming request to the dialog by comparing the Call-ID header and the tag values from the From and To headers. If all three pieces of information match, then this is a request within a SIP dialog; otherwise it is a request outside of a SIP dialog.
Matching requests and responses
SIP provides rules for matching requests to responses. For the most part, when you're reviewing a SIP trace you can match requests to responses simply by looking for response messages which match the Call-ID and CSeq header of a request. This method is not completely accurate, since the SIP rules for matching requests are a bit more detailed, but in the general case, and particularly when you are simply tracing a call in a log file, this method is usually sufficient and accurate.
The "true" (i.e., completely accurate) method for matching requests and responses is as follows:
- The "branch" parameter of the topmost Via header of the response matches the branch parameter of the topmost Via in the request, and
- The method parameter of the CSeq header matches as well.
For instance, consider the Via header and CSeq of the INVITE request shown earlier:
Via: SIP/2.0/UDP 10.10.250.233:5060;branch=z9hG4bK2df7b9194cd51e25 CSeq: 1 INVITE
The matching response to this INVITE would have to have a Via with a branch parameter z9hG4bK2df7b9194cd51e25 and a CSeq method of INVITE
Session timers
There are many different features and extensions to SIP, but we will address one of the most important here -- SIP session timers. We've described how SIP dialogs are established, and how things like media flows and signaling paths are negotiated during call setup. SIP session timers are another aspect of a SIP dialog that is negotiated during the initial INVITE transaction.
The basic concept is that a SIP dialog can be established with a specific duration negotiated as part of the call setup. Both UAC and UAS agree that the dialog is valid only for that time interval, but can also "refresh" the dialog at any time prior to the end of that interval if it is desired to extend it. If, however, the interval elapses without a refresh, then each side has a responsibility to terminate the dialog.
The motivation behind the SIP session timer feature once again has to do with the unreliable nature of the UDP transport protocol. Because it is a connectionless protocol, it is possible that two servers can establish a SIP dialog, and then one of the servers can crash or become unreachable due to a network failure, without the other server receiving any indication of a problem. In such a scenario, resources could continue be tied up on the active server since no dialog termination message (i.e., BYE request) would ever be received from the far end.
The solution to this problem is to establish a finite duration for a sip dialog, and then require this duration to be extended periodically by an operation that requires action on the part of both servers. This is what the SIP session timer feature does. Here is how it works:
- If the UAC supports session timers, it includes a Supported: timer header in the INVITE it sends to request the dialog.
- Furthermore, if the UAC would like a session timer to be attached to this dialog, it includes a Session-Expires header with a value that includes suggestions both for the session duration, and for the designation of which side (UAC or UAS) will be the "refresher". An example would be Session-Expires: 1800; refresher=uac, which requests a session timer interval of 1800 seconds and indicates that that sender (the UAC) will be the one to refresh the dialog.
- Finally, if the UAC requires that a session timer be attached to this dialog, it includes a Required: timer header on the INVITE request.
- When the UAS receives the INVITE, it can respond to the session timer request in any of the following ways:
- Reject the INVITE, because the UAC indicated a session timer was required, and the UAS does not support session timers
- Reject the INVITE, because the proposed session timer interval was shorter than the UAS would like (the UAS will include information about the minimum session timer interval it will accept in its response, so that the UAC can try again with an interval that will be acceptable).
- Accept the session-timer, and possibly reduce the Session-Expires interval (to a defined minimum specified by the UAC in a Min-SE header included with the INVITE)
- Indicate that it will not implement a session timer, but that the UAC can act as a refresher and employ a session timer if it desires.
If the session is established with a session timer, then one side will have been designated as the refresher. The responsibility of the refresher is to send a "refreshing re-INVITE" roughly halfway through the session timer interval, to extend the dialog. The processing of the re-INVITE will renegoitiate the session timer value and refresher all over again. There is no requirement that the same interval and refresher are renegotiated, but this is typically the case. Finally, if the session timer expires without a refreshing re-INVITE, then a BYE request should be sent to terminate the session.
Session timers are commonly used on many networks and are widely supported on most gateways and session border controllers, so it is important that applications are able to be written to support session timers.
Using the SCE to establish and manage SIP dialogs
This article has described what SIP dialogs are, how they are established to enable communications between two parties, and how some of the key attributes are negotiated between the parties during call setup. The SCE is a developer tool that allows you to build applications which create and manage SIP dialogs.
In the SCE, the concept of the "SIP dialog" is central to the environment. The SCE includes a data type to represent a SIP Dialog -- the SipDialog object. The SCE also includes a set of PACs that allow you to create, modify, and terminate SIP dialog objects. These PACs -- the Session/Call Control PACs -- automate much of the behavior described in this article that is required from SIP dialogs. The developer needs only to specify the policies that he or she wants to apply to the dialog, and the platform handles the implementation details.
A developer establishes a SIP dialog in the SCE by using either the Respond to a SIP dialog request PAC or the Generate a SIP dialog request PAC. If the application is acting as either a UAS or a proxy, then it will be receiving a SIP INVITE (i.e., a SIP dialog request), and then it would use the "Respond..." PAC. This PAC allows the developer to choose from several options about how to respond to the incoming request; the developer can choose to:
- reject the request with a specified final response code
- redirect the request to a different server
- proxy the request on to another server
- accept the request and connect the call to a media server for IVR
- accept the request and connect the call into a conference
- accept the request and act as a B2BUA, generating a new, related SIP dialog
On the other hand, if the application is acting as a UAC, or as a B2BUA where it has first connected the call to a media server for IVR and now wants to connect to a third party, then the application will use the "Generate SIP dialog request" PAC. This allows the developer to generate an outgoing SIP INVITE and choose from several options about how to connect the local end of the media connection for the call:
- connect the local end to a media server for IVR
- connect the local end to a conference
- connect the local end to another caller
In either case, using the "Respond..." or "Generate..." PAC, the developer establishes a SIP dialog. Both PACs give the developer full control over whether or not to use SIP session timers on the call and, if so, what preferred values to use for session timer duration and refresher. If session timers are negotiated with the far end, the underlying application server platform handles all the details of managing refreshes for the application.
Once a sip dialog is established, the developer can modify aspects of it by using the Modify SIP dialog PAC. This allows the developer to modify the underlying media streams in any of the following ways:
- put the far end on hold
- connect the far end into a conference
- connect the far end to a media server for IVR
When it's time to terminate the SIP dialog, the developer simply uses the Terminate SIP dialog PAC. If the far end chooses to terminate the dialog first (i.e., the far end sends a BYE request to the application), then the application will be notified by the 'SipDialogTerminated' event. No special application processing of this event is required, it is simply a notification that the dialog is now terminated.
See Also
- A quick tour of the SCE
- Session/Call Control PACs
- Tutorial:Building a back-to-back user agent
- Tutorial:Building a SIP proxy
- Tutorial:Building a SIP registrar
- Application server
External links
- RFC 1889 - RTP: A Transport Protocol for Real-Time Applications
- RFC 1890 - RTP Profile for Audio and Video Conferences with Minimal Control
- RFC 2327 - SDP: Session Description Protocol
- RFC 2833 - RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals
- RFC 3261 - SIP: Session Initiation Protocol
- RFC 3264 - An Offer/Answer Model with the Session Description Protocol (SDP)
