Abstract

The Session Initiation Protocol (SIP) is an application-layer control (signaling) protocol for creating, modifying and terminating sessions with one or more participants. These sessions include Internet multimedia conferences, Internet telephone calls and multimedia distribution. Members in a session can communicate via multicast or via a mesh of unicast relations, or a combination of these. SIP invitations used to create sessions carry session descriptions which allow participants to agree on a set of compatible media types. SIP supports user mobility by proxying and redirecting requests to the user s current location. Users can register their current location. SIP is not tied to any particular conference control protocol. SIP is designed to be independent of the lower-layer transport protocol and can be extended with additional capabilities.

What is SIP?

Introduction

SIP (Session Initiation Protocol) is a protocol developed to assist in providing advanced telephony services across the Internet. Internet telephony is evolving from its use as a cheap (but low quality) way to make international phone calls to a serious business telephony capability. SIP is one of a group of protocols required to ensure that this evolution can occur.

SIP is part of the IETF standards process and is modeled upon other Internet protocols such as SMTP (Simple Mail Transfer Protocol) and HTTP (Hypertext Transfer Protocol.) It is used to establish, change and tear down (end) calls between one or more users in an IP-based network. Beside SIP we give a short description of RTP (Real-Time Transport Protocol) to carry voice and video data. We also mention the Real Time Streaming Protocol (RTSP) for control of streaming media. At the end of the exam of SIP there is a comparison between SIP and today's most common IP telephony protocol, H.323.

SIP is described as a control protocol for creating, modifying and terminating sessions with one or more participants. These sessions include Internet multimedia conferences, Internet (or any IP Network) telephone calls and multimedia distribution. Members in a session can communicate via multicast or via a mesh of unicast relations, or via a combination of these. SIP supports session descriptions that allow participants to agree on a set of compatible media types. It also supports user mobility by proxying and redirecting requests to the user's current location. SIP is not tied to any particular conference control protocol.

In essence, SIP has to provide or enable the following functions:

Name Translation and User Location - Ensuring that the call reaches the called party wherever they are located. Carrying out any mapping of descriptive information to location information. Ensuring that details of the nature of the call (Session) are supported.

Feature Negotiation - This allows the group involved in a call (this may be a multi-partycall) to agree on the features supported recognizing that not all the parties can support the same level of features. For example video may or may not be supported; as any form of MIME type is supported by SIP, there is plenty of scope for negotiation.

Call Participant Management - During a call a participant can bring other users onto the call or cancel connections to other users. In addition, users could be transferred or placed on hold.

Call feature changes - A user should be able to change the call characteristics during the course of the call. For example, a call may have been set up as voice-only, but in the course of the call, the users may need to enable a video function. A third party joining a call may require different features to be enabled in order to participate in the call

Definitions

The following terms have special significance for SIP.

Call: A call consists of all participants in a conference invited by a common source. A SIP call is identified by a globally unique call-id. Thus, if a user is, for example, invited to the same multicast session by several people, each of these invitations will be a unique call. A point-to-point Internet telephony conversation maps into a single SIP call. In a multiparty conference unit (MCU) based call-in conference, each participant uses a separate call to invite himself to the MCU.

Call leg: A call leg is identified by the combination of the Call-ID header field and the addr-spec and tag of the To and From header fields. Within the same Call-ID, requests with From A and To value B belong to the same call leg as the requests in the opposite direction, i.e., From B and To A.

Client: An application program that sends SIP requests. Clients may or may not interact directly with a human user. User agents and proxies contain clients (and servers).

Conference: A multimedia session identified by a common session description. A conference can have zero or more members and includes the cases of a multicast conference, a full-mesh confer-ence and a two-party telephone call, as well as combinations of these. Any number of calls can be used to create a conference.

Downstream: Requests sent in the direction from the caller to the callee (i.e., user agent client to user agent server).

Final response: A response that terminates a SIP transaction, as opposed to a provisional response that does not. The response has a response code and response message. The codes fall into classes 100 through 600, similar to HTTP. Unlike other requests, invitations cannot be answered immediately, as locating the callee and waiting for a human to answer may take several seconds. Call requests may also be queued, e.g., if the callee is busy. Responses of the 100 class (denoted as 1xx) indicate call progress; they are always followed by other responses indicating the final outcome of the request. While the 1xx responses are provisional, the other classes indicate the final status of the request: 2xx for success, 3xx for redirection, 4xx, 5xx and 6xx for client, server and global failures, respectively.
A list of all SIP messages is given in Appendix XXXX.

Initiator, calling party, caller: The party initiating a session invitation. Note that the calling party does not have to be the same as the one creating the conference.

Invitation: A request sent to a user (or service) requesting participation in a session. A successful SIP invitation consists of two transactions: an INVITE request followed by an ACK request.

Invitee, invited user, called party, callee: The person or service that the calling party is trying to invite to a conference.

Location service: A location service is used by a SIP redirect or proxy server to obtain information about a callee' s possible location(s). Examples of sources of location information include SIP registrars, databases or mobility registration protocols. Location services are offered by location servers. Location servers MAY be part of a SIP server, but the manner in which a SIP server requests location services is beyond the scope of this document.

Proxy, proxy server: An intermediary program that acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them on, possibly after translation, to other servers. A proxy interprets, and, if necessary, rewrites a request message before forwarding it. Proxy servers are, for example, used to route requests, enforce policies, control firewalls.

Redirect server: A redirect server is a server that accepts a SIP request, maps the address into zero or more new addresses and returns these addresses to the client. Unlike a proxy server, it does not initiate it's own SIP request. Unlike a user agent server, it does not accept calls.

Registrar: A registrar is a server that accepts REGISTER requests. A registrar is typically co-located with a proxy or redirect server and MAY make its information available through the location server.

Server: A server is an application program that accepts requests in order to service requests and sends back responses to those requests. Servers are either proxy, redirect or user agent servers or registrars.

Stateless Proxy: A logical entity that does not maintain state for a SIP transaction. A stateless proxy forwards every request it receives downstream and every response it receives upstream.

Stateful Proxy: A logical entity that maintains state information at least for the duration of a SIP transaction.

User agent client (UAC): A user agent client is a client application that initiates a SIP request.

User agent server (UAS): A user agent server is a server application that contacts the user when a SIP request is received and that returns a response on behalf of the user. The response accepts, rejects or redirects the request.

User agent (UA): An application which can act both as a user agent client and user agent server.

An application program MAY be capable of acting both as a client and a server. For example, a typical multimedia conference control application would act as a user agent client to initiate calls or to invite others to conferences and as a user agent server to accept invitations. The role of UAC and UAS as well as proxy and redirect servers are defined on a request-by-request basis. For example, the user agent initiating a call acts as a UAC when sending the initial INVITE request and as a UAS when receiving a BYE request from the callee. Similarly, the same software can act as a proxy server for one request and as a redirect server for the next request.

Protocol Components

There are two components within SIP. The SIP User Agent and the SIP Network Server. The User Agent is effectively the end system component for the call and the SIP Server is the network device that handles the signaling associated with multiple calls.

The User agent it self has a client element, the User Agent Client (UAC) and a server element, the User Agent Server (UAS.) The client element initiates the calls and the server element answers the calls. This allows peer-to-peer calls to be made using a client-server protocol.

The SIP server element also provides for more than one type of server. There are effectively three forms of server that can exist in the network - the SIP stateful proxy server, the SIP stateless proxy server and the SIP redirect server. The main function of the SIP servers is to provide name resolution and user location, since the caller is unlikely to know the IP address or host name of the called party. What will be available is perhaps an email-like address or a telephone number associated with the called party. Using this information, the caller's user agent can identify with a specific server to resolve the address information- it is likely that this will involve many servers in the network.

A SIP proxy server receives requests, determines where to send these, and passes them onto the next server (using next hop routing principals). There can be many server hops in the network.

The difference between a stateful and stateless proxy server is that a stateful proxy server remembers the incoming requests it receives, along with the responses it sends back and the outgoing requests it sends on. A stateless proxy server forgets all information once it has sent on a request. This allows a stateful proxy server to fork requests to try multiple possible user locations in parallel and only send the best responses back. Stateless proxy servers are most likely to be the fast, backbone of the SIP infrastructure. Stateful proxy servers are then most likely to be the local devices close to the User Agents, controlling domains of users and becoming the prime platform for the application services.

A redirect server receives requests, but rather than passing these onto the next server it sends a response to the caller indicating the address for the called user. This provides the address for the caller to contact the called party at the next server directly.

SIP Protocol

SIP provides the necessary protocol mechanisms so that end systems and proxy servers can provide services:

User location
User capabilities
User availability
Call set-up
Call handling
Call forwarding, including

                - The equivalent of 700-, 800- and 900- type calls
                - Call-forwarding no answer
                - Call-forwarding busy
                - Call-forwarding unconditional
                - Other address-translation services

Callee and calling " number" delivery, where numbers can be any (preferably unique) naming scheme
Personal mobility, i.e., the ability to reach a called party under a single, location- independent address even when the user changes terminals
Terminal- type negotiation and selection: a caller can be given a choice how to reach the party, e.g., via Internet telephony, mobile phone, an answering service, etc.
Terminal capability negotiation
Caller and callee authentication
Blind and supervised call transfer
Invitations to multicast conferences

When a user wants to call another user, the caller initiates the call with an invite request. The request contains enough information for the called party to join the session. If the client knows the location of the other party it can send the request directly to their IP address. If not the client can send it to a locally configured SIP network server. If that server is a proxy server it will attempt to resolve the called user's location and send the request to them. There are many ways it can do this, such as searching the DNS or accessing databases. Alternatively, the server may be a redirect server that may return the called user location to the calling client for it to try directly. During the course of locating a user, one SIP network server can, of course, proxy or redirect the call to additional servers until it arrives at one that definitely knows the IP address where the called user can be found.

Once found, the request is sent to the user, and from there several options arise. In the simplest case, the user's telephony client receives the request that is, the user' s phone rings. If the user takes the call, the client responds to the invitation with the designated capabilities* of the client software and a connection is established. If the user declines the call, the session can be redirected to a voice mail server or to another user.

* "Designated capabilities" refers to the functions that the user wants to invoke.
The client software might support video conferencing, for example, but the user may only want to use audio conferencing. Regardless, the user can always add functions such as video conferencing, white boarding, or a third user by is suing another invite request to other users on the link.

SIP has two additional significant features. The first is a stateful SIP proxy server's ability to split or fork an incoming call so that several extensions can be rung at once. The first extension to answer takes the call. This feature is handy if a user is working between two locations (a lab and an office, for example), or where someone is ringing both a boss and their secretary.

The second significant feature is SIP s unique ability to return different media types. Take the example of a user contacting a company. When the SIP server receives the client's connection request, it can return to the customer s phone client via a Web Interactive Voice Response page (IVR or could use the term Interactive Web Response or IWR), with the extensions of the available departments or users provided on the list. Clicking the appropriate link sends an invitation to that user to set up a call.

Addressing and Naming

To be invited and identified, the called party has to be named. Since it is the most common form of user addressing in the Internet, SIP chose an email-like identifier of the form user@domain, user@host, user@IP address or phone-number@gateway. The identifier can refer to the name of the host that a user is logged in at the time, an email address or the name of a domain-specific name translation service. Addresses of the form phone-number@gateway designate GSTN phone numbers reachable via the named gateway. SIP provides its own reliability mechanism and is therefore independent of the packet layer and only requires an unreliable datagram service. SIP is typically used over UDP or TCP.

SIP uses these addresses as part of SIP URLs, such as sip:j.doe@example.com. This URL may well be placed in a web page, so that clicking on the link initiates a call to that address, similar to a mail to URL today.

We anticipate that most users will be able to use their email address as their published SIP address. Email addresses already offer a basic location-independent form of addressing, in that the host part does not have to designate a particular Internet host, but can be a domain, which is then resolved into one or more possible domain mail server hosts via Domain Name System (DNS) MX (mail exchange) records.

For email, finding the mail exchange host is often sufficient to deliver mail, as the user either logs in to the mail exchange host or uses protocols such as the Internet Mail Access Protocol (IMAP) or the Post Office Protocol (POP) to retrieve their mail. For interactive audio and video communications, however, participants are typically sending and receiving data on the workstation, PC or Internet appliance in their immediate physical proximity. Thus, SIP has to be able to resolve name@domain to user@host. A user at a specific host will be derived through zero or more translations. A single externally visible address may well lead to a different host depending on time of day, media to be used, and any number of other factors. Also, hosts that connect via dial modems may acquire a different IP address each time.

SIP Messages

A SIP message is either a request from a client to a server, or a response from a server to a client.

SIP uses message structures found from HTML. The messages are in text format using ISO 10646 in UTF-8 encoding. As in HTML the client requests invoke methods on the server. The messages consists of a start-line specifying the method and the protocol, a number of header fields specifying call properties and service information, and an optional message body which can contain a session description. The following methods are applicable in SIP:

Invite - invites a user to join a call.
Bye - terminates the call between two of the users on a call
Options - requests information on the capabilities of a server
Ack - confirms that a client has received a final response to an INVITE
Register - provides the map for address resolution, letting a server k now the location of other users.
Cancel - ends a pending request, but does not end the call

The syntax of response codes are similar to HTML. The three digit codes are hierarchically organized with the first digit representing the result class and the other two digits providing additional information. The first digit controls the protocol operation and the other two gives useful but non-critical information. A textual description and even a whole HTML document can be attached to the result message.

In SIP the extensibility of functionalities has same approach as hyper text transfer protocol (HTTP) and simple mail transfer protocol (SMTP) use. New headers can be added to the SIP messages. Unknown headers and values are ignored by default. Using Require header the client can require specific headers to be understood by the other endpoint. If it does not support the named services an error message containing the unknown feature is returned and the client can return to simpler operation.

RTP

RTP consists of the actual Real-time Transport Protocol which is used to carry data with real-time properties and RTP Control Protocol (RTCP) which is used to monitor QoS and conveying information about the participants in an on-going conference.

RTP implementation will often be integrated into application rather than being implemented as a separate protocol layer (see Figure 4-3). In applications RTP is typically run on top of UDP to make use of its port numbers and checksums. The RTP framework is relatively "loose" allowing modifications and tailoring depending on application.
Additionally, a complete specification for a particular application will require a payload format and profile specification. The payload format defines how a particular payload is to be carried in RTP. A payload specification defines how a set of payload type codecs are mapped into payload formats.

Figure. Location of RTP in IP stack.

RTP session setup consists of defining a pair of destination transport addresses one IP address and UDP port pair, one for RTP and another for RTCP. In the case of multicast conference the IP address is a class D multicast address. In multimedia session each medium is carried in a separate RTP session with its own RTCP packets reporting only the quality of that session. Usually additional media are allocated in additional port pairs and only one multicast address is used for the conference.

SIP vs H.323

SIP's smaller footprint makes the protocol more scalable and faster than existing H.323 implementations. The catch? The protocol is still in its early stages, making products hard to come by.

Until recently, network managers looking to roll out intelligent networks have relied heavily on the H.323 suite of protocols. With H.323, a compliant client queries an H.323 gatekeeper for the address of a new user. The gatekeeper retrieves the address and forwards it to the client, which then establishes a session with the new client using H.225, one of the H.323 protocols. Once the session is established, another H.323 protocol, H.245, negotiates the available features of each client.

The key strength of H.323 is its maturity, which has allowed a number of software vendors to develop robust implementations. The standard's maturity has also allowed the various vendors to eliminate interoperability issues, permitting the deployment of a wide range of H.323-capable devices into the market. Since the H.323 standard includes an adaptation of the Q.931 protocol for call-control, many developers with experience in existing ISDN telephony are familiar with the call control model. In fact, the events and parameters can often be directly passed from H.323 into applications that previously operated with ISDN.

It may sound simple, but H.323 suffers from some key problems. At the top of the list is call setup time. Since H.323 first establishes a session and only then negotiates the features and capabilities of that session, call setup can take significantly longer than an average PSTN call.

Just how long depends on the particular network and the distance between locations, but the total time for someone to answer the call can reach up to 8 seconds.

What's more, H.323 doesn't scale well. A case in point is H.323 addressing. Creating separate phone-numbering schemes complicates interconnecting carrier networks. The H.323 standard itself is too large and complex to make deployment easy.

Finally, H.323 doesn't provide a simple way for connecting two circuit-switched networks across an IP network.

All of these problems are addressed by SIP. With SIP, each user is identified through a hierarchical URL that's built around elements such as a user's phone number or host name (for example, SIP:user@company.com). The similarity to an e-mail address makes SIP URLs easy to guess from a user's e-mail address.

It's easy to see where SIP fills in some of H.323's holes. First there is the issue of call setup time. By including a client's available features within the invite request, SIP negotiates the features and capabilities of the call within a single transaction. SIP can set up a call within about 100 ms, depending on the network.

SIP also scales better than H.323. It is simple and easy to embed into inexpensive end-user devices. The expandable nature of the protocol allows future capabilities to be easily defined and quickly implemented.

The protocol was designed to ensure interoperability and enable different devices to communicate.

Another SIP strength is that non-telephony developers find the protocol easy to understand.

The weaknesses with SIP besides that the protocol is very new are that the protocol has a narrow scope and thus has limited applications by itself; however, it gains flexibility when used with other protocols.

Another weakness is that SIP is only a small piece of a complete solution. Numerous other software components are required to build a complete IP telephony product.

Low-cost end devices are natural applications for SIP. Devices such as wireless phones, set-top cable boxes, Ethernet phones, and other devices with limited computing and memory resources are suited to this protocol.

Table: Protocol Interactivity

Capability H.323 SIP

Complexity High Low

Cost High Low

Maturity Good Poor

Scope of Definition Full Limited

Interoperability Good Some

Similar to ISDN Yes No

References

http://www.cs.columbia.edu/~hgs/sipc/
General description on SIP i.e. SIP software, Registering and Making Calls

http://www.ptotocols.com/voip/sip_methods.html
SIP Methods & Response Codes

http://www.cs.columbia.edu/~hgs/sip/
Official homepage of SIP

http://www.cs.columbia.edu/~hgs/sip/faq.html
All kinds of questions with answer about SIP functionality, protocol operation etc. -

ftp://ftp.rfc-editor.org/in-notes/rfc2543.txt
The SIP RFC 2543. This document specifies the SIP protocol.

Appendix XXXX

SIP Response Messages, a list extract from the VOCAL administration guide

SIP Response Messages Category
The VOCAL system supports all SIP response messages:

1xx Responses - Information Responses
2xx Responses - Successful Responses
3xx Responses - Redirection Responses
4xx Responses - Request Failures Responses
5xx Responses - Server Failure Responses
6xx Responses - Global Failure Responses

For More Information
Refer to the SIP RFC 2543 for a list of the status codes and their reason
codes:
http://www.ietf.org/rfc/rfc2543.txt

1xx and 2xx Responses
1xx SIP response message are informational responses:

100 Trying
180 Ringing
181 Call Is Being Forwarded
182 Queued
183 Session Progress
200 OK

3xx Responses
3xx SIP response message are redirection responses:

300 Multiple Choices
301 Moved Permanently
302 Moved Temporarily
303 See Other
305 Use Proxy
380 Alternative Service

4xx Responses
4xx SIP response message are client error responses:

400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
413 Request Entity Too Large
414 Request-URI Too Large
415 Unsupported Media Type
420 Bad Extension
480 Temporarily not available
481 Call Leg/Transaction Does Not Exist
482 Loop Detected
483 Too Many Hops
484 Address Incomplete
485 Ambiguous
486 Busy Here

5xx Responses
5xx SIP response message are server error responses:

500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Time-out
505 SIP Version not supported

6xx Responses
6xx SIP response message are global failure responses:

600 Busy Everywhere
603 Decline
604 Does not exist anywhere
606 Not Acceptable

Capability	H.323	SIP
Complexity	High	Low
Cost	High	Low
Maturity	Good	Poor
Scope of Definition	Full	Limited
Interoperability	Good	Some
Similar to ISDN	Yes	No