The Session Initiation Protocol (SIP) is an application-layer control (signaling) protocol for creating, modifying and terminating sessions with one or more participants. These sessions include Internet multimedia conferences, Internet telephone calls and multimedia distribution. Members in a session can communicate via multicast or via a mesh of unicast relations, or a combination of these. SIP invitations used to create sessions carry session descriptions which allow participants to agree on a set of compatible media types. SIP supports user mobility by proxying and redirecting requests to the user s current location. Users can register their current location. SIP is not tied to any particular conference control protocol. SIP is designed to be independent of the lower-layer transport protocol and can be extended with additional capabilities.
What is SIP?
Introduction
SIP (Session Initiation Protocol) is a protocol developed to assist in providing advanced telephony services across the Internet. Internet telephony is evolving from its use as a cheap (but low quality) way to make international phone calls to a serious business telephony capability. SIP is one of a group of protocols required to ensure that this evolution can occur.
SIP is part of the IETF standards process and is modeled upon other Internet protocols such as SMTP (Simple Mail Transfer Protocol) and HTTP (Hypertext Transfer Protocol.) It is used to establish, change and tear down (end) calls between one or more users in an IP-based network. Beside SIP we give a short description of RTP (Real-Time Transport Protocol) to carry voice and video data. We also mention the Real Time Streaming Protocol (RTSP) for control of streaming media. At the end of the exam of SIP there is a comparison between SIP and today's most common IP telephony protocol, H.323.
SIP is described as a control protocol for creating, modifying and terminating sessions with one or more participants. These sessions include Internet multimedia conferences, Internet (or any IP Network) telephone calls and multimedia distribution. Members in a session can communicate via multicast or via a mesh of unicast relations, or via a combination of these. SIP supports session descriptions that allow participants to agree on a set of compatible media types. It also supports user mobility by proxying and redirecting requests to the user's current location. SIP is not tied to any particular conference control protocol.
In essence, SIP has to provide or enable the following functions:
Name Translation and User Location - Ensuring that the call reaches the called party wherever they are located. Carrying out any mapping of descriptive information to location information. Ensuring that details of the nature of the call (Session) are supported.
Feature Negotiation - This allows the group involved in a call (this may be a multi-partycall) to agree on the features supported recognizing that not all the parties can support the same level of features. For example video may or may not be supported; as any form of MIME type is supported by SIP, there is plenty of scope for negotiation.
Call Participant Management - During a call a participant can bring other users onto the call or cancel connections to other users. In addition, users could be transferred or placed on hold.
Call feature changes - A user should be able to change the call characteristics during the course of the call. For example, a call may have been set up as voice-only, but in the course of the call, the users may need to enable a video function. A third party joining a call may require different features to be enabled in order to participate in the call
Definitions
The following terms have special significance for SIP.
Call: A call consists of all participants in a conference invited by a common source. A SIP call is identified by a globally unique call-id. Thus, if a user is, for example, invited to the same multicast session by several people, each of these invitations will be a unique call. A point-to-point Internet telephony conversation maps into a single SIP call. In a multiparty conference unit (MCU) based call-in conference, each participant uses a separate call to invite himself to the MCU.
Call leg: A call leg is identified by the combination of the Call-ID header field and the addr-spec and tag of the To and From header fields. Within the same Call-ID, requests with From A and To value B belong to the same call leg as the requests in the opposite direction, i.e., From B and To A.
Client: An application program that sends SIP requests. Clients may or may not interact directly with a human user. User agents and proxies contain clients (and servers).
Conference: A multimedia session identified by a common session description. A conference can have zero or more members and includes the cases of a multicast conference, a full-mesh confer-ence and a two-party telephone call, as well as combinations of these. Any number of calls can be used to create a conference.
Downstream: Requests sent in the direction from the caller to the callee (i.e., user agent client to user agent server).
Final response: A response that terminates a SIP transaction,
as opposed to a provisional response that does not. The response has a
response code and response message. The codes fall into classes 100 through
600, similar to HTTP. Unlike other requests, invitations cannot be answered
immediately, as locating the callee and waiting for a human to answer may
take several seconds. Call requests may also be queued, e.g., if the callee
is busy. Responses of the 100 class (denoted as 1xx) indicate call progress;
they are always followed by other responses indicating the final outcome
of the request. While the 1xx responses are provisional, the other classes
indicate the final status of the request: 2xx for success, 3xx for redirection,
4xx, 5xx and 6xx for client, server and global failures, respectively.
A list of all SIP messages is given in Appendix XXXX.
Initiator, calling party, caller: The party initiating a session invitation. Note that the calling party does not have to be the same as the one creating the conference.
Invitation: A request sent to a user (or service) requesting participation in a session. A successful SIP invitation consists of two transactions: an INVITE request followed by an ACK request.
Invitee, invited user, called party, callee: The person or service that the calling party is trying to invite to a conference.
Location service: A location service is used by a SIP redirect or proxy server to obtain information about a callee' s possible location(s). Examples of sources of location information include SIP registrars, databases or mobility registration protocols. Location services are offered by location servers. Location servers MAY be part of a SIP server, but the manner in which a SIP server requests location services is beyond the scope of this document.
Proxy, proxy server: An intermediary program that acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them on, possibly after translation, to other servers. A proxy interprets, and, if necessary, rewrites a request message before forwarding it. Proxy servers are, for example, used to route requests, enforce policies, control firewalls.
Redirect server: A redirect server is a server that accepts a SIP request, maps the address into zero or more new addresses and returns these addresses to the client. Unlike a proxy server, it does not initiate it's own SIP request. Unlike a user agent server, it does not accept calls.
Registrar: A registrar is a server that accepts REGISTER requests. A registrar is typically co-located with a proxy or redirect server and MAY make its information available through the location server.
Server: A server is an application program that accepts requests in order to service requests and sends back responses to those requests. Servers are either proxy, redirect or user agent servers or registrars.
Stateless Proxy: A logical entity that does not maintain state for a SIP transaction. A stateless proxy forwards every request it receives downstream and every response it receives upstream.
Stateful Proxy: A logical entity that maintains state information at least for the duration of a SIP transaction.
User agent client (UAC): A user agent client is a client application that initiates a SIP request.
User agent server (UAS): A user agent server is a server application that contacts the user when a SIP request is received and that returns a response on behalf of the user. The response accepts, rejects or redirects the request.
User agent (UA): An application which can act both as a user agent client and user agent server.
An application program MAY be capable of acting both as a client and a server. For example, a typical multimedia conference control application would act as a user agent client to initiate calls or to invite others to conferences and as a user agent server to accept invitations. The role of UAC and UAS as well as proxy and redirect servers are defined on a request-by-request basis. For example, the user agent initiating a call acts as a UAC when sending the initial INVITE request and as a UAS when receiving a BYE request from the callee. Similarly, the same software can act as a proxy server for one request and as a redirect server for the next request.
Protocol Components
There are two components within SIP. The SIP User Agent and the SIP Network Server. The User Agent is effectively the end system component for the call and the SIP Server is the network device that handles the signaling associated with multiple calls.
The User agent it self has a client element, the User Agent Client (UAC) and a server element, the User Agent Server (UAS.) The client element initiates the calls and the server element answers the calls. This allows peer-to-peer calls to be made using a client-server protocol.
The SIP server element also provides for more than one type of server. There are effectively three forms of server that can exist in the network - the SIP stateful proxy server, the SIP stateless proxy server and the SIP redirect server. The main function of the SIP servers is to provide name resolution and user location, since the caller is unlikely to know the IP address or host name of the called party. What will be available is perhaps an email-like address or a telephone number associated with the called party. Using this information, the caller's user agent can identify with a specific server to resolve the address information- it is likely that this will involve many servers in the network.
A SIP proxy server receives requests, determines where to send these, and passes them onto the next server (using next hop routing principals). There can be many server hops in the network.
The difference between a stateful and stateless proxy server is that a stateful proxy server remembers the incoming requests it receives, along with the responses it sends back and the outgoing requests it sends on. A stateless proxy server forgets all information once it has sent on a request. This allows a stateful proxy server to fork requests to try multiple possible user locations in parallel and only send the best responses back. Stateless proxy servers are most likely to be the fast, backbone of the SIP infrastructure. Stateful proxy servers are then most likely to be the local devices close to the User Agents, controlling domains of users and becoming the prime platform for the application services.
A redirect server receives requests, but rather than passing these onto the next server it sends a response to the caller indicating the address for the called user. This provides the address for the caller to contact the called party at the next server directly.
SIP Protocol
SIP provides the necessary protocol mechanisms so that end systems and
proxy servers can provide services:
Once found, the request is sent to the user, and from there several options arise. In the simplest case, the user's telephony client receives the request that is, the user' s phone rings. If the user takes the call, the client responds to the invitation with the designated capabilities* of the client software and a connection is established. If the user declines the call, the session can be redirected to a voice mail server or to another user.
* "Designated capabilities" refers to the functions that
the user wants to invoke.
The client software might support video conferencing,
for example, but the user may only want to use
audio conferencing. Regardless, the user can always add functions
such as video conferencing, white boarding, or a third user by is suing
another invite request to other users on the link.
SIP has two additional significant features. The first is a stateful SIP proxy server's ability to split or fork an incoming call so that several extensions can be rung at once. The first extension to answer takes the call. This feature is handy if a user is working between two locations (a lab and an office, for example), or where someone is ringing both a boss and their secretary.
The second significant feature is SIP s unique ability to return different media types. Take the example of a user contacting a company. When the SIP server receives the client's connection request, it can return to the customer s phone client via a Web Interactive Voice Response page (IVR or could use the term Interactive Web Response or IWR), with the extensions of the available departments or users provided on the list. Clicking the appropriate link sends an invitation to that user to set up a call.
Addressing and Naming
To be invited and identified, the called party has to be named. Since it is the most common form of user addressing in the Internet, SIP chose an email-like identifier of the form user@domain, user@host, user@IP address or phone-number@gateway. The identifier can refer to the name of the host that a user is logged in at the time, an email address or the name of a domain-specific name translation service. Addresses of the form phone-number@gateway designate GSTN phone numbers reachable via the named gateway. SIP provides its own reliability mechanism and is therefore independent of the packet layer and only requires an unreliable datagram service. SIP is typically used over UDP or TCP.
SIP uses these addresses as part of SIP URLs, such as sip:j.doe@example.com. This URL may well be placed in a web page, so that clicking on the link initiates a call to that address, similar to a mail to URL today.
We anticipate that most users will be able to use their email address as their published SIP address. Email addresses already offer a basic location-independent form of addressing, in that the host part does not have to designate a particular Internet host, but can be a domain, which is then resolved into one or more possible domain mail server hosts via Domain Name System (DNS) MX (mail exchange) records.
For email, finding the mail exchange host is often sufficient to deliver mail, as the user either logs in to the mail exchange host or uses protocols such as the Internet Mail Access Protocol (IMAP) or the Post Office Protocol (POP) to retrieve their mail. For interactive audio and video communications, however, participants are typically sending and receiving data on the workstation, PC or Internet appliance in their immediate physical proximity. Thus, SIP has to be able to resolve name@domain to user@host. A user at a specific host will be derived through zero or more translations. A single externally visible address may well lead to a different host depending on time of day, media to be used, and any number of other factors. Also, hosts that connect via dial modems may acquire a different IP address each time.
SIP Messages
A SIP message is either a request from a client to a server, or a response from a server to a client.
SIP uses message structures found from HTML. The messages are in text format using ISO 10646 in UTF-8 encoding. As in HTML the client requests invoke methods on the server. The messages consists of a start-line specifying the method and the protocol, a number of header fields specifying call properties and service information, and an optional message body which can contain a session description. The following methods are applicable in SIP:
Invite - invites a user to join a call.
Bye - terminates the call between two of the users on a call
Options - requests information on the capabilities of a server
Ack - confirms that a client has received a final response to
an INVITE
Register - provides the map for address resolution, letting
a server k now the location of other users.
Cancel - ends a pending request, but does not end the call
The syntax of response codes are similar to HTML. The three digit codes are hierarchically organized with the first digit representing the result class and the other two digits providing additional information. The first digit controls the protocol operation and the other two gives useful but non-critical information. A textual description and even a whole HTML document can be attached to the result message.
In SIP the extensibility of functionalities has same approach as hyper text transfer protocol (HTTP) and simple mail transfer protocol (SMTP) use. New headers can be added to the SIP messages. Unknown headers and values are ignored by default. Using Require header the client can require specific headers to be understood by the other endpoint. If it does not support the named services an error message containing the unknown feature is returned and the client can return to simpler operation.
RTP
RTP consists of the actual Real-time Transport Protocol which is used to carry data with real-time properties and RTP Control Protocol (RTCP) which is used to monitor QoS and conveying information about the participants in an on-going conference.
RTP implementation will often be integrated into application rather
than being implemented as a separate protocol layer (see Figure 4-3). In
applications RTP is typically run on top of UDP to make use of its port
numbers and checksums. The RTP framework is relatively "loose" allowing
modifications and tailoring depending on application.
Additionally, a complete specification for a particular application
will require a payload format and profile specification. The payload format
defines how a particular payload is to be carried in RTP. A payload specification
defines how a set of payload type codecs are mapped into payload formats.
Figure. Location of RTP in IP stack.
RTP session setup consists of defining a pair of destination transport
addresses one IP address and UDP port pair, one for RTP and another for
RTCP. In the case of multicast conference the IP address is a class D multicast
address. In multimedia session each medium is carried in a separate RTP
session with its own RTCP packets reporting only the quality of that session.
Usually additional media are allocated in additional port pairs and only
one multicast address is used for the conference.
SIP vs H.323
SIP's smaller footprint makes the protocol more scalable and faster than existing H.323 implementations. The catch? The protocol is still in its early stages, making products hard to come by.
Until recently, network managers looking to roll out intelligent networks have relied heavily on the H.323 suite of protocols. With H.323, a compliant client queries an H.323 gatekeeper for the address of a new user. The gatekeeper retrieves the address and forwards it to the client, which then establishes a session with the new client using H.225, one of the H.323 protocols. Once the session is established, another H.323 protocol, H.245, negotiates the available features of each client.
The key strength of H.323 is its maturity, which has allowed a number of software vendors to develop robust implementations. The standard's maturity has also allowed the various vendors to eliminate interoperability issues, permitting the deployment of a wide range of H.323-capable devices into the market. Since the H.323 standard includes an adaptation of the Q.931 protocol for call-control, many developers with experience in existing ISDN telephony are familiar with the call control model. In fact, the events and parameters can often be directly passed from H.323 into applications that previously operated with ISDN.
It may sound simple, but H.323 suffers from some key problems. At the top of the list is call setup time. Since H.323 first establishes a session and only then negotiates the features and capabilities of that session, call setup can take significantly longer than an average PSTN call.
Just how long depends on the particular network and the distance between locations, but the total time for someone to answer the call can reach up to 8 seconds.
What's more, H.323 doesn't scale well. A case in point is H.323 addressing. Creating separate phone-numbering schemes complicates interconnecting carrier networks. The H.323 standard itself is too large and complex to make deployment easy.
Finally, H.323 doesn't provide a simple way for connecting two circuit-switched networks across an IP network.
All of these problems are addressed by SIP. With SIP, each user is identified through a hierarchical URL that's built around elements such as a user's phone number or host name (for example, SIP:user@company.com). The similarity to an e-mail address makes SIP URLs easy to guess from a user's e-mail address.
It's easy to see where SIP fills in some of H.323's holes. First there is the issue of call setup time. By including a client's available features within the invite request, SIP negotiates the features and capabilities of the call within a single transaction. SIP can set up a call within about 100 ms, depending on the network.
SIP also scales better than H.323. It is simple and easy to embed into inexpensive end-user devices. The expandable nature of the protocol allows future capabilities to be easily defined and quickly implemented.
The protocol was designed to ensure interoperability and enable different devices to communicate.
Another SIP strength is that non-telephony developers find the protocol easy to understand.
The weaknesses with SIP besides that the protocol is very new are that the protocol has a narrow scope and thus has limited applications by itself; however, it gains flexibility when used with other protocols.
Another weakness is that SIP is only a small piece of a complete solution. Numerous other software components are required to build a complete IP telephony product.
Low-cost end devices are natural applications for SIP. Devices such
as wireless phones, set-top cable boxes, Ethernet phones, and other devices
with limited computing and memory resources are suited to this protocol.
Table: Protocol Interactivity
Capability | H.323 | SIP |
Complexity | High | Low |
Cost | High | Low |
Maturity | Good | Poor |
Scope of Definition | Full | Limited |
Interoperability | Good | Some |
Similar to ISDN | Yes | No |
References
http://www.cs.columbia.edu/~hgs/sipc/
General description on SIP i.e. SIP software, Registering and Making
Calls
http://www.ptotocols.com/voip/sip_methods.html
SIP Methods & Response Codes
http://www.cs.columbia.edu/~hgs/sip/
Official homepage of SIP
http://www.cs.columbia.edu/~hgs/sip/faq.html
All kinds of questions with answer about SIP functionality, protocol
operation etc. -
ftp://ftp.rfc-editor.org/in-notes/rfc2543.txt
The SIP RFC 2543. This document specifies the SIP protocol.
Appendix XXXX
SIP Response Messages, a list extract from the VOCAL administration guide
SIP Response Messages Category
The VOCAL system supports all SIP response messages:
For More Information
Refer to the SIP RFC 2543 for a list of the status codes and their
reason
codes:
http://www.ietf.org/rfc/rfc2543.txt
1xx and 2xx Responses
1xx SIP response message are informational responses:
3xx Responses
3xx SIP response message are redirection responses:
4xx Responses
4xx SIP response message are client error responses: