High Performance Video Streaming on Low End Systems

High Performance Video Streaming on Low End Systems

Literature Report

Abstract

This literature report presents a brief background and some of the aspects that need to be considered when transferring digital video over networks.

Video compression

The main purpose with compression is to reduce the size of the video files in order to save bandwidth and storage place. Video takes up a huge amount of space. The television standard PAL for example would need a bit rate of 216 Mbps (25 frames per second, 864 x 625 luminance samples, 429 x 525 x 2 chrominance samples, 8 bits per sample Bit rate = 25 x 8 x ((864 x 625) + (432 x 625 x 2)) = 216Mbps) [22]. For a video studio this can be handled, but it is too much for transmission over practical networks.

There are two types of compression lossy and lossless compression. The lossless compression removes the statistical redundant information. The process is reversible and compression ratios of up to about 1:3 can be achieved. To compress the data even more, lossy compression has to be used. Ratios of over 1:20 is possible, at the price of reduced quality. The lossy compression removes the subjectively redundant information, this process destroys some of the data and is therefore not reversible. Lossless compression is almost only used for medical and scientific purposes where no distortion of the image can be tolerated.

A drawback of removing most of the redundant information, is that compressed video is very sensitive to losses and errors. One way to improve the quality when there are frequent losses and errors is to use forward error correction (FEC). FEC might be a god choice to reduce the cell loss rate, but the redundant information that FEC adds leads to that more packets needs to be transmitted. For satellite traffic, this might be accepted due to the long roundtrip-time, but in networks, the extra packets might in fact increase the congestion on the net and cause even more cell losses.

Video coding techniques

There are several different coding techniques for digital video. Some of the most used techniques are MPEG1, MPEG2, H.261 and H.263.

The H.261 standard is a real-time codec system with low delay. It was developed to distribute videoconferences and video telephony via ISDN. The coding algorithm allows transmission-rates in multiples (1-30) of 64kbps. The image size is based on the CIF-format (Common Intermediate Format) that has a resolution of 352 x 288 pixels. When transmission over 64 or 128kbps ISDN is used, it is desirable to reach low bit rates. To achieve this, the frame rate is restricted to about 10 frames per second and quarter CIF (QCIF) with a resolution of 176 x 144 is usually used. The basic idea of the algorithm is to divide each picture into macroblocks of 16 x 16 pixels, and try to find similarities between consecutive images, in order to reduce the information. Motion prediction is used to find out how parts in the image move.

The H.263 standard is an enhancement of H.261. It was developed to support the V.34 modem standard. Many improvements have been done to support the low bit rate of V.34 (28.8kbps). Better motion compensation, part of a predicted block is allowed to be outside the picture, advanced prediction mode and PB frames mode. PB is a P and a B frame coded as one unit. H.263 also supports a new format for low bandwidth applications, sub-QCIF (128 x 96) which has half as many pixels as QCIF. For higher resolution, there are new optional standards, 4CIF and 16CIF with resolutions of (704 x 576) and (1408 x 1152).

The MPEG1 standard reminds of H.261, but MPEG1 has better coding for motion prediction. Recommended image size is 360 x 240pixels, and at a bit-rate of 1.5Mbps the quality is comparable to VHS. A MPEG video sequence is divided into a group of pictures (GOP). There are three different types of coded pictures/frames in a GOP:

I pictures (intracoded frames): self contained frames, act as reference to decode the inter-coded P and B frames.
P pictures (forward-predicted frames): block-by-block differences with the previous frame.
B pictures (bidirectionally-predicted frames): differences with both the previous and the next frame.

With MPEG at least two or three frames have to be buffered at the destination in order to decode inter frame dependencies.

The MPEG2 standard was developed to meet the demands for high quality video coding. It reminds much of MPEG1, but it has a greater collection of coding techniques to rely on. For applications that use broadcast quality, streams of 4–6 Mbps are usually used and MPEG2 have been adopted to support future demands of high-definition TV (HDTV) with bit-rates of up to 40 Mbps.

Layered coding

Layered coding is used to divide a video into different layers. First there is a base layer which only offer a rudimentary quality, then there are several add on layers that increase the quality and in some cases the resolution. The advantage with using layered coding is that the base layer can be sent with priority and the add on layers with best effort. This means that the client will be guaranteed to get an image of basic quality and a better image if the resources allow it. When there are clients with different amount of available bandwidth, they can receive as many layers as they can handle. When the net is congested the sender can chose to only send a few layers.

The drawback with layered coding is that it increases the complexity in encoding, decoding and transmission requirements, and it might reduce the degree of compression.

Error concealment

Using error concealment techniques in the decoder can reduce the damage of a lost packet. The more advanced the technique is, the better the result will be. A common way to fix a lost part in an image is to use an equivalent spatially or temporally adjacent area. If a whole frame is lost, there are several ways to handle it (See fig 1). The easiest way (Alt. 1) is to skip the lost frame and pretend that it was never there. A better way (Alt. 2) might be to show the previous frame once more. The most advanced way (Alt. 3) is to always keep the latest frames in a buffer and if a loss occurs, show a couple of frames before and after the loss a bit longer.

Fig 1. Handling of frame loss.

Jitter

Jitter can be explained as the difference between consecutive packets transmission time. Jitter occurs because of delays that appear due to many different reasons. Depending on how a video stream is multiplexed with other traffic, time glitches will appear and jitter will increase.

Jitter can be reduced by smoothing the output from the server. This can be done either by having a previously smoothed video file (CBR) or by using buffers to send a VBR file as CBR. Jitter resulting from variable queuing delay can be accommodated by reading at least as many packets that are required to cover the duration of the expected jitter delay.

Problems with VBR

One problem that appears when coding video is that the bitrate required to represent different frames varies heavily depending on the content. This kind of behavior is called VBR. When transmitting data on networks, it is preferable to have a constant bitrate because sudden peaks in bitrate can congest the net, overflow buffers and cause packet loss.

Solutions

There are several techniques to handle the VBR behavior, some of them is presented below under smoothing techniques and rate control.

Smoothing techniques

There are several ways to smooth a video-stream, some more advanced than other:

Temporal multiplexing, using one buffer per stream can reduce the peak bandwidth requirements at the price of increased delay.
By statistical multiplexing, several independent streams together have a bandwidth demand that converges towards a normal distribution. This kind of smoothing introduces no delay.
By using a work-ahead buffer ([15], [17]), the variability can be reduced without introducing a delay. There has to be enough buffer space in the receiver though.

[15] and [17] have worked on and evaluated work-ahead smoothing techniques reduces the rate variability when transmitting stored video from a server to a client across the network. The problem they have focused on, is how to transmit a video as smooth as possible to a client with a fixed buffer without starving or overflowing the receivers buffer. The solution to this problem is to schedule the transmission so that variance and peak rate is minimized.

The scheduler has an algorithm that uses the frame sizes of the movie and the receivers buffer size as input. The algorithm divides the movie in different segments by looking at the bandwidth demand for different frames. Each segment is then transmitted, with its own calculated rate, in constant bit rate (CBR). With a variable bit rate (VBR) video as source, the peak rate and standard deviation can be reduced by 70-80% (with a buffer of only 1MB and a videostream of 1.25 Mbps), using this technique.

Fig. 2. Bandwidth allocation

When one segment is being transmitted, the bandwidth is renegotiated as early as possible (depending on the buffer capacity), in order to meet the demands for the next segment (see fig. 2). By prefetching a part of the next segment concurrently with the current, the bandwidth demands of the next segment can be reduced. A network manager keeps track of the network recourses so the recourses are not overbooked. [15] tries to increase the bandwidth in smaller steps than [17], the advantage with this principle is that it is more likely to get a little bit more bandwidth from the network manager than a huge amount. The drawback with small increases Therefore, [17] wants to present all its bandwidth negotiations in advance and get it approved before transmitting. The drawback with more frequent allocation requests is the overhead that this causes.

If an increase in bandwidth is denied the sender has to take further actions: dynamically decrease the quantiztation level to reduce the bit-rate, choosing another path or by allocating more buffer space in the client.

To be able to cope with transmission jitter the algorithm must not totally empty or fill the buffer. A high- and low- watermark is calculated to handle early packets or the worst expected delay. If the jitter characteristics are known, it is possible to set the watermarks optimally and achieve better performance.

Network performance basically depends on peak-rate and burst characteristics, therefore these new smooth streams generated by scheduling and prefetching ought to consume less resources.

[15] have made a comparison of the use of smooth streams on two different kind of network services, deterministic guaranteed service and Renegotiated Constant Bit-rate.

Deterministic guaranteed service increases the performance by using temporal multiplexing. Statistical multiplexing is not allowed because of the hard guarantees of quality. The advantage of deterministic guaranteed service is that the performance is guaranteed, but at the cost of introduced delay and potentially low utilization.

RCBR is like CBR with functionality for bandwidth renegotiation added. RCBR does not use temporal multiplexing, performance is increased by using statistical multiplexing of network resources via a bandwidth renegotiation mechanism. The advantages with RCBR are low delay and potentially high utilization, the drawbacks are renegotiation overhead and no hard guarantees. When using RCBR together with the scheduling algorithm, capacity is reserved to handle the peak in each interval. For every new interval, the bandwidth is renegotiated.

Rate Control

[11] have worked on rate control mechanisms. Rate control mechanisms (see fig. 3) is useful for two different things, to send a VBR video stream on a CBR channel or to adopt to congestion on the net. The main idea is to decrease the quality for the more complex scenes and increase it for the less complex ones to achieve a constant bitrate. To adopt to congestion the quality is also decreased.

There are different ways to adjust the output rate from an encoder. By decreasing the sampling rate, increasing the quantization step size, decreasing the number of bits used for each pixel or by increasing the movement detection threshold. Common for all reduction techniques is that the quality is reduced.

Fig. 3 Rate Controler

New protocols for real-time traffic

Lately several new protocols have been made to support the transfer of real-time video:

Real-Time Stream Transfer Protocol (RSTP) offer client-server network transfer functions that are built for transmitting data such as audio or video. The main idea with RSTP, is to adopt the transfer rate to the conditions on the net. The video stream is first divided into cyclic timeslots. Depending on the situation in the network, in the server and in the client, the data rate can be adjusted.

Real-time Transport Protocol (RTP) consists of two different protocols, the RTP that transports the payload data over UDP and the Real-time Transfer Control Protocol (RTCP), that is used to report traffic conditions from the clients to the server. Functions that are supported by RTP are loss detection for quality estimation and rate adaption, sequencing of data, intra- and intermedia synchronization, source identification, and basic membership information.

The protocols use two different channels and the transmission is of 'best-effort'. Using ‘best-effort’ results in that packets can be lost, without retransmission (because UDP is used). All members on the RTCP-channel send regular messages about how much data they have sent (if any) and how well the data on the RTP-channel is being received. When some of these regular packets get lost, it is an indication of general packet-loss. When this happens, RTCP request the sender to adapt its transmission-rate to avoid further packet-loss.

Resource Reservation Protocol (RSVP) makes it possible for applications to allocate resources along the path from the source to the destinations. RSVP has to be implemented in each router on the way and in the end system. The end system, reserves resources as bandwidth and buffer space in each router that has to be passed from source to destination. In this way, a certain QoS can be guaranteed. Both unicast and multicast delivery of data is supported by RSVP.

Traffic shaping

When sending a stored video, it is desirable to do it in a smooth way, and not all at once because:

This could congest the net
The whole file would need to be buffered to be able to see it
In many cases, the capacity is so low that it would take a lot of time to first download it and then view it.

The solution to all these problems is to send the file in the same rate that is needed to view it. A way to accomplish this is to use Traffic shaping

Shaping the traffic can save a lot of bandwidth, especially when the sources are bursty. However, this is done at the expense of increased delay, and the shaping is therefore limited by how much delay that is tolerable.

One common way to limit the burst size and transmission rate is to use the Leaky Bucket Algorithm. The Leaky bucket algorithm consists of two parts (see fig. 4), a token pool of size N and tokens. Tokens are generated at a fixed rate of R cells per ms and are stored in the pool.

Fig. 4. Leaky bucket

The token pools can not contain more than N tokens. When a packet arrive, it consume one token, if there are no tokens left in the pool the packet is queued. If the queue is full, the packet gets lost. With a pool size of N = 1, the packet transmission rate becomes equal to R.

In order to save buffer space (i.e. where the queue is stored) there are two ways: either to increase the pool size N or increase the token generation rate R. But to smooth out bursts, the pool size must not be too large, it is better to increase R then.

Comments

The articles that I studied did not offer much of an opening to the problem that I am faced to, and I had expected more from them. The most of them wore either introductory or described a very specific case. Although some of the solutions look very good, I don’t have the time to implement them on my own in the short period of time that the M. Sc. Thesis consists of.
Instead, I will focus on building a simpler video streamer using CBR coded videos.
The streamer will rely on the leaky bucket algorithm to send video files in a smooth way.

Abbreviations

CBR Constant Bit Rate

CODEC Encoder/Decoder

CBR Constant Bit Rate

CRC Cyclic Redundancy Check

FEC Forward Error Connection

H.261 Video coding standard

H.263 Video coding standard

IP Internet Protocol

MPEG Moving Pictures Expert Group

MPEG2 Generic video coding standard

QoS Quality of Service

PAL Phase Alternation Line television format

RCBR Renegotiated Constant Bit-rate

RSTP Real-time Stream Transport Protocol

RSVP Resource Reservation Protocol

RTP Real-time Transport Protocol

RTCP Real-time Control Transfer Protocol

TCP Transmission Control Protocol

TDM Time Division Multiplexing

VBR Variable Bit Rate

References

T. C. Kwok: Residential Broadband Internet Services and Applications Requirements.

A. Banerjea: On the Use of Dispersity Routing for Fault Tolerant Real-time Channels.

C. Perkins, J. Crowcroft: Real-time Audio and Video Transmissions of IEEE GLOBECOM ’96 over the Internet. IEEE Communications Magazine, April 1997.
T.-H. Wu, I. Korpeoglu, B.-C. Cheng: Distributed Interactive Video System Design and Analysis.

G. Karlsson: Asynchronous Transfer of Video.

N. E. Andersen, P. M. N. Nordeste, A. M. O. Duarte, H. E. Lassen, A. Ekblad, A. R. Pach, K. Amborski, L. Ditmtnann: Broadbandloop: A Full-Service Access for Residential and Small Business Users.

D. J. Wright: Assessment of Alternative Transport Options for Video Distribution and Retrieval over ATM in Residential Broadband. IEEE Communications Magazine, December 1997.
S. Kalyanaraman, R. Jain, S. Fahmy, R. Goyal: Performances and Buffering Requirements of Internet Protocols over ATM, ABR and UBR Services. IEEE Communications Magazine, June 1998.
T.-H. Lee, K.-C. Lai: Characterization of Delay-Sensitive Traffic:

G. Chirivolu, R. Sankar, N. Ranganathan: Addaptive VBR Video Traffic Management for Higher Utilization of ATM Networks. ACM Sigcomm Computer Communication Review. Vol. 28, Nr. 3, July 1998.
J-C. Bolot, T. Turletti: Experience with Control Mechanisms for Packet Video in the Internet.

S. Gringeri, B. Khasnabish, A. Lewis, K. Shuaib, R. Egorov, B. Basch: Transmission of MPEG-2 Video Streams over ATM. IEEE Multimedia, January-March 1998.
C. A. Fulton, S.-q. Li: Delay Jitter First-Order and Second-Order Statistical Functions of General Traffic on High-Speed Multimedia Networks. IEEE/ACM Transactions on Networking, Vol. 6, No. 2, April 1998.
R. Gopalakrishnan, G. M. Parulkar: Efficient User-Space Protocol Implements with QoS Guarantees Using Real-time Upcalls. IEEE/ACM Transactions on Networking, Vol. 6, No. 4, August 1998.
J. D. Salehi, Z.-L Zhang, J Kurose Supporting, D. Towsley: Supporting Stored Video: Reducing Rate Variability and End-to-End Resource Requirements Through Optimal Smoothing. IEEE/ACM Transactions on Networking, Vol. 6, No. 4, August 1998.
E. C. Lin: The Effects of Scheduling Contention on Workstation Traffic.

W.-C. Feng, S. Sechrest: Critical bandwidth allocation for the delivery of compressed video.

H. Jinzenji, K. Hagishima: Real-time Audio and Video Transmissions of IEEE GLOBECOM ’96 over the Internet. IEEE Communications Magazine, April 1997.
R. O. Onvural: Asynchronous Transfer Mode Networks – Performance Issues 2:nd ed.

D. E. Comer, D. L. Stevens: Internetworking with TCP/IP Volume II.

D. E. Comer: Internetworking with TCP/IP Volume I 3:rd ed.

M. J. Riley, I. E.G. Richardson: Digital Video Communications.

Drafts:

ITU-T Recommendation H.263, 1995