Markus Flierl - Dissertation

Video Coding with Superimposed Motion-Compensated Signals

Markus Helmut Flierl

Abstract

This work discusses video coding with superimposed motion-compensated signals. We build on the theory of multihypothesis motion-compensated prediction for video coding and introduce the concept of motion compensation with complementary hypotheses. Multihypothesis motion compensation linearly combines more than one motion-compensated signal to form the superimposed motion-compensated signal. Motion-compensated signals that are used for the superposition are referred to as hypotheses. Further, a displacement error that captures the inaccuracy of motion compensation is associated with each hypothesis. This work proposes that the multiple displacement errors are jointly distributed and, in particular, correlated. We investigate the efficiency of superimposed motion compensation as a function of the displacement error correlation coefficient. We observe that decreasing the displacement error correlation coefficient improves the efficiency of superimposed motion compensation. We conclude that motion compensation with complementary hypotheses results in maximally negatively correlated displacement error. Motion compensation with complementary hypotheses implies two major results for the efficiency of superimposed motion-compensated prediction: First, the slope of the rate difference reaches up to 2 bits per sample per motion inaccuracy step whereas for single hypothesis motion-compensated prediction this slope is limited to 1 bit per sample per motion inaccuracy step. Here, we measure the rate difference with respect to optimum intra-frame encoding and use a high-rate approximation. Second, this slope of 2 bits per sample per inaccuracy step is already achieved for N=2 complementary hypotheses.

Further, we investigate motion compensation with complementary hypotheses by integrating superimposed motion-compensated prediction into the ITU-T Rec. H.263. We linearly combine up to 4 motion-compensated blocks chosen from up to 20 previous reference frames to improve the performance of inter-predicted pictures. To determine the best N-hypothesis for each predicted block, we utilize an iterative algorithm that improves successively conditional optimal hypotheses. In addition, we discuss motion compensation with complementary hypotheses for B-pictures in the emerging ITU-T Rec. H.264. We focus on reference picture selection and linearly combined motion-compensated prediction signals. We show that bidirectional prediction exploits partially the efficiency of combined prediction signals. Superimposed prediction chooses hypotheses from an arbitrary set of reference pictures and, by this, outperforms bidirectional prediction. That is, superimposed motion-compensated prediction with multiple reference frames allows a more general form of B-pictures.

Finally, we discuss superimposed motion-compensated signals for motion-compensated 3D subband coding of video. We investigate experimentally and theoretically motion-compensated lifted wavelet transforms for the temporal subband decomposition. The experiments show that the 5/3 wavelet kernel outperforms both the Haar kernel and, in many cases, the reference scheme utilizing motion-compensated predictive coding. Based on the motion-compensated lifting scheme, we develop an analytical model describing motion compensation for groups of K pictures. The theoretical discussion is based on a signal model for K motion-compensated pictures that are decorrelated by a linear transform. The dyadic decomposition of K pictures with motion-compensated lifted wavelets is replaced by an equivalent coding scheme with K motion-compensated pictures and a dyadic wavelet decomposition without motion compensation. We generalize the model and employ the Karhunen-Loeve Transform to obtain theoretical performance bounds at high bit-rates for motion-compensated 3D transform coding. For a very large group of pictures and negligible residual noise, the slope of the rate difference is limited by 1 bit per sample per inaccuracy step. The slope of the rate difference for motion-compensated prediction is also limited by 1 bit per sample per inaccuracy step but this coding scheme outperforms motion-compensated prediction by at most 0.5 bits per sample.

Keywords: Video coding, compression, motion compensation, multihypothesis motion-compensated prediction, multiframe prediction, B pictures, motion-compensated 3-dimensional subband coding, wavelets, adaptive wavelets, lifting scheme, Gaussian random fields, Wiener filter, Karhunen-Loeve transform.