BIT RATE OPTIMIZATION FOR MULTI-POINT VIDEO CONFERENCING USING EMBEDDED ZEROTREE CODING

Bit rate optimization of composited video frames for multi-point video conferencing is implemented. Rate control problems can be generally characterized as the determination of the appropriate coding parameters by pre-coding and decoding processes so that the decoded video quality is optimized according to a certain bit rate. By using zerotree coding we obtain exact target bit rate. In order to obtain best quality for a video, target bit rate is allocated to each video frame optimally by using convex rate-distortion model and Lagrangian optimization.


INTRODUCTION
Multi-point video conferencing basically consist of decoding and resizing of incoming videos from different points, compositing of the resized videos, and finally re-encoding of the composited video which has as many subframes as the number of incoming videos.Rate control for composited video is generally different from rate control for the single video stream case.In such a case, the joint effect of each incoming video stream should be considered in the composited video.For instance, if one of the video streams contains more activity than the others, the number of bits assigned to this stream should be larger than those given to the others.
Depending on the channel conditions there are several bit rate control schemes for video coding.Most of them typically adopt a rate control scheme by adjusting the quantization step based on buffer occupancy.Some methods encode each image block several times with different quantization parameters (QP), and then select the best quantization parameter [1], [2].However because of the high computational complexity, these methods are not suitable for real-time applications [3].Another method given in [4], selects the quantizers according to a formula 20 derived from a model of the encoder.However, this approach does not achieve the exact target bit rate, and can suffer from frequent frame skipping and wasting of channel bandwidth in real time applications [3].The embedded property of the zerotree coding greatly simplifies rate control since the coding control parameter is the allocated bit rate for each frame rather than the quantization parameter [10].Additionally embedded zerotree coding gives better rate-distortion tradeoff while the encoded bit stream can be stopped at any point without a significant distortion [5], [8], [9], [10].Thus we use the flexibility of embedded zerotree coding for bit rate control.
Most of the video coding standards depend on converting video frame pixels into Discrete Cosine Transform (DCT) coefficients, and quantizing them by quantization tables which we call regular quantization here.Different from the regular quantization, another quantization scheme, called zerotree coding well exploits dependencies of the DCT coefficients.Zerotree coding is first introduced for wavelets [5] and then applied for DCT coefficients [6], [7].The main idea of the DCT zerotree coding it to rearrange the DCT coefficients into a hierarchical subband structure similar to that in wavelet transform.The highest subband contains all DC (lowest frequency) coefficients where one can see the content of a video frame.All other subbands include AC (higher frequencies) coefficients where diagonal, horizontal and vertical details of the video frame are stored.Since DC coefficients contain most of the energy of the frame, the quality of the decoded video frame depends mostly on DC coefficients, and then on the low to high frequency AC coefficients.Thus the main objective of the zerotree coding is to code firstly the DC coefficients precisely, and then the low to high frequency AC coefficients.Related to the similarities in subbands, zerotree coding obtains symbols at its output and these symbols are encoded by an adaptive arithmetic encoder giving embedded bit stream property.The embedded property of the zerotree coding allows one to control the bit rate of each frame instantly.A detailed explanation of the zerotree coding can be found in [5].It is very easy to adapt the bit rate of a group of video frames (or pictures), shortly GOP, to a given constant or variable channel bit rate.This can be basically done by allocating a fixed number of bits to each intraframe (I-frame) and interframe (P-frame).However, this scheme does not necessarily give the best average Peak Signal-to-Noise Ratio (PSNR) value for the decoded videos since it does not consider the rate-distortion performance of each frame.To improve the decoded video quality, the bit rate control problem can be formulated as a constrained optimization problem.This problem can be solved by Lagrangian method as will be explained later in Section 4.However, first an R-D model should be defined and then frame dependency problem should be solved.

RATE-DISTORTION MODEL AND RATE CONTROL PROBLEM
To solve the rate control problem, first one needs to obtain the ratedistortion (R-D) model of a video frame.For this purpose, each of the video frames is encoded and decoded at particular bit rates.Then one can easily find an

BIT RATE OPTIMIZATION FOR MULTI-POINT VIDEO CONFERENCING
21 approximation function for the R-D performance curve of each frame by using the obtained distortion versus bit rate graphics of decoded videos.Considering that R-D model of a frame is convex, [10], [11], it can be formulated as where D is the distortion, R is the bit rate, σ 2 is the variance of the DCT coefficients, and γ is the coding efficiency parameter.The variance σ 2 is also the coding distortion when bit rate R equals zero.This model can be easily verified by using experimental data for any video sequence.We show an example for convex R-D model of an I-frame from composited videos with four subframes in Figure 1.The coding efficiency parameter γ specifies the decaying rate of the distortion as the bit rate increases.Generally coding efficiency parameter γ I of I-frames are larger than the coding efficiency parameter γ P of P-frames [10].It is easy to see that the larger the coding efficiency parameter the more efficient the coding, because as bit rate increases distortion decays quickly with a higher coding efficiency parameter.R-D characteristics and coding efficiency parameters of I and P-frames of a composited video sequence are shown in Figure 2. given by B bits/sec, and duration of a GOP is T seconds, target bit rate of a GOP will be R TARGET =BT bits. ( Then the rate-control problem is given as follows: minimize where D i is the distortion, and R i is the corresponding bit rate, the coding control parameter, of the frame f i .

FRAME DEPENDENCY PROBLEM
In order to solve the bit rate constraint optimization problem in Eq. 3, one first performs the experiments to find the R-D characteristics of each frame given in Eq. 1, and then solves it via Lagrangian optimization.However, R-D curve of each frame is dependent on the R-D curve of previously coded frames.In other words, the prediction error corresponding the current frame depends on how previous frame has been encoded.Actually, for each (R i ,D i ) point in currently encoded frame, there is a different R-D curve for the next frame [12], [13].This is called frame dependency problem.
Typically allocating more bits to an I-frame improves the quality of motion compensation resulting in reducing the bit rates for the following P-frames [10].However bit rate distribution should be optimized to have the best average PSNR of a GOP.Now first consider the error of motion compensation with respect to original frame which is given as where r(m[i,j]) is motion compensated reference frame, and c(i,j) is its predictively coded frame.Here m[i,j] is the motion compensation vectors.Then the variance of the motion compensated residue is given as Since at the decoder we only have encoded reference frame rˆ(i,j) the actual residual variance will be where ê(i,j) is the actual residual error.This error can also be written as the summation of the residue of motion compensation with respect to the original reference frame and the error of the motion compensated reference frame as follows: In the same manner, we can also rewrite the variance of the motion compensated residue as where the second component is the mean square error of the motion compensated reference frame.There is a linear relationship between the mean square error of the motion compensated reference frame and that of the original reference frame as where α is frame dependency parameter [10].Finally we can rewrite Eq. 8 as where D stands for coding distortion, which is the mean square error of the original reference frame given in Eq. 9.The linear relationship between the variance of the actual residue error and the mean square error of the original reference frame in Eq. 10 was also verified by the experiments as shown in Figure 3.

BIT RATE ALLOCATION BY LAGRANGIAN OPTIMIZATION
Considering that the R-D functions of each frame are convex, the optimization problem given in Eq. 3 can be solved by using Lagrangian optimization.Now the problem can be rewritten as where J is called Lagrangian or R-D cost, and λ is Lagrangian multiplier.Lagrangian multiplier λ is the absolute value of the slope of the tangency point of the R-D curve at where minimum distortion is achieved at given target bit rate.In Eq. 11, if λ is fixed, the rates that minimize this equation can be found.Now the constraint optimization in Eq. 3 becomes an unconstraint optimization problem which is easier to solve; [10], [11], [12], [13].From R-D models, having the rates , Eq. 11 can be solved by searching a λ 0 such that Clearly from here, our aim is to find these optimized bit rates Let us consider the partial derivatives of the Lagrangian cost with respect to bit rates R i , which are zero at the optimum points, such as Also for each individual frame, R-D equation from Eq. 1can be written as where the actual residual variance of frame f i , 2 ˆi σ , is Now consider the partial derivative in Eq. 13 for the last frame f N in GOP, and since the last frame f N is not a reference frame for other frames one can easily obtain the distortion of the frame f N as Similarly to Eq. 16, partial derivative of Lagrangian cost with respect to R N-1 can be found as and by using Eq.14 and Eq. 15, we get and therefore distortion of the frame f N-1 will be we will have that where From Eq. 14 and Eq. 15, then Eq. 24 becomes where If we put Eq. 14 and Eq. 25 into Eq.28 we will have and from Eq. 27, D i will be Finally, by putting Eq. 15 and Eq. 29 into Eq.30 a second order distortion function is obtained: where Solving Eq.31, distortions will be Now since we have the distortions, finally we can find the bit rates = for each frame in a GOP by using Eq.14 and Eq. 15 as follows: To obtain the bit rates, one first needs to find the distortions given by Eq. 18, Eq. 22 and Eq.32.Also Lagrangian multiplier λ is needed to be found.There are several simple algorithms to find λ, which one of them is bisection iteration method whose details can be found in [10].
This rate control scheme with the explained solution to the frame dependency problem has been shown to be very efficient for wavelet zerotree coders [10].
In the next section, we will compare this method with a piecewise linear R-D model scheme to show its effectiveness when used with DCT-based embedded zerotree coder.

SIMULATIONS AND COMPARISON OF CONVEX AND PIECEWISE LINEAR R-D MODELS
In [12], Silva et al. investigates rate control problem by using piecewise linear R-D model for embedded wavelet zerotree coding.In this section, we will compare this method by using it with DCT-based embedded zerotree coding against the convex R-D model method we use.
To solve the Lagrangian rate control optimization problem given in Eq. 11 using the piecewise linear model, R-D characteristics of each frame are first estimated.An example of a piecewise linear R-D model is shown in Figure 4. Breakpoints of each linear curve are the boundaries of consecutive dominant and subordinate passes of zerotree coding [12].Therefore to estimate the R-D characteristics of a frame, decoder decodes the encoded frame for the rates corresponding to the breakpoints.Then following algorithm is used to find the optimum bit rate for given GOP:  For each frame, find the tangency point (R i (λ),D i (λ )) for given λ, 2. Compute the total bit rate R(λ), 3. If the total bit rate, R(λ), is not equal to the target bit rate, R TARGET , vary λ and go to Step 1, else the optimal bit rates are given by , and stop.The values of λ here are found by determining the negatives of the set of slopes of all the linear pieces of the R-D curves of every frame [12].In [12], Silva et al. propose an iterative method that copes with the frame dependency problem.In their method, they apply the rate control strategy described above and have the reconstructed frames for iteration n.Then rate allocation for the iteration n+1 is computed and so the reconstructed frames for iteration n+1 is obtained.This process is continued until the change in the distortion is below a threshold.However since this method requires several times encoding and decoding the frames of a GOP, we use the frame dependency parameter explained in the previous section.
The comparison results between the rate control method that uses convex R-D model and the one that uses piecewise linear R-D model are given in Figure5.In Figure 5, PSNR results of the reconstructed composited video frames with four subframes are compared.As shown in Table 1, average PSNR values obtained from convex R-D models are slightly better than the ones from piecewise R-D models.Therefore convex R-D model with the solution to the frame dependency problem generally achieves a better PSNR performance.

Bit Rate Allocation at Subframe Layer
In this subsection, we investigate if subframe layer bit rate allocation is necessary for the proposed DCT-based embedded zerotree coding in the DCT compositing system.For this propose; after allocating the bit rates = to each frame in a GOP, we divide the number of bits to the each subframe according to variance of each one such as, where R i,j is the allocated bit rate to subframe f i,j of frame f i , and w i,j is the weight of the bit rate of subframe f i,j obtained from the variances of each subframe as where 2   , j i σ is the variance of subframe f i,j of frame f i consisting of K subframes.By using this we distribute the bit rates among the subframes according to the activities in each one.The average PSNR results are shown in Table 2 for composited videos with four subframes.As seen from the table, subframe layer bit rate allocation does not have any advantages on improving the quality of composited videos.The reason is that since embedded zerotree coding uses successive approximation quantization, DCT coefficients are encoded by significance importance eliminating the evaluation of subframe layer bit rate allocation.

CONCLUSIONS
We use convex R-D model for bit rate control.In order to distribute the bits to each frame optimally, Lagrangian optimization is used.Frame dependency problem is solved by computing the frame dependency parameter which is obtained from the linear relationship between the variance of the actual residue error and the distortion of the original reference frame [10].Since embedded zerotree coder does not require evaluation of quantization parameter, which is the case in regular quantization, we only need to solve the bit rate problem in the frame layer.Also unlike regular quantization the coding control parameter is the allocated bit rate to each frame in a GOP.Therefore we get exact target bit rate.However regular quantization requires a feedback to reevaluate the quantization parameters to reach the target bit rate.Still it does not guarantee to achieve the target bit rate precisely requiring usage of a buffer.We also show that there is no need to obtain the statistics of each subframe to distribute the allocated bit rate of a frame to each subframe.The reason for this is that the embedded zerotree coding uses successive approximation quantization that allows the most significant DCT coefficients to be encoded first.Location of the DCT coefficients does not matter to the embedded zerotree coder.In the same manner, the other DCT coefficients are encoded in the descending significance order.We also compare the performances of the bit rate control methods using convex [10] and piecewise linear models [12].Bit rate control with convex model achieves slightly higher PSNRs than those with piecewise linear model.

10 −
Composited Videos with 4−subframes− Rate−Distortion Characteristics of I and P Frames Bit Rate (bits/pixel)

Figure 2 .
Figure 2. Rate-Distortion characteristics of first I and P frames

Figure 3 .
Figure 3. Relationship between the variance of the actual residue error and the mean square error of the original reference frame

Since
the partial derivative of Lagrangian cost with respect to R i , for i≤N-2,

1
bits/pixel) PIECEWISE LINEAR R−D MODEL −I Frame from Composited Videos with 4−subframes− Experimental Approximation

Figure 5 .
Comparison of R-D performances of the proposed convex model with piecewise linear model for composited videos with 4-subframes (a) at 0.25 bits/pixel (b) at 0.50 bits/pixel.

Table 1 .
Average PSNR comparisons of rate control with convex R-D model with piecewise linear R-D model for composited videos with four subframes

Table 2 .
Average PSNR comparisons of rate control with and without subframe layer bit rate allocation for composited videos with four subframes