Parallel Zigzag Scanning and Huffman Coding for a GPU-based MPEG-2 Encoder

GPUs excel in parallel computations, so they are very efficient calculating the discrete cosine transform of spatial domain images, as required for video encoding. The last steps of MPEG-2 compression, however, are inherently sequential since they require a serial processing of the resulting DCT coefficients. As that can easily become a bottleneck in GPUbased video encoders, in this paper we analyze the problem of computing the zigzag scan and Huffman encoding of a MPEG- 2 coefficient block in a GPU. We observed that simply optimizing the parallelism of the serialization and compression algorithm is not enough, and it can actually lead to worse results than a simple approach with no parallelism because of inefficient memory usage, since memory accesses can dramatically slow down the computation. This paper describes three different techniques to calculate the final bit stream for a MPEG-2 quantized coefficient matrix: a simple serial implementation, a fully parallel implementation, and a combination that beats them both when considering the cost of transferring the result to the CPU.

Tags :
Your rating: None Average: 4.3 (3 votes)