Video coding in WebRTC

Introduction to layered video coding

Video coding is the process of encoding a stream of uncompressed video frames into a compressed bitstream, whose bitrate is lower than that of the original stream.

Block-based hybrid video coding

All video codecs in WebRTC are based on the block-based hybrid video coding paradigm, which entails prediction of the original video frame using either information from previously encoded frames or information from previously encoded portions of the current frame, subtraction of the prediction from the original video, and transform and quantization of the resulting difference. The output of the quantization process, quantized transform coefficients, is losslessly entropy coded along with other encoder parameters (e.g., those related to the prediction process) and then a reconstruction is constructed by inverse quantizing and inverse transforming the quantized transform coefficients and adding the result to the prediction. Finally, in-loop filtering is applied and the resulting reconstruction is stored as a reference frame to be used to develop predictions for future frames.

Frame types

When an encoded frame depends on previously encoded frames (i.e., it has one or more inter-frame dependencies), the prior frames must be available at the receiver before the current frame can be decoded. In order for a receiver to start decoding an encoded bitstream, a frame which has no prior dependencies is required. Such a frame is called a “key frame”. For real-time-communications encoding, key frames typically compress less efficiently than “delta frames” (i.e., frames whose predictions are derived from previously encoded frames).

Single-layer coding

In 1:1 calls, the encoded bitstream has a single recipient. Using end-to-end bandwidth estimation, the target bitrate can thus be well tailored for the intended recipient. The number of key frames can be kept to a minimum and the compressability of the stream can be maximized. One way of achiving this is by using “single-layer coding”, where each delta frame only depends on the frame that was most recently encoded.

Scalable video coding

In multiway conferences, on the other hand, the encoded bitstream has multiple recipients each of whom may have different downlink bandwidths. In order to tailor the encoded bitstreams to a heterogeneous network of receivers, scalable video coding can be used. The idea is to introduce structure into the dependency graph of the encoded bitstream, such that layers of the full stream can be decoded using only available lower layers. This structure allows for a selective forwarding unit to discard upper layers of the of the bitstream in order to achieve the intended downlink bandwidth.

There are multiple types of scalability:

Temporal scalability are layers whose framerate (and bitrate) is lower than that of the upper layer(s)
Spatial scalability are layers whose resolution (and bitrate) is lower than that of the upper layer(s)
Quality scalability are layers whose bitrate is lower than that of the upper layer(s)

WebRTC supports temporal scalability for VP8, VP9 and AV1, and spatial scalability for VP9 and AV1.

Simulcast

Simulcast is another approach for multiway conferencing, where multiple independent bitstreams are produced by the encoder.

In cases where multiple encodings of the same source are required (e.g., uplink transmission in a multiway call), spatial scalability with inter-layer prediction generally offers superior coding efficiency compared with simulcast. When a single encoding is required (e.g., downlink transmission in any call), simulcast generally provides better coding efficiency for the upper spatial layers. The K-SVC concept, where spatial inter-layer dependencies are only used to encode key frames, for which inter-layer prediction is typically significantly more effective than it is for delta frames, can be seen as a compromise between full spatial scalability and simulcast.

Overview of implementation in `modules/video_coding`

Given the general introduction to video coding above, we now describe some specifics of the modules/video_coding folder in WebRTC.

Built-in software codecs in `modules/video_coding/codecs`

This folder contains WebRTC-specific classes that wrap software codec implementations for different video coding standards:

libaom for AV1
libvpx for VP8 and VP9
OpenH264 for H.264 constrained baseline profile

Users of the library can also inject their own codecs, using the VideoEncoderFactory and VideoDecoderFactory interfaces. This is how platform-supported codecs, such as hardware backed codecs, are implemented.

Video codec test framework in `modules/video_coding/codecs/test`

This folder contains a test framework that can be used to evaluate video quality performance of different video codec implementations.

SVC helper classes in `modules/video_coding/svc`

ScalabilityStructure* - different standardized scalability structures
ScalableVideoController - provides instructions to the video encoder how to create a scalable stream
SvcRateAllocator - bitrate allocation to different spatial and temporal layers

Utility classes in `modules/video_coding/utility`

FrameDropper - drops incoming frames when encoder systematically overshoots its target bitrate
FramerateController - drops incoming frames to achieve a target framerate
QpParser - parses the quantization parameter from a bitstream
QualityScaler - signals when an encoder generates encoded frames whose quantization parameter is outside the window of acceptable values
SimulcastRateAllocator - bitrate allocation to simulcast layers

General helper classes in `modules/video_coding`

FecControllerDefault - provides a default implementation for rate allocation to forward error correction
VideoCodecInitializer - converts between different encoder configuration structs

Receiver buffer classes in `modules/video_coding`

PacketBuffer - (re-)combines RTP packets into frames
RtpFrameReferenceFinder - determines dependencies between frames based on information in the RTP header, payload header and RTP extensions
FrameBuffer - order frames based on their dependencies to be fed to the decoder

Video coding in WebRTC

Introduction to layered video coding

Block-based hybrid video coding

Frame types

Single-layer coding

Scalable video coding

Simulcast

Overview of implementation in modules/video_coding

Built-in software codecs in modules/video_coding/codecs

Video codec test framework in modules/video_coding/codecs/test

SVC helper classes in modules/video_coding/svc

Utility classes in modules/video_coding/utility

General helper classes in modules/video_coding

Receiver buffer classes in modules/video_coding