| <!-- go/cmark --> |
| <!--* freshness: {owner: 'brandtr' reviewed: '2021-04-15'} *--> |
| |
| # Video coding in WebRTC |
| |
| ## Introduction to layered video coding |
| |
| [Video coding][video-coding-wiki] is the process of encoding a stream of |
| uncompressed video frames into a compressed bitstream, whose bitrate is lower |
| than that of the original stream. |
| |
| ### Block-based hybrid video coding |
| |
| All video codecs in WebRTC are based on the block-based hybrid video coding |
| paradigm, which entails prediction of the original video frame using either |
| [information from previously encoded frames][motion-compensation-wiki] or |
| information from previously encoded portions of the current frame, subtraction |
| of the prediction from the original video, and |
| [transform][transform-coding-wiki] and [quantization][quantization-wiki] of the |
| resulting difference. The output of the quantization process, quantized |
| transform coefficients, is losslessly [entropy coded][entropy-coding-wiki] along |
| with other encoder parameters (e.g., those related to the prediction process) |
| and then a reconstruction is constructed by inverse quantizing and inverse |
| transforming the quantized transform coefficients and adding the result to the |
| prediction. Finally, in-loop filtering is applied and the resulting |
| reconstruction is stored as a reference frame to be used to develop predictions |
| for future frames. |
| |
| ### Frame types |
| |
| When an encoded frame depends on previously encoded frames (i.e., it has one or |
| more inter-frame dependencies), the prior frames must be available at the |
| receiver before the current frame can be decoded. In order for a receiver to |
| start decoding an encoded bitstream, a frame which has no prior dependencies is |
| required. Such a frame is called a "key frame". For real-time-communications |
| encoding, key frames typically compress less efficiently than "delta frames" |
| (i.e., frames whose predictions are derived from previously encoded frames). |
| |
| ### Single-layer coding |
| |
| In 1:1 calls, the encoded bitstream has a single recipient. Using end-to-end |
| bandwidth estimation, the target bitrate can thus be well tailored for the |
| intended recipient. The number of key frames can be kept to a minimum and the |
| compressability of the stream can be maximized. One way of achiving this is by |
| using "single-layer coding", where each delta frame only depends on the frame |
| that was most recently encoded. |
| |
| ### Scalable video coding |
| |
| In multiway conferences, on the other hand, the encoded bitstream has multiple |
| recipients each of whom may have different downlink bandwidths. In order to |
| tailor the encoded bitstreams to a heterogeneous network of receivers, |
| [scalable video coding][svc-wiki] can be used. The idea is to introduce |
| structure into the dependency graph of the encoded bitstream, such that _layers_ of |
| the full stream can be decoded using only available lower layers. This structure |
| allows for a [selective forwarding unit][sfu-webrtc-glossary] to discard upper |
| layers of the of the bitstream in order to achieve the intended downlink |
| bandwidth. |
| |
| There are multiple types of scalability: |
| |
| * _Temporal scalability_ are layers whose framerate (and bitrate) is lower than that of the upper layer(s) |
| * _Spatial scalability_ are layers whose resolution (and bitrate) is lower than that of the upper layer(s) |
| * _Quality scalability_ are layers whose bitrate is lower than that of the upper layer(s) |
| |
| WebRTC supports temporal scalability for `VP8`, `VP9` and `AV1`, and spatial |
| scalability for `VP9` and `AV1`. |
| |
| ### Simulcast |
| |
| Simulcast is another approach for multiway conferencing, where multiple |
| _independent_ bitstreams are produced by the encoder. |
| |
| In cases where multiple encodings of the same source are required (e.g., uplink |
| transmission in a multiway call), spatial scalability with inter-layer |
| prediction generally offers superior coding efficiency compared with simulcast. |
| When a single encoding is required (e.g., downlink transmission in any call), |
| simulcast generally provides better coding efficiency for the upper spatial |
| layers. The `K-SVC` concept, where spatial inter-layer dependencies are only |
| used to encode key frames, for which inter-layer prediction is typically |
| significantly more effective than it is for delta frames, can be seen as a |
| compromise between full spatial scalability and simulcast. |
| |
| ## Overview of implementation in `modules/video_coding` |
| |
| Given the general introduction to video coding above, we now describe some |
| specifics of the [`modules/video_coding`][modules-video-coding] folder in WebRTC. |
| |
| ### Built-in software codecs in [`modules/video_coding/codecs`][modules-video-coding-codecs] |
| |
| This folder contains WebRTC-specific classes that wrap software codec |
| implementations for different video coding standards: |
| |
| * [libaom][libaom-src] for [AV1][av1-spec] |
| * [libvpx][libvpx-src] for [VP8][vp8-spec] and [VP9][vp9-spec] |
| * [OpenH264][openh264-src] for [H.264 constrained baseline profile][h264-spec] |
| |
| Users of the library can also inject their own codecs, using the |
| [VideoEncoderFactory][video-encoder-factory-interface] and |
| [VideoDecoderFactory][video-decoder-factory-interface] interfaces. This is how |
| platform-supported codecs, such as hardware backed codecs, are implemented. |
| |
| ### Video codec test framework in [`modules/video_coding/codecs/test`][modules-video-coding-codecs-test] |
| |
| This folder contains a test framework that can be used to evaluate video quality |
| performance of different video codec implementations. |
| |
| ### SVC helper classes in [`modules/video_coding/svc`][modules-video-coding-svc] |
| |
| * [`ScalabilityStructure*`][scalabilitystructure] - different |
| [standardized scalability structures][scalability-structure-spec] |
| * [`ScalableVideoController`][scalablevideocontroller] - provides instructions to the video encoder how |
| to create a scalable stream |
| * [`SvcRateAllocator`][svcrateallocator] - bitrate allocation to different spatial and temporal |
| layers |
| |
| ### Utility classes in [`modules/video_coding/utility`][modules-video-coding-utility] |
| |
| * [`FrameDropper`][framedropper] - drops incoming frames when encoder systematically |
| overshoots its target bitrate |
| * [`FramerateController`][frameratecontroller] - drops incoming frames to achieve a target framerate |
| * [`QpParser`][qpparser] - parses the quantization parameter from a bitstream |
| * [`QualityScaler`][qualityscaler] - signals when an encoder generates encoded frames whose |
| quantization parameter is outside the window of acceptable values |
| * [`SimulcastRateAllocator`][simulcastrateallocator] - bitrate allocation to simulcast layers |
| |
| ### General helper classes in [`modules/video_coding`][modules-video-coding] |
| |
| * [`FecControllerDefault`][feccontrollerdefault] - provides a default implementation for rate |
| allocation to [forward error correction][fec-wiki] |
| * [`VideoCodecInitializer`][videocodecinitializer] - converts between different encoder configuration |
| structs |
| |
| ### Receiver buffer classes in [`modules/video_coding`][modules-video-coding] |
| |
| * [`PacketBuffer`][packetbuffer] - (re-)combines RTP packets into frames |
| * [`RtpFrameReferenceFinder`][rtpframereferencefinder] - determines dependencies between frames based on information in the RTP header, payload header and RTP extensions |
| * [`FrameBuffer`][framebuffer] - order frames based on their dependencies to be fed to the decoder |
| |
| [video-coding-wiki]: https://en.wikipedia.org/wiki/Video_coding_format |
| [motion-compensation-wiki]: https://en.wikipedia.org/wiki/Motion_compensation |
| [transform-coding-wiki]: https://en.wikipedia.org/wiki/Transform_coding |
| [motion-vector-wiki]: https://en.wikipedia.org/wiki/Motion_vector |
| [mpeg-wiki]: https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group |
| [svc-wiki]: https://en.wikipedia.org/wiki/Scalable_Video_Coding |
| [sfu-webrtc-glossary]: https://webrtcglossary.com/sfu/ |
| [libvpx-src]: https://chromium.googlesource.com/webm/libvpx/ |
| [libaom-src]: https://aomedia.googlesource.com/aom/ |
| [openh264-src]: https://github.com/cisco/openh264 |
| [vp8-spec]: https://tools.ietf.org/html/rfc6386 |
| [vp9-spec]: https://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331-draft.pdf |
| [av1-spec]: https://aomediacodec.github.io/av1-spec/ |
| [h264-spec]: https://www.itu.int/rec/T-REC-H.264-201906-I/en |
| [video-encoder-factory-interface]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video_codecs/video_encoder_factory.h;l=27;drc=afadfb24a5e608da6ae102b20b0add53a083dcf3 |
| [video-decoder-factory-interface]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video_codecs/video_decoder_factory.h;l=27;drc=49c293f03d8f593aa3aca282577fcb14daa63207 |
| [scalability-structure-spec]: https://w3c.github.io/webrtc-svc/#scalabilitymodes* |
| [fec-wiki]: https://en.wikipedia.org/wiki/Error_correction_code#Forward_error_correction |
| [entropy-coding-wiki]: https://en.wikipedia.org/wiki/Entropy_encoding |
| [modules-video-coding]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/ |
| [modules-video-coding-codecs]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/codecs/ |
| [modules-video-coding-codecs-test]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/codecs/test/ |
| [modules-video-coding-svc]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/ |
| [modules-video-coding-utility]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/ |
| [scalabilitystructure]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/create_scalability_structure.h?q=CreateScalabilityStructure |
| [scalablevideocontroller]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/scalable_video_controller.h?q=ScalableVideoController |
| [svcrateallocator]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/svc_rate_allocator.h?q=SvcRateAllocator |
| [framedropper]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/frame_dropper.h?q=FrameDropper |
| [frameratecontroller]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/framerate_controller.h?q=FramerateController |
| [qpparser]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/qp_parser.h?q=QpParser |
| [qualityscaler]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/quality_scaler.h?q=QualityScaler |
| [simulcastrateallocator]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/simulcast_rate_allocator.h?q=SimulcastRateAllocator |
| [feccontrollerdefault]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/fec_controller_default.h?q=FecControllerDefault |
| [videocodecinitializer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/include/video_codec_initializer.h?q=VideoCodecInitializer |
| [packetbuffer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/packet_buffer.h?q=PacketBuffer |
| [rtpframereferencefinder]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/rtp_frame_reference_finder.h?q=RtpFrameReferenceFinder |
| [framebuffer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video/frame_buffer.h |
| [quantization-wiki]: https://en.wikipedia.org/wiki/Quantization_(signal_processing) |