PeerConnection level framework fixture architecture

Overview

The main implementation of webrtc::webrtc_pc_e2e::PeerConnectionE2EQualityTestFixture is webrtc::webrtc_pc_e2e::PeerConnectionE2EQualityTest. Internally it owns the next main pieces:

Also it keeps a reference to webrtc::TimeController, which is used to create all required threads, task queues, task queue factories and time related objects.

TestPeer

Call participants are represented by instances of TestPeer object. TestPeerFactory is used to create them. TestPeer owns all instances related to the webrtc::PeerConnection, including required listeners and callbacks. Also it provides an API to do offer/answer exchange and ICE candidate exchange. For this purposes internally it uses an instance of webrtc::PeerConnectionWrapper.

The TestPeer also owns the PeerConnection worker thread. The signaling thread for all PeerConnection's is owned by PeerConnectionE2EQualityTestFixture and shared between all participants in the call. The network thread is owned by the network layer (it maybe either emulated network provided by Network Emulation Framework or network thread and rtc::NetworkManager provided by user) and provided when peer is added to the fixture via AddPeer(...) API.

GetStats API based metrics reporters

PeerConnectionE2EQualityTestFixture gives the user ability to provide different QualityMetricsReporters which will listen for PeerConnection GetStats API. Then such reporters will be able to report various metrics that user wants to measure.

PeerConnectionE2EQualityTestFixture itself also uses this mechanism to measure:

Also framework provides a StatsBasedNetworkQualityMetricsReporter to measure network related WebRTC metrics and print debug raw emulated network statistic. This reporter should be added by user via AddQualityMetricsReporter(...) API if requried.

Internally stats gathering is done by StatsPoller. Stats are requested once per second for each PeerConnection and then resulted object is provided into each stats listener.

Offer/Answer exchange

PeerConnectionE2EQualityTest provides ability to test Simulcast and SVC for video. These features aren't supported by P2P call and in general requires a Selective Forwarding Unit (SFU). So special logic is applied to mimic SFU behavior in P2P call. This logic is located inside SignalingInterceptor, QualityAnalyzingVideoEncoder and QualityAnalyzingVideoDecoder and consist of SDP modification during offer/answer exchange and special handling of video frames from unrelated Simulcast/SVC streams during decoding.

Simulcast

In case of Simulcast we have a video track, which internally contains multiple video streams, for example low resolution, medium resolution and high resolution. WebRTC client doesn't support receiving an offer with multiple streams in it, because usually SFU will keep only single stream for the client. To bypass it framework will modify offer by converting a single track with three video streams into three independent video tracks. Then sender will think that it send simulcast, but receiver will think that it receives 3 independent tracks.

To achieve such behavior some extra tweaks are required:

  • MID RTP header extension from original offer have to be removed
  • RID RTP header extension from original offer is replaced with MID RTP header extension, so the ID that sender uses for RID on receiver will be parsed as MID.
  • Answer have to be modified in the opposite way.

Described modifications are illustrated on the picture below.

VP8 Simulcast offer modification

The exchange will look like this:

  1. Alice creates an offer
  2. Alice sets offer as local description
  3. Do described offer modification
  4. Alice sends modified offer to Bob
  5. Bob sets modified offer as remote description
  6. Bob creates answer
  7. Bob sets answer as local description
  8. Do reverse modifications on answer
  9. Bob sends modified answer to Alice
  10. Alice sets modified answer as remote description

Such mechanism put a constraint that RTX streams are not supported, because they don't have RID RTP header extension in their packets.

SVC

In case of SVC the framework will update the sender's offer before even setting it as local description on the sender side. Then no changes to answer will be required.

ssrc is a 32 bit random value that is generated in RTP to denote a specific source used to send media in an RTP connection. In original offer video track section will look like this:

m=video 9 UDP/TLS/RTP/SAVPF 98 100 99 101
...
a=ssrc-group:FID <primary ssrc> <retransmission ssrc>
a=ssrc:<primary ssrc> cname:...
....
a=ssrc:<retransmission ssrc> cname:...
....

To enable SVC for such video track framework will add extra ssrcs for each SVC stream that is required like this:

a=ssrc-group:FID <Low resolution primary ssrc> <Low resolution retransmission ssrc>
a=ssrc:<Low resolution primary ssrc> cname:...
....
a=ssrc:<Low resolution retransmission ssrc> cname:....
...
a=ssrc-group:FID <Medium resolution primary ssrc> <Medium resolution retransmission ssrc>
a=ssrc:<Medium resolution primary ssrc> cname:...
....
a=ssrc:<Medium resolution retransmission ssrc> cname:....
...
a=ssrc-group:FID <High resolution primary ssrc> <High resolution retransmission ssrc>
a=ssrc:<High resolution primary ssrc> cname:...
....
a=ssrc:<High resolution retransmission ssrc> cname:....
...

The next line will also be added to the video track section of the offer:

a=ssrc-group:SIM <Low resolution primary ssrc> <Medium resolution primary ssrc> <High resolution primary ssrc>

It will tell PeerConnection that this track should be configured as SVC. It utilize WebRTC Plan B offer structure to achieve SVC behavior, also it modifies offer before setting it as local description which violates WebRTC standard. Also it adds limitations that on lossy networks only top resolution streams can be analyzed, because WebRTC won't try to restore low resolution streams in case of loss, because it still receives higher stream.

Handling in encoder/decoder

In the encoder, the framework for each encoded video frame will propagate information requried for the fake SFU to know if it belongs to an interesting simulcast stream/spatial layer of if it should be “discarded”.

On the decoder side frames that should be “discarded” by fake SFU will be auto decoded into single pixel images and only the interesting simulcast stream/spatial layer will go into real decoder and then will be analyzed.