| # Absolute Capture Time |
| |
| The Absolute Capture Time extension is used to stamp RTP packets with a NTP |
| timestamp showing when the first audio or video frame in a packet was originally |
| captured. The intent of this extension is to provide a way to accomplish |
| audio-to-video synchronization when RTCP-terminating intermediate systems (e.g. |
| mixers) are involved. |
| |
| **Name:** |
| "Absolute Capture Time"; "RTP Header Extension for Absolute Capture Time" |
| |
| **Formal name:** |
| <http://www.webrtc.org/experiments/rtp-hdrext/abs-capture-time> |
| |
| **Status:** |
| This extension is defined here to allow for experimentation. Once experience has |
| shown that it is useful, we intend to make a proposal based on it for |
| standardization in the IETF. |
| |
| Contact <chxg@google.com> for more info. |
| |
| ## RTP header extension format |
| |
| ### Data layout overview |
| Data layout of the shortened version of `abs-capture-time` with a 1-byte header |
| \+ 8 bytes of data: |
| |
| 0 1 2 3 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | ID | len=7 | absolute capture timestamp (bit 0-23) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | absolute capture timestamp (bit 24-55) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | ... (56-63) | |
| +-+-+-+-+-+-+-+-+ |
| |
| Data layout of the extended version of `abs-capture-time` with a 1-byte header + |
| 16 bytes of data: |
| |
| 0 1 2 3 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | ID | len=15| absolute capture timestamp (bit 0-23) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | absolute capture timestamp (bit 24-55) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | ... (56-63) | estimated capture clock offset (bit 0-23) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | estimated capture clock offset (bit 24-55) | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | ... (56-63) | |
| +-+-+-+-+-+-+-+-+ |
| |
| ### Data layout details |
| #### Absolute capture timestamp |
| |
| Absolute capture timestamp is the NTP timestamp of when the first frame in a |
| packet was originally captured. This timestamp MUST be based on the same clock |
| as the clock used to generate NTP timestamps for RTCP sender reports on the |
| capture system. |
| |
| It's not always possible to do an NTP clock readout at the exact moment of when |
| a media frame is captured. A capture system MAY postpone the readout until a |
| more convenient time. A capture system SHOULD have known delays (e.g. from |
| hardware buffers) subtracted from the readout to make the final timestamp as |
| close to the actual capture time as possible. |
| |
| This field is encoded as a 64-bit unsigned fixed-point number with the high 32 |
| bits for the timestamp in seconds and low 32 bits for the fractional part. This |
| is also known as the UQ32.32 format and is what the RTP specification defines as |
| the canonical format to represent NTP timestamps. |
| |
| #### Estimated capture clock offset |
| |
| Estimated capture clock offset is the sender's estimate of the offset between |
| its own NTP clock and the capture system's NTP clock. The sender is here defined |
| as the system that owns the NTP clock used to generate the NTP timestamps for |
| the RTCP sender reports on this stream. The sender system is typically either |
| the capture system or a mixer. |
| |
| This field is encoded as a 64-bit two’s complement **signed** fixed-point number |
| with the high 32 bits for the seconds and low 32 bits for the fractional part. |
| It’s intended to make it easy for a receiver, that knows how to estimate the |
| sender system’s NTP clock, to also estimate the capture system’s NTP clock: |
| |
| Capture NTP Clock = Sender NTP Clock + Capture Clock Offset |
| |
| ### Further details |
| |
| #### Capture system |
| |
| A receiver MUST treat the first CSRC in the CSRC list of a received packet as if |
| it belongs to the capture system. If the CSRC list is empty, then the receiver |
| MUST treat the SSRC as if it belongs to the capture system. Mixers SHOULD put |
| the most prominent CSRC as the first CSRC in a packet’s CSRC list. |
| |
| #### Intermediate systems |
| |
| An intermediate system (e.g. mixer) MAY adjust these timestamps as needed. It |
| MAY also choose to rewrite the timestamps completely, using its own NTP clock as |
| reference clock, if it wants to present itself as a capture system for A/V-sync |
| purposes. |
| |
| #### Timestamp interpolation |
| |
| A sender SHOULD save bandwidth by not sending `abs-capture-time` with every |
| RTP packet. It SHOULD still send them at regular intervals (e.g. every second) |
| to help mitigate the impact of clock drift and packet loss. Mixers SHOULD always |
| send `abs-capture-time` with the first RTP packet after changing capture system. |
| |
| A receiver SHOULD memorize the capture system (i.e. CSRC/SSRC), capture |
| timestamp, and RTP timestamp of the most recently received `abs-capture-time` |
| packet on each received stream. It can then use that information, in combination |
| with RTP timestamps of packets without `abs-capture-time`, to extrapolate |
| missing capture timestamps. |
| |
| Timestamp interpolation works fine as long as there’s reasonably low NTP/RTP |
| clock drift. This is not always true. Senders that detect "jumps" between its |
| NTP and RTP clock mappings SHOULD send `abs-capture-time` with the first RTP |
| packet after such a thing happening. |