Allow full SVC to reference T0 frame only after it has been encoded

The VP9 encoder may drop a frame internally which will not advance the
frame pattern. Consider the following scenario where only spatial layer
0 and temporal layer 0 is active:

1. Key frame encoded
2. Spatial layer 1 is activated
3. Delta T0 dropped
4. Delta T0 encoded

No S1T0 frame is encoded in (1) since it's not active. When
NextFrameConfig is called in (3) it will say that future frames may
reference T0 on both S0 and S1, but it's then dropped.

On step (4), the SVC controller essentially thinks it's encoding a new
picture and will happily reference the T0 on what it thinks is the first
delta frame. However, this is actually still the key frame and since
there was no S1T0 frame produced it will reference an invalid buffer.

To fix this, only say it's possible to reference a T0 frame after it has
been successfully encoded.

Bug: webrtc:11999, webrtc:13142, chromium:1178444
Change-Id: Iab3d2042ce0b3fa7d952b2831d1a36b1a6613a86
Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/231695
Reviewed-by: Erik Språng <sprang@webrtc.org>
Reviewed-by: Danil Chapovalov <danilchap@webrtc.org>
Commit-Queue: Emil Lundmark <lndmrk@webrtc.org>
Cr-Commit-Position: refs/heads/main@{#34982}
diff --git a/modules/video_coding/svc/scalability_structure_full_svc.cc b/modules/video_coding/svc/scalability_structure_full_svc.cc
index 9f66a5a..7177427 100644
--- a/modules/video_coding/svc/scalability_structure_full_svc.cc
+++ b/modules/video_coding/svc/scalability_structure_full_svc.cc
@@ -173,7 +173,6 @@
           config.Update(BufferIndex(sid, /*tid=*/0));
         }
 
-        can_reference_t0_frame_for_spatial_id_.set(sid);
         spatial_dependency_buffer_id = BufferIndex(sid, /*tid=*/0);
       }
       break;
@@ -256,6 +255,9 @@
   // pattern defered here from the `NextFrameConfig`.
   // In particular creating VP9 references rely on this behavior.
   last_pattern_ = static_cast<FramePattern>(config.Id());
+  if (config.TemporalId() == 0) {
+    can_reference_t0_frame_for_spatial_id_.set(config.SpatialId());
+  }
   if (config.TemporalId() == 1) {
     can_reference_t1_frame_for_spatial_id_.set(config.SpatialId());
   }
diff --git a/modules/video_coding/svc/scalability_structure_full_svc_unittest.cc b/modules/video_coding/svc/scalability_structure_full_svc_unittest.cc
index 9ccbe21..1c0a8be 100644
--- a/modules/video_coding/svc/scalability_structure_full_svc_unittest.cc
+++ b/modules/video_coding/svc/scalability_structure_full_svc_unittest.cc
@@ -21,6 +21,27 @@
 using ::testing::IsEmpty;
 using ::testing::SizeIs;
 
+TEST(ScalabilityStructureL3T3Test, SkipT0FrameByEncoderKeepsReferencesValid) {
+  std::vector<GenericFrameInfo> frames;
+  ScalabilityStructureL3T3 structure;
+  ScalabilityStructureWrapper wrapper(structure);
+
+  // Only S0T0 decode target is enabled.
+  structure.OnRatesUpdated(EnableTemporalLayers(/*s0=*/1, /*s1=*/0));
+  // Encoder generates S0T0 key frame.
+  wrapper.GenerateFrames(/*num_temporal_units=*/1, frames);
+  EXPECT_THAT(frames, SizeIs(1));
+  // Spatial layers 1 is enabled.
+  structure.OnRatesUpdated(EnableTemporalLayers(/*s0=*/1, /*s1=*/1));
+  // Encoder tries to generate S0T0 and S1T0 delta frames but they are dropped.
+  structure.NextFrameConfig(/*restart=*/false);
+  // Encoder successfully generates S0T0 and S1T0 delta frames.
+  wrapper.GenerateFrames(/*num_temporal_units=*/1, frames);
+  EXPECT_THAT(frames, SizeIs(3));
+
+  EXPECT_TRUE(wrapper.FrameReferencesAreValid(frames));
+}
+
 TEST(ScalabilityStructureL3T3Test, SkipS1T1FrameKeepsStructureValid) {
   ScalabilityStructureL3T3 structure;
   ScalabilityStructureWrapper wrapper(structure);
diff --git a/test/fuzzers/BUILD.gn b/test/fuzzers/BUILD.gn
index 8300fca..019b845 100644
--- a/test/fuzzers/BUILD.gn
+++ b/test/fuzzers/BUILD.gn
@@ -648,6 +648,7 @@
       "//third_party/abseil-cpp/absl/base:core_headers",
       "//third_party/abseil-cpp/absl/container:inlined_vector",
     ]
+    seed_corpus = "corpora/vp9-encoder-references-corpus"
     defines = [ "RTC_ENABLE_VP9" ]
   }
 }
diff --git a/test/fuzzers/corpora/vp9-encoder-references-corpus/0cee4d5fd2905dc1fb2979f10a9724265b7075e2 b/test/fuzzers/corpora/vp9-encoder-references-corpus/0cee4d5fd2905dc1fb2979f10a9724265b7075e2
new file mode 100644
index 0000000..febe4ad
--- /dev/null
+++ b/test/fuzzers/corpora/vp9-encoder-references-corpus/0cee4d5fd2905dc1fb2979f10a9724265b7075e2
Binary files differ
diff --git a/test/fuzzers/corpora/vp9-encoder-references-corpus/a8b3fb7be82395c9462684c766841d668dc0029f b/test/fuzzers/corpora/vp9-encoder-references-corpus/a8b3fb7be82395c9462684c766841d668dc0029f
new file mode 100644
index 0000000..1bd0937
--- /dev/null
+++ b/test/fuzzers/corpora/vp9-encoder-references-corpus/a8b3fb7be82395c9462684c766841d668dc0029f
Binary files differ