Avoid unnecessary HW video encoder reconfiguration

This change reduces the number of times the Android hardware video
encoder is reconfigured when making an outgoing call. With this change,
the encoder should only be initialized once as opposed to the ~3 times
it happens currently.

Before the fix, the following sequence of events caused the extra
reconfigurations:

 1. After the SetLocalDescription call, the WebRtcVideoSendStream is created.
    All frames from the camera are dropped until the corresponding
    VideoSendStream is created.

 2. SetRemoteDescription() triggers the VideoSendStream creation. At
    this point, the encoder is configured for the first time, with the
    frame dimensions set to a low resolution default (176x144).

 3. When the first video frame is received from the camera after the
    VideoSendStreamIsCreated, the encoder is reconfigured to the correct
    dimensions. If we are using the Android hardware encoder, the default
    configuration is set to encode from a memory buffer (use_surface=false).

 4. When the frame is passed down to the encoder in
    androidmediaencoder_jni.cc EncodeOnCodecThread(), it may be stored in
    a texture instead of a memory buffer. In this case, yet another
    reconfiguration takes place to enable encoding from a texture.

 5. Even if the resolution and texture flag were known at the start of
    the call, there would be a reconfiguration involved if the camera is
    rotated (such as when making a call from a phone in portrait orientation).
    The reason for that is that at construction time, WebRtcVideoEngine2
    sets the VideoSinkWants structure parameter to request frames rotated
    by the source; the early frames will then arrive in portrait resolution.
    When the remote description is finally set, if the rotation RTP extension
    is supported by the remote receiver, the source is asked to provide
    non-rotated frames. The very next frame will then arrive in landscape
    resolution with a non-zero rotation value to be applied by the receiver.
    Since the encoder was configured with the last (portrait) frame size,
    it's going to need to be reconfigured again.

The fix makes the following changes:

 1. WebRtcVideoSendStream::OnFrame() now caches the last seen frame
    dimensions, and whether the frame was stored in a texture.

 2. When the encoder is configured the first time
    (WebRtcVideoSendStream::SetCodec()) - the last seen frame dimensions
    are used instead of the default dimensions.

 3. A flag that indicates if encoding is to be done from a texture has
    been added to the webrtc::VideoStream and webrtc::VideoCodec structs,
    and it's been wired up to be passed down all the way to the JNI code in
    androidmediaencoder_jni.cc.

 4. MediaCodecVideoEncoder::InitEncode is now reading the is_surface
    flag from the VideoCodec structure instead of guessing the default as
    false. This way we end up with the correct encoder configuration the
    first time around.

 5. WebRtcVideoSendStream now takes an optimistic guess and requests non-
    rotated frames when the supported RtpExtensions list is not available.
    This makes the "early" frames arrive non-rotated, and the cached dimensions
    will be correct for the common case when the rotation extension is supported.
    If the other side is an older endpoint which does not support rotation,
    the encoder will have to be reconfigured - but it's better to penalize the
    uncommon case rather than the common one.

Review-Url: https://codereview.webrtc.org/2067103002
Cr-Original-Commit-Position: refs/heads/master@{#13173}
Cr-Mirrored-From: https://chromium.googlesource.com/external/webrtc
Cr-Mirrored-Commit: 3abb7644001d264c402184705950111d3fb8f181
diff --git a/common_types.h b/common_types.h
index 4c77c77..13d0c3f 100644
--- a/common_types.h
+++ b/common_types.h
@@ -701,6 +701,7 @@
   SpatialLayer spatialLayers[kMaxSpatialLayers];
 
   VideoCodecMode      mode;
+  bool                expect_encode_from_texture;
 
   bool operator==(const VideoCodec& other) const = delete;
   bool operator!=(const VideoCodec& other) const = delete;