Optimize RoboCaller::AddReceiver() for code size

Essentially, instead of having the inlined UntypedFunction::Create(f)
return an UntypedFunction which is then passed as an argument to
non-inlined RoboCallerReceivers::AddReceiverImpl(), we let
UntypedFunction::PrepareArgs(f) return a few different kinds of
trivial structs (depending on what sort of type f has) which are
passed as arguments to non-inlined RoboCallerReceivers::AddReceiver()
(which then converts them to UntypedFunction by calling
UntypedFunction::Create()). These structs are smaller than
UntypedFunction and optimized for argument passing, so many fewer
instructions are needed.

Example code:

  struct Foo {
    void Receive(int, float, int, float);
    void TestAddLambdaReceiver();
    webrtc::RoboCaller<int, float, int, float> rc;
  };

  void Foo::TestAddLambdaReceiver() {
    rc.AddReceiver([this](int a, float b, int c, float d){
        Receive(a, b, c, d);});
  }

On arm32, we get before this CL:

  Foo::TestAddLambdaReceiver():
        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #24
        ldr     r1, .LCPI0_0
        mov     r2, #0
        stm     sp, {r0, r2}
        add     r1, pc, r1
        str     r2, [sp, #20]
        str     r1, [sp, #16]
        mov     r1, sp
        bl      RoboCallerReceivers::AddReceiverImpl
        mov     sp, r11
        pop     {r11, pc}
  .LCPI0_0:
        .long   CallInlineStorage<Foo::TestAddLambdaReceiver()::$_0>
  CallInlineStorage<Foo::TestAddLambdaReceiver()::$_0>:
        ldr     r0, [r0]
        b       Foo::Receive(int, float, int, float)

After this CL:

  Foo::TestAddLambdaReceiver():
        ldr     r3, .LCPI0_0
        mov     r2, r0
        add     r3, pc, r3
        b       RoboCallerReceivers::AddReceiver<1u>
  .LCPI0_0:
        .long   CallInlineStorage<Foo::TestAddLambdaReceiver()::$_0>
  CallInlineStorage<Foo::TestAddLambdaReceiver()::$_0>:
        ldr     r0, [r0]
        b       Foo::Receive(int, float, int, float)

(Symbol names abbreviated so that they'll fit on one line.)

So a reduction from 64 to 28 bytes. The improvements on arm64 and
x86_64 are similar.

Bug: webrtc:11943
Change-Id: I93fbba083be0235051c3279d3e3f6852a4a9fdad
Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/185960
Commit-Queue: Karl Wiberg <kwiberg@webrtc.org>
Reviewed-by: Mirko Bonadei <mbonadei@webrtc.org>
Cr-Commit-Position: refs/heads/master@{#32244}
diff --git a/rtc_base/BUILD.gn b/rtc_base/BUILD.gn
index a09c06e..489a5c6 100644
--- a/rtc_base/BUILD.gn
+++ b/rtc_base/BUILD.gn
@@ -57,6 +57,7 @@
     ":untyped_function",
     "../api:function_view",
     "system:assume",
+    "system:inline",
   ]
 }