[xiph-rtp] P2P Theora Header delivery; why not SDP?

Discussion:

David Barrett

2005-05-11 00:57:25 UTC

Hi, I'm a long fan of Theora, and am starting an RTP implementation with:

http://svn.xiph.org/branches/theora-mmx/doc/draft-kerr-avt-theora-rtp-00.txt

Am I correct in understanding there are only two ways to deliver
configuration headers?

1) "in-band" using RTP
2) "out-of-band" by downloading from a URI specified in the SDP using TCP

If so, how would you recommend implementing this in a P2P setting in light
of lossy UDP transmission and NAT piercing? I see the following
complications:

- With respect to #1, UDP is lossy, and RTP has no standard retransmission
technique. Furthermore, due to NAT-piercing issues, the initial RTP packets
have the highest probability of being lost (because your NAT will block my
RTP packets until you "punch a hole" by sending a packet back to me).

- With respect to #2, TCP cannot pierce NATs near to the same degree as UDP.
Thus option #2 limits the range of deployment to those clients between which
TCP connections can be established.

As such, the following semi-compliant techniques come to mind:

A) Use "in-band" transmission and keep resending the headers until the first
RTCP packet has been received (effectively using to ACK the session). The
spec says "Clients MUST be capable of dealing with periodic re-transmission
of the configuration headers", so this should work in theory, but it
certainly isn't intended.

B) Use "out-of-band" transmission with the help of a third party (publish
the headers to some TCP-enabled third party, and download from there). This
adds centralization to an otherwise decentralized problem, along with its
resultant complications (scalability, authentication, etc.).

C) Use (B), but pre-populate it with a large library of headers from which
clients can index in a read-only fashion. This is better, but only works if
header-generation is deterministic (ie, headers generated with the same
settings are exactly the same). I assume it is -- can anyone confirm this?
In this way, I can just generate my headers locally, and use its CRC32 to
index into the server's library.

A and C seem like the best option to me so far, but both still rather suck
(A is unreliable, and C imposes a central solution into the mix).

What I would prefer (and actually expected but was surprised not to find)
would be a third option where the inputs into the header-generation process
are simply specified in the SDP itself (on the assumption that
header-generation is deterministic from these, and can be computed locally).
For example:

c=IN IP4/6
m=video RTP/AVP 98
a=rtpmap:98 theora/90000
a=fmtp:98 sampling=YCbCr-4:2:2; width=1280; height=720;
header=<URI of configuration header>
a=theora: <frame_width>x<frame_height>;
<offset_x>,<offset_y>;
<width>x<height>;
<fps_numerator>/<fps_denominator>;
<aspect_numerator>/<aspect_denominator>;
<colorspace>;
<target_bitrate>;
<quality>;
<dropframes_p>;
<quick_p>;
<keyframe_auto_p>;
<keyframe_frequency>;
<keyframe_frequency_force>;
<keyframe_mindistance>;
<keyframe_data_target_bitrate>;
<keyframe_auto_threshold>;
<noise_sensitivity>

When spelled out in prose it looks like a lot of data, but in practice it'd
actually look something like:

c=IN IP4/6
m=video RTP/AVP 98
a=rtpmap:98 theora/90000
a=fmtp:98 sampling=YCbCr-4:2:2; width=1280; height=720;
header=<URI of configuration header>
a=theora: 96x64; 0,0; 96x64; 15/1; 0; 45000; 0; 0; 1; 1; 64; 64; 8; 67500;
80; 2

Is this possible? Looking back over the [xiph-rtp] list I see a lot of
discussion about static, cached, and downloadable codebooks, but I don't see
where SDP is mentioned as an option. Has this option already been
considered and discounted?

-david

Ralph Giles

2005-05-11 02:08:26 UTC

Permalink

Post by David Barrett
http://svn.xiph.org/branches/theora-mmx/doc/draft-kerr-avt-theora-rtp-00.txt

Hi! Always good to have feedback from another implementer.

Note that most of the details of that draft have been superceeded.
Unfortunately no one has fed the new design back into creating a new
draft. If you're curious, you can troll through the recent discussions
on the vorbis mapping; all the same issues apply and we intend to make
the drafts as similar as possible.

Post by David Barrett
Am I correct in understanding there are only two ways to deliver
configuration headers?
1) "in-band" using RTP
2) "out-of-band" by downloading from a URI specified in the SDP using TCP

Well, "out-of-band" can be however you want, but there are a couple of
proposals of a how to do the TCP reference in the SDP.

Post by David Barrett
If so, how would you recommend implementing this in a P2P setting in light
of lossy UDP transmission and NAT piercing? I see the following
- With respect to #1, UDP is lossy, and RTP has no standard retransmission
technique. Furthermore, due to NAT-piercing issues, the initial RTP packets
have the highest probability of being lost (because your NAT will block my
RTP packets until you "punch a hole" by sending a packet back to me).

Right, this isn't going to work reliably.

Post by David Barrett
- With respect to #2, TCP cannot pierce NATs near to the same degree as UDP.
Thus option #2 limits the range of deployment to those clients between which
TCP connections can be established.

If you have a p2p infrastructure, can you use that to achieve lossless
out-of-band transmission? Some sort of send+ack over udp like or
outgoing channel to a non-NAT node like you'd use for file transfer?

Post by David Barrett
C) Use (B), but pre-populate it with a large library of headers from which
clients can index in a read-only fashion. This is better, but only works if
header-generation is deterministic (ie, headers generated with the same
settings are exactly the same). I assume it is -- can anyone confirm this?

It's entirely up to the encoder. The current reference implementation
uses a fixed setup for all inputs. This is the same as the VP3 decode
config, so if you control the clients well enough, you could just
standardize on that, and add other,better fixed general sets as they
become available.

Post by David Barrett
In this way, I can just generate my headers locally, and use its CRC32 to
index into the server's library.

Right. We've abandoned the CRC32 because of the risk of collisions
causing random failures. The new draft will still have a setup id
in the RTP payload header, but it is only 16 bits. The idea that this
is an arbitrary mapping between either in-band header packets with the
same id, or something arranged out-of-band e.g. with the SDP.

So, for example, you could put a longer (MD5 or SHA1) hash of the
setup packet in the SDP to indicate to the decoder which one you
used, and then hardwire a set into the clients, so it's the only
one used and no one has to fetch anything.

I guess this is something you'd want to be able to negotiate of in the
future heterogenous clients could choose the best common header.

Post by David Barrett
What I would prefer (and actually expected but was surprised not to find)
would be a third option where the inputs into the header-generation process
are simply specified in the SDP itself (on the assumption that
header-generation is deterministic from these, and can be computed locally).

That's not possible, except in the sense described above. The whold
point is for future encoders to be able to make better choices by
reconfiguring the decoder. This has been very successful with vorbis.

Note, of the following the keyframe_frequency_force is the only
one that actually appears in the info header; the rest are
(confusingly) part of the encoder config api.

Post by David Barrett
<dropframes_p>;
<quick_p>;
<keyframe_auto_p>;
<keyframe_frequency>;
<keyframe_frequency_force>;
<keyframe_mindistance>;
<keyframe_data_target_bitrate>;
<keyframe_auto_threshold>;
<noise_sensitivity>

Anyway, something to chew on I hope. I'd like to hear what you think of
the new draft as an implementor.

Cheers,
-r

Ralph Giles

2005-05-11 04:53:28 UTC

Permalink

Post by David Barrett
Thanks, you and Aaron have given me a good start. Perhaps I'm confused on
- Codebook: The header data generated by the encoder, and required by the
theora_encode_header( )
theora_encode_comment( )
theora_encode_tables( )
- Codebook parameters: The parameters that the encoder uses to generate the
theora_encode_init( )
I was assuming that for a given set of "codebook parameters", there is
exactly one "codebook". Thus I proposed that rather than send over the
codebook (which is big), just send the codebook parameters (which are small)
and generate the codebook "just in time" before decoding.
However, I think you're saying this assumption is wrong, and that there are
many possible codebooks from the same set of codebook parameters. (Ie, the
"codebook parameters" are directly tied to a specific version of the
encoding engine; different versions or different encoders might have
different codebook parameters, or use them in different ways.) If that's
the case, then my proposal obviously won't work.
Is this correct?

That is correct.

To clarify a bit, there are three standard headers in a Theora
bitstream. Each is a separate packet. We refer to:

1. the 'info' or 'ident' header, the output of theora_encode_header()
2. the 'comment' or 'metadata' header, the output of theora_encode_comments()
3. the 'setup' or 'codebook' header, the output of theora_encode_tables()

These are sometimes referred to collectively as 'the codebooks', but
this is obviously imprecise. The spec also allows additional optional
headers, (like an ICC profile) but these must be ignorable and so don't
concern us here.

Of these, 1 and 3 are actually required to properly decode data packets.
The comment header is required for completeness, but the client can
construct an empty (or custom) packet if necessary and substitute. None
of the current implementations actually require it.

So only two of the three have to be transmitted reliably. 1 is what you
were thinking of as the input to theora_encode_init(). It is very small,
and as you and Aaron suggest can be included directly in the SDP. The
idea of this header is the same as with SDP; to identify the stream as
Theora, and give the externally interesting parameters like frame size
and rate. The third header is much larger and contains data for the
configurable parts of the decoder: quantizers, huffman tables and so on.
So each can be changed intependently of the other, and an encoder or
decoder need both to function properly.

So yes, while the reference implementation uses the same setup header
for all input, and only varies the info header, other encoders (can)
generate different setup headers based not just on the elements of the
info header, but also on the content itself.

HTH,
-r

Aaron Colwell

2005-05-11 02:50:42 UTC

Permalink

Hi David,

I'll try to address as many of your concerns as I can.

Post by David Barrett
http://svn.xiph.org/branches/theora-mmx/doc/draft-kerr-avt-theora-rtp-00.txt
Am I correct in understanding there are only two ways to deliver
configuration headers?
1) "in-band" using RTP
2) "out-of-band" by downloading from a URI specified in the SDP using TCP

Yes. The second method should be reworded to be something along the lines of
out of band delivery. The URI via SDP mechanism is really only one way to
do things. There are may different other ways that are possible. All you
need is a reliable way to transmit the ident and codebook information reliably
between the two end points. How that is done doesn't really matter.
The SDP work that has been discussed on the list mainly targets the RTSP + RTP
use case.

While it is true that there isn't and IETF standard yet for retransmission,
a draft is in the works.

http://www.ietf.org/internet-drafts/draft-ietf-avt-rtp-retransmission-11.txt

You could also just use RFC 3611 to signal lost packets and the packets
could just be retransmitted.

Post by David Barrett
Furthermore, due to NAT-piercing issues, the initial RTP packets
have the highest probability of being lost (because your NAT will block my
RTP packets until you "punch a hole" by sending a packet back to me).

What NAT scenario are you intending to support? Can both peers be behind a
NAT? If so then usually something like STUN is used deal with the NAT
traversal problems. Your description of a NAT traversal problem confuses me.
It is usually the client behind the NAT that does the "punch a hole". This
should be done BEFORE media is sent so that you don't lose anything. Your
basically telling the NAT how to route packets from the outside to you. If
both peers are behind a NAT you'll need to do something like what STUN does
because there is no way to know for sure what your port will be on the other
side of the NAT.

Like I said above #2 does not necessarily impy TCP. It does imply reliable
delivery between the 2 endpoints. I'm assuming that you already have some sort
of protocol to communicate between the peers. I'm also assuming that it is
reliable in some form. If that is the case that you can transmit the codebook
and ident info over that. The URL model that has been discussed on the
mailing list basically just allows you to specify where the codebook and ident
info is located. If both peers are behind NATs then they would likely have to
post their codebook and ident info to a server outside their NATs. Then they
could use HTTP to retrieve eachothers info.

One other possibility if you are using a fixed set of ident and codebooks is
to use some sort of offer/answer model. The ident info is small enough that
it could be sent as it is. For the codebooks you could just send an MD5 hash
of the codebook. Once they agreed on the codebook to use your done. You don't
need to send the codebook because the negotiation of MD5 hashes told each
peer which codebook to use.

Post by David Barrett
A) Use "in-band" transmission and keep resending the headers until the first
RTCP packet has been received (effectively using to ACK the session). The
spec says "Clients MUST be capable of dealing with periodic re-transmission
of the configuration headers", so this should work in theory, but it
certainly isn't intended.

This solution is fine. Reception of an RTCP packet doesn't imply that the
codebook was received though. You would either need to send RFC 3611
packets so that you could signal what packets have been received, or just
periodically send the info throughout the duration of the stream.

Post by David Barrett
B) Use "out-of-band" transmission with the help of a third party (publish
the headers to some TCP-enabled third party, and download from there). This
adds centralization to an otherwise decentralized problem, along with its
resultant complications (scalability, authentication, etc.).

This would only be needed in the worst case scenario where both peers were
behind a NAT and their NATs assigned outside ports based on
<src IP, src port, dest IP, dest port> tuples. In that case you have to use
a third party anyways. If you use STUN, then you don't have to worry about
this because it takes care of establishing the link between the peers and
it already establishes an outside third party, the STUN server.

I think it would be good to have a central codebook server. This way you
could have a well known source for codebooks. It could be mirrored and
existing HTTP proxy and caching infastructure could help with scalability.
Personally I'd use MD5 or SHA for the codebook hash just to be extra sure that
the IDs are unique.

Post by David Barrett
A and C seem like the best option to me so far, but both still rather suck
(A is unreliable, and C imposes a central solution into the mix).
What I would prefer (and actually expected but was surprised not to find)
would be a third option where the inputs into the header-generation process
are simply specified in the SDP itself (on the assumption that
header-generation is deterministic from these, and can be computed locally).

I'm not sure if you are asking whether a finite set of parameters will dictate
what the codebook will be. Right now in the reference code the same codebook
is used no matter what the encoding parameters are. This will likely not
be true in the future. Different encoders may accept different parameters
for encoding and make codebook selections based on different criteria. That
is why the codebook needs to be sent in the first place. It IS the minimal
information needed to tell the other side how the video is encoded.

Post by David Barrett
c=IN IP4/6
m=video RTP/AVP 98
a=rtpmap:98 theora/90000
a=fmtp:98 sampling=YCbCr-4:2:2; width=1280; height=720;
header=<URI of configuration header>
a=theora: <frame_width>x<frame_height>;
<offset_x>,<offset_y>;
<width>x<height>;
<fps_numerator>/<fps_denominator>;
<aspect_numerator>/<aspect_denominator>;
<colorspace>;
<target_bitrate>;
<quality>;
<dropframes_p>;
<quick_p>;
<keyframe_auto_p>;
<keyframe_frequency>;
<keyframe_frequency_force>;
<keyframe_mindistance>;
<keyframe_data_target_bitrate>;
<keyframe_auto_threshold>;
<noise_sensitivity>
When spelled out in prose it looks like a lot of data, but in practice it'd
c=IN IP4/6
m=video RTP/AVP 98
a=rtpmap:98 theora/90000
a=fmtp:98 sampling=YCbCr-4:2:2; width=1280; height=720;
header=<URI of configuration header>
a=theora: 96x64; 0,0; 96x64; 15/1; 0; 45000; 0; 0; 1; 1; 64; 64; 8; 67500;
80; 2
Is this possible? Looking back over the [xiph-rtp] list I see a lot of
discussion about static, cached, and downloadable codebooks, but I don't see
where SDP is mentioned as an option. Has this option already been
considered and discounted?

You could do this for the ident info because that data is small, but the
codebook itself would be too large to put in the SDP. That is why we have been
discussing the downloading mechanisms. If I were to put the ident info in
the SDP I'd just Bin64 or hex encode the ident packet. It would just make
it more compact and eliminates a translation step.

I hope this provided answers for most of your concerns.

Aaron

Post by David Barrett
-david
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

David Barrett

2005-05-11 03:31:15 UTC

Permalink

Thanks, you and Aaron have given me a good start. Perhaps I'm confused on
terminology. I'll define:

- Codebook: The header data generated by the encoder, and required by the
decoder. In other words, the sum of the Ogg packets output by:
theora_encode_header( )
theora_encode_comment( )
theora_encode_tables( )

- Codebook parameters: The parameters that the encoder uses to generate the
codebook. In other words, the input to:
theora_encode_init( )

I was assuming that for a given set of "codebook parameters", there is
exactly one "codebook". Thus I proposed that rather than send over the
codebook (which is big), just send the codebook parameters (which are small)
and generate the codebook "just in time" before decoding.

However, I think you're saying this assumption is wrong, and that there are
many possible codebooks from the same set of codebook parameters. (Ie, the
"codebook parameters" are directly tied to a specific version of the
encoding engine; different versions or different encoders might have
different codebook parameters, or use them in different ways.) If that's
the case, then my proposal obviously won't work.

Is this correct?

-david

-----Original Message-----
Sent: Tuesday, May 10, 2005 9:09 PM
To: David Barrett
Subject: Re: [xiph-rtp] P2P Theora Header delivery; why not SDP?

Post by David Barrett
Hi, I'm a long fan of Theora, and am starting an RTP implementation
http://svn.xiph.org/branches/theora-mmx/doc/draft-kerr-avt-theora-rtp-

00.txt
Hi! Always good to have feedback from another implementer.
Note that most of the details of that draft have been superceeded.
Unfortunately no one has fed the new design back into creating a new
draft. If you're curious, you can troll through the recent discussions
on the vorbis mapping; all the same issues apply and we intend to make
the drafts as similar as possible.

TCP
Well, "out-of-band" can be however you want, but there are a couple of
proposals of a how to do the TCP reference in the SDP.

Post by David Barrett
If so, how would you recommend implementing this in a P2P setting in

light

Post by David Barrett
of lossy UDP transmission and NAT piercing? I see the following
- With respect to #1, UDP is lossy, and RTP has no standard

retransmission

Post by David Barrett
technique. Furthermore, due to NAT-piercing issues, the initial RTP

packets

Post by David Barrett
have the highest probability of being lost (because your NAT will block

Post by David Barrett
RTP packets until you "punch a hole" by sending a packet back to me).

Right, this isn't going to work reliably.

Post by David Barrett
- With respect to #2, TCP cannot pierce NATs near to the same degree as

UDP.

Post by David Barrett
Thus option #2 limits the range of deployment to those clients between

which

Post by David Barrett
TCP connections can be established.

Post by David Barrett
C) Use (B), but pre-populate it with a large library of headers from

which

Post by David Barrett
clients can index in a read-only fashion. This is better, but only

works if

Post by David Barrett
header-generation is deterministic (ie, headers generated with the same
settings are exactly the same). I assume it is -- can anyone confirm

this?
It's entirely up to the encoder. The current reference implementation
uses a fixed setup for all inputs. This is the same as the VP3 decode
config, so if you control the clients well enough, you could just
standardize on that, and add other,better fixed general sets as they
become available.

Post by David Barrett
In this way, I can just generate my headers locally, and use its CRC32

Post by David Barrett
index into the server's library.

Right. We've abandoned the CRC32 because of the risk of collisions
causing random failures. The new draft will still have a setup id
in the RTP payload header, but it is only 16 bits. The idea that this
is an arbitrary mapping between either in-band header packets with the
same id, or something arranged out-of-band e.g. with the SDP.
So, for example, you could put a longer (MD5 or SHA1) hash of the
setup packet in the SDP to indicate to the decoder which one you
used, and then hardwire a set into the clients, so it's the only
one used and no one has to fetch anything.
I guess this is something you'd want to be able to negotiate of in the
future heterogenous clients could choose the best common header.

Post by David Barrett
What I would prefer (and actually expected but was surprised not to

find)

Post by David Barrett
would be a third option where the inputs into the header-generation

process

Post by David Barrett
are simply specified in the SDP itself (on the assumption that
header-generation is deterministic from these, and can be computed

locally).
That's not possible, except in the sense described above. The whold
point is for future encoders to be able to make better choices by
reconfiguring the decoder. This has been very successful with vorbis.

Note, of the following the keyframe_frequency_force is the only
one that actually appears in the info header; the rest are
(confusingly) part of the encoder config api.

Anyway, something to chew on I hope. I'd like to hear what you think of
the new draft as an implementor.
Cheers,
-r

David Barrett

2005-05-11 05:28:30 UTC

Permalink

Excellent detail, thanks.

With all that in mind, chalk up one more vote for standard codebooks
that can be specified in the SDP description.

I mean, I acknowledge the optimal configurability of dynamic codebooks.
But for my needs, I'd prefer a couple pre-configured options that can be
easily selected. I don't really have the expertise to tweak the
settings too closely, so I'd like a small selection of preconfigured
options, like:

- Option #1: High quality, high framerate
- Option #2: Low quality, high framerate
- Option #3: High quality, low framerate
- Option #4: Low quality, low framerate

In the meantime, I'll probably just oversend the header packets inline
(maybe resend with an exponential falloff in frequency) and skip the
complexity of out-of-band codebook delivery.

-david

Post by Ralph Giles

Post by David Barrett
Thanks, you and Aaron have given me a good start. Perhaps I'm
confused on
- Codebook: The header data generated by the encoder, and required by
the
theora_encode_header( )
theora_encode_comment( )
theora_encode_tables( )
- Codebook parameters: The parameters that the encoder uses to
generate the
theora_encode_init( )
I was assuming that for a given set of "codebook parameters", there is
exactly one "codebook". Thus I proposed that rather than send over
the
codebook (which is big), just send the codebook parameters (which are
small)
and generate the codebook "just in time" before decoding.
However, I think you're saying this assumption is wrong, and that
there are
many possible codebooks from the same set of codebook parameters.
(Ie, the
"codebook parameters" are directly tied to a specific version of the
encoding engine; different versions or different encoders might have
different codebook parameters, or use them in different ways.) If
that's
the case, then my proposal obviously won't work.
Is this correct?

That is correct.
To clarify a bit, there are three standard headers in a Theora
1. the 'info' or 'ident' header, the output of theora_encode_header()
2. the 'comment' or 'metadata' header, the output of
theora_encode_comments()
3. the 'setup' or 'codebook' header, the output of
theora_encode_tables()
These are sometimes referred to collectively as 'the codebooks', but
this is obviously imprecise. The spec also allows additional optional
headers, (like an ICC profile) but these must be ignorable and so don't
concern us here.
Of these, 1 and 3 are actually required to properly decode data
packets.
The comment header is required for completeness, but the client can
construct an empty (or custom) packet if necessary and substitute. None
of the current implementations actually require it.
So only two of the three have to be transmitted reliably. 1 is what you
were thinking of as the input to theora_encode_init(). It is very
small,
and as you and Aaron suggest can be included directly in the SDP. The
idea of this header is the same as with SDP; to identify the stream as
Theora, and give the externally interesting parameters like frame size
and rate. The third header is much larger and contains data for the
configurable parts of the decoder: quantizers, huffman tables and so
on.
So each can be changed intependently of the other, and an encoder or
decoder need both to function properly.
So yes, while the reference implementation uses the same setup header
for all input, and only varies the info header, other encoders (can)
generate different setup headers based not just on the elements of the
info header, but also on the content itself.
HTH,
-r