[xiph-rtp] header caching and chaining

Hi,

I am not an expert on protocols, only a consumer with interest in this
field.

Those of us using vorbis streams so far rely on the ability to chain
streams for providing metadata updates. So streaming title and artist
information, etc. Now I don't think we should lose this ability unless
there's a good reason for losing it. Whether this is the best mechanism
for conveying this info is another question, and if it can be carried
without support for chained bitstreams then I guess that's fine.

I do also like the idea of being able to change audio format, as it's a big
plus over MP3, but I realise I'm a voice of 1 and may get shouted down on
this one. Icecast has this nice ability to switch to and from backup
feeds, and I'd be concerned if this was not available in a future RTP mode.

I also presume that scaling down to a lower bitrate version of a stream due
to bandwidth limitations might also be affected if chaining is not
supported.

Just my thoughts,
Geoff.
--
Geoff Shang <***@hitsandpieces.net>
Phone: +61-418-96-5590
MSN: ***@acbradio.org

Make sure your E-mail can be read by everyone!
http://www.betips.net/etc/evilmail.html

Please avoid sending me Word or PowerPoint attachments.
See http://www.fsf.org/philosophy/no-word-attachments.html

Phil Kerr

2005-02-03 12:02:14 UTC

Hi Geoff,

Comments are always welcomed.

Cheers

Phil

Post by Geoff Shang
Hi,
I am not an expert on protocols, only a consumer with interest in this
field.
Those of us using vorbis streams so far rely on the ability to chain
streams for providing metadata updates. So streaming title and artist
information, etc. Now I don't think we should lose this ability
unless there's a good reason for losing it. Whether this is the best
mechanism for conveying this info is another question, and if it can
be carried without support for chained bitstreams then I guess that's
fine.
I do also like the idea of being able to change audio format, as it's
a big plus over MP3, but I realise I'm a voice of 1 and may get
shouted down on this one. Icecast has this nice ability to switch to
and from backup feeds, and I'd be concerned if this was not available
in a future RTP mode.
I also presume that scaling down to a lower bitrate version of a
stream due to bandwidth limitations might also be affected if chaining
is not supported.
Just my thoughts,
Geoff.

Ralph Giles

2005-02-03 22:21:54 UTC

Post by Geoff Shang
Those of us using vorbis streams so far rely on the ability to chain
streams for providing metadata updates. So streaming title and artist
information, etc. Now I don't think we should lose this ability unless
there's a good reason for losing it. Whether this is the best mechanism
for conveying this info is another question, and if it can be carried
without support for chained bitstreams then I guess that's fine.

The plan is to replace this with a separate metadata stream through
which the updates can be sent. (We could also use in-band transmission
of metadata header packets.) So this functionality won't go away, but is
outside the scope of the vorbis/theora RTP draft.

We'd kind of always intended to do this anyway. The simple flat
classification the in-stream metadata headers provide is more limited
than what some people want, and the precedence rules for multiplexed
streams are kind of hacky.

The Ogg requirement to transmit new codebooks at a chaining boundary
whenever the metadata changes also causes a nasty bandwidth spike in
very low bitrate streams, which a separate stream also helps with.

Post by Geoff Shang
I do also like the idea of being able to change audio format, as it's a big
plus over MP3, but I realise I'm a voice of 1 and may get shouted down on
this one.

It's not that we don't appreciate the feature, but it's important to
make tradeoffs against implementation complexity.

Post by Geoff Shang
I also presume that scaling down to a lower bitrate version of a stream due
to bandwidth limitations might also be affected if chaining is not
supported.

Correct. Without chaining it works more like an analog feed: you pick
the parameters you want the channel to support and they're fixed for the
duration of the connection.

-r

Phil Kerr

2005-02-03 12:00:28 UTC

Hi Ralph, comments inline.

Post by Ralph Giles
This is based on a conversation derf and I had last week on irc.
There's been some debate about chaining support in RTP transmission. If
Chaining is (much more of) a pain to support than in Ogg
Many streams are live encodings or only use one set of codebooks anyway
Being able to switch bitrates on the fly without the clients having
to reconnect is a nice feature that requires chaining. Not being
able to do this would be a regression for both Real and Icecast.
For theora, being able to at least switch framerate when cutting
between material is a bit win. Resampling and re-encoding like
can be done with audio is prohibitively expensive for the medium
term.
Personally, I come down on the side of no chaining for audio, and yes
chaining for video. Lovely.

I think the most compelling argument you give is the last pro point
concerning re-encoding. But the frequency of re-encoding needed for
audio should be a lot higher so the re-encoding strain for audio may be
as high as for video.

Post by Ralph Giles
Phil's proposal for doing chaining is to have a 32-bit crc of the
bitstream header in each RTP packet. This lets the decoder know
when the a 'chain boundary' has passed, and a new context applies.
Using a CRC lets the client cache headers, optimizing latency and
bandwidth usage. Headers can be sent in-band (not recommended) or
packaged up for out-of-band retrieval, broadcast or otherwise
distributed.
So far, so good. But what exactly should be hashed. Derf and I have
verified that with theora the initial 'info' header with the frame
size, rate, and so on, as passed in the SDP is completely orthogonal
to the contents of the third 'setup' header. There may be an issue
in the future when we add interlaced support, but orthogonality
can in general be restored by using that flag from the info header
as an additional cache key. I believe the two headers are also
orthogonal in vorbis, though I haven't double checked.
So, the cleanest thing would be to hash only the setup header and
rely on the SDP for the info header details.
BUT, this means that things like sample rate, image size, and so
on can't change mid-stream, only the codebooks themselves. We've
only supported one of the two motivations for chaining. This may
still be a reasonable compromise: it preserves quality/bandwidth
scalability while simplifying a lot of the things people have
complained about with chaining because their code assumed image
size et al. couldn't change mid-stream.
If we do want to support info header changes, we have to transmit
and cache both of them. We could, for example, just concatenate
the two and hash them together. This will make for a lot more
cache entries differing only by a few bytes, but does simplify/
obsolete defining those same fields in the SDP.
The crc hash is just of the 3rd setup header data, but the
info headers are also tranmitted in the same package...indexed
by the same crc as the stream data itself. I don't particularly
like this as described, because all three headers for each
segment are packed together, there's no scope for redundancy,
and the bandwidth spike is going to be much worse than in-line
retrieval.
Opinions?

So, just to get this straight, the core objection is which header is
being hashed, not that hashing is being done?

If it's the former then extending what is hashed to what is packed into
the full header block for a chained file is a reasonable change, if the
metadata is left out.

I can look at altering the draft to cover this extension.

If the whole chaining mechanism is to go then, although it does make the
whole structure a lot easier, it does lose a significant amount of
functionality. People will compare the RTP implementations to Icecast,
and if it isn't on par people will not be happy. Also as you pointed
out in the pros section above re-encoding puts a lot of strain on the
server-side processes, either on-line processor overhead or off-line
file processing.

-P

Ralph Giles

2005-02-03 22:44:05 UTC

Post by Phil Kerr
I think the most compelling argument you give is the last pro point
concerning re-encoding. But the frequency of re-encoding needed for
audio should be a lot higher so the re-encoding strain for audio may be
as high as for video.

Not really. Video's two extra dimensions tends to make up for the
smaller sampling rate. Or, looking at it another way, it's not 24 frames
a second, it's 8 (or 50) million pixels a second.

Post by Phil Kerr
So, just to get this straight, the core objection is which header is
being hashed, not that hashing is being done?

Well, I'm trying to frame the decision. We have:

0. No hashing. All decode parameters are fixed for the duration of the
RTP stream.
1. Hash the setup header only: codebooks can change doing a stream,
but not parameters like sample rate, framerate, image size, etc.
2. Hash the info and setup headers together: all decoder parameters
can change.
3. Hash all headers. the setup ident carries metadata changes as well.

Of these, I think (0) and (2) are the only reasonable options for the
reasons we've been discussing. I think 2 makes chaining work reasonably
for the use cases I've posited, so we're just down to the feature cost
and aesthetics.

A question: is there now or can we define some method to direct the
client (through rtsp or whatever) to connect to a different stream?
I ask in all ignorance. That would handle the case of the number of
streams changing, which is I think quite important for conference
webcasts and the like.

Post by Phil Kerr
If the whole chaining mechanism is to go then, although it does make the
whole structure a lot easier, it does lose a significant amount of
functionality. People will compare the RTP implementations to Icecast,
and if it isn't on par people will not be happy.

Well, I don't actually know anyone who does the bitrate-fallback thing
with icecast. That just leaves needing to transcode when doing 'fake'
live streaming from a mixed-encoding set of files. The tools can do that
automatically, so most people won't notice. I think the additional cpu
cost is worth the simplification as far as icecast regression goes.

-r

Aaron Colwell

2005-02-03 15:11:17 UTC

I think one other Pro is that you would be able to stream a larger set of
valid .ogg files over RTP. Unfortunately this still won't let you stream all
valid .ogg files, but it is getting closer.

Post by Ralph Giles
Personally, I come down on the side of no chaining for audio, and yes
chaining for video. Lovely.

Why is it more beneficial for video than audio? I'm assuming you want chaining
for video to allow frame rate changes and codebook changes to allow better
bitrate characteristics. Couldn't these same arguments be held for audio as
well? If you have audio that only has low frequency components it may make
sense to use a lower sample rate.

I'm not sure if the 2 headers are orthogonal in Vorbis. You need to know the
channels value from the info header to properly decode the setup header. This
will constrain you from changing the number of channels across chain boundries.
The block sizes are also stored in the info header so that wouldn't be able to
change across chain boundries either. These 2 constraints seem to me to
severely limit the usefulness of chaining support if you can only update the
setup header.

Post by Ralph Giles
So, the cleanest thing would be to hash only the setup header and
rely on the SDP for the info header details.
BUT, this means that things like sample rate, image size, and so
on can't change mid-stream, only the codebooks themselves. We've
only supported one of the two motivations for chaining. This may
still be a reasonable compromise: it preserves quality/bandwidth
scalability while simplifying a lot of the things people have
complained about with chaining because their code assumed image
size et al. couldn't change mid-stream.

To properly handle local and Icecast playback they have to solve these problems
anyways so I don't really see much savings here.

Post by Ralph Giles
If we do want to support info header changes, we have to transmit
and cache both of them. We could, for example, just concatenate
the two and hash them together. This will make for a lot more
cache entries differing only by a few bytes, but does simplify/
obsolete defining those same fields in the SDP.

This seems reasonable to me. This allows the CRC to represent the (info,setup)
tuple uniquely which is what we need.

Post by Ralph Giles
The crc hash is just of the 3rd setup header data, but the
info headers are also tranmitted in the same package...indexed
by the same crc as the stream data itself. I don't particularly
like this as described, because all three headers for each
segment are packed together, there's no scope for redundancy,
and the bandwidth spike is going to be much worse than in-line
retrieval.

I haven't read his proposals yet (I'll try to get to this today), but the
way you describe it doesn't sound like it would work. What happens in the case
where you use the same setup header, but a different frame size or frame rate?
The CRC wouldn't change so the client wouldn't know that these parameters
changed. Depending on how the frame size changed, frame decode could fail
because there are either too many or too few coded coeff, block coded flags,
etc.

Aaron

Post by Ralph Giles
Opinions?
-r
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

Ralph Giles

2005-02-03 22:06:48 UTC

Post by Aaron Colwell
I think one other Pro is that you would be able to stream a larger set of
valid .ogg files over RTP. Unfortunately this still won't let you stream all
valid .ogg files, but it is getting closer.

Do you mean Ogg files that change the number of multiplexed logical
bitstreams between chain boundaries? That's the only thing I can think
of that we're not covering.

Post by Aaron Colwell
Why is it more beneficial for video than audio? I'm assuming you want chaining
for video to allow frame rate changes and codebook changes to allow better
bitrate characteristics. Couldn't these same arguments be held for audio as
well? If you have audio that only has low frequency components it may make
sense to use a lower sample rate.

My analysis was based on transcoding being much more expensive and
difficult for video. Remember I'm against RTP chaining support in
the first place; the video framerate issue was pretty much the lever
argument for me.

Post by Aaron Colwell
I'm not sure if the 2 headers are orthogonal in Vorbis. You need to know the
channels value from the info header to properly decode the setup header. This
will constrain you from changing the number of channels across chain boundries.
The block sizes are also stored in the info header so that wouldn't be able to
change across chain boundries either. These 2 constraints seem to me to
severely limit the usefulness of chaining support if you can only update the
setup header.

Right. Teaches me to not do my homework. In light of this I agree with
you that hashing both the info and setup packets and using that as the
ident key is the best approach.

Post by Aaron Colwell
I haven't read his proposals yet (I'll try to get to this today), but the
way you describe it doesn't sound like it would work. What happens in the case
where you use the same setup header, but a different frame size or frame rate?
The CRC wouldn't change so the client wouldn't know that these parameters
changed. Depending on how the frame size changed, frame decode could fail
because there are either too many or too few coded coeff, block coded flags,
etc.

Insightful as always. So unless I missed something in my description,
this is just a more expensive version of the fixed-info packet config.

Thanks,
-r

Aaron Colwell

2005-02-03 22:41:05 UTC

Do you mean Ogg files that change the number of multiplexed logical
bitstreams between chain boundaries? That's the only thing I can think
of that we're not covering.

Yes. Since SDP forces the number of streams to be constant, you can't support
changing the number of logical streams unless you do SSRC multipexing over a
single RTP channel. I'm fine with not supporting those types of stream since
hopefully they are rare. For on-demand content you could support these files
by figuring out the max number of Vorbis and Theora streams needed and
advertise that many streams. You just don't send data on some of the streams
when they aren't needed. This is basically how I support these types of files
in the Helix Plugins.

For live this could be ok, but for on-demand, transcoding won't scale
eventhough audio is less expensive to transcode than video.

Right. Teaches me to not do my homework. In light of this I agree with
you that hashing both the info and setup packets and using that as the
ident key is the best approach.

Insightful as always. So unless I missed something in my description,
this is just a more expensive version of the fixed-info packet config.

I'm not sure what you mean about "a more expensive version of the fixed-info
packet config". I think it just means that if you want to support changes in
the info header then you need to have the CRC cover the info header and
codebook. If you just want to switch out the codebook then it will work fine.
Since the frame-rate info is in the ident header this would mean fixed
frame rate for video and fixed sample-rate and channels for audio.

Aaron

Post by Ralph Giles
Thanks,
-r

Ralph Giles

2005-02-03 22:50:50 UTC

Post by Aaron Colwell
Yes. Since SDP forces the number of streams to be constant, you can't support
changing the number of logical streams unless you do SSRC multipexing over a
single RTP channel. I'm fine with not supporting those types of stream since
hopefully they are rare. For on-demand content you could support these files
by figuring out the max number of Vorbis and Theora streams needed and
advertise that many streams. You just don't send data on some of the streams
when they aren't needed. This is basically how I support these types of files
in the Helix Plugins.

Ok, thanks. That helps.

Post by Aaron Colwell
For live this could be ok, but for on-demand, transcoding won't scale
eventhough audio is less expensive to transcode than video.

Is there a reason to use RTP instead of HTTP for on-demand work??

Post by Aaron Colwell
I'm not sure what you mean about "a more expensive version of the fixed-info
packet config". I think it just means that if you want to support changes in
the info header then you need to have the CRC cover the info header and
codebook. If you just want to switch out the codebook then it will work fine.
Since the frame-rate info is in the ident header this would mean fixed
frame rate for video and fixed sample-rate and channels for audio.

That's what I mean. :-)

-r

Aaron Colwell

2005-02-04 06:02:51 UTC

Ok, thanks. That helps.

Post by Aaron Colwell
For live this could be ok, but for on-demand, transcoding won't scale
eventhough audio is less expensive to transcode than video.

Is there a reason to use RTP instead of HTTP for on-demand work??

I don't really want to start a huge thread about this. I just want to say that
there are several streaming servers out there that use RTP for on-demand.
They can serve all sorts of other datatypes fine. Why make Vorbis and Theora
exceptions? A server can choose not to support these types of files if it
doesn't want to deal with the added complexity. If chained stream are somehow
marked in the SDP, then the client can opt not to play them back. It just
seems to me that making all streaming methods have equivalent functionality
is beneficial and doesn't limit the the distribution mechanisms you can use.

Aaron

That's what I mean. :-)
-r

Ramón García

2005-02-04 18:29:44 UTC

There is a strong problem with supporting chaining: it breaks encapsulation.

Single stream | Several Streams
--------------------------------------------------------------------
Vorbis streams | OGG Container
Different RTP sesssions | RTSP

Aaraon, why are you saying that not supporting chaining would imply
not serving some OGG files. Don't RealMedia, Quicktime and other
formats use different RTP sessions for each stream contained?

On the other hand, there is an important argument in favour of
supporing chaining: allowing dynamic changes as a response to changes
in the network conditions. So I think that limited support for this
case would be justified.

Aaron, how does RealAudio handle changes in network bandwidth?
Different RTP sessions, or just one?

Ralph Giles

2005-02-04 18:38:35 UTC

Post by RamÃ³n GarcÃa
There is a strong problem with supporting chaining: it breaks encapsulation.
Single stream | Several Streams
--------------------------------------------------------------------
Vorbis streams | OGG Container
Different RTP sesssions | RTSP

I think you may be confusing chaining and grouping here?

-r

Aaron Colwell

2005-02-04 19:05:35 UTC

Post by RamÃ³n GarcÃa
There is a strong problem with supporting chaining: it breaks encapsulation.

How so? In the sane case (ie number of logical stream don't change) all
chaining does is allow you to change the codec parameters in the middle of
playback.

Post by RamÃ³n GarcÃa
Single stream | Several Streams
--------------------------------------------------------------------
Vorbis streams | OGG Container
Different RTP sesssions | RTSP
Aaraon, why are you saying that not supporting chaining would imply
not serving some OGG files. Don't RealMedia, Quicktime and other
formats use different RTP sessions for each stream contained?

Yes they do. The problem is that there is a subset of valid ogg files that
allow the number of streams to change throughout the duration of the clip. For
on-demand clips you can figure out how many logical streams at SDP generation
time. For a live or simulated live session that does chaining, you don't have
enough info at SDP generation time to generate the right number of streams.
The only way to make it work inside the framework is to describe 1 stream in
the SDP and specify rtpmap info for both Vorbis and Theora. You can then use
SSRC multiplexing on that single RTP channel to keep the streams seperate.

Post by RamÃ³n GarcÃa
On the other hand, there is an important argument in favour of
supporing chaining: allowing dynamic changes as a response to changes
in the network conditions. So I think that limited support for this
case would be justified.

Right. That is one reason.

Post by RamÃ³n GarcÃa
Aaron, how does RealAudio handle changes in network bandwidth?
Different RTP sessions, or just one?

The different bitrates are multiplexed on the same RTP channel. We basically
have substream IDs in the packets. Audio and video are on seperate RTP
channels.

Aaron

Post by RamÃ³n GarcÃa
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

Ralph Giles

2005-02-05 20:55:16 UTC

Derf raised the question of CRC32 cache collisions on irc today and I'm
afraid he as a point.

The idea was to CRC32 hash the info+setup headers, and include that in
every packet to identify the associated decoder setup. Since this ident
is generated in a well defined way from the stream headers, clients
could cache the results and avoid having to do an out-of-band retrieval
for headers it's already seen. It also makes it easy to verify correct
retrieval of the headers before use.

The problem is that the risk of a collision causing the decoder to use
the wrong set of headers. If you believe in deterministically correct
software, that's bad, and even if you don't the probability with a 32
bit value is high enough to occur occasionally in practice.

So, the only way around this is to not cache the headers. We could just
say that, but doing so breaks the broadcast use case where play-time
retrieval isn't an option.

The other thing we can do is just make it an arbitrary number and say
that the sender and receiver MUST negotiate a mapping between the ident
and the decoder setup out of band, and just leave it out of the scope of
the RTP mapping (beyond the in-band transmission option anyway.) This
should work fine for the SDP uri and so on.

Thoughts?

-r

Phil Kerr

2005-02-06 23:16:27 UTC

Hi Ralph,

Post by Ralph Giles
Derf raised the question of CRC32 cache collisions on irc today and I'm
afraid he as a point.
The idea was to CRC32 hash the info+setup headers, and include that in
every packet to identify the associated decoder setup. Since this ident
is generated in a well defined way from the stream headers, clients
could cache the results and avoid having to do an out-of-band retrieval
for headers it's already seen. It also makes it easy to verify correct
retrieval of the headers before use.

This is the plus side using this mechanism.

Post by Ralph Giles
The problem is that the risk of a collision causing the decoder to use
the wrong set of headers. If you believe in deterministically correct
software, that's bad, and even if you don't the probability with a 32
bit value is high enough to occur occasionally in practice.

I think Derf was talking about Theora, not Vorbis, but from what I
gathered he listed the maximum permutations for all the config fields.
In practice there will be a large number of permutations that will never
be seen.

When the idea came up for using a CRC32 field there seemed to be no real
objections to it, perhaps we should find out what the frequency of hash
collisions are in real-life?

Post by Ralph Giles
So, the only way around this is to not cache the headers. We could just
say that, but doing so breaks the broadcast use case where play-time
retrieval isn't an option.

This is a solution, but does it harm functionality too much?

Post by Ralph Giles
The other thing we can do is just make it an arbitrary number and say
that the sender and receiver MUST negotiate a mapping between the ident
and the decoder setup out of band, and just leave it out of the scope of
the RTP mapping (beyond the in-band transmission option anyway.) This
should work fine for the SDP uri and so on.

If there is no caching then this will work as there only needs to be a
unique ident for each file for the session life. How do you ensure that
the arbitrary numbers are unique? Would ov_serialnumber be unique enough?

-P

Post by Ralph Giles
Thoughts?
-r
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

Ralph Giles

2005-02-06 23:31:32 UTC

Post by Phil Kerr
I think Derf was talking about Theora, not Vorbis, but from what I
gathered he listed the maximum permutations for all the config fields.
In practice there will be a large number of permutations that will never
be seen.

He was. I didn't necessarily agree with his feelings about the relative
danger of collision, so I generalized the argument. :)

Post by Phil Kerr
When the idea came up for using a CRC32 field there seemed to be no real
objections to it, perhaps we should find out what the frequency of hash
collisions are in real-life?

The problem is it's hard to judge because this depends on the behavior
of future encoders. I'm more worried about custom codebooks generated by
more advanced multipass theora encoders. I guess I do agree the issue is
probably more important for theora than for vorbis.

OTOH, caching seemed like a major win on the whole chaining thing.

Post by Ralph Giles
So, the only way around this is to not cache the headers. We could just
say that, but doing so breaks the broadcast use case where play-time
retrieval isn't an option.

This is a solution, but does it harm functionality too much?

I think so. We have people interested in satellite broadcast of
vorbis+theora right now. That use case is important.

Aaron Colwell

2005-02-10 16:13:42 UTC

ok. I sort of lost track of what has happened here. Could someone provide a
summary of the problem and solutions being proposed?

Here is what I've gotten from this discussion.

- There are concerns about collisions in the CRC

- The use of a unique ID is being proposed so that collisions are avoided,
but this brings up a problem of mapping the unique ID to the codebooks and
ident header.

- There are some concerns that using a unique ID adversely effects the players
ability to cache codebooks and avoid retrieval when it isn't necessary.

Is this correct?

If so here is a possible solution.

- Generate a unique ID for each chain. This could simply be the chain index
(ie the Nth chain in the file will have a unique ID of N)

- The server publishes a base URL in the SDP that allows the client to retrieve
hashes and URLs for the codebook and ident header associated with each
unique ID. The url for info associated with a particular unique ID is
constructed by appending the unique ID to the base URL.

- When it sees a new ID in the stream, the client can request the hash and URL
info from the server. If the hashes match stuff it already has in it's cache
then it just uses the info from it's cache. If it doesn't have the info then
it can use the specified URLs to retrieve what it needs.

- If the unique IDs are known at SDP generation time, then they can be
advertised in the SDP. The client then can prefetch all the info associated
with these IDs.

- Since the hash will no longer be in the payload we should probably use MD5
so that the collision worry just goes away.

- For one-way scenarios, nothing special needs to be done because the codebooks
and ident header will be transmitted inline. If you are dealing with a
completely closed system the server and client could agree on specific
ID -> ident, codebook mappings and not even bother transmitting the codebooks
inline. It might be useful to reserve some of the ID space for this purpose
in general.

Aaron

He was. I didn't necessarily agree with his feelings about the relative
danger of collision, so I generalized the argument. :)

Post by Phil Kerr
When the idea came up for using a CRC32 field there seemed to be no real
objections to it, perhaps we should find out what the frequency of hash
collisions are in real-life?

The problem is it's hard to judge because this depends on the behavior
of future encoders. I'm more worried about custom codebooks generated by
more advanced multipass theora encoders. I guess I do agree the issue is
probably more important for theora than for vorbis.
OTOH, caching seemed like a major win on the whole chaining thing.

Post by Ralph Giles
So, the only way around this is to not cache the headers. We could just
say that, but doing so breaks the broadcast use case where play-time
retrieval isn't an option.

This is a solution, but does it harm functionality too much?

I think so. We have people interested in satellite broadcast of
vorbis+theora right now. That use case is important.

Well, it would be up to the server to make sure it didn't reuse an ident
for two different decoder configs within a given RTP session. Otherwise,
it's arbitrary. That enforces no caching based on the ident, and isn't
much of a burden for the server.
IMHO,
-r
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

Ralph Giles

2005-02-10 21:58:40 UTC

Post by Aaron Colwell
ok. I sort of lost track of what has happened here. Could someone provide a
summary of the problem and solutions being proposed?

Oops. That's what I was trying to do when I started the thread.

Post by Aaron Colwell
Here is what I've gotten from this discussion.
- There are concerns about collisions in the CRC

Yes. And we don't want to spend more per-packet bits on the ident.

Post by Aaron Colwell
- The use of a unique ID is being proposed so that collisions are avoided,
but this brings up a problem of mapping the unique ID to the codebooks and
ident header.

It makes the mapping definition more pressing anyway.

Post by Aaron Colwell
- There are some concerns that using a unique ID adversely effects the players
ability to cache codebooks and avoid retrieval when it isn't necessary.

Yes.

Post by Aaron Colwell
If so here is a possible solution.
- Generate a unique ID for each chain. This could simply be the chain index
(ie the Nth chain in the file will have a unique ID of N)
- The server publishes a base URL in the SDP that allows the client to retrieve
hashes and URLs for the codebook and ident header associated with each
unique ID. The url for info associated with a particular unique ID is
constructed by appending the unique ID to the base URL.

How many bits are you proposing for this id?

Using a regularized URL scheme also means clients can at least take
advantage of HTTP caching mechanisms. That might be simpler than
specifying some way to retrieve the hash separately.

I'd really like to see what the SDP would actually look like for
this codebook transmission stuff. Can someone work up some examples? If
we can somehow include a larger hash for the chain idents in the
SDP, that would restore caching.

-r

Aaron Colwell

2005-02-10 22:49:32 UTC

Post by Aaron Colwell
ok. I sort of lost track of what has happened here. Could someone provide a
summary of the problem and solutions being proposed?

Oops. That's what I was trying to do when I started the thread.

Post by Aaron Colwell
Here is what I've gotten from this discussion.
- There are concerns about collisions in the CRC

Yes. And we don't want to spend more per-packet bits on the ident.

Post by Aaron Colwell
- The use of a unique ID is being proposed so that collisions are avoided,
but this brings up a problem of mapping the unique ID to the codebooks and
ident header.

It makes the mapping definition more pressing anyway.

Post by Aaron Colwell
- There are some concerns that using a unique ID adversely effects the players
ability to cache codebooks and avoid retrieval when it isn't necessary.

Yes.

How many bits are you proposing for this id?

I figure we could use the 32 bits that were proposed for the CRC field or
maybe even 24 or 16 bits. This basically just limits how many chains with
different ident,codebook pairs you want to support in a single broadcast.

Post by Ralph Giles
Using a regularized URL scheme also means clients can at least take
advantage of HTTP caching mechanisms. That might be simpler than
specifying some way to retrieve the hash separately.

I'm not sure I follow you here. The reason I am proposing the base URL scheme
is because you may not always know what the ident,codebooks pairs are at
SDP generation time. By using the base URL mechanism the client knows how to
take a new ID it gets and retrieve information about the codebooks involved.
Unless you want to send this mapping info periodically in the data stream I'm
not sure how else you can do this.

Post by Ralph Giles
I'd really like to see what the SDP would actually look like for
this codebook transmission stuff. Can someone work up some examples? If
we can somehow include a larger hash for the chain idents in the
SDP, that would restore caching.

I'll take a stab at this.

Here is the case where all the chains are known at SDP generation time

v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chain-ids=0,1,2;
a=chain-info:0 ident=42; codebook=98
a=chain-info:1 ident=43; codebook=98
a=chain-info:2 ident=45; codebook=23
a=ident-info:42 url="http://foo.com/ident-441k"; hash=987234BC8D92DFE2987234BC8D92DFE2
a=ident-info:43 url="http://foo.com/ident-8k"; hash=2186461716517578792145688D92DFE2
a=ident-info:43 url="http://foo.com/ident-11k"; hash=218646687642f4AEFD2145688D92DFE2
a=codebook-info:98 url="http://foo.com/codebook-lowBW"; hash=309573098520975ABEFC34768D92DFE2
a=codebook-info:23 url="http://foo.com/codebook-speech"; hash=4567319735186778271C34768D92DFE2

If you allow the urls to be relative that you could also do this

v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chain-ids=0,1,2; baseURL="http://foo.com/ident-441k/"
a=chain-info:0 ident=42; codebook=98
a=chain-info:1 ident=43; codebook=98
a=chain-info:2 ident=45; codebook=23
a=ident-info:42 url="ident-441k"; hash=987234BC8D92DFE2987234BC8D92DFE2
a=ident-info:43 url="ident-8k"; hash=2186461716517578792145688D92DFE2
a=ident-info:43 url="ident-11k"; hash=218646687642f4AEFD2145688D92DFE2
a=codebook-info:98 url="codebook-lowBW"; hash=309573098520975ABEFC34768D92DFE2
a=codebook-info:23 url="codebook-speech"; hash=4567319735186778271C34768D92DFE2

If the chain ID set is not known at SDP generation time then you could have
an SDP that looks like this

v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chainIDBaseURL="http://foo.com/chainIDs/"

If the client got a packet with an ID of 63 then it would request
http://foo.com/chainIDs/63

This would contain a chunk of data that would look like

+--------------+
| Ident MD5 |
+--------------+
| Codebook MD5 |
+--------------+
| Ident URL | <- Null terminated string
+--------------+
| Codebook URL | <- Null terminated string
+--------------+

The client can differentiate the 2 cases by what fields are present in the
fmtp header.

Aaron

Ralph Giles

2005-04-10 18:42:23 UTC

Post by Aaron Colwell
I figure we could use the 32 bits that were proposed for the CRC field or
maybe even 24 or 16 bits. This basically just limits how many chains with
different ident,codebook pairs you want to support in a single broadcast.

So you'd be ok with the 16 bits in my proposal?

Also, it occured to me that while storing actual info in the SSRC would
be an abuse, switching on SSRC,ident pairs might be ok. I don't suggest
that this is a good idea, but it does leave lots of headroom.

Post by Aaron Colwell
I'm not sure I follow you here. The reason I am proposing the base URL scheme
is because you may not always know what the ident,codebooks pairs are at
SDP generation time. By using the base URL mechanism the client knows how to
take a new ID it gets and retrieve information about the codebooks involved.
Unless you want to send this mapping info periodically in the data stream I'm
not sure how else you can do this.

Ok, I understand now. I don't have enough experience with SDP to have an
opinion on the style issue, so if you think this is reasonable, I'm ok
with it. Saves typing too.

The idea would be simple concatenation, as in your example, or some kind
of template string? There are lots of ways to map ident to url.

Post by Aaron Colwell
Here is the case where all the chains are known at SDP generation time
[...]
If you allow the urls to be relative that you could also do this
v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chain-ids=0,1,2; baseURL="http://foo.com/ident-441k/"
a=chain-info:0 ident=42; codebook=98
a=chain-info:1 ident=43; codebook=98
a=chain-info:2 ident=45; codebook=23
a=ident-info:42 url="ident-441k"; hash=987234BC8D92DFE2987234BC8D92DFE2
a=ident-info:43 url="ident-8k"; hash=2186461716517578792145688D92DFE2
a=ident-info:43 url="ident-11k"; hash=218646687642f4AEFD2145688D92DFE2
a=codebook-info:98 url="codebook-lowBW"; hash=309573098520975ABEFC34768D92DFE2
a=codebook-info:23 url="codebook-speech"; hash=4567319735186778271C34768D92DFE2

Ok, thanks for putting this together, that really helps. :)

I'm not sure being able to mix-and-match ident and codebook headers is
worth the indirection complexity. How about just a single id=url;hash
mapping? I also think it would be reasonable to (optionally) include a
comment header in the set, for streams where that makes sense. Yes, of
course we want a separate metadata stream, but there's no reason not to
fall back on current tech.

I'd also suggest s/hash/md5/ just so it's possible to change the hash
later.

Post by Aaron Colwell
If the chain ID set is not known at SDP generation time then you could have
an SDP that looks like this
v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chainIDBaseURL="http://foo.com/chainIDs/"

Ok. Similar to before I have a simpler proposal. The http urls should
just point to an Ogg stream with the header packets. That's well
specified, and easy for the server to generate. Having a separate query
to get the hashes saves bandwidth, but also increases latency quite a
bit. If you're generating custom headers for every stream in a live
environment, I don't think cache coherency for the hashes is going to be
very good anyway.

Question: if you've streaming theora+vorbis using this scheme, can you
use the same url for both headers and just put up a multiplexed Ogg
file? That would make it *really* easy.

Post by Aaron Colwell
The client can differentiate the 2 cases by what fields are present in the
fmtp header.

Is this why you have BaseURL vs ChainBaseURL? I think with a combined
header scheme this could also go away.

Still, in general this looks good, and I think it's better than trying
to send the headers themselves in the SDP. If we can work out these
details, and specify some way to point to an RTP header stream for
multicast, I think we're done. Broadcast applications can use the SDP
scheme with a fixed set of preloaded codebooks.

-r

Phil Kerr

2005-04-10 20:18:30 UTC

So you'd be ok with the 16 bits in my proposal?
Also, it occured to me that while storing actual info in the SSRC would
be an abuse, switching on SSRC,ident pairs might be ok. I don't suggest
that this is a good idea, but it does leave lots of headroom.

The SSRC needs to be a random value. Having it as a derived value will
be against RTP specs.

Ok, I understand now. I don't have enough experience with SDP to have an
opinion on the style issue, so if you think this is reasonable, I'm ok
with it. Saves typing too.
The idea would be simple concatenation, as in your example, or some kind
of template string? There are lots of ways to map ident to url.

Ok, thanks for putting this together, that really helps. :)
I'm not sure being able to mix-and-match ident and codebook headers is
worth the indirection complexity. How about just a single id=url;hash
mapping? I also think it would be reasonable to (optionally) include a
comment header in the set, for streams where that makes sense. Yes, of
course we want a separate metadata stream, but there's no reason not to
fall back on current tech.
I'd also suggest s/hash/md5/ just so it's possible to change the hash
later.

MD5 is 128 bit, a hash can be different sizes. Key could be a better
description.

Also Aaron's little SDP example above is 770 octets, isn't the maximum
SDP message length 1000 octets? This chaining mechanism could be
limited if it passes its data in SDP like this.

Ok. Similar to before I have a simpler proposal. The http urls should
just point to an Ogg stream with the header packets. That's well
specified, and easy for the server to generate. Having a separate query
to get the hashes saves bandwidth, but also increases latency quite a
bit. If you're generating custom headers for every stream in a live
environment, I don't think cache coherency for the hashes is going to be
very good anyway.
Question: if you've streaming theora+vorbis using this scheme, can you
use the same url for both headers and just put up a multiplexed Ogg
file? That would make it *really* easy.

Post by Aaron Colwell
The client can differentiate the 2 cases by what fields are present in the
fmtp header.

Is this why you have BaseURL vs ChainBaseURL? I think with a combined
header scheme this could also go away.
Still, in general this looks good, and I think it's better than trying
to send the headers themselves in the SDP. If we can work out these
details, and specify some way to point to an RTP header stream for
multicast, I think we're done. Broadcast applications can use the SDP
scheme with a fixed set of preloaded codebooks.
-r
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

Ralph Giles

2005-04-11 00:50:00 UTC

Post by Ralph Giles
Also, it occured to me that while storing actual info in the SSRC would
be an abuse, switching on SSRC,ident pairs might be ok. I don't suggest
that this is a good idea, but it does leave lots of headroom.

The SSRC needs to be a random value. Having it as a derived value will
be against RTP specs.

That's what I was saying. It can't be a derived value, but things in the
SDP could be derived from it. Or is that also contradicted?

Post by Ralph Giles
I'd also suggest s/hash/md5/ just so it's possible to change the hash
later.

MD5 is 128 bit, a hash can be different sizes. Key could be a better
description.

To be able to verify the hash you need to know the algorithm.

Post by Phil Kerr
Also Aaron's little SDP example above is 770 octets, isn't the maximum
SDP message length 1000 octets? This chaining mechanism could be
limited if it passes its data in SDP like this.

Hmm. removing the indirection as I suggested would help, but only so
much. Using the implicit url mapping Aaron proposed would help a lot
more. Do you have another suggestion?

-r

Aaron Colwell

2005-04-11 13:45:17 UTC

So you'd be ok with the 16 bits in my proposal?
Also, it occured to me that while storing actual info in the SSRC would
be an abuse, switching on SSRC,ident pairs might be ok. I don't suggest
that this is a good idea, but it does leave lots of headroom.

The SSRC needs to be a random value. Having it as a derived value will
be against RTP specs.

Ok, I understand now. I don't have enough experience with SDP to have an
opinion on the style issue, so if you think this is reasonable, I'm ok
with it. Saves typing too.
The idea would be simple concatenation, as in your example, or some kind
of template string? There are lots of ways to map ident to url.

Post by Aaron Colwell
Here is the case where all the chains are known at SDP generation time
[...]
If you allow the urls to be relative that you could also do this
v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chain-ids=0,1,2; baseURL="http://foo.com/ident-441k/"
a=chain-info:0 ident=42; codebook=98
a=chain-info:1 ident=43; codebook=98
a=chain-info:2 ident=45; codebook=23
a=ident-info:42 url="ident-441k"; hash=987234BC8D92DFE2987234BC8D92DFE2
a=ident-info:43 url="ident-8k"; hash=2186461716517578792145688D92DFE2
a=ident-info:43 url="ident-11k"; hash=218646687642f4AEFD2145688D92DFE2
a=codebook-info:98 url="codebook-lowBW";
hash=309573098520975ABEFC34768D92DFE2
a=codebook-info:23 url="codebook-speech";
hash=4567319735186778271C34768D92DFE2

Ok, thanks for putting this together, that really helps. :)
I'm not sure being able to mix-and-match ident and codebook headers is
worth the indirection complexity. How about just a single id=url;hash
mapping? I also think it would be reasonable to (optionally) include a
comment header in the set, for streams where that makes sense. Yes, of
course we want a separate metadata stream, but there's no reason not to
fall back on current tech.
I'd also suggest s/hash/md5/ just so it's possible to change the hash
later.

MD5 is 128 bit, a hash can be different sizes. Key could be a better
description.
Also Aaron's little SDP example above is 770 octets, isn't the maximum
SDP message length 1000 octets? This chaining mechanism could be
limited if it passes its data in SDP like this.

The only place where I've seen a limit suggested on SDP is for SAP
announcements. Perhaps this is where your 1k limit is coming from. Even there
it is not a MUST, but rather a RECOMMENDED. I also don't think that chaining
will likely be used that much in a multicast session that uses SAP for
announcement. Even if it does, the larger format would still be allowed, just
not recommended. I'll try to come up with a little more compact form if that
makes people happier. Using a Bin64 encoding for the hash would reduce the
number of octets needed.

For usage in scenarios other than SAP, we shouldn't care about the 1k limit.
Cases where the SDP is fetched via RTSP or HTTP shouldn't need to worry about
this limitation. On-demand, unicast live, and multicast live can all use
SDP retrieval via HTTP or RTSP.

Aaron

Ok. Similar to before I have a simpler proposal. The http urls should
just point to an Ogg stream with the header packets. That's well
specified, and easy for the server to generate. Having a separate query
to get the hashes saves bandwidth, but also increases latency quite a
bit. If you're generating custom headers for every stream in a live
environment, I don't think cache coherency for the hashes is going to be
very good anyway.
Question: if you've streaming theora+vorbis using this scheme, can you
use the same url for both headers and just put up a multiplexed Ogg
file? That would make it *really* easy.

Post by Aaron Colwell
The client can differentiate the 2 cases by what fields are present in
the fmtp header.

Is this why you have BaseURL vs ChainBaseURL? I think with a combined
header scheme this could also go away.
Still, in general this looks good, and I think it's better than trying
to send the headers themselves in the SDP. If we can work out these
details, and specify some way to point to an RTP header stream for
multicast, I think we're done. Broadcast applications can use the SDP
scheme with a fixed set of preloaded codebooks.
-r
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp

Luca Barbato

2005-04-10 23:41:30 UTC

Post by Ralph Giles
Ok. Similar to before I have a simpler proposal. The http urls should
just point to an Ogg stream with the header packets. That's well
specified, and easy for the server to generate. Having a separate query
to get the hashes saves bandwidth, but also increases latency quite a
bit. If you're generating custom headers for every stream in a live
environment, I don't think cache coherency for the hashes is going to be
very good anyway.

What about just map the codebook ident to the file to fetch from the url?
that way would be quite easy have it as

myhandler://mysite/myfile/codebookident

in the sdp you can just provide the myhandler://mysite/myfile/ part and
the client can figure it out from the ident in the rtp packet or just
fetch all and cache .

Since you told me that the rfc would be JUST for rt,p I'd like to have a
simple reference about out of band codebook delivery methods and not
specify them since in different cases you may have more effective or
less effective way to provide them.

Post by Ralph Giles
Question: if you've streaming theora+vorbis using this scheme, can you
use the same url for both headers and just put up a multiplexed Ogg
file? That would make it *really* easy.

That would add the requirement of adding ogg dependency on the client.
vorbis-rtp has one container, rtp, no need to add others.

lu

--
Luca Barbato

Gentoo/linux Developer Gentoo/PPC Operational Manager
http://dev.gentoo.org/~lu_zero

Ralph Giles

2005-04-11 15:13:28 UTC

Post by Luca Barbato
What about just map the codebook ident to the file to fetch from the url?
that way would be quite easy have it as
myhandler://mysite/myfile/codebookident
in the sdp you can just provide the myhandler://mysite/myfile/ part and
the client can figure it out from the ident in the rtp packet or just
fetch all and cache.

That was the substance of Aaron's proposal for the case when you don't
know the headers you're going to use in advance.

Post by Luca Barbato
Since you told me that the rfc would be JUST for rt,p I'd like to have a
simple reference about out of band codebook delivery methods and not
specify them since in different cases you may have more effective or
less effective way to provide them.

We'll need to document the various recommended header retrieval
mechanisms, yes.

Post by Luca Barbato

That would add the requirement of adding ogg dependency on the client.
vorbis-rtp has one container, rtp, no need to add others.

So you'd prefer raw retrieval of the individual headers?

Note that the code to just split out the packet data from an Ogg stream
is only a couple of pages; it's not a large dependency compared to, say,
an http client.

-r

Luca Barbato

2005-04-11 18:12:00 UTC

Post by Ralph Giles
So you'd prefer raw retrieval of the individual headers?

Probably a network order uint32 for the size followed by the raw data
could work well enough or alternatively a vlc coded size followed by the
raw data

Post by Ralph Giles
Note that the code to just split out the packet data from an Ogg stream
is only a couple of pages; it's not a large dependency compared to, say,
an http client.

As I said before, if the target is to have a pure vorbis-rtp spec we
should try to think the simplest way that doesn't require external deps.
Having contained extradata adds complexity

--
Luca Barbato

Gentoo/linux Developer Gentoo/PPC Operational Manager
http://dev.gentoo.org/~lu_zero

Aaron Colwell

2005-04-11 14:09:47 UTC

So you'd be ok with the 16 bits in my proposal?

I don't really care that much about the size. My only concern is that
theoretically an ogg file can have 2^32-1 chains in it. I know that no one is
likely to do this, but they could. I think a 32 bit field in every packet would
be a waste of bits as would any number of bits if chaining wasn't even used.

Ideally I'd like to rearrange the flag bits a little bit so that there could be
a bit that indicates whether a chainID is present in the packet. The chainID
field could then be variable length ala UTF-8 style or perhaps a simple
encoding like MSB being set means that there is another byte for the chainID.
The reason I like this option is that it is kind of a "pay as you go" strategy.
As you add more chains to the stream, you pay more and more for the chainID
field. Yes it is a little more complex, but it allows you to accomodate any
valid ogg file, prevents you from wasting bits when chaining isn't even used,
and provides incremental overhead when chaining is used.

Ok, I understand now. I don't have enough experience with SDP to have an
opinion on the style issue, so if you think this is reasonable, I'm ok
with it. Saves typing too.
The idea would be simple concatenation, as in your example, or some kind
of template string? There are lots of ways to map ident to url.

I think absolute URLs should be used for the base URL and relative URLs
should be used for the ident-info. Then you can just use the relative URL
resolution rules to figure out what the URL for each ident info. That's
basically how SETUP urls are generated for RTSP. That is where I got the idea
from.

Ok, thanks for putting this together, that really helps. :)
I'm not sure being able to mix-and-match ident and codebook headers is
worth the indirection complexity. How about just a single id=url;hash
mapping? I also think it would be reasonable to (optionally) include a
comment header in the set, for streams where that makes sense. Yes, of
course we want a separate metadata stream, but there's no reason not to
fall back on current tech.

I think there is probably a more compact representation I could use here.
The reason I wanted to have a mix and match scheme was to reduce the amount
of duplication in the SDP. That minimizes the number of hash values in the SDP
if the same ident or codebook is used in several chains.

Post by Ralph Giles
I'd also suggest s/hash/md5/ just so it's possible to change the hash
later.

I agree.

I though about just pointing to an ogg stream, but that prevents the client
from just grabbing the ident or codebook. It is possible for the client to
already have one of these 2 pieces from an earlier chain or from an earlier
file. I just wanted to have a scheme where the client is able to only get what
it needs.

It would, but if the client already has everything but the Theora codebook then
it has to waste bits pulling down the Vorbis ident, Vorbis codebook, and Theora
ident header. Having to pull down all the pieces reduces the savings of the
client's cache because if you have a cache miss on a codebook or ident you
still have to download stuff you might already have.

Post by Aaron Colwell
The client can differentiate the 2 cases by what fields are present in the
fmtp header.

Is this why you have BaseURL vs ChainBaseURL? I think with a combined
header scheme this could also go away.

Yes that is why I have the 2 different URL headers.

I'll look at this again and see if I can come up with a more compact SDP
representation.

Aaron

Post by Ralph Giles
Still, in general this looks good, and I think it's better than trying
to send the headers themselves in the SDP. If we can work out these
details, and specify some way to point to an RTP header stream for
multicast, I think we're done. Broadcast applications can use the SDP
scheme with a fixed set of preloaded codebooks.
-r

Ralph Giles

2005-04-11 15:05:31 UTC

Post by Aaron Colwell
I don't really care that much about the size. My only concern is that
theoretically an ogg file can have 2^32-1 chains in it. I know that no one is
likely to do this, but they could. I think a 32 bit field in every packet would
be a waste of bits as would any number of bits if chaining wasn't even used.

True. I think supporting 32 bits of chain segments is outside our
requirements list though. The arguments I find reasonable for RTP
chaining support are framerate changes in theora and your suggestion
of realserver-style bandwidth adjustment. Being able to support
more general chained Ogg files is more of a nice side effect; and
therefore I'm not worried about technically covering the same domain.

Post by Aaron Colwell
Ideally I'd like to rearrange the flag bits a little bit so that there could be
a bit that indicates whether a chainID is present in the packet. The chainID
field could then be variable length ala UTF-8 style or perhaps a simple
encoding like MSB being set means that there is another byte for the chainID.
The reason I like this option is that it is kind of a "pay as you go" strategy.
As you add more chains to the stream, you pay more and more for the chainID
field. Yes it is a little more complex, but it allows you to accomodate any
valid ogg file, prevents you from wasting bits when chaining isn't even used,
and provides incremental overhead when chaining is used.

Ok. I'd be happy with this to select between 1,2,3,4 byte signatures,
but I still prefer my 16+8 proposal to maintain alignment. We can either
steal a bit from the packet count field, or (my preference) have a
minimum 1 byte ident field, with the high bit indicating continuation,
as you suggest.

Post by Aaron Colwell
I think absolute URLs should be used for the base URL and relative URLs
should be used for the ident-info. Then you can just use the relative URL
resolution rules to figure out what the URL for each ident info. That's
basically how SETUP urls are generated for RTSP. That is where I got the idea
from.

Ok.

Post by Aaron Colwell
I though about just pointing to an ogg stream, but that prevents the client
from just grabbing the ident or codebook. It is possible for the client to
already have one of these 2 pieces from an earlier chain or from an earlier
file. I just wanted to have a scheme where the client is able to only get what
it needs.

[multiplexed header urls]

In both cases, the latency of additional queries will cost as much as
the extra data transfer for broadband users. Perhaps we should do some
measurements and see what the actual usage patterns would be with
current streams?

-r

Aaron Colwell

2005-04-11 21:48:47 UTC

Ok here is an updated SDP that is more compact. Basically the chain-IDs,
ident-IDs, and codec-IDs are implied by the order of the information in the
SDP lines.

-- Case where all chains are known at SDP generation time --

v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 baseURL="http://foo.com/ogg-info/"
a=chain-info: 0:0; 1:0; 2:1;
a=ident-info: url="ident-441k", MD5=987234BC8D92DFE2987234BC8D92DFE2;url="ident-8k", MD5=2186461716517578792145688D92DFE2; url="ident-11k", MD5=218646687642f4AEFD2145688D92DFE2;
a=codebook-info: url="codebook-lowBW", MD5=309573098520975ABEFC34768D92DFE2;url="/codebook-speech", MD5=4567319735186778271C34768D92DFE2;

Basically this says
- Chain ID 0 uses Ident ID 0 and CodebookID 0.
- Chain ID 1 uses Ident ID 1 and CodebookID 0.
- Chain ID 2 uses Ident ID 2 and CodebookID 1.

- Ident 0 can be retrieved at http://foo.com/ogg-info/ident-441k and has an
MD5 hash of 987234BC8D92DFE2987234BC8D92DFE2
- Ident 1 can be retrieved at http://foo.com/ogg-info/ident-8k and has an
MD5 hash of 2186461716517578792145688D92DFE2
- Ident 2 can be retrieved at http://foo.com/ogg-info/ident-11k and has an
MD5 hash of 218646687642f4AEFD2145688D92DFE2

- Codebook 0 can be retrieved at http://foo.com/ogg-info/codebook-lowBW and
has a MD5 hash of 309573098520975ABEFC34768D92DFE2
- Codebook 1 can be retrieved at http://foo.com/codebook-speech and has an
MD5 hash of 4567319735186778271C34768D92DFE2

Note that the url in the ident-info and codebook info is a relative URL. It
should be applied to the base URL using normal relative URL resolution rules
specified in the URI RFC. URL can contain absolute URLs if you wish, but it
just takes more SDP space.

For now I'm still sticking by my original proposal for dealing with the case
where the chains are not known at SDP generation time. I'd slightly change the
format of the chain info to address Ralph's idea of using different hash
functions.

-- Case where chains are NOT known at SDP generation time --
v=0
o=- 1105605563 1105605563 IN IP4 207.188.30.165
s=<No title>
i=<No author> .2000
c=IN IP4 0.0.0.0
t=0 0
a=control:*
a=range:npt=0-202.297000
m=audio 0 RTP/AVP 101
b=AS:8
a=control:TrackID=0
a=rtpmap:101 VORBIS/44100/2
a=fmtp:101 chainIDBaseURL="http://foo.com/chainIDs/"

http://foo.com/chainIDs/0 would contain something like

+----------------------+
| Ident hash 4cc |
+----------------------+
| Ident hash length |
+----------------------+
| Codebook hash 4cc |
+----------------------+
| Codebook hash length |
+----------------------+
| Ident Hash |
+----------------------+
| Codebook Hash |
+----------------------+
| Ident URL | <- Null terminated string
+----------------------+
| Codebook URL | <- Null terminated string
+----------------------+

The 4cc's indicate which hash is being used. The lengths allow a client to
skip over hash codes it doesn't understand to get to the URLs. This allows
an older client to still be able to get the ident and codebook even if it
doesn't understand a newer hash code scheme that is being used.

You can also do a hybrid of the 2 SDP's if you happen to know some of the
ident and codebooks at SDP generation time.

more comments inline

ok. The MSb method is my preference too.

Ok.

[multiplexed header urls]

True. I do think some measurements would probably be useful. The main use of
chaining I can think of right now is the Virgin radio case. In this case there
are multiple chains, but the codebooks and ident's don't change. In this case
you use less bits by not retrieving the codebooks every time. You would just
grab the chain info, notice that the ident and codebook hasn't changed. The
client knows it doesn't have to download the codebooks in this case. That is
the main case that I know is out in the wild right now.

The less trivial scenario that I can think of is ad-insertion into a stream.
In that case the ident and/or codebook could change. We'd have to create a
stream like this to figure out the latencies involved. It seems like this
is somewhat similar to a .m3u file retrieved via HTTP. You do a GET to retrieve
the .m3u and then you immediately do another GET to get the actual stream.
That usually seems to happen pretty quickly.

I'd be happy to run this experiment if I could find some time to implement all
this stuff in the Helix Server. I'm pretty busy these days, but I'll try to
carve out some time for this.

Aaron

Aaron Colwell

2005-04-11 22:47:37 UTC