Discussion:
[xiph-rtp] Lots of proposals
Luca Barbato
2005-08-29 19:14:45 UTC
Permalink
I spent the day with the rest of the oms group (streaming.polito.it if
you don't know already) thinking and rethinking about some issues.

That is a short memo, looks like we got to similar conclusions in
different fields.

The first part of brainstorming was about the overhead of not allowing
chaining in rtp and the result that the major overhead is about the fact
in most cases you have more (x2) ports open since you have to add rtp
streams; in certain situations it is a non issue, in certain could if
you don't have enough resources

Since I'd like to have the rtp describing ONLY rtp in vorbis and use it
as baseline for other rfcs to provide or define better extended features
(like outband transmission that require other protocols)

Here follows the summary:


####
The first suggestion was developed by Federico Ridolfo and is about
codebook (and Configuration and metadata) retrasmission and packet skips.

It works that way:

- If the codecbook(and the other metadata) aren't available the client
have to skip to the next packet untill they are received correctly.

- The 3 start packet will be resent each N time with the same timestamp eg:

Time 0 Configuration Codebook and Metadata sent
Time 1 Raw sent
..
..
Time N Configuration Codebook Metadata Raw sent
Consider that each raw packet is sent every 20ms, once you reach 20Nms
you send the 3 start packet again and the raw one.

The way to discover if you have or not have the right settings and you
have to skip could be the usual hash or a codebook id mapping in sdp (in
that case the tag field can be shorter and be an incremental number)

That solution supports part of the chaining features and could be used
on every scenario w/out issues.


###
Another issue was about using the Marker bit to delimit syncpoints and
how to use timestamps to support some editlist features

###

The last piece was about minimizing the information to be sent inband,
the metadata packet and the configuration packet could be replicated in
the sdp (it depends mostly on the metadata content but that is another
issue). Everything that could fit in the sdp should stay there since on
the multicast/conference scenario that data will be presented to the
client joining for sure and it won't be any need to retransmit it.
That could be one of the famous early optimization but start thinking
about that could be useful or not

###
The other discussions were about ways to support Offband transmissions
using RTSP and mappings using SDP but those don't belong to the rtp
draft I guess.


Those features won't require other protocols and should support quite
well the following scenarios:

? conference/multicast

? playlist on demand

? webcast with or without live and pseudo live feeds

? adaptive changes in the bitstream


Tomorrow I'll write something better after my exam.

lu
--
Luca Barbato

Gentoo/linux Developer Gentoo/PPC Operational Leader
http://dev.gentoo.org/~lu_zero
Tor-Einar Jarnbjo
2005-08-29 19:36:14 UTC
Permalink
Post by Luca Barbato
Consider that each raw packet is sent every 20ms, once you reach 20Nms
you send the 3 start packet again and the raw one.
The way to discover if you have or not have the right settings and you
have to skip could be the usual hash or a codebook id mapping in sdp (in
that case the tag field can be shorter and be an incremental number)
That solution supports part of the chaining features and could be used
on every scenario w/out issues.
You make this sound much easier than it really is ...

The transmitter would have to make assumptions on the client's available
bandwidth and adjust the transmission speed accordingly. The only
assumption it can make is to expect the client to have enough bandwidth
to receive the actual audio stream an so throttle the header
transmission accordingly. For a 100kbps stream, it will take roughly 5
secs to transmit the header this way, add some packet loss and
prebuffering of audio data and you'll soon end up with an inacceptable
delay before the client is able to start playing. I also don't see how
you would support this using multicast.

Tor
David Barrett
2005-08-29 21:17:35 UTC
Permalink
Post by Tor-Einar Jarnbjo
For a 100kbps stream, it will take roughly 5
secs to transmit the header this way, add some packet loss and
prebuffering of audio data and you'll soon end up with an inacceptable
delay before the client is able to start playing.
This is exactly what I experienced, forcing me to drop inline
transmission. Personally, I've come to the conclusion that it's just
not a good idea to make a profile of an unreliable protocol (RTP) depend
on reliable delivery (inline codebook transmission) without providing
some kind of reliability feature. Otherwise at best you're wasting
bandwidth, and at worst you're creating unacceptable delays.

With this in mind, has anyone proposed a matching RTCP profile that adds
a "codebook acknowledged" packet? If this existed, clients MUST send it
when they get a codebook, but the server only SHOULD pay attention to it.

Thus unidirectional broadcasters would need to do some kind of
exponential retransmission backoff or something (at cost in delay and
bandwidth), while bidirectional broadcasters/receivers could reliably
deliver codebooks before the server ever sends video data.

(And I believe this is entirely orthogonoal to the chaining discussion.)

-david
Aaron Colwell
2005-08-30 13:57:12 UTC
Permalink
Post by David Barrett
Post by Tor-Einar Jarnbjo
For a 100kbps stream, it will take roughly 5
secs to transmit the header this way, add some packet loss and
prebuffering of audio data and you'll soon end up with an inacceptable
delay before the client is able to start playing.
This is exactly what I experienced, forcing me to drop inline
transmission. Personally, I've come to the conclusion that it's just
not a good idea to make a profile of an unreliable protocol (RTP) depend
on reliable delivery (inline codebook transmission) without providing
some kind of reliability feature. Otherwise at best you're wasting
bandwidth, and at worst you're creating unacceptable delays.
Inline delivery is mainly for forward link only and multicast scenarios.
The periodic transmission of the codebooks is added to the cost of delivering
the stream and is a necessesary component for cases where you have no way to
insure reliable transmission.
Post by David Barrett
With this in mind, has anyone proposed a matching RTCP profile that adds
a "codebook acknowledged" packet? If this existed, clients MUST send it
when they get a codebook, but the server only SHOULD pay attention to it.
I go back and forth about whether it is appropriate to create our own
"codebook ACK" or whether we should use existing IETF standards and drafts that
have similar purposes. There are several options mainly RTCP XR
(RFC 3611), AVPF (AVT Internet Draft), and perhaps the retransmission draft.
The conflict I have about using the existing stuff is that it seems to apply to
the whole session. We really only want to reliably transmit part of the
session. It's not clear to me whether this is justification enough to create
our own reliability mechanism or not.

I think this conflict is mainly the reason why I've leaned towards HTTP for
codebook retrieval. It allows you to use a seperate reliable transport for the
only data that needs to be transmitted in a reliable fashion.

I suppose one other possibility could be to put the codebooks in their own RTP
session. A client could then listen to that session for as long as it took to
receive the codebooks and then close down that session. I'm not really a huge
fan of this particular idea since it has a much higher overhead then a simple
HTTP transfer.

Aaron
Ralph Giles
2005-08-30 14:36:47 UTC
Permalink
Post by Aaron Colwell
I suppose one other possibility could be to put the codebooks in their own RTP
session. A client could then listen to that session for as long as it took to
receive the codebooks and then close down that session. I'm not really a huge
fan of this particular idea since it has a much higher overhead then a simple
HTTP transfer.
Weren't we planning on describing this as an option for multicast?
Unicast solutions would be welcome to use it to, although I agree
HTTP makes more since for those cases.

-r
Aaron Colwell
2005-08-30 15:04:00 UTC
Permalink
Post by Ralph Giles
Post by Aaron Colwell
I suppose one other possibility could be to put the codebooks in their own RTP
session. A client could then listen to that session for as long as it took to
receive the codebooks and then close down that session. I'm not really a huge
fan of this particular idea since it has a much higher overhead then a simple
HTTP transfer.
Weren't we planning on describing this as an option for multicast?
Unicast solutions would be welcome to use it to, although I agree
HTTP makes more since for those cases.
Yes. In a multicast scenario it would be a good idea to have the codebooks
transmitted in their own session. That session should probably flagged in some
way to indicate that it only contains codebooks. I suppose people could also
do this in a unicast case, but HTTP seems like a better option.

Aaron
Post by Ralph Giles
-r
David Barrett
2005-08-30 15:30:01 UTC
Permalink
Post by Aaron Colwell
Inline delivery is mainly for forward link only and multicast scenarios.
The periodic transmission of the codebooks is added to the cost of delivering
the stream and is a necessesary component for cases where you have no way to
insure reliable transmission.
Yes, this makes sense.
Post by Aaron Colwell
I go back and forth about whether it is appropriate to create our own
"codebook ACK" or whether we should use existing IETF standards and drafts that
have similar purposes. There are several options mainly RTCP XR
(RFC 3611), AVPF (AVT Internet Draft), and perhaps the retransmission draft.
The conflict I have about using the existing stuff is that it seems to apply to
the whole session. We really only want to reliably transmit part of the
session. It's not clear to me whether this is justification enough to create
our own reliability mechanism or not.
I agree, we only care about reliable codebook delivery, and certainly
not every packet.

Also, I agree we *could* beat an existing round standard into our square
peg, but it wouldn't be pretty. A minimal, purpose-built RTCP profile
that solves just this problem might be appropriate. We needn't limit it
to just Xiph codecs, but it will be used by *at least* those.
Post by Aaron Colwell
I think this conflict is mainly the reason why I've leaned towards HTTP for
codebook retrieval. It allows you to use a seperate reliable transport for the
only data that needs to be transmitted in a reliable fashion.
I agree in theory, but in practice I find it very problematic to use two
separate communication channels (especially using two entirely different
protocols, contacting entirely different sources) to accomplish a single
action. It confuses the code, compounds the failure rate, decelerates
performance, burdens my central resources, etc, etc.

Granted, when you only have a hammer, everything looks like a nail. But
I would vastly prefer to send a *single packet* along a RTCP stream that
is *already established* -- even if it means creating a new RTCP profile
-- to any of the proposed alternatives.


It seems to me that we could do away with HTTP codebook delivery
altogether, while gaining performance and reducing complexity.
Furthermore, we could make chaining (true chaining, with an entirely new
codebooks and not just framerate/quality tweaks) as simple as:

1) If you receive a packet with a 'chainID' you don't recognize, the
codebook was probably lost, so send a 'retransmit codebook' request with
the 'chainID' you need.

2) Whenever you receive a codebook, send a 'codebook acknowledged'
message indicating its associated 'chainID'.

This makes an RTP session entirely standalone -- you needn't be aware of
its SDP before decoding. (SDP would still be used for negotiating the
high-level encoding options, but it's really just advisory; you could
get away with just broadcasting raw RTP and the client has all the
information necessary to decode.)

On the server, if it knows it can receive RTCP from the client, it
doesn't bother sending any audio/video until the codebook its using has
been acknowledged. (Likewise, the server knows to generate a video
keyframe after codebook acknowledgment, further reducing the perceived
end-to-end delay between clicking "broadcast" and the receiver seeing an
image.)

And if the server can't receive RTCP from the client, it just uses an
exponential backoff of codebook delivery inserted into the video stream
(ie, it doesn't need to wait for advisory feedback it'll never get), or
uses a separate multicast stream, or any other mechanism that has
already been proposed.


Basically, I hear all this talk about HTTP being the "safe fallback
option", but in my experience, and for my application, it simply isn't.
I believe inline transmission with ack/retransmit would be superior in
almost all cases, eliminate the need for HTTP delivery altogether, and
thereby vastly reduce the client complexity while simultaneously
improving performance.

-david
Luca Barbato
2005-08-30 16:35:51 UTC
Permalink
Post by David Barrett
Also, I agree we *could* beat an existing round standard into our square
peg, but it wouldn't be pretty. A minimal, purpose-built RTCP profile
that solves just this problem might be appropriate. We needn't limit it
to just Xiph codecs, but it will be used by *at least* those.
I'm trying to separate the issues:

1 Have the vorbis-rtp baseline approved by ietf asap with the minimal
set of feature it will allows it to work on every scenario we are
planning to use

2 Make companion i-d for enhance the usage in conjunction with other
protocols and make them accepted on a second time.
Post by David Barrett
Granted, when you only have a hammer, everything looks like a nail. But
I would vastly prefer to send a *single packet* along a RTCP stream that
is *already established* -- even if it means creating a new RTCP profile
-- to any of the proposed alternatives.
HTTP site, RTSP messaging, codebook by email or XMPP or whichever Out of
Band system isn't a concern of baseline, I'd cosinder it allowed options
by default and I'd like to have them regulated in other linked drafts.
Post by David Barrett
Basically, I hear all this talk about HTTP being the "safe fallback
option", but in my experience, and for my application, it simply isn't.
I believe inline transmission with ack/retransmit would be superior in
almost all cases, eliminate the need for HTTP delivery altogether, and
thereby vastly reduce the client complexity while simultaneously
improving performance.
To sum it up: you almost like Federico's/OMS group proposal about inband
packet delivery but you prefer it asyncronous and not timed.

The problem with asyncronous delivery is that in the conference scenario
you may be exposed to floods. A good compromise is to set the
retransmission time to be related to the number of later joining clients.

Currently we have those possibilities:

+ No chaining by default with optional chaining support by an in payload
chainID (hashed)

+ Chaining is band/out band using a preset flag->codebook map.
The flag proposed is 8bit long and read as integer, timestamp and marker
bit could have their use to mark the validity of the mappings.

+ As the previous but with a longer hash

+ Codebook retransmission and packet skipping, with support for
chaining, yet to decide if is more effective which way to mark the
codebook validity. The retransmission could be periodic, asyncronous or
a compromise between the two.

I hope I haven't miss anything.


lu
--
Luca Barbato

Gentoo/linux Developer Gentoo/PPC Operational Leader
http://dev.gentoo.org/~lu_zero
David Barrett
2005-08-30 19:07:04 UTC
Permalink
Post by Luca Barbato
1 Have the vorbis-rtp baseline approved by ietf asap with the minimal
set of feature it will allows it to work on every scenario we are
planning to use
2 Make companion i-d for enhance the usage in conjunction with other
protocols and make them accepted on a second time.
Yes, this sounds like a good plan.
Post by Luca Barbato
HTTP site, RTSP messaging, codebook by email or XMPP or whichever Out of
Band system isn't a concern of baseline, I'd cosinder it allowed options
by default and I'd like to have them regulated in other linked drafts.
Ok, so are you suggesting the baseline vorbis-rtp draft include basic
inline codebook transmission (which is adequate for all cases, but
non-optimal for many), but no out-of-band systems, and no
acknowledge/retransmit RTCP profile?

If so, I think that's a good idea.
Post by Luca Barbato
To sum it up: you almost like Federico's/OMS group proposal about inband
packet delivery but you prefer it asyncronous and not timed.
Correct.
Post by Luca Barbato
The problem with asyncronous delivery is that in the conference scenario
you may be exposed to floods. A good compromise is to set the
retransmission time to be related to the number of later joining clients.
Ok, that's a good point. I've been focused on much smaller groups where
flooding isn't an issue. This is another good reason to separate the
out-of-band/retransmit mechanisms from the base draft, as it's clear the
usage scenario greatly affects which you might want to use.
Post by Luca Barbato
+ No chaining by default with optional chaining support by an in payload
chainID (hashed)
+ Chaining is band/out band using a preset flag->codebook map.
The flag proposed is 8bit long and read as integer, timestamp and marker
bit could have their use to mark the validity of the mappings.
+ As the previous but with a longer hash
+ Codebook retransmission and packet skipping, with support for
chaining, yet to decide if is more effective which way to mark the
codebook validity. The retransmission could be periodic, asyncronous or
a compromise between the two.
I hope I haven't miss anything.
I'm not sure I followed those, but regardless, these are simply
possibilities for additional drafts -- ie, you are recommending that
none of these options are included in the base vorbis-rtp draft, correct?

-david
Ralph Giles
2005-08-30 19:25:45 UTC
Permalink
Post by David Barrett
Ok, so are you suggesting the baseline vorbis-rtp draft include basic
inline codebook transmission (which is adequate for all cases, but
non-optimal for many), but no out-of-band systems, and no
acknowledge/retransmit RTCP profile?
So if we do this, does every packet have a codebook id field, or
none, with the plan being to use the Reserved flag to indicate
it's presense in said later drafts?

-r
David Barrett
2005-08-30 20:02:02 UTC
Permalink
Post by Ralph Giles
Post by David Barrett
Ok, so are you suggesting the baseline vorbis-rtp draft include basic
inline codebook transmission (which is adequate for all cases, but
non-optimal for many), but no out-of-band systems, and no
acknowledge/retransmit RTCP profile?
So if we do this, does every packet have a codebook id field, or
none, with the plan being to use the Reserved flag to indicate
it's presense in said later drafts?
Good question.

The question of which codebook delivery methods to use is entirely
independent from the question of whether to support chaining. Even if
we never go beyond the baseline codebook delivery method (inline without
retransmit/acknowledgement), the baseline method is entirely sufficient
to support chaining (especially for files, which are lossless).

Furthermore, all codebook delivery options are valid, irrespective of
whether or not chaining is supported. None of the codebook delivery
options makes the problems associated with chaining better or worse.

Thus I'd say if we ever intend to support chaining, it the one byte
header should be there from the start, else it shouldn't ever be there.


So at this point, if I could have my choice, I'd say:

- Yes vorbis-rtp supports chaining, through a one-byte codebookID field
in the header.
- The only codebook mechanism supported by the baseline spec is inline
delivery, without ack/retransmit.
- Additional codebook delivery mechanisms are documented in subsequent
drafts; the SDP indicates which the server/client supports.
- A stream can support up to 255 codebooks over its lifetime.

However, I do see the problems associated with chaining, as Aaron
pointed out.
Post by Ralph Giles
There are several problems with allowing chaining to be supported in a general
sense.
- The client has no way to determine if it can actually play the stream
properly...
- The server might guess a bad value for the RTP timestamp sample rate...
- Managing coordination of codebook downloads in the middle of playback is
non-trivial...
However, the first could be handled in SDP (ie, when negotiating the
media types, you negotiate all codebooks types you intend to use, up
front). The second could be handled by stating that all chains must use
the same timestamp sample rate. (Prevents chaining of different
timestamp sample rates, but at least it doesn't prevent chaining
altogether.) And the third is the cost of doing business.

Furthermore, I'm assuming a single stream can't switch codecs (ie, from
Theora to MPEG); it can merely swich codebooks within a single codec
(ie, Theora, Speex, etc).

-david
Tor-Einar Jarnbjo
2005-08-30 20:19:30 UTC
Permalink
Post by David Barrett
- The only codebook mechanism supported by the baseline spec is inline
delivery, without ack/retransmit.
How do you expect the client to handle transmission loss of parts of the
codebook header?

Tor
David Barrett
2005-08-30 20:32:42 UTC
Permalink
Post by Tor-Einar Jarnbjo
Post by David Barrett
- The only codebook mechanism supported by the baseline spec is inline
delivery, without ack/retransmit.
How do you expect the client to handle transmission loss of parts of the
codebook header?
Well to start, if you're delivering RTP via a lossless channel (over
TCP, or in a file) then the baseline works just fine and I'd stick with
that.

However, generally you are using UDP which is unreliable, and so I
agree, that's the problem. Whether loss of part or the whole thing, the
baseline isn't that great. If I were writing a plain vanilla baseline
broadcaster, I'd use some kind of exponential backoff of the codebook.
For example I'd send at:

0ms
1000ms
2000ms
4000ms
8000ms and every 8s thereafter

Unfortunately it's never *guaranteed* to arrive, but it's highly
probable. Frankly that's not reassuring to me, which is why I probably
wouldn't write a baseline broadcaster. Rather, I'd pick whichever
codebook delivery mechanism made the most sense for my situation:

- If I'm broadcasting to a small group using UDP, I'd use an optional
extension where the client sends a "codebook acknowledged" whenever it
receives the codebook. Then the broadcaster would just keep sending the
codebook every second until it receives a "codebook acknowledged" before
sending any audio/video.

- If I were broadcasting over multicast, I might use an optional
extension where codebooks are just sent periodically over a separate
channel. Then clients subscribe to this channel long enough to get the
codebook, and then unsubscribe after they've got it.

- If I'm broadcasting from a webserver, I might use an optional
extension where codebooks are delivered over HTTP via URLs embedded in
the SDP.

I'm sure there are other strategies that are appropriate for other
situations. The key thing is, however, the baseline RTP packet format
remains the same for all these situations, and it's only the codebook
delivery mechanism that changes.

Does this make sense?

-david
David Barrett
2005-08-30 21:14:58 UTC
Permalink
Post by David Barrett
Well to start, if you're delivering RTP via a lossless channel (over
TCP, or in a file) then the baseline works just fine and I'd stick
with that.
It would, but you are not guaranteed that RTP is transmitted over a
lossless channel, in most practical implementations it is not and I
can't image that IETF will approve an RFC for an RTP based protocol
which requires a lossless transport mechanism.
Yes, I agree entirely; I was just mentioning this as one area where
baseline vorbis-rtp works without trouble. I agree, it's a corner case.
Depending on the codebook size and available bandwidth, it would not be
practiable with such tight intervals, as the codebooks will consume too
much bandwidth. I assume that you in this case would start to transmit
audio data shortly after transmitting the first codebook burst. If the
first receptions of the codebook are failing, the client would either
have to cache audio data or skip the beginning of the stream. If joining
e.g. a radio channel, that might not be a problem, but if you order a
song "on demand", skipping the first few seconds of the song is probably
not what you expected and e.g. a mobile player will likely not have too
much memory to waste for an audio cache.
Again, I agree 100%. I totally agree the baseline has a number of
disadvantages for streaming broadcast scenarios. I merely state that
there is no single improvement on it that works in all situations. Thus
I'm just endorsing Luca's proposal that the baseline be separated from
the optional codebook delivery mechanisms, leaving it up to the
broadcaster/receiver to negotiate the best way to deliver codebooks.
I'm not really convinced. Using FEC has already been discussed as a
solution, and if inband RTP transmission of the codebook header is to be
the generic "compatible base" with only optional alternatives, I would
rather go for an attempt to achieve a robust transmission of the
codebook header using FEC encoding. There is still a chance that the
client fails to receive the codebook header, but in that case, packet
loss will probably be too high for the client to stream at all. A nice
side effect would of course be that also the audio stream itself with
little effort could be secured against packet loss using FEC as well.
Forward error correction is another fine optional delivery mechanism,
and I'd add it to the list. But I don't believe it's practical or
necessary to mandate that *all* recievers MUST support FEC, any more
than that all MUST support multicast codebook delivery.

Standards are best when they focus on areas of agreement. The baseline
spec is identical for all situations. But codebook delivery depends
heavily on the usage model, and thus I think it's reasonable to split it
out and leave it up to the developers to pick the best one.

-david

PS: Sorry for the explosion of posts today. This is just an issue that
happens to directly impact my current use of Theora.
David Barrett
2005-08-30 21:54:57 UTC
Permalink
Post by David Barrett
Standards are best when they focus on areas of agreement. The
baseline spec is identical for all situations. But codebook delivery
depends heavily on the usage model, and thus I think it's reasonable
to split it out and leave it up to the developers to pick the best one.
And you don't expect this to end up with servers and clients which are
only compatible using a delivery method which is not practically usable
for common streaming situations, whereas only the client from vendor X
works well together with the server from vendor X, as each vendors
developers have different opinions on which alternative is "the best one"?
(I'm assuming you meant to post this to the list; I apologize in advance
if I assumed wrong.)

I see your concern here, but I don't share it. Vendor lock-in generally
happens due to proprietary, non-standard extensions -- and that's not
what we have here. (And incidentally, nothing we say prevents vendor
lock in should a vendor wish it.)

What we have are usage models that call for different codebook delivery
techniques. I would imagine that all client/servers targeting the same
usage model would tend toward the same codebook delivery technique. For
example, web radio servers would use the same technique, and thus they
would all be compatible. All VoIP techniques would use the same
technique, and thus they would all be compatible.

Now I'll grant it's possible that VoIP and web-radio would do codebook
delivery different, but they do *everything* different. It's not like
if only they used the same codebook delivery technique they'd be
compatible -- they'll never be compatible, no matter what we say,
because they're totally different products.

It's not our goal to make all vorbis-enabled products interchangeable,
and it shouldn't be. Rather, I would state our goal as ensuring that
all products that *want* to be compatible can do so in the easiest
possible way. And part of being "easy" is not forcing products to
support codebook delivery mechanisms that they don't use.

-david
Tor-Einar Jarnbjo
2005-09-01 19:20:58 UTC
Permalink
Post by David Barrett
(I'm assuming you meant to post this to the list; I apologize in
advance if I assumed wrong.)
Of course, my mistake.
Post by David Barrett
I see your concern here, but I don't share it. Vendor lock-in
generally happens due to proprietary, non-standard extensions -- and
that's not what we have here. (And incidentally, nothing we say
prevents vendor lock in should a vendor wish it.)
No, but I was not thinking of incompatibilites built on purpose. If the
standard requires a method A for codebook delivery with optional methods
B, C, D, E and F, of which one of the optional methods is necessary to
make common use of the standard, as method A is not probably not usable,
I have no doubt, that this will be a critical problem.
Post by David Barrett
What we have are usage models that call for different codebook
delivery techniques. I would imagine that all client/servers
targeting the same usage model would tend toward the same codebook
delivery technique. For example, web radio servers would use the same
technique, and thus they would all be compatible. All VoIP techniques
would use the same technique, and thus they would all be compatible.
I don't know about you, but I tend to use the same client for all
thinkable usages for Vorbis over RTP and this client is not "targeting"
any special usage model. I doubt that Vorbis will be used for VoIP, as
the codec latency is far from ideal.
Post by David Barrett
Now I'll grant it's possible that VoIP and web-radio would do codebook
delivery different, but they do *everything* different. It's not like
if only they used the same codebook delivery technique they'd be
compatible -- they'll never be compatible, no matter what we say,
because they're totally different products.
Where do you get these ideas from? I have no problem at the moment to
use one client to listen to most of the webradios out there, no matter
if they use HTTP or RTP as transport for their content and no matter
which radio is broadcasting. The ones using Vorbis are limited to HTTP,
but isn't the point of getting a standard for Vorbis over RTP, that a.o.
web radios can be able to use the standard? Is it realistic to assume,
that all the web radios transmitting Vorbis over HTTP at the moment with
all compatible clients will switch to another protocol without being
sure that their listeners can continue using mostly the same software?
Post by David Barrett
It's not our goal to make all vorbis-enabled products interchangeable,
and it shouldn't be.
Who are "we"? Can someone from the Xiph confirm that Xiph's attitude on
this subject is that there is no interest in enforcing a compatibility
between different RTP media servers and clients?

Tor
David Barrett
2005-09-01 20:48:02 UTC
Permalink
Post by Tor-Einar Jarnbjo
Post by David Barrett
I see your concern here, but I don't share it. Vendor lock-in
generally happens due to proprietary, non-standard extensions -- and
that's not what we have here. (And incidentally, nothing we say
prevents vendor lock in should a vendor wish it.)
No, but I was not thinking of incompatibilites built on purpose. If the
standard requires a method A for codebook delivery with optional methods
B, C, D, E and F, of which one of the optional methods is necessary to
make common use of the standard, as method A is not probably not usable,
I have no doubt, that this will be a critical problem.
Ok, yes, I see this concern. Any protocol with optional extensions
necessarily runs the risk that two similar clients/servers decide to
implement different optional extensions, and that's a problem.

However, the only way to mitigate that risk is to assert that all
clients/servers MUST support all codebook delivery methods. But there
are so many options, and they are so dependent on the particular usage
model, that I'm afraid (or rather, expect) people will only implement
the non-standard subset that is relevant to their specific usage model.

Thus as much as I would like to have a world where all clients support
all codebook delivery models, I simply don't see that as practical. So
I'd much rather take a world where people implement a standard base plus
some optional standard extensions, rather than a world where people each
implement a different non-standard subset of some enormous base standard.

In other words, I believe it's inevitable for different clients/servers
to support different codebook delivery methods, and I propose the best
way to handle this unfortunate situation is with base RTP payload
standards for Theora, Speex, and Vorbis, and then optional codebook
delivery specifications on top of that, shared by all.

That said, I wouldn't be at all opposed to recommendations like "If you
are building X type of application, you SHOULD support Y codebook
delivery mechanism". Just so long as all application types A-Z needn't
support all codebook delivery mechanisms 0-9.
Post by Tor-Einar Jarnbjo
I don't know about you, but I tend to use the same client for all
thinkable usages for Vorbis over RTP and this client is not "targeting"
any special usage model. I doubt that Vorbis will be used for VoIP, as
the codec latency is far from ideal.
I apologize; I think I've been speaking inaccurately.

It's my impression that all Xiph codecs (Vorbis, Theora, and Speex) have
the exact same codebook delivery problem (too many options, too little
time to build them all). Thus I don't mean to limit my argument to
Vorbis alone, but rather state generally that Speex, Theora, and Vorbis
would each be best served by a tight base payload standard, with
optional codebook delivery extensions.

Thus in my ideal world, there would be the following IETF standards:

1) Theora RTP payload
2) Speex RTP payload
3) Vorbis RTP payload
4...n) Optional codebook delivery extensions (shared by all Xiph codecs)
Post by Tor-Einar Jarnbjo
Post by David Barrett
It's not our goal to make all vorbis-enabled products interchangeable,
and it shouldn't be.
Who are "we"? Can someone from the Xiph confirm that Xiph's attitude on
this subject is that there is no interest in enforcing a compatibility
between different RTP media servers and clients?
That's fair, I'm only speaking my opinion as one avid user of Theora and
Speex over RTP. And I'm not sure who comprises "Xiph", but I know I
don't in any official capacity.

However, even if there is *interest* in enforcing compatibility, I
question if there's the *capability* to do so. If the base standard
mandates that adopters create technology that's wholly inappropriate for
their usage model (such as forcing my VoIP application to support
multicast codebook delivery), I think it's just begging for incomplete
implementations.

-david
Tor-Einar Jarnbjo
2005-09-01 21:21:30 UTC
Permalink
Post by David Barrett
Ok, yes, I see this concern. Any protocol with optional extensions
necessarily runs the risk that two similar clients/servers decide to
implement different optional extensions, and that's a problem.
I agree with you on this,
Post by David Barrett
However, the only way to mitigate that risk is to assert that all
clients/servers MUST support all codebook delivery methods. But there
are so many options, and they are so dependent on the particular usage
model, that I'm afraid (or rather, expect) people will only implement
the non-standard subset that is relevant to their specific usage model.
but disagree here. Are there really that many usage models for which
Vorbis over RTP is a feasible protocol? I am not sure about Theora, but
Vorbis is IMHO not usable for scenarios requiring a low latency, which
includes VoIP and conferencing applications. I might have missed some of
your opinions on this, but what are your concerns against HTTP and a
separate RTP stream as not only the two mandatory delivery methods among
several optional, but as the only two delivery methods defined? The
implementation will of course be slightly more complicated, but it is
not exactly rocket science to implement a minimalistic HTTP stack able
to handle codebook delivery and/or use an HTTP client library on the
client side.
Post by David Barrett
In other words, I believe it's inevitable for different
clients/servers to support different codebook delivery methods, and I
propose the best way to handle this unfortunate situation is with base
RTP payload standards for Theora, Speex, and Vorbis, and then optional
codebook delivery specifications on top of that, shared by all.
Is this problem really relevant for Speex? I am not 100% into the stream
definition, but I was pretty sure that Speex does not require reliable
delivery of any part of the stream.
Post by David Barrett
It's my impression that all Xiph codecs (Vorbis, Theora, and Speex)
have the exact same codebook delivery problem (too many options, too
little time to build them all).
I'm not very familiar with Theora and Speex, but to be honest, it is my
impression that Xiph built Ogg and Vorbis far too complex and without
much respect to existing standards, common practice and media codecs.
Chaining and dynamic codebooks probably beeing the two most adventurous
features. This makes it difficult to integrate Ogg/Vorbis into commonly
used media frameworks like Quicktime or Windows Media, or to pack a
Vorbis stream into other containers than Ogg, as which RTP may be
considered in this sense.
Post by David Barrett
That's fair, I'm only speaking my opinion as one avid user of Theora
and Speex over RTP. And I'm not sure who comprises "Xiph", but I know
I don't in any official capacity.
Neither am I, but have you ever seen any software writing Ogg/Vorbis
files, which can not be read by any other software which supports
reading Ogg/Vorbis files? I think this is a very perfect analogy to RTP
servers (producing Vorbis streams) and RTP clients (consuming Vorbis
streams).
Post by David Barrett
However, even if there is *interest* in enforcing compatibility, I
question if there's the *capability* to do so. If the base standard
mandates that adopters create technology that's wholly inappropriate
for their usage model (such as forcing my VoIP application to support
multicast codebook delivery), I think it's just begging for incomplete
implementations.
Well, the RTP draft for Vorbis has been around for a couple of years
now, so I don't see any point in rushing to get an unusable RFC through
the clearing process. If it takes more effort to complete a usable RFC
now than there is available capacity, why not wait? I already have quite
a lot of code snippets laying around waiting for an acceptable RFC to be
finished, so a reference implementation can be put together with not too
much effort.

Tor
David Barrett
2005-09-01 22:13:23 UTC
Permalink
Post by Tor-Einar Jarnbjo
Are there really that many usage models for which
Vorbis over RTP is a feasible protocol? ...
I might have
missed some of your opinions on this, but what are your concerns
against HTTP and a separate RTP stream as not only the two mandatory
delivery methods among several optional, but as the only two delivery
methods defined?
Well, I don't claim to know all the uses, but in my case I'm using Speex
and Theora (and eventually Vorbis) in a P2P scenario where peers are
inaccessible via TCP because they're behind NATs and firewalls. Thus:

1) HTTP is cumbersome because I'd need to have clients post their
codebooks to a TCP-accessible server/peer, which undermines the value of
a decentralized system and introduces greater points of failure.

2) A separate RTP stream brings no benefit as whether I send codebooks
inline on the data RTP stream, or in a separate RTP stream, the
broadcaster still has no idea if it ever arrives, and the receiver has
no way of triggering a retransmit. This leads to the wasted bandwidth
and high setup latency that you spoke about earlier.

So in my case, neither of the two methods work well for me. Yes, I can
force them to work if they're the only options, but I'd prefer more
options -- such as a "codebook ACK" packet sent via RTCP -- and I'd
prefer to do it without breaking the standard.
Post by Tor-Einar Jarnbjo
Is this problem really relevant for Speex? I am not 100% into the
stream definition, but I was pretty sure that Speex does not require
reliable delivery of any part of the stream.
Hm, I think you might be right -- Speex can make due with just what's
defined out of band, such as with SDP (which needs to be delivered
reliably, but that's a separate concern). Regardless, if the problem
only exists for Vorbis and Theora (and possibly also Flac), and if the
problem is identical for both, that seems sufficient justification for
standardizing the solution between the two.
Post by Tor-Einar Jarnbjo
Neither am I, but have you ever seen any software writing Ogg/Vorbis
files, which can not be read by any other software which supports
reading Ogg/Vorbis files? I think this is a very perfect analogy to
RTP servers (producing Vorbis streams) and RTP clients (consuming
Vorbis streams).
I agree, that's a good analogy, and I don't dispute the vision is
compelling. I just don't think it's practical given the differences
between live RTP streams and standalone files.

-david
Tor-Einar Jarnbjo
2005-09-01 22:43:40 UTC
Permalink
Post by David Barrett
Well, I don't claim to know all the uses, but in my case I'm using
Speex and Theora (and eventually Vorbis) in a P2P scenario where peers
are inaccessible via TCP because they're behind NATs and firewalls.
No offense, but this sounds like a rather unusual usecase and I am not
sure if your requirements here should be used as an argument for how the
RFC should be formulated. Without proper NAT and/or firewall
configuration, most computers are not accessible via UDP either, but it
is not in the scope of the Vorbis/RTP RFC to solve all sorts of network
configuration issues.
Post by David Barrett
1) HTTP is cumbersome because I'd need to have clients post their
codebooks to a TCP-accessible server/peer, which undermines the value
of a decentralized system and introduces greater points of failure.
Fair enough here, but your arguments againts the second variant are
completely wrong.
Post by David Barrett
2) A separate RTP stream brings no benefit as whether I send codebooks
inline on the data RTP stream, or in a separate RTP stream, the
broadcaster still has no idea if it ever arrives, and the receiver has
no way of triggering a retransmit. This leads to the wasted bandwidth
and high setup latency that you spoke about earlier.
No. The RTP setup order could in this case look like this (assuming RTSP
and SDP):

- the client issues an RTSP describe request for the main URL and gets
an SDP response from the server containing another RTSP url (and
potentially a chechsum/hash) for the codebook stream

- the client issues an RTSP setup request followed by a play request for
the codebook stream, the codebook data is repeated endlessly at appr.
the audio stream bitrate

- as soon as the client has a complete codebook (probably within a
couple of seconds), the codebook stream is closed with an RTSP teardown
request

- the client can now continue session initialization (RTSP setup and
play) with the main URL

In your case, including a codebook checksum or hash in the SDP would
probably make much sense, as I suppose the servers use the same codebook
most of the time and the client may decide without server knowledge to
cache the codebook and skip codebook retrieval completely.In any case,
no bandwidth is wasted during media streaming for codebook
retransmissions and the server does not have to care about if the
client is having the codebook data or not, as the client is either
clearly stating "start audio streming now" or "send codebook now".
Post by David Barrett
So in my case, neither of the two methods work well for me. Yes, I
can force them to work if they're the only options, but I'd prefer
more options -- such as a "codebook ACK" packet sent via RTCP -- and
I'd prefer to do it without breaking the standard.
It's not breaking any standard to refer to other resources in the SDP
using an URI. It is even mentioned as a usecase in the SDP standard
(RFC2327, section 5.4).
Post by David Barrett
Hm, I think you might be right -- Speex can make due with just what's
defined out of band, such as with SDP (which needs to be delivered
reliably, but that's a separate concern). Regardless, if the problem
only exists for Vorbis and Theora (and possibly also Flac), and if the
problem is identical for both, that seems sufficient justification for
standardizing the solution between the two.
Makes sense, but the common standard does not have to be inband RTP.

Tor
David Barrett
2005-09-02 00:12:55 UTC
Permalink
Post by Tor-Einar Jarnbjo
Post by David Barrett
Well, I don't claim to know all the uses, but in my case I'm using
Speex and Theora (and eventually Vorbis) in a P2P scenario where peers
are inaccessible via TCP because they're behind NATs and firewalls.
No offense, but this sounds like a rather unusual usecase and I am not
sure if your requirements here should be used as an argument for how the
RFC should be formulated. Without proper NAT and/or firewall
configuration, most computers are not accessible via UDP either, but it
is not in the scope of the Vorbis/RTP RFC to solve all sorts of network
configuration issues.
None taken, but I disagre it's at all unusual: it's the classic
VoIP/videoconferencing case. All if it is going P2P (think Skype) using
the same NAT/firewall penetration techniques I'm using. The primary
standards are SIP and RTP (not RTSP), and all the networking is UDP. If
Theora and Speex are to be considered for this entire industry, this
problem needs solving in one way or another, and the two options you
present are cumbersome.
Post by Tor-Einar Jarnbjo
Post by David Barrett
2) A separate RTP stream brings no benefit as whether I send codebooks
inline on the data RTP stream, or in a separate RTP stream, the
broadcaster still has no idea if it ever arrives, and the receiver has
no way of triggering a retransmit. This leads to the wasted bandwidth
and high setup latency that you spoke about earlier.
No. The RTP setup order could in this case look like this (assuming RTSP
(I'm not using RTSP, but an equilvalent method would work with SIP.)
Yes, that would work. But consider the two options:

1) Establish two RTP streams, deliver codebooks over one, media over
another, kill codebook stream when received.
2) Establish one RTP stream, deliver codebooks and media on it, send
codebook ACK in RTCP profile when received.

Both "work" in the sense that you can write a program to do it in each
way. Both have comparable performance characteristics. But neither is
plainly superior. I assume #1 fits your architecture better. I can
guarantee, #2 fits mine.

I'm merely advocating that we are not smart enough to pick the
end-all-be-all solution for codebook delivery. This discussion alone
(not to mention the many similar discussions that have proceeded this)
is proof enough for me that there are strong opinions and reasonable
arguments in favor of competing options.

Whether we mandate supporting all, mandate supporting some, or leave
everything optional to the developer -- somebody will be dissatisfied.
And at the end of the day, I vastly prefer a RFC that errs on the side
mandating too little, than mandating too much.

-david
Tor-Einar Jarnbjo
2005-09-02 08:09:14 UTC
Permalink
Post by David Barrett
None taken, but I disagre it's at all unusual: it's the classic
VoIP/videoconferencing case.
It is, but as I already pointed out: Vorbis _is not_ suitable for low
latency use cases like VoIP or video conferencing, so there is no point
in considering this when writing the RFC. The Vorbis codec itself has a
realtively high latency, it is not coping very well with packet loss -
enforcing a longer receive buffer to compensate for transmission jitter
and it is tuned for music and not for speech compression. For the same
reasons, noone is seriously using MP3, WMA or similar for telephony or
conferencing. I'm not sure about this, but I expect Theora to have many
of the same limitations and I'm not really convinced if it's suitable
for video conferencing either.
Post by David Barrett
All if it is going P2P (think Skype) using the same NAT/firewall
penetration techniques I'm using. The primary standards are SIP and
RTP (not RTSP), and all the networking is UDP. If Theora and Speex
are to be considered for this entire industry, this problem needs
solving in one way or another, and the two options you present are
cumbersome.
You already realized this yourself, but my suggestions would work
perfectly with SIP as well as with RTSP. The only requirement for the
extra RTP stream to work is that the client is able to setup an RTP
stream and the server is able to transmit one. If they weren't, they
wouldn't be able to setup the audio stream either, so I can't think of
any situation where it won't work. In a SIP situation, the client would
"dial" the main audio stream, getting a reference to the codebook stream
in the SDP. It could then dial the codebook address and get the codebook
while e.g. letting the main address ring.
Post by David Barrett
(I'm not using RTSP, but an equilvalent method would work with SIP.)
1) Establish two RTP streams, deliver codebooks over one, media over
another, kill codebook stream when received.
2) Establish one RTP stream, deliver codebooks and media on it, send
codebook ACK in RTCP profile when received.
Both "work" in the sense that you can write a program to do it in each
way. Both have comparable performance characteristics. But neither
is plainly superior. I assume #1 fits your architecture better. I
can guarantee, #2 fits mine.
As I already pointed out, I'm not having any architecture to target and
I am trying to speak generally and not about a specific use case. OTOH,
if you see for which purposes music streaming is actually being used at
the moment, you are mainly limited to real time radio broadcasts and
specific songs on demand. At least I assume that multicast will become
more popular in the future, at the moment it is simply not supported by
most IPs or in the internet backbone. The difference I see between the
two options you describe are that 1 will work for both multicast,
unicast and without client feedback (and is the only really scalable
solution yet being discussed) and 2 will only for for unicast with
client feedback. Especially for multicast scenarios, it is important
that the mechanism should work without client feedback.
Post by David Barrett
I'm merely advocating that we are not smart enough to pick the
end-all-be-all solution for codebook delivery. This discussion alone
(not to mention the many similar discussions that have proceeded this)
is proof enough for me that there are strong opinions and reasonable
arguments in favor of competing options.
Yes, if we could just spend time being constructive instead of wasting
time on discussing completely off the edge problems, we might have been
much further. As long as people are making definite statements based on
"random values" and ends up with a conclusion containing results far off
any realistic values (Luca Barbato: 0,4 vs >8s for codebook delivery)
and you are trying to make the mandatory part of the RFC fit a use case,
for which Vorbis was never designed, we won't get anywhere.
Post by David Barrett
Whether we mandate supporting all, mandate supporting some, or leave
everything optional to the developer -- somebody will be dissatisfied.
And at the end of the day, I vastly prefer a RFC that errs on the side
mandating too little, than mandating too much.
Yes I may agree on that, but we obviously disagree on exactly what.
Mandating only inband RTP delivery of the codebook makes it an IMHO
completley unusable RFC and I expect IETF to share that opinion.

Tor
David Barrett
2005-09-02 15:01:47 UTC
Permalink
Ok, I think we've covered all the bases we're going to cover. I'm going
to attempt a summary of my position -- Tor, will you please do the same?
(ie, please don't respond point-by-point to my position, just summarize
yours)

1) For the P2P architecture I am currently developing with Speex and
Theora, neither HTTP nor inline unacknowledged codebook delivery are
ideal options. Thus I propose a new option, inline with codebook ACK
(which, incidentally, works well with chaining to support seamless
adaptive encoding).

2) I intend to use Vorbis in the same architecture, and would like to
use the same solution (inline with codebook ACK).

3) I believe my use case (the common P2P case) is fundamentally
different than today's common web radio case and tomorrow's
theoretically-common multicast case. Indeed, I supect there are more
cases, some of which might warrant yet more codebook delivery options.

4) Given all these delivery options, I don't want to build them all, and
I expect others won't want to either. Thus I propose we make *all*
codebook delivery options into optional extensions, with some
suggestions about which options are recommended for which type of
application (P2P, client/server, multicast, TCP, etc.).

5) As a compromise, I'd be open to picking one option as a "baseline"
that all decoders support. If we do this, I recommend picking the
easiest one (unacknowledged inline codeboks) so as to minimize time
spent developing unused features.

6) Finally, I believe the IETF would accept the above proposals just
fine.


Anywhere, that's where I stand. And I'll note that for me, this is not
some abstract thought experiment. My recommendations are based on
solving today's problems using today's internet, intended for
integration with an architecture and application that exists today.

-david
Post by Tor-Einar Jarnbjo
Post by David Barrett
None taken, but I disagre it's at all unusual: it's the classic
VoIP/videoconferencing case.
It is, but as I already pointed out: Vorbis _is not_ suitable for low
latency use cases like VoIP or video conferencing, so there is no point
in considering this when writing the RFC. The Vorbis codec itself has a
realtively high latency, it is not coping very well with packet loss -
enforcing a longer receive buffer to compensate for transmission jitter
and it is tuned for music and not for speech compression. For the same
reasons, noone is seriously using MP3, WMA or similar for telephony or
conferencing. I'm not sure about this, but I expect Theora to have many
of the same limitations and I'm not really convinced if it's suitable
for video conferencing either.
Post by David Barrett
All if it is going P2P (think Skype) using the same NAT/firewall
penetration techniques I'm using. The primary standards are SIP and
RTP (not RTSP), and all the networking is UDP. If Theora and Speex
are to be considered for this entire industry, this problem needs
solving in one way or another, and the two options you present are
cumbersome.
You already realized this yourself, but my suggestions would work
perfectly with SIP as well as with RTSP. The only requirement for the
extra RTP stream to work is that the client is able to setup an RTP
stream and the server is able to transmit one. If they weren't, they
wouldn't be able to setup the audio stream either, so I can't think of
any situation where it won't work. In a SIP situation, the client would
"dial" the main audio stream, getting a reference to the codebook
stream in the SDP. It could then dial the codebook address and get the
codebook while e.g. letting the main address ring.
Post by David Barrett
(I'm not using RTSP, but an equilvalent method would work with SIP.)
1) Establish two RTP streams, deliver codebooks over one, media over
another, kill codebook stream when received.
2) Establish one RTP stream, deliver codebooks and media on it, send
codebook ACK in RTCP profile when received.
Both "work" in the sense that you can write a program to do it in each
way. Both have comparable performance characteristics. But neither
is plainly superior. I assume #1 fits your architecture better. I
can guarantee, #2 fits mine.
As I already pointed out, I'm not having any architecture to target and
I am trying to speak generally and not about a specific use case. OTOH,
if you see for which purposes music streaming is actually being used at
the moment, you are mainly limited to real time radio broadcasts and
specific songs on demand. At least I assume that multicast will become
more popular in the future, at the moment it is simply not supported by
most IPs or in the internet backbone. The difference I see between the
two options you describe are that 1 will work for both multicast,
unicast and without client feedback (and is the only really scalable
solution yet being discussed) and 2 will only for for unicast with
client feedback. Especially for multicast scenarios, it is important
that the mechanism should work without client feedback.
Post by David Barrett
I'm merely advocating that we are not smart enough to pick the
end-all-be-all solution for codebook delivery. This discussion alone
(not to mention the many similar discussions that have proceeded this)
is proof enough for me that there are strong opinions and reasonable
arguments in favor of competing options.
Yes, if we could just spend time being constructive instead of wasting
time on discussing completely off the edge problems, we might have been
much further. As long as people are making definite statements based on
"random values" and ends up with a conclusion containing results far
off any realistic values (Luca Barbato: 0,4 vs >8s for codebook
delivery) and you are trying to make the mandatory part of the RFC fit
a use case, for which Vorbis was never designed, we won't get anywhere.
Post by David Barrett
Whether we mandate supporting all, mandate supporting some, or leave
everything optional to the developer -- somebody will be dissatisfied.
And at the end of the day, I vastly prefer a RFC that errs on the side
mandating too little, than mandating too much.
Yes I may agree on that, but we obviously disagree on exactly what.
Mandating only inband RTP delivery of the codebook makes it an IMHO
completley unusable RFC and I expect IETF to share that opinion.
Tor
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp
Aaron Colwell
2005-09-02 17:10:21 UTC
Permalink
It looks like you beat me to suggesting that a summary for each proposal be
written. The "Chaining" thread seemed to be getting out of control.

Here is my position.

1. I think inline codebook transmission should be part of the base spec. I
don't think a inline codebook delivery strategy should be discussed in
detail. It should just be noted that loss can occur and periodic
transmission or selective retransmissions are ways to handle loss.

2. I wouldn't mind if inline transmission with some sort of RTCP feedback
was also included in the base spec. I have some ideas on this topic that
I'd be willing to discuss on a seperate thread. After much thought I
prefer this solution now over the HTTP mechanism.

David, I'd be interested in your current solution. I think that would be
a good starting point for developing a standard for this codebook delivery
standard. Ideally I'd like to turn it into something more generic that
allows reliablility of some packets in an RTP session.

3. I think the HTTP codebook delivery scheme should be a seperate draft and
not part of the base spec. I have yet to see an RTP payload format that
requires another protocol like HTTP to allow it to function. I think it may
be a useful thing to specify, but I don't think it belongs in the base spec.

4. I think that we should create an SDP header that allows the available
codebook delivery mechanisms to be advertised. We would also need a way to
notify the server of what mechanism the client intends to use.

For RTSP at least, I belive that 3GPP's alt-group specifications provide the
necessary SDP magic to allow for this. Basically the idea is that each
delivery mechanism is associated with an alt-group. Since each alt-group
has a different a=control: line the server will be able to determine which
delivery mechanism the client wants by the URLs used in the SETUP requests.

I'm not as familiar with SIP, but I believe that there is a similar way to
pick a specific delivery profile in the offer/answer handshake.

Here is an initial proposal.

- For the case where the codebook delivery is completely handled by out of
band means. This might be selected if the client alreays has the
codebooks used in the stream.

a=codebook-delivery:out-of-band

- For a possible multicast case. This just indicates that the codebooks will
will be delivered inline on a periodic basis.

a=codebook-delivery:inline-periodic

- For the case where you want to do selective retransmission.

a=codebook-delivery:inline-selective-ack

- For the case where you want to use the HTTP codebook retrieval mechanism.
One could argue that you could just use the out-of-band case for this, but
I feel that this helps the client to know what other headers it should
look for when determining where to get the codebooks from.

a=codebook-delivery:http

5. Codebook MDS5 hashes and the ident header should be in the fmtp field no
matter what the codebook delivery mechanism is.


Aaron
David Barrett
2005-09-02 21:35:22 UTC
Permalink
Post by Aaron Colwell
Here is my position.
1. I think inline codebook transmission should be part of the base spec. I
don't think a inline codebook delivery strategy should be discussed in
detail. It should just be noted that loss can occur and periodic
transmission or selective retransmissions are ways to handle loss.
Agreed.
Post by Aaron Colwell
2. I wouldn't mind if inline transmission with some sort of RTCP feedback
was also included in the base spec. I have some ideas on this topic that
I'd be willing to discuss on a seperate thread. After much thought I
prefer this solution now over the HTTP mechanism.
I agree, though I'd suggest implementors only SHOULD support the RTCP
feedback method (and not MUST, in order to allow for unidirectional
broadcasts).
Post by Aaron Colwell
3. I think the HTTP codebook delivery scheme should be a seperate draft and
not part of the base spec. I have yet to see an RTP payload format that
requires another protocol like HTTP to allow it to function. I think it may
be a useful thing to specify, but I don't think it belongs in the base spec.
Agreed.
Post by Aaron Colwell
4. I think that we should create an SDP header that allows the available
codebook delivery mechanisms to be advertised. We would also need a way to
notify the server of what mechanism the client intends to use.
...
This sounds great!
Post by Aaron Colwell
5. Codebook MDS5 hashes and the ident header should be in the fmtp field no
matter what the codebook delivery mechanism is.
Agreed, on one condition: that decoders be hardened against the
possibility of the encoder and decoder using mismatching codebooks that
happen to have the same MD5 hash. Given how astronomically slight this
chance is I don't think it's necessary to really correct the situation
-- I think garbled output is fine, so long as it doesn't crash.

-david
Aaron Colwell
2005-09-02 21:59:53 UTC
Permalink
Post by David Barrett
Post by Aaron Colwell
2. I wouldn't mind if inline transmission with some sort of RTCP feedback
was also included in the base spec. I have some ideas on this topic that
I'd be willing to discuss on a seperate thread. After much thought I
prefer this solution now over the HTTP mechanism.
I agree, though I'd suggest implementors only SHOULD support the RTCP
feedback method (and not MUST, in order to allow for unidirectional
broadcasts).
Agreed.
Post by David Barrett
Post by Aaron Colwell
5. Codebook MDS5 hashes and the ident header should be in the fmtp field no
matter what the codebook delivery mechanism is.
Agreed, on one condition: that decoders be hardened against the
possibility of the encoder and decoder using mismatching codebooks that
happen to have the same MD5 hash. Given how astronomically slight this
chance is I don't think it's necessary to really correct the situation
-- I think garbled output is fine, so long as it doesn't crash.
Agreed.
Tor-Einar Jarnbjo
2005-09-04 21:35:19 UTC
Permalink
Post by David Barrett
Ok, I think we've covered all the bases we're going to cover. I'm
going to attempt a summary of my position -- Tor, will you please do
the same? (ie, please don't respond point-by-point to my position,
just summarize yours)
Ok, let me first explain why I think inline codebook delivery with or
without client acknowledge is one of the worst methods we have been
discussing until yet:

- You can't make a decent implementation of it for multicast. The only
possibility for inline codebook delivery to work with multicast would be
to continuously transmit the codebook data and hence either waste an
unaccepatable amount of bandwidth or introduce an unacceptable delay at
the beginning of a stream while the client waits for a complete codebook
set to be received. Even if multicast transmissions are not commonly
used today, more and more ISPs are at least starting to experiment with
multicast and it is the only feasible solution to avoid bandwidth
cludges as the internet will be used more as a transport medium for
audio and video streaming. For unicast scenarios, Ogg/Vorbis over HTTP
is already used quite a lot. As long as the Vorbis codec itself has a
realatively high latency and is not designed for low latency "real time"
streaming situations, unicast Vorbis over RTP won't bring much advantage
over Ogg/Vorbis/HTTP. I would expect an RFC for Vorbis over RTP, which
only allows unicast will be very much neglected an not very usable.

- Inline transmission with client acknowledge will not work in
unidirectional network environments. Although this is not very likely
for unicast situations, it will be for multicast, as there may be
situations where the client is simply joining an ongoing session without
server knowledge.

- Even in unicast situations, the delay when starting a stream may be
inacceptable. The codebook header is by the standard not limited in
size, but even if you do some calculus on codebook sizes commonly being
used by current encoders, the codebook transmission will take several
seconds at least. The server would have to stream the codebook at the
same rate as the audio stream, potentially letting the client wait
unnessecary long for the transmission to complete. To stay below the
network MTU, we can assume that a "common size" codebook would be split
into something around 5 RTP packets. In a network with 2% packet loss,
there will be a chance of 9,6% that any of these packets will not arrive
at the client. Hence, it should at least be mandatory for the server to
send the codebook twice _before starting to stream audio at all to
minimize the chance that the entire stream is undecodeable and this
raises the delay before playback can begin accordingly.

- I would expect most usecases for Vorbis over RTP to be web radios and
music "on demand" services. Designing the RFC to only fit well a
situation where multidirectional streams are required (e.g. the "client"
must also be able to transmit its codebook to the "server") is a major
mistake, as it will probably rarely ever be needed.

I am by no means extremely advocating any other solution and there have
been a few other reasonable delivery methods discussed:

- URI reference to the codebook in the SDP. In this case I would suggest
HTTP and whatever protocol being used to setup the RTP stream to be
mandatory. E.g. HTTP and RTSP for an RTSP server or HTTP and SIP for a
SIP client or registrar. At the server side, I would assume that it in
most cases would be feasible to make use of an existing HTTP server to
support HTTP delivery. At the client side, it would not be much effort
to either implement enough of the HTTP protocol or make use of available
HTTP client libraries to fetch the codebook. If HTTP for some reason is
not feasible, the other protocol may be used.

- Agree on a fixed set of codebooks for RTP. Codebook optimizers have
shown to only save a few percent on the file size for streams created by
the reference encoder, so I am not really convinced that dynamic
codebooks are very useful. This may of course be because the actual
stream data created by the reference encoder are fitting the fixed
codebooks well or vice versa. A drawback on this would be that the
decoder software size increases. I've not had time to check the complete
size of all codebooks used by the reference encoder, but as a
comparison, the static codebooks used by WMA could be stored in around
25kB. As pointed out in a response to my question on this subject on the
AVT mailing list, it would be easily feasible for a transmitter to
reencode a local "unsupported" Vorbis stream using with a supported
codebook.

Tor
David Barrett
2005-09-06 17:46:57 UTC
Permalink
Post by Tor-Einar Jarnbjo
Ok, let me first explain why I think inline codebook delivery with or
without client acknowledge is one of the worst methods we have been
- You can't make a decent implementation of it for multicast. ...
Totally agree. I wouldn't recommend it for multicast.
Post by Tor-Einar Jarnbjo
- Inline transmission with client acknowledge will not work in
unidirectional network environments. ...
Totally agree. I wouldn't recommend it for unidirectional.
Post by Tor-Einar Jarnbjo
- Even in unicast situations, the delay when starting a stream may be
inacceptable. ...
This is where I disagree. (And incidentally, unicast over a
bidirectional network is far and away the most common deployment now,
and for the forseeable future.)

Remember, TCP and UDP run over the same network. There's nothing that
physically makes UDP codebook delivery slower than TCP (and the reverse
is often true). Rather, it's all in how you build your server.

You've proposed sending codebooks at the stream rate -- a safe,
reasonable, but very slow solution. There are other choices that are
still safe but much faster, especially with acknowledgement. Indeed, if
you choose not to play nicely with other TCP streams, you can even go
*faster* than TCP using UDP (and for realtime data as small as 5KB, this
might be a fine choice).
Post by Tor-Einar Jarnbjo
- I would expect most usecases for Vorbis over RTP to be web radios and
music "on demand" services.
Anyway, this is the real reason I'm responding. One assumption I hold
that's biasing all of my discussion is "whatever decisions are made for
Vorbis-rtp will likely be applied to Theora-rtp". Can anyone confirm or
deny that this is the case?

Personally, I think Vorbis and Theora (even Speex -- it doesn't send
codebooks, but it does have stream parameters that must be delivered
reliably) are so similar in this respect that we'd be silly not to use
the same approach for all. Do you have an opinion on this?
Post by Tor-Einar Jarnbjo
- Agree on a fixed set of codebooks for RTP. ...
I actually agree with you here, but it's my impression that this
widely-discussed option has been formally rejected by Xiph.

-david
Tor-Einar Jarnbjo
2005-09-06 20:38:13 UTC
Permalink
Post by David Barrett
You've proposed sending codebooks at the stream rate -- a safe,
reasonable, but very slow solution. There are other choices that are
still safe but much faster, especially with acknowledgement. Indeed,
if you choose not to play nicely with other TCP streams, you can even
go *faster* than TCP using UDP (and for realtime data as small as 5KB,
this might be a fine choice).
Not if you are concerned about implementation effort. Of course you can
reimplement parts of the already available TCP stack to achieve reliable
stream transfers over UDP, but that would have to include ack or resend
messages from the client and some sort of bandwidth usage control.
Without a two way communication, the server will never know for sure
that the codebook has been completely received by the client, so that it
can start streaming the audio content.
Post by David Barrett
Anyway, this is the real reason I'm responding. One assumption I hold
that's biasing all of my discussion is "whatever decisions are made
for Vorbis-rtp will likely be applied to Theora-rtp". Can anyone
confirm or deny that this is the case?
Personally, I think Vorbis and Theora (even Speex -- it doesn't send
codebooks, but it does have stream parameters that must be delivered
reliably) are so similar in this respect that we'd be silly not to use
the same approach for all. Do you have an opinion on this?
I do not have any detailed knowledge about Theroa, but I don't see any
similarities between Vorbis, Speex and FLAC justifying the effort to try
to find a common RTP solution for them. Speex was designed for
unreliable transports to begin with and has AFAIK completely
self-contained packets, e.g. a single packet can be decoded properly
without any other knowledge about the stream. FLAC only has a very small
mandatory setup header, which can easily be transfered as SDP
parameters. On the other side, FLAC over RTP would e.g. have to cope
without seek-points as they don't make any sense in a streaming situation.
Post by David Barrett
- Agree on a fixed set of codebooks for RTP. ...
I actually agree with you here, but it's my impression that this
widely-discussed option has been formally rejected by Xiph.
I'm not sure and the mailing list archive is too difficult to search for
me to bother. I'm wondering why we're not hearing anything from then or
from Fluendo in this discussion.

Tor
David Barrett
2005-09-06 21:36:39 UTC
Permalink
Post by Tor-Einar Jarnbjo
Post by David Barrett
You've proposed sending codebooks at the stream rate -- a safe,
reasonable, but very slow solution. There are other choices that are
still safe but much faster, especially with acknowledgement. Indeed,
if you choose not to play nicely with other TCP streams, you can even
go *faster* than TCP using UDP (and for realtime data as small as 5KB,
this might be a fine choice).
Not if you are concerned about implementation effort. Of course you can
reimplement parts of the already available TCP stack to achieve reliable
stream transfers over UDP, but that would have to include ack or resend
messages from the client and some sort of bandwidth usage control.
Without a two way communication, the server will never know for sure
that the codebook has been completely received by the client, so that it
can start streaming the audio content.
Yes, we're not covering any new ground here:

- Yes: Two way communication would require ACKs, and a good way to do
that would be to model TCP. And there are other good ways.

- Yes: One way communication doesn't get the benefit of ACKs, clearly.
And without ACKs, the server is just guessing, clearly.

I think we're in violent agreement. I'm not asking you to implement
this. I'm not asking you to recommend this. I'm merely asking that you
don't forbid those who want to do so (ie, me) from doing so.
Post by Tor-Einar Jarnbjo
Post by David Barrett
Personally, I think Vorbis and Theora (even Speex -- it doesn't send
codebooks, but it does have stream parameters that must be delivered
reliably) are so similar in this respect that we'd be silly not to use
the same approach for all. Do you have an opinion on this?
I do not have any detailed knowledge about Theroa, but I don't see any
similarities between Vorbis, Speex and FLAC justifying the effort to try
to find a common RTP solution for them. ...
Ok, so you believe Speex and FLAC needn't be covered by the same RTP
spec. What about Theora?

In my case, the Theora codebook is about 2KB. Thus I have exactly the
same problem between Vorbis and Theora, and I would like to solve it in
exactly the same way. It would seem that the reasoning that goes into
the Vorbis spec is extremely similar, if not identical to the reasoning
that will go into the Theora spec.

-david
Aaron Colwell
2005-09-06 22:05:50 UTC
Permalink
Post by David Barrett
In my case, the Theora codebook is about 2KB. Thus I have exactly the
same problem between Vorbis and Theora, and I would like to solve it in
exactly the same way. It would seem that the reasoning that goes into
the Vorbis spec is extremely similar, if not identical to the reasoning
that will go into the Theora spec.
I think the same mechanisms should be used for Theora and Vorbis. You shouldn't
need as many codebooks for Theora though since you can use the same codebook
to reduce the bitrate.

Aaron
Post by David Barrett
-david
_______________________________________________
xiph-rtp mailing list
http://lists.xiph.org/mailman/listinfo/xiph-rtp
Ralph Giles
2005-09-02 15:42:58 UTC
Permalink
Post by Tor-Einar Jarnbjo
I'm not sure about this, but I expect Theora to have many
of the same limitations and I'm not really convinced if it's suitable
for video conferencing either.
Can you elaborate on your reasoning here? I don't see why theora
wouldn't work well for conferencing. There are no B frames, and
the current encoder does a reasonable job with no look-ahead. To
get less than a frame of latency requires both a encoder modifications
and specially-matched camera hardware, but that's true of most
compressed digital standards, and I didn't think people cared that much.

-r
Tor-Einar Jarnbjo
2005-09-02 15:51:22 UTC
Permalink
Post by Ralph Giles
Can you elaborate on your reasoning here?
No, I can't and I just wrote "expect" as an indication of not being sure.

Tor
David Barrett
2005-09-02 16:36:01 UTC
Permalink
Post by Ralph Giles
Post by Tor-Einar Jarnbjo
I'm not sure about this, but I expect Theora to have many
of the same limitations and I'm not really convinced if it's suitable
for video conferencing either.
Can you elaborate on your reasoning here? I don't see why theora
wouldn't work well for conferencing.
Incidentally, I'm using Theora for videoconferencing, and it's working
great. Thanks Ralph!

-david
Loading...