Discussion:
[xiph-rtp] Re: [theora-dev] Theora in Matroska
Ralph Giles
2006-11-03 17:07:24 UTC
Permalink
Forgive the cross posting, this affects several projects.
Currently Theora video in Matroska is not supported by
Mplayer. To enable the support Michael Niedermayer has
http://article.gmane.org/gmane.comp.video.mplayer.nut.devel/214
The proposal is that we add a recommendation to the ogg vorbis
spec to just concatenate the headers when embedding in a container
that needs to store them in a blob, and that readers skip leading
and trailing data based on known packet lengths and magic strings.

As Michael says, this works, and is just the sort of hacky spec
wrangling ogg is (in)famous for. :)

I guess my only comment is that this isn't particularly general.
While vorbis has a fixed set of header packets with "easy" to determine
lengths, it's possible to do a codec with external framing in mind where
this wouldn't work. The theora spec, for example, allows additional
application-defined header packets after the initial required three.

It also means a cross-encapsulator has to understand a codec's header
packet format to put the data in an ogg stream, which is something many
implementors have complained loudly about. Therefore I'd like to
counterpropose something with explicit packet lengths, like matroska
has, or the "packed header" format the vorbis and theora rtp drafts use.

If we're going to add this to the vorbis and theora specs, I'd like to
see it used as broadly as possible, but luca dislikes the metadata
header and omitted it from his packed header design for rtp. Luca, what
do you think about adding the metadata header back as an optionally
empty field?

-r
Silvia Pfeiffer
2006-11-03 19:03:25 UTC
Permalink
Let me see if I understand the problem correctly.

Matroska provides only one header packet per codec to identify it.
Michael's proposal suggests to create a new header for each of our
codecs (well, his proposal is only for vorbis, but there are other
codecs who have secondary header pages)?

I am not sure if Matroska would encapsulate the clean codec stream or
an ogg framed stream. I also don't understand if there would be one
blob per codec in the case of a multitrack file (e.g. Theora + Vorbis)
or whether there would be just one large, interleaved blob. In any
case, I might put our experience into the mix to get this right.

When Skeleton was developed for Ogg, we wanted to have one generic
type of header that could help identify all the possible codecs inside
an Ogg stream and give enough information to an application to seek
without having to decode the secondary header pages. Our first aproach
was to add an additional header before each codec. That was really bad
though, because it broke all the existing Ogg decoders out there and
essentially created a new format.

After lengthy discussions, a much better solution was born: create an
additional logical bitstream that had one header per codec inside it.
This spec is now Skeleton and is supported by just about all the
common media players out there.

May I therefore suggest that if Matroska needs one header per codec,
it could use a Skeleton bitstream to do so (see
http://wiki.xiph.org/index.php/Ogg_Skeleton)? Or maybe at least to use
the fishead headers as a basis for a new spec, since they have gone
through a long thought process in development. Also, it might make it
easier to implement support in media players since most already
support it for Ogg.

Cheers,
Silvia.
Post by Ralph Giles
Forgive the cross posting, this affects several projects.
Currently Theora video in Matroska is not supported by
Mplayer. To enable the support Michael Niedermayer has
http://article.gmane.org/gmane.comp.video.mplayer.nut.devel/214
The proposal is that we add a recommendation to the ogg vorbis
spec to just concatenate the headers when embedding in a container
that needs to store them in a blob, and that readers skip leading
and trailing data based on known packet lengths and magic strings.
As Michael says, this works, and is just the sort of hacky spec
wrangling ogg is (in)famous for. :)
I guess my only comment is that this isn't particularly general.
While vorbis has a fixed set of header packets with "easy" to determine
lengths, it's possible to do a codec with external framing in mind where
this wouldn't work. The theora spec, for example, allows additional
application-defined header packets after the initial required three.
It also means a cross-encapsulator has to understand a codec's header
packet format to put the data in an ogg stream, which is something many
implementors have complained loudly about. Therefore I'd like to
counterpropose something with explicit packet lengths, like matroska
has, or the "packed header" format the vorbis and theora rtp drafts use.
If we're going to add this to the vorbis and theora specs, I'd like to
see it used as broadly as possible, but luca dislikes the metadata
header and omitted it from his packed header design for rtp. Luca, what
do you think about adding the metadata header back as an optionally
empty field?
-r
_______________________________________________
theora-dev mailing list
http://lists.xiph.org/mailman/listinfo/theora-dev
xiphmont at xiph.org ()
2006-11-09 23:25:06 UTC
Permalink
Post by Silvia Pfeiffer
Let me see if I understand the problem correctly.
Matroska provides only one header packet per codec to identify it.
Michael's proposal suggests to create a new header for each of our
codecs (well, his proposal is only for vorbis, but there are other
codecs who have secondary header pages)?
Just to be clear, the first header does identify Vorbis, but the
others are needed for setup. Sort of like a super-keyframe.

Monty
Michael Niedermayer
2006-12-05 13:03:54 UTC
Permalink
Hi
Post by Silvia Pfeiffer
Let me see if I understand the problem correctly.
Matroska provides only one header packet per codec to identify it.
my proposal was about the 2 or 3 codec initalization packets at the
start of vorbis, theora, ... streams
identifying the codecs is not a problem in any container format besides
ogg i know of, containers simply have some field which identifies the
codec, that can be a 32bit or 16bit integer, or a variable length
string, or in case of matroska several redundant systems

the problem with the initalization packets, or super keyframes or
sequence headers or whetever you want to call them is that there
are several of them but containers are generally designed to handle
just one such packet per stream
if you would simply store these 3 packets like normal packets then
a demuxer which is told by the user to seek to lets say 5min into
the stream will do so, first it will pass the single global header
(this one is empty in our example) to the codec next it would search
for a keyframe around the requested 5min and start passing packets
begining with the keyframe to the decoder which would fail as it
never received the 3 initalization packets ...

if now the 2 or 3 packets are merged into one and stored in the appropriate
spot for the global packet for the stream then everything will work
fine, of course that requires that the decoder is able to parse
or the demuxer is able to split the merged packet (for that a few words
in the relevant specs would be helpfull, whatever the exact method is
which is used to merge the packets ...)

also note this is not about matroska alone, but rather many containers
avi, wav, nut, matroska, nuv, asf to name the ones which IIRC support
a single global header but do not really support multiple ones without
codec specific hacks ...
mpeg-ps/ts does not support any global header, they expect such headers
to be repeated before keyframes (mpeg1/2 video does exactly that with
their sequence headers)
mov allows everything but tends to need a special case per codec in the
demuxer

also APIs tend to support passing a single global packet around but tend
not to support multiple ones ...


[...]
Post by Silvia Pfeiffer
I am not sure if Matroska would encapsulate the clean codec stream or
an ogg framed stream. I also don't understand if there would be one
blob per codec in the case of a multitrack file (e.g. Theora + Vorbis)
or whether there would be just one large, interleaved blob. In any
case, I might put our experience into the mix to get this right.
putting a container into a container is the most insane thing you could
do, it also isnt allowed in many containers, avi
requires each packet to be a single packet (people do ignore this yes
i know but they generally dont put other containers in avi), nut
explicitly says that containers inside streams render the
file invalid and any player playing such a file is not nut compliant,
i dont know about matroska but it would surprise me if a vorbis+theora
in ogg stream could be put in matroska without violating some rules
also what is such a stream audio? video? something else?

also ignoring the rules, such files are a nightmare to support, and
even if supported will have a lot of random problems with AV sync

[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
Michael Niedermayer
2006-11-03 21:48:50 UTC
Permalink
Hi
Post by Ralph Giles
Forgive the cross posting, this affects several projects.
and forgive me too for doing the same ... (if the disscusion
is inappropriate for any of the lists just say so and i wont CC
that one anymore ...)
Post by Ralph Giles
Currently Theora video in Matroska is not supported by
Mplayer. To enable the support Michael Niedermayer has
http://article.gmane.org/gmane.comp.video.mplayer.nut.devel/214
The proposal is that we add a recommendation to the ogg vorbis
spec to just concatenate the headers when embedding in a container
that needs to store them in a blob, and that readers skip leading
and trailing data based on known packet lengths and magic strings.
As Michael says, this works, and is just the sort of hacky spec
wrangling ogg is (in)famous for. :)
I guess my only comment is that this isn't particularly general.
While vorbis has a fixed set of header packets with "easy" to determine
lengths, it's possible to do a codec with external framing in mind where
this wouldn't work. The theora spec, for example, allows additional
application-defined header packets after the initial required three.
It also means a cross-encapsulator has to understand a codec's header
packet format to put the data in an ogg stream, which is something many
implementors have complained loudly about. Therefore I'd like to
counterpropose something with explicit packet lengths, like matroska
has, or the "packed header" format the vorbis and theora rtp drafts use.
ive looked at the rtp draft and as far as i understand it it concatenates
the first 2 packets and omits all further ones
"
A Theora Packed Configuration is indicated with the payload type
field set to 1. Of the three headers, defined in the Theora I
specification [16], the identification and the setup will be packed
together, the comment header is completely suppressed. It is up to
the client to provide a minimal size comment header to the decoder if
required by the implementation.
"

this definitly has my support, not that that would make any difference... :)
comments, userdata, and other non essential data does not belong to a global
codec specific header be it in rtp or any container
normal containers have their own fields to store data like author, comment,
user specified metadata and so on
yes its a nightmare to convert from ogg to other containers or back but
putting this data in the global codec specific header does not solve
anything, the data would be as usefull as random data put there, and
for rtp resources would be wasted to ensure error free delivery of possibly
large and useless data

[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is
Ralph Giles
2006-11-03 22:36:39 UTC
Permalink
Post by Michael Niedermayer
ive looked at the rtp draft and as far as i understand it it concatenates
the first 2 packets and omits all further ones
First and 3rd packets, but yes. I missed that it's was omitting further
ones in the draft review. It also uses a 16 bit length, with isn't
general either. Using unary encoding like matroska and ogg does isn't
optimal for large packets either, of course. For RTP we were trying to
keep it simple, and of course the RTP formats are by definition codec
specific, so we didn't try to do anything like variable length length
fields.

(I like the jpeg2k scheme, where the length header includes the bytes in
the length header, so since the minimum length is greater than zero you
have one or more bits to use as a flag to describe the width of the bits
field. But a high-bit encoding like utf-8 works too.)
Post by Michael Niedermayer
this definitly has my support, not that that would make any difference... :)
comments, userdata, and other non essential data does not belong to a global
codec specific header be it in rtp or any container
The inline metadata helped solve a problem. Of course it's nice to use a
container level metadata format if one is available, and in that case it
should supercede the codec-level one. But the codec specs are very clear
that this header is required, even if it doesn't contain useful
information. I think it's more confusing to treat this as a mistake and
try to fix it.

-r
Continue reading on narkive:
Loading...