Make Matroska demuxer skip unknown elements safely
This is the result of me thinking about a broader solution for #7884 (closed) and therefore purposefully a new ticket. It's rather long, but please bear with me; I'm proposing solutions here, not assigning blame.
== The problem
The problem is that newly introduced elements in the Matroska specs will always cause VLC to abort reading the file with default settings if it encounters those elements until the developers have had time to implement support for it (or at least add them to the list of known elements, not necessarily do anything with their information). This is contrary to the Matroska spirit (if not spec) in that Matroska's intention has always been to be able to add new elements allowing existing parsers to simply skip them if they don't know about those elements yet.
The result of this decision is that the muxers (most of the time mkvmerge as it is traditionally the first tool to implement new elements) are blamed, then muxer authors investigate, find out about VLC's preferences and then blame VLC resulting in bug reports and work for the VLC developers. All the time the users are left out in the dark with conflicting information while developers point fingers at each other. Not good for anyone involved.
== Possible intention behind this design
If I understood the tooltip for the "Dummy elements" option correctly then the intention behind not skipping those elements by default is that heavily damaged files result in !libMatroska returning a long list of EbmlDummy
elements which in turn leads to bad user experience as the demuxer struggles to find something valid to continue from.
So here are my possible solutions that could hopefyll improve the "always abort if I see something bad" problem while not degrading user experience too much.
== Solution 1
Do not skip dummy elements if the dummy elements fits into the parent element's size. This can also be improved upon by further checking whether or not the dummy element's position is exactly where it should be: new_position == previous_element_position + previous_element_size
. For the first child element in a parent the start position is of course the very beginning of the child area.
Advantages: Well-structured files with fulfill both conditions pretty much 100% of the time. Damaged files will quickly violate the first condition (my experience: the dummy element's sizes are all over the place quickly exceeding the area covered by the parent element) and often violate the second one (!libMatroska keeps looking when it doesn't find valid EBML IDs, therefore the dummy elements can occur with gaps to the previous valid element).
This is also rather easy to implement, I guess.
== Solution 2
Improve on solution 1 by only doing that when the DocTypeVersion
(not the DocTypeReadVersion
!) is higher than what VLC currently officially Matroska version.
Advantages: will only trigger for situations in which VLC admits it doesn't really have knowledge about. mkvmerge does this correctly; if it uses those new elements it writes DocTypeVersion = 4
.
Disadvantages: VLC will have to keep track of when a Matroska version has been finalized. v4 has not been finalized yet, meaning there can (and most likely will be) further additions.
== Solution 3
Improve on the previous solutions by treating dummy elements differently depending on the level they occur on. Level 1 is the one containing all the clusters, track headers, seek heads etc, level 2 is the one containing e.g. the block groups beneath a cluster etc. For example, dummy elements on levels 2 and further down in the tree (levels 3, 4...) will get treated according to solution 1 while dummy elements on level 1 result in an improved search for the next level 1 element. This also means that a "bad" dummy element on level 2 or lower must not abort reading the file immediately but cause the demuxer to look for the next valid level 1 element.
mkvmerge implements something like this: when dummy elements are found that violate solution 1 then look for the next valid level 1 element. If one is found make sure that there's another valid level 1 element directly behind that first level 1 element, and do that three or four times. If mkvmerge has found n
consecutive valid 1 elements then it continues processing from the first one.
Advantages: Combines the advantage of solution 1 with more resilience.
Disadvantages: harder to implement and with a definitely noticable delay when a damaged part in the file is found while the demuxer looks for the next valid elements.
== Conclusion
Insert great sales pitch here. Damn cannot think of one. So: Thanks for reading and considering :)