Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#8450 closed defect (worksforme)

VLC does not display correctly special characters in subtitle streams of MOV files

Reported by: Nick Owned by: jb
Priority: normal Milestone: 2.1.0 release
Component: Subtitles Version: master git
Severity: normal Keywords:
Cc: cehoyos Difficulty: unknown
Platform(s): all Work status: Not started

Description

VLC does not display correctly special characters in subtitle streams of MOV files

VLC 2.0.6 does not display the right characters if you try to show special characters of Languages like French, German, Spanish, etc. in subtitle streams that are encoded inside MOV containers.
(Special characters inside other containers like MP4 and MKV are working fine!)

This problem seems older because VLC 1.1.11 shows the same problem! The nightly build of VLC (vlc-2.0.7-20130416) has this problem also! Furthermore it is independent of the "Subtitle Encoding" setting inside of VLC.

Other media player software's are showing the same MOV video with such subtitle on the correct way! Tested media players to compare:
Media Player Classic - Home Cinema 1.6.6
SMPlayer 0.8.4
XBMC 12.0

(Screenshot and test files see attachment)

Operating system: Windows XP, Windows 7
Windows 32bit version of VLC 2.0.6

Attachments (10)

output_imported_subtitle.mov (483.2 KB) - added by Nick 2 years ago.
VLC_2.0.6_mov_subtitle_bug.png (59.0 KB) - added by Nick 2 years ago.
subtitle_test.srt (693 bytes) - added by Nick 2 years ago.
mov_subtitle_test.bat (1.2 KB) - added by Nick 2 years ago.
input.mp4 (481.7 KB) - added by Nick 2 years ago.
input_with_subtitle.mkv (469.7 KB) - added by Nick 2 years ago.
MP4_with_subtitle_language(afr).mp4 (483.0 KB) - added by Nick 2 years ago.
MP4_with_subtitle_language(baq).mp4 (483.0 KB) - added by Nick 2 years ago.
mp4_subtitle_language-code_problem.png (30.5 KB) - added by Nick 2 years ago.
mp4_test_file_difference.png (15.8 KB) - added by Nick 2 years ago.

Change History (25)

comment:1 Changed 2 years ago by jb

I see no attachments.

Changed 2 years ago by Nick

Changed 2 years ago by Nick

Changed 2 years ago by Nick

Changed 2 years ago by Nick

Changed 2 years ago by Nick

Changed 2 years ago by Nick

comment:2 Changed 2 years ago by Nick

To create other test video files with subtitle you can use the Windows batch script "mov_subtitle_test.bat"

This batch files requires as source following files:
"input.mp4" (source video without subtitle)
"input_with_subtitle.mkv" (video with subtitle)
"subtitle_test.srt" (subtitle test file for import)
...and an Windows build of FFmpeg, download: http://ffmpeg.zeranoe.com/builds/

comment:3 follow-up: Changed 2 years ago by jb

Seems like an encoding issue... MKV mandates UTF-8. What does mp4 mandates for timed text?

comment:4 follow-up: Changed 2 years ago by courmisch

About the PNG file... It shows that the subtitle is really encoded as UTF-8 but interpreted as Macintosh.

It's probably because of this (mp4.c):

    if( p_mdhd->data.p_mdhd->i_language_code < 0x800 )
    {
        /* We can convert i_language_code into iso 639 code,
         * I won't */
        strcpy( language, MP4_ConvertMacCode( p_mdhd->data.p_mdhd->i_language_code ) );
        p_track->b_mac_encoding = true;
    }
    else
    {
        for( unsigned i = 0; i < 3; i++ )
            language[i] = p_mdhd->data.p_mdhd->i_language[i];
        language[3] = '\0';
    }

and then

            if( p_track->b_mac_encoding )
                p_track->fmt.subs.psz_encoding = strdup( "MAC" );
            else
                p_track->fmt.subs.psz_encoding = strdup( "UTF-8" );

Not sure if VLC demux or FFmpeg mux is wrong here.

comment:6 Changed 2 years ago by courmisch

  • Resolution set to notvlc
  • Status changed from new to closed

Based on documentation above, the file is not encoded correctly. The FFmpeg muxer must be at fault here.

comment:7 in reply to: ↑ 3 Changed 2 years ago by Nick

Replying to jb:

Seems like an encoding issue... MKV mandates UTF-8. What does mp4 mandates for timed text?


As far as I know, for subtitles in "MOV" the "3GPP Timed Text specification" should be used!
For "3GPP Timed Text" UTF-8 encoding is defined also. The subtitle in example file is encoded in UTF-8.

See also:
https://developer.apple.com/library/mac/#documentation/QuickTime/qtff/QTFFChap3/qtff3.html

http://en.wikipedia.org/wiki/MPEG-4_Part_17
http://mpeg.chiariglione.org/standards/mpeg-4/streaming-text-format

3GPP TS 26.245 Specification:
http://www.3gpp.org/ftp/Specs/archive/26_series/26.245/26245-600.zip


Questions:
How can other media players are showing this subtitle correctly?
Did you verified this problem already with other MOV files including subtitles with special characters?

Last edited 2 years ago by Nick (previous) (diff)

comment:8 Changed 2 years ago by courmisch

Your sample is NOT using 3GPP Timed Text. See #1220.

comment:9 Changed 2 years ago by cehoyos

  • Cc cehoyos added

comment:10 in reply to: ↑ 4 Changed 2 years ago by Nick

I tested more and found exactly the same display problem in VLC with MP4 files created by ffmpeg!
But here the problem is different because in this case the display problem in VLC is only dependent of the used ISO-639 3-letter language code for the subtitle stream! In this case it is depended of the language code only but not of the encoding type of the subtitle stream!
(Nevertheless this problem seems related to the problem described above of this ticket.)

If a language code starting with "a" like "language=afr" for Afrikaans is used, then subtitles with special characters coded in UTF-8 inside mp4 files are not showing correctly in VLC!

More details:
Subtitles marked with a language code in a range of aaa - azz are not displayed correctly in VLC! Subtitles marked with a language code in a range of baa - zzz are displayed correctly in VLC.
(That means subtitle streams including special characters encoded by ffmpeg in mp4 containers)

Could somebody please test and verify the following case?:
Take another "standard conform" MP4 file containing a subtitle stream including special characters like used in German, French or whatever. Then change the language code mark from "fre" or "ger" to "afr" (or any other 3-letter language code starting with an "a") and then play this file in VLC to display the subtitle! What is the result ...?
(subtitle example file see attachments)

If you are comparing binary my both mp4 test files you can see both files are different in 2 bytes only!!! These 2 Bytes containing the 3-letter language code mark only! The encoding on the subtitle stream in both files is exactly the same! But only one of these files will display correctly in VLC.
Therefore I don't think here is only a problem in ffmpeg!

The value range of the resulted subtitle language mark is directly corresponding to the code shown by "courmisch" in comment:4

Language codes from "aaa" (0x0421) up to "azz" (0x075A)
and from "baa" (0x0821) up to "zzz" (0x6b5a)
Examples language codes: "afr"=(0x04d2) and "baq"=(0x0831) (see screenshots)

Values > 0x0800 are ok:

Replying to courmisch:

It's probably because of this (mp4.c):

    if( p_mdhd->data.p_mdhd->i_language_code < 0x800 )
    {
        /* We can convert i_language_code into iso 639 code,
         * I won't */
        strcpy( language, MP4_ConvertMacCode( p_mdhd->data.p_mdhd->i_language_code ) );
        p_track->b_mac_encoding = true;
    }
    else
    {
        for( unsigned i = 0; i < 3; i++ )
            language[i] = p_mdhd->data.p_mdhd->i_language[i];
        language[3] = '\0';
    }

and then

            if( p_track->b_mac_encoding )
                p_track->fmt.subs.psz_encoding = strdup( "MAC" );
            else
                p_track->fmt.subs.psz_encoding = strdup( "UTF-8" );


My both MP4 example files are created with command lines:

ffmpeg -i input.mp4 -sub_charenc ISO-8859-1 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=afr MP4_with_subtitle_language(afr).mp4

ffmpeg -i input.mp4 -sub_charenc ISO-8859-1 -i subtitle_test.srt -map 0:v -map 0:a -c copy -map 1 -c:s:0 mov_text -metadata:s:s:0 language=baq MP4_with_subtitle_language(baq).mp4

(You can test it also with "non-real" language codes like "azz" or "zzz")

Changed 2 years ago by Nick

Changed 2 years ago by Nick

Changed 2 years ago by Nick

Changed 2 years ago by Nick

comment:11 Changed 2 years ago by Nick

Here is definitively an problem in VLC, although it is more an problem described with:
'MOV/MP4 subtitle language code detection in VLC is not fully standard compliant'

See on page: https://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFChap4/qtff4.html#//apple_ref/doc/uid/TP40000939-CH206-TPXREF101
... and read exactly:
"Language Code Values

Some elements of a QuickTime? file may be associated with a particular spoken language. To indicate the language associated with a particular object, the QuickTime? file format uses either language codes from the Macintosh Script Manager or ISO language codes (as specified in ISO 639-2/T).

QuickTime? stores language codes as unsigned 16-bit fields. All Macintosh language codes have a value that is less than 0x400 except for the single value 0x7FFF indicating an unspecified language. ISO language codes are three-character codes, and are stored inside the 16-bit language code field as packed arrays, as described in "ISO Language Codes." If treated as an unsigned 16-bit integer, an ISO language code always has a value of 0x400 or greater unless the code is equal to the value 0x7FFF indicating an Unspecified Macintosh language code."

See also comment:10 and comment:4 ...

...\vlc-2.0.6\modules\demux\mp4\mp4.c

    if( p_mdhd->data.p_mdhd->i_language_code < 0x800 )
    {
        /* We can convert i_language_code into iso 639 code,
         * I won't */
        strcpy( language, MP4_ConvertMacCode( p_mdhd->data.p_mdhd->i_language_code ) );
        p_track->b_mac_encoding = true;
    }
    else
    {
        for( unsigned i = 0; i < 3; i++ )
            language[i] = p_mdhd->data.p_mdhd->i_language[i];
        language[3] = '\0';
    }

This source code is an simplified solution but it is not fully standard compliant:

    if( p_mdhd->data.p_mdhd->i_language_code < 0x800 )

...because ISO language codes can also have values >0x400 except of 0x7FFF!

it should be corrected to:

    if( p_mdhd->data.p_mdhd->i_language_code < 0x400 || p_mdhd->data.p_mdhd->i_language_code == 0x7FFF )


comment:12 Changed 2 years ago by Nick

  • Resolution notvlc deleted
  • Status changed from closed to reopened

comment:13 Changed 2 years ago by courmisch

  • Milestone changed from 2.0.x maintenance bugs to 2.1.0 bugs
  • Resolution set to worksforme
  • Status changed from reopened to closed

comment:14 Changed 2 years ago by Nick

ok, now I checked also the last sources of VLC 2.1.0 and I can confirm this bug was fixed from version "vlc-2.1.0-20130416-0028" to "vlc-2.1.0-20130417-0021"
...on the date this ticket was opened!

But why this bug is not fixed also for version 2.0.7? (vlc-2.0.7-20130420-0121.tar.xz)

corrected code in v2.1.0:

    if( p_mdhd->data.p_mdhd->i_language_code < 0x400 )
    {
        strcpy( language, MP4_ConvertMacCode( p_mdhd->data.p_mdhd->i_language_code ) );
        p_track->b_mac_encoding = true;
    }
    else if( p_mdhd->data.p_mdhd->i_language_code == 0x7fff )
        p_track->b_mac_encoding = true;
    else
    {
        for( unsigned i = 0; i < 3; i++ )
            language[i] = p_mdhd->data.p_mdhd->i_language[i];
        language[3] = '\0';
    }

comment:15 Changed 2 years ago by jb

2.1.0 is development version.

Note: See TracTickets for help on using tickets.