Ticket #817 (closed task: fixed)

Opened 2 years ago

Last modified 2 years ago

Win32: use Unicode instead of ANSI/OEM APIs

Reported by: md Assigned to: courmisch
Priority: normal Milestone: 0.9.0 features freeze
Component: LibVLC Version: master
Severity: normal Keywords:
Cc: md Platform(s): Win32
Difficulty: easy Work status: Almost finished

Description (Last modified by courmisch)

As Windows NT and above uses Unicode internally (file paths, console I/O, what else?); we should get rid of ANSI/OEM calls.

This currently means replacing all invocations of FromLocale?() and ToLocale?() on Win32 (other stuff might be affected that would have slipped through the LibVLC Unicode switch might also be affected, but well, it can slip through another time...). Typical replacement consists of FromWide?(), WideCharToMultiByte?() or MultiByteToWideChar?() and a equivalent Unicode-enabled Win32 API. Hopefully, we won't need to add any further utf8_wrapper but who knows...

A single grep yields these file:

modules/gui/ncurses.c
modules/gui/wxwidgets/dialogs/vlm/vlm_panel.cpp
modules/gui/wxwidgets/dialogs/open.cpp
modules/gui/wxwidgets/dialogs/wizard.cpp
modules/gui/wxwidgets/dialogs/playlist.cpp
modules/gui/wxwidgets/interface.cpp
modules/gui/wxwidgets/wxwidgets.hpp
modules/gui/wxwidgets/dialogs.cpp
modules/gui/skins2/src/theme_loader.cpp
modules/gui/skins2/src/theme_loader.hpp
modules/gui/skins2/src/theme_repository.cpp
modules/gui/skins2/src/skin_common.hpp
modules/gui/skins2/src/dialogs.cpp
modules/gui/skins2/win32/win32_dragdrop.cpp
modules/gui/skins2/parser/builder.cpp
modules/misc/gnutls.c
modules/demux/playlist/m3u.c
modules/access/gnomevfs.c
src/misc/configuration.c
src/text/unicode.c
src/libvlc.sym
src/libvlc-common.c

NCurses and GNOME VFS are not used on Win32. wxWidgets is actually fixed and anyway unmaintained. LibVLC matches are non-Win32 (or non-NT fallback) code, and the Unicode support itself.

Skins2 has its own ticket: #831. GnuTLS is a bit more tricky (no Unicode APIs), and low priority: #1108.

So this file still need investigation:

modules/demux/playlist/m3u.c

Change History

05/11/06 10:45:21 changed by md

  • description changed.

11/11/06 17:49:41 changed by courmisch

12/11/06 19:47:58 changed by courmisch

  • status changed from new to assigned.
  • difficulty changed from unknown to easy.
  • wip changed from Not started to 80%.
  • owner set to courmisch.
  • platform changed from all to Win32.

This is fixed in most plugins and the core. Remains:

  • skins2,
  • GnuTLS (the underlying library wants ANSI file names),
  • ncurses (not used in our Windows builds).

12/11/06 19:57:20 changed by courmisch

  • status changed from assigned to closed.
  • resolution set to fixed.
  • wip changed from 80% to 60%.

Oh, and GNOME VFS, but again we don't use this on Windows.

As far as I am concerned, this is over. For Skins2, I have opened #831.

(follow-up: ↓ 6 ) 12/11/06 21:14:49 changed by courmisch

  • status changed from closed to reopened.
  • resolution deleted.

We still need to fix opendir and readdir wrappers.

(in reply to: ↑ 5 ) 20/11/06 05:51:43 changed by xxcv

Replying to courmisch: i've sent unicode dir-access patch just waiting for it to be applied :) http://www.via.ecp.fr/via/ml/vlc-devel/2006-11/msg00624.html

(follow-up: ↓ 8 ) 21/11/06 16:35:58 changed by md

Also why we're not writing vlcrc in UTF8? Notepad has absolutely no problems to open such files if a proper BOM is present (it can also open UTF16-LE and UTF16-BE).

The current approach discards all config options containing at least one unicode character.

(in reply to: ↑ 7 ) 21/11/06 16:54:47 changed by courmisch

Replying to md:

The main problem was backward compatibility and upgrade path. Besides, up until VLC could handle non-ANSI filenames on Win32, not using Unicode was in fact hardly a limitation.

21/11/06 16:54:59 changed by courmisch

  • status changed from reopened to new.

21/11/06 16:55:06 changed by courmisch

  • status changed from new to assigned.

(follow-up: ↓ 12 ) 21/11/06 18:47:54 changed by md

Fortunately UTF8 is not that disruptive. Seems that older VLC versions don't break when they find vlcrc in UTF8 - they just ignore the option(s) with unicode character.

Thus I think it's safe to start writing vlcrc in UTF8 with proper BOM. Of course VLC needs to be liberal in what it expects the same way as in stream_ReadLine():

no BOM => ANSI CP BOM => apropriate unicode encoding

This will keep backwards compatibility and ensure clean upgrade path.

(in reply to: ↑ 11 ) 21/11/06 18:57:11 changed by courmisch

Replying to md: I pretty much they would ignore the options. They would accept it as ANSI input and then convert it erronously to UTF-8.

I hardly care since my operating system is sensible enough to use UTF-8, but I don't want to be responsible (once more) for awful obscure (to the average users) encoding problems on Windows. Adding the BOM works as long as the user does not downgrade, which should be OK. I cannot see this happen at this point in 0.8.6 though.

22/11/06 11:10:01 changed by md

Thanks for your latest commits.

Now it really makes sense, since before the comments were in ANSI CP, but the variables in UTF8 (!) - see [16494]

However, the preferences still could not be saved due to other problem - see [17944].

The bad news is that wxwidgets module is full of those mb_str() and it uses wxFromLocale, wxDNDFromLocale as external replacement instead of using mb_str(wxConvUTF8).

25/11/06 11:04:24 changed by courmisch

Let alone possible regressions, directory listing should now be sorted out.

Meanwhile, I found that the recursive plugin directory loader seems to be lacking wide characters support too.

25/11/06 11:04:34 changed by courmisch

  • wip changed from 60% to 80%.

06/03/07 21:20:00 changed by courmisch

Note that this explicitly excludes RC interface problems, which is an impossible mess of its own.

13/03/07 18:13:07 changed by courmisch

  • description changed.
  • summary changed from Win32: Replace all ANSI-CP dependent system calls with their Unicode equivalents to Win32: use Unicode instead of ANSI/OEM APIs.

13/03/07 18:19:16 changed by courmisch

  • description changed.

13/03/07 18:30:27 changed by courmisch

  • description changed.

Md or anyone: does any M3U readers (players) support Byte-Order-Mark ? If not, the M3U demux is fine (tries UTF-8 and falls back to Locale, it cannot be UTF-16 anyway).

13/03/07 18:57:16 changed by courmisch

  • wip changed from 80% to Almost finished.

13/03/07 21:00:08 changed by md

  • status changed from assigned to closed.
  • resolution set to fixed.

M3U demux is working correctly since 0.8.6. It accepts anything reasonable, i.e. ANSI, UTF-8 without BOM as well as UTF-8 with BOM. No fix needed here IMHO.

BTW, both WMP and RealPlayer? are heavily confused when they see m3u file with UTF-8 BOM.

13/03/07 21:18:27 changed by courmisch

Then, I too think this bug is over; thanks. The M3U is still not perfect in the sense that it should parse the entire file to determine the charset rather than parse line by line; but that would too complicated and memory hungry, so we are probably better as is.