-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] A mix of 8-bit/16-bit chars sent to iconv #1451
Comments
Could you share the output of |
|
You are using version 0.89. Could you try using the latest version(0.94)? |
Reverted my change and pulled latest master, it is decoding stuff (which is better than previous version IIRC...), but still every space in the text messes it up, and I get some non-printable chars in the output. Output without any code changes - Output after forcing write_utf16_char to always use 2 chars - I don't speak Japanese myself :) but google translate can confirm the fixed version is better. Current version -
|
You could send a PR. If it doesn't cause any issues with the other tests, then we can merge it |
Was this fixed? I could make a simple pull request with the specified changes. |
Probably not if it's still open :-) |
Created a PR: #1571 |
Necessary information
./ccextractor test.ts -svc all[UTF-16BE] -nofc -12
Video links
http://cdnapi.kaltura.com/p/2035982/playManifest/entryId/1_frxnu0yr/flavorId/1_tr3kiz6l/format/download/a.ts
Additional information
Hi all,
I have some TS file with 708 subtitles in Japanese & Chinese that failed to decode properly.
After some debugging, I found that if I patch the function
write_utf16_char
here -https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708_output.c#L113
to always output 2 byte chars (I changed the if to
if (1)
), and I specify an encoding ofUTF-16BE
, it decodes properly.This code looks off to me, as it creates a mix of 8-bit & 16-bit chars with no clear encoding (it's not UTF-8 and it's not UTF-16...).
Maybe when iconv is used, the function should always output 2 byte chars?
Or, alternatively, if it would use 2-bytes for ALL chars if there is ANY char that doesn't fit in 1-byte, it would also be ok (but this sounds more complex to do...).
Btw, VLC decodes the Japanese & Chinese properly, after changing the 'preferred closed captions decoder' setting from 608 to 708.
Thanks!
Eran
The text was updated successfully, but these errors were encountered: