Click here to Skip to main content
16,021,687 members
Home / Discussions / C / C++ / MFC
   

C / C++ / MFC

 
GeneralRe: Character set Pin
Mircea Neacsu13-Sep-24 0:06
Mircea Neacsu13-Sep-24 0:06 
GeneralRe: Character set Pin
trønderen13-Sep-24 9:33
trønderen13-Sep-24 9:33 
GeneralRe: Character set Pin
Mircea Neacsu13-Sep-24 17:17
Mircea Neacsu13-Sep-24 17:17 
GeneralRe: Character set Pin
trønderen14-Sep-24 7:25
trønderen14-Sep-24 7:25 
GeneralRe: Character set Pin
Mircea Neacsu14-Sep-24 15:10
Mircea Neacsu14-Sep-24 15:10 
GeneralRe: Character set Pin
trønderen14-Sep-24 17:27
trønderen14-Sep-24 17:27 
GeneralRe: Character set Pin
Mircea Neacsu15-Sep-24 2:26
Mircea Neacsu15-Sep-24 2:26 
GeneralRe: Character set Pin
trønderen15-Sep-24 12:16
trønderen15-Sep-24 12:16 
Mircea Neacsu wrote:
If I understand you correctly, you suggest having UTF-8 files converted to UTF-16 on entry, processed as UTF-16 inside the application and converted back to UTF-8 on output.
A simple UTF-8/16 conversion filter (included by default) in the StreamReader/Writer, or whatever your IO classes are named certainly does not complicate things for the developer.

For interaction with other systems, whether they use UTF-8, UTF-32 or UTF-16 as a working format, UTF-8 is the lingua franca, the Esperanto of textual information. It is The File Character Encoding. No application needs to be concerned about it.

That would complicate things very much if you target different OS-es.
If your alternative is to reject any OS that does not use UTF-8 as its system character encoding, and all programming languages, libraries and development environments that does not use UTF-8, you are most certainly right. In that situation, I would certainly go for UTF-8 in-memory as well.

For a great number of developers, in-memory UTF-8 isn't a viable option. UTF-16-oriented OSes, languages and tools are a fact of life. I say; When in Rome, roam with the Romans (or however they saying goes). Even though a lot of developers of drivers and interrupt handlers and low-level network protocols with only rudimentary textual output work in *nix-like environments, the great majority of those communicating textually, with users and others, roam Rome in UTF-16 environments.

If you are talking about making applications that can be ported between UTF-16 and UTF-8 oriented OSes without a single source code change in the string handling, and your code assuming UTF-8 strings even under an APIs expecting UTF-16, then you are overly optimistic. You will have to do a lot of adaptations to UTF-16-oriented library and system functions. Or wrap every single one of them in a two-way-conversion wrapper.

The best way to handle it is to leave all string handling to library functions, you application knowing nothing about the encoding under the hood. (Your floating point application would probably run fine on a machine with an FP format different from IEEE-754!). Treat a string as a string, regardless of encoding. Make sure that when you handle characters individually, you use a char32 to hold them.

If you go to Linux or Mac worlds, everything is UTF-8.
Most certainly not. Well, I never worked with Mac, but in Linux there are loads of software that can't handle anything but 8 bit character sets. You can even come across those that cannot handle 8 bit, but only 7 bit characters. A few years ago, I was editing a configuration file on a Unix system; that network module crashed immediately because I had added a comment which contained a non-ASCII 8859 character (in the name of one maintainer).

A number of RFC-822 based internet protocols still cannot handle ISO 8859 (it would be against RFC-822). There still is a real need for QP encoding, backslash or ampersand encoding, etc.

You may rightfully say that *nix/Mac apps written to handle UTF-8 does handle UTF-8. Big surprise. You may claim that languages/tools specifying UTF-16 string representation doesn't use that when running under *nix/Mac - all strings are converted to UTF-8 when ported to these OSes, both in source code and library APIs. I doubt very much that that is the case.
Religious freedom is the freedom to say that two plus two make five.

GeneralRe: Character set Pin
Richard MacCutchan15-Sep-24 21:08
mveRichard MacCutchan15-Sep-24 21:08 
GeneralRe: Character set Pin
Mircea Neacsu16-Sep-24 2:41
Mircea Neacsu16-Sep-24 2:41 
GeneralRe: Character set Pin
jschell17-Sep-24 12:29
jschell17-Sep-24 12:29 
GeneralRe: Character set Pin
Richard MacCutchan14-Sep-24 21:34
mveRichard MacCutchan14-Sep-24 21:34 
GeneralRe: Character set Pin
Mircea Neacsu15-Sep-24 2:45
Mircea Neacsu15-Sep-24 2:45 
GeneralRe: Character set Pin
Richard MacCutchan15-Sep-24 2:58
mveRichard MacCutchan15-Sep-24 2:58 
GeneralRe: Character set Pin
trønderen15-Sep-24 12:19
trønderen15-Sep-24 12:19 
GeneralRe: Character set Pin
Mircea Neacsu17-Sep-24 13:49
Mircea Neacsu17-Sep-24 13:49 
GeneralRe: Character set Pin
jschell17-Sep-24 12:22
jschell17-Sep-24 12:22 
GeneralRe: Character set Pin
CPallini13-Sep-24 0:06
mveCPallini13-Sep-24 0:06 
GeneralRe: Character set Pin
Calin Negru13-Sep-24 1:32
Calin Negru13-Sep-24 1:32 
GeneralRe: Character set Pin
CPallini13-Sep-24 1:34
mveCPallini13-Sep-24 1:34 
GeneralRe: Character set Pin
trønderen14-Sep-24 6:48
trønderen14-Sep-24 6:48 
GeneralRe: Character set Pin
Calin Negru14-Sep-24 9:18
Calin Negru14-Sep-24 9:18 
GeneralRe: Character set Pin
k505414-Sep-24 10:15
mvek505414-Sep-24 10:15 
GeneralRe: Character set Pin
markkuk14-Sep-24 11:15
markkuk14-Sep-24 11:15 
QuestionREMOVED Pin
jana_hus7-Sep-24 7:12
jana_hus7-Sep-24 7:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.