Characters in C++: char, wchar_t...

How to Write UTF-8 String Literals in C++

(From http://utf8everywhere.org/)

If you internationalize your software then all non-ASCII strings will be loaded from an external translation database, so it is not a problem.

If you still want to embed a special character you can do it as follows. In C++11 you can do it as:

u8"∃y ∀x ¬(x ≺ y)"

With compilers that do not support ‘u8’ you can hard-code the UTF-8 code units as follows:

"\xE2\x88\x83y \xE2\x88\x80x \xC2\xAC(x \xE2\x89\xBA y)"

However the most straightforward way is to just write the string as-is and save the source file encoded in UTF-8:

"∃y ∀x ¬(x ≺ y)"

Wide Characters with wchar_t

A wide char is similar to the char data type, except that wide chars take up twice the space and can take on values in a far wider range as a result. char can only take one of 256 values, which corresponds to entries in the ASCII table. On the other hand, a wide char can take on 65536 values which corresponds to UNICODE values, which is a recent international standard which allows for the encoding of characters for virtually all languages and commonly used symbols.

Some points:

Below is a simple C++ program to show how wchar_t is used:

// An example in C++ demonstrating wchar_t
#include <iostream>
using namespace std;
​
int main()
{
    wchar_t w  = L'A';
    cout << "Wide character value:: " << w << endl ;
    cout << "Size of the wide char is:: " << sizeof(w);
    return 0;
}>

When should we use wchar_t. The site utf8everywhere.org recommends converting from wchar_t to char (UTF-16 to UTF-8) as soon as you receive it from any library, and converting back when you need to pass strings to it. Therefore, always use char except when an API requires you to pass or receive wchar_t. The rationale is that UTF-8 (matching C/C++ char) is good enough for all use cases but a select few.