- Support encoding and decoding four-byte UTF-8 sequences
- E_unicode supports surrogate pairs, renamed to E_utf16be for clarity
- char32_t should be used for storing a Unicode code point
The no-arguments get_text() and set_text() will now return Unicode strings in Python 3, but passing in an encoding will make them return/take bytes objects.
In Python 2, they all take regular strings, but Unicode is also accepted by the no-argument get_text() and set_text().
In the future we probably want to remove most of this interface for Python users, to whom all this is unnecessary since it duplicates functionality already in the standard library.