Many phones use surrogates to encode higher plane unicode chars and this gets passed through Gammu and python-gammu to Python unicode string. The problem is that these are not allowed there, do doing something with such string ends up in:
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 10316: surrogates not allowed
There has to be some bug in the surrogate conversion code:
|
/* Convert string without zero at the end. */ |
|
*out_len = 0; |
|
for (i = 0; i < len; i++) { |
|
value = (src[2 * i] << 8) + src[(2 * i) + 1]; |
|
if (value >= 0xD800 && value <= 0xDBFF) { |
|
second = src[(i + 1) * 2] * 256 + src[(i + 1) * 2 + 1]; |
|
if (second >= 0xDC00 && second <= 0xDFFF) { |
|
value = ((value - 0xD800) << 10) + (second - 0xDC00) + 0x010000; |
|
i++; |
|
} else if (second == 0) { |
|
/* Surrogate at the end of string */ |
|
value = 0xFFFD; /* REPLACEMENT CHARACTER */ |
|
} |
|
} |
|
dest[(*out_len)++] = value; |
|
} |
Or there is other way this can slip through. I've seen this in Text as returned by DecodePDU.
Many phones use surrogates to encode higher plane unicode chars and this gets passed through Gammu and python-gammu to Python unicode string. The problem is that these are not allowed there, do doing something with such string ends up in:
There has to be some bug in the surrogate conversion code:
python-gammu/gammu/src/convertors/string.c
Lines 121 to 136 in 86a497c
Or there is other way this can slip through. I've seen this in
Textas returned byDecodePDU.