Skip to content

Can create invalid unicode strings #37

@nijel

Description

@nijel

Many phones use surrogates to encode higher plane unicode chars and this gets passed through Gammu and python-gammu to Python unicode string. The problem is that these are not allowed there, do doing something with such string ends up in:

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 10316: surrogates not allowed

There has to be some bug in the surrogate conversion code:

/* Convert string without zero at the end. */
*out_len = 0;
for (i = 0; i < len; i++) {
value = (src[2 * i] << 8) + src[(2 * i) + 1];
if (value >= 0xD800 && value <= 0xDBFF) {
second = src[(i + 1) * 2] * 256 + src[(i + 1) * 2 + 1];
if (second >= 0xDC00 && second <= 0xDFFF) {
value = ((value - 0xD800) << 10) + (second - 0xDC00) + 0x010000;
i++;
} else if (second == 0) {
/* Surrogate at the end of string */
value = 0xFFFD; /* REPLACEMENT CHARACTER */
}
}
dest[(*out_len)++] = value;
}

Or there is other way this can slip through. I've seen this in Text as returned by DecodePDU.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions