Can create invalid unicode strings

Many phones use surrogates to encode higher plane unicode chars and this gets passed through Gammu and python-gammu to Python unicode string. The problem is that these are not allowed there, do doing something with such string ends up in:

```
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 10316: surrogates not allowed
```

There has to be some bug in the surrogate conversion code:

https://github.com/gammu/python-gammu/blob/86a497c623b139df3819ed22d2763ff5aec76578/gammu/src/convertors/string.c#L121-L136

Or there is other way this can slip through. I've seen this in `Text` as returned by `DecodePDU`.

	/* Convert string without zero at the end. */
	*out_len = 0;
	for (i = 0; i < len; i++) {
	value = (src[2 * i] << 8) + src[(2 * i) + 1];
	if (value >= 0xD800 && value <= 0xDBFF) {
	second = src[(i + 1) * 2] * 256 + src[(i + 1) * 2 + 1];
	if (second >= 0xDC00 && second <= 0xDFFF) {
	value = ((value - 0xD800) << 10) + (second - 0xDC00) + 0x010000;
	i++;
	} else if (second == 0) {
	/* Surrogate at the end of string */
	value = 0xFFFD; /* REPLACEMENT CHARACTER */
	}
	}
	dest[(*out_len)++] = value;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can create invalid unicode strings #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Can create invalid unicode strings #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions