Bug Description
Xtream-Codes series search/matching fails to find a series whose name contains
non-ASCII Latin letters (e.g. Turkish "ğ", "ı", "ş", "ç") when the provider's
catalog stores the title differently (with or without the diacritics) than
TMDB does.
Root Cause
IptvRepository.kt defines:
val NON_ALPHA_NUM_REGEX = Regex("[^a-z0-9]+")
private fun normalizeLookupText(value: String): String {
...
.lowercase(Locale.US)
.replace(NON_ALPHA_NUM_REGEX, " ")
...
}
This regex only recognizes ASCII a-z0-9. Non-ASCII letters like Turkish "ğ"
are not transliterated — they're treated as punctuation and replaced with a
space, splitting the word in two:
"Doğu" → normalizeLookupText() → "do u"
If the provider's catalog stores the title in plain ASCII (e.g. "Dogu", common
for many IPTV backends), it normalizes to "dogu" — a single token — which no
longer matches "do u" either exactly or via the word-overlap fuzzy scoring
(scoreNameMatch / looseSeriesTitleScore, which both filter out words
shorter than 3 characters, dropping "do" and "u" entirely).
Comparison with the existing Jellyfin/Emby/Plex matcher
HomeServerRepository.kt's HomeServerMatcher.normalizeTitle() already
handles this correctly via Unicode NFD decomposition before stripping:
fun normalizeTitle(title: String): String {
val ascii = Normalizer.normalize(title, Normalizer.Form.NFD)
.replace(DIACRITICS_REGEX, "")
return ascii.lowercase(Locale.US)...
}
This correctly turns "Doğu" into "dogu" by stripping only the combining
diacritic mark, not the base letter.
Suggested Fix
Apply the same NFD-normalize + diacritics-strip step in
IptvRepository.kt::normalizeLookupText() before the NON_ALPHA_NUM_REGEX
replacement, so it behaves consistently with HomeServerMatcher.normalizeTitle().
Impact
Any Xtream-Codes series/VOD title containing non-ASCII Latin characters
(Turkish, German umlauts, etc.) is at risk of silently failing to match
against the user's own portal content, even though the content exists and is
correctly listed via get_series/get_series_info.
Bug Description
Xtream-Codes series search/matching fails to find a series whose name contains
non-ASCII Latin letters (e.g. Turkish "ğ", "ı", "ş", "ç") when the provider's
catalog stores the title differently (with or without the diacritics) than
TMDB does.
Root Cause
IptvRepository.ktdefines:This regex only recognizes ASCII
a-z0-9. Non-ASCII letters like Turkish "ğ"are not transliterated — they're treated as punctuation and replaced with a
space, splitting the word in two:
If the provider's catalog stores the title in plain ASCII (e.g. "Dogu", common
for many IPTV backends), it normalizes to
"dogu"— a single token — which nolonger matches
"do u"either exactly or via the word-overlap fuzzy scoring(
scoreNameMatch/looseSeriesTitleScore, which both filter out wordsshorter than 3 characters, dropping "do" and "u" entirely).
Comparison with the existing Jellyfin/Emby/Plex matcher
HomeServerRepository.kt'sHomeServerMatcher.normalizeTitle()alreadyhandles this correctly via Unicode NFD decomposition before stripping:
This correctly turns "Doğu" into "dogu" by stripping only the combining
diacritic mark, not the base letter.
Suggested Fix
Apply the same NFD-normalize + diacritics-strip step in
IptvRepository.kt::normalizeLookupText()before theNON_ALPHA_NUM_REGEXreplacement, so it behaves consistently with
HomeServerMatcher.normalizeTitle().Impact
Any Xtream-Codes series/VOD title containing non-ASCII Latin characters
(Turkish, German umlauts, etc.) is at risk of silently failing to match
against the user's own portal content, even though the content exists and is
correctly listed via
get_series/get_series_info.