Issue with malyalam and kannad normalization, extra space is inserted before punctuation

I have tried normalizing malyalam and kannada but in the output was not able to normalize phone number
Input:
{"text":"എനിക്ക് 29 വയസുണ്ട്, എന്റെ ഫോൺ നമ്പർ 9123456789 ആണ്."}
Output:
{
    "normalized_text": "എനിക്ക് ഇരുപത്തൊമ്പത് വയസുണ്ട് , എന്റെ ഫോൺ നമ്പർ 9123456789 ആണ് .",
    "detected_lang": "ml",
    "lang_name": "Malayalam",
    "processing_time": 0.05313992500305176
}

Also I am putting together all observed issues accross langauges

Observed Behavior

1. Phone number normalization (critical)
Phone numbers are handled differently across languages:
Hindi (hi): Digit-wise normalization but extra leading digit introduced
Tamil (ta): Digit-wise normalization, but non-Tamil zero lexeme (பூஜ்யம்) used
Telugu (te): Phone number treated as a cardinal quantity, expanded into crores/lakhs
Kannada (kn): Phone number not normalized at all
Malayalam (ml): Phone number not normalized at all


2. Punctuation normalization (global bug)
Across all tested languages, an extra space is inserted before punctuation:
Examples:
है ।
ஆகும் .
ಇದೆ .
ആണ് .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with malyalam and kannad normalization, extra space is inserted before punctuation #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with malyalam and kannad normalization, extra space is inserted before punctuation #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions