Skip to content

[EN Currency] ISO codes not recognised as currency prefixes; crore/lakh multipliers missing from currency context #3211

@nikitabuxy

Description

@nikitabuxy

Describe the bug
When an ISO currency code is directly concatenated to a number (no space between
them), the currency model returns a wrong numeric value with high confidence and
no review flag. This is silent data corruption — the caller has no indication
the returned value is incorrect.

To Reproduce

Install: pip install recognizers-number-with-unit

from recognizers_text import Culture
from recognizers_number_with_unit import NumberWithUnitRecognizer

model = NumberWithUnitRecognizer(Culture.English).get_currency_model()

print(model.parse("USD34.6 million"))
# Returns: [{'value': '6000000', 'unit': None}]

print(model.parse("VND4,927 billion"))
# Returns: [{'value': '927000000000', 'unit': None}]

print(model.parse("USD0.92 Million"))
# Returns: [{'value': '92000000', 'unit': None}]

**Expected behavior**

model.parse("USD34.6 million")
# Expected: [{'value': '34600000', 'unit': 'United States dollar'}]

model.parse("VND4,927 billion")
# Expected: [{'value': '4927000000000', 'unit': 'Vietnamese dong'}]

model.parse("USD0.92 Million")
# Expected: [{'value': '920000', 'unit': 'United States dollar'}]


**Sample input/output**

┌──────────────────┬──────────────┬────────────────┐
│      InputActual valueExpected value │
├──────────────────┼──────────────┼────────────────┤
│ USD34.6 million600000034600000       │
├──────────────────┼──────────────┼────────────────┤
│ USD4.1 Million10000004100000        │
├──────────────────┼──────────────┼────────────────┤
│ USD0.92 Million92000000920000         │
├──────────────────┼──────────────┼────────────────┤
│ AUD1.2 million20000001200000        │
├──────────────────┼──────────────┼────────────────┤
│ VND4,927 billion9270000000004927000000000  │
└──────────────────┴──────────────┴────────────────┘

Root cause: QueryProcessor.preprocess() lowercases the query before
extraction. The internal EnglishNumberExtractor (Unit mode) then
misreads the lowercased formusd34.6 million has the decimal 34.
treated as a sentence-ending period, so only 6 million is extracted.
Similarly, vnd4,927 billion has 4, absorbed into the non-numeric
prefix, leaving only 927 billion. Both cases return a high-confidence
result with no flag.


**Platform (please complete the following information):**
- Platform: Python
- Environment: pip package (recognizers-number-with-unit)
- Version: 1.0.1
- Culture: English


**Additional context**
The spaced variant (USD 34.6 million) also currently returns no result
because ISO codes are absent from CurrencyPrefixListthat is tracked
as a separate enhancement request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions