Describe the bug
When an ISO currency code is directly concatenated to a number (no space between
them), the currency model returns a wrong numeric value with high confidence and
no review flag. This is silent data corruption — the caller has no indication
the returned value is incorrect.
To Reproduce
Install: pip install recognizers-number-with-unit
from recognizers_text import Culture
from recognizers_number_with_unit import NumberWithUnitRecognizer
model = NumberWithUnitRecognizer(Culture.English).get_currency_model()
print(model.parse("USD34.6 million"))
# Returns: [{'value': '6000000', 'unit': None}]
print(model.parse("VND4,927 billion"))
# Returns: [{'value': '927000000000', 'unit': None}]
print(model.parse("USD0.92 Million"))
# Returns: [{'value': '92000000', 'unit': None}]
**Expected behavior**
model.parse("USD34.6 million")
# Expected: [{'value': '34600000', 'unit': 'United States dollar'}]
model.parse("VND4,927 billion")
# Expected: [{'value': '4927000000000', 'unit': 'Vietnamese dong'}]
model.parse("USD0.92 Million")
# Expected: [{'value': '920000', 'unit': 'United States dollar'}]
**Sample input/output**
┌──────────────────┬──────────────┬────────────────┐
│ Input │ Actual value │ Expected value │
├──────────────────┼──────────────┼────────────────┤
│ USD34.6 million │ 6000000 │ 34600000 │
├──────────────────┼──────────────┼────────────────┤
│ USD4.1 Million │ 1000000 │ 4100000 │
├──────────────────┼──────────────┼────────────────┤
│ USD0.92 Million │ 92000000 │ 920000 │
├──────────────────┼──────────────┼────────────────┤
│ AUD1.2 million │ 2000000 │ 1200000 │
├──────────────────┼──────────────┼────────────────┤
│ VND4,927 billion │ 927000000000 │ 4927000000000 │
└──────────────────┴──────────────┴────────────────┘
Root cause: QueryProcessor.preprocess() lowercases the query before
extraction. The internal EnglishNumberExtractor (Unit mode) then
misreads the lowercased form — usd34.6 million has the decimal 34.
treated as a sentence-ending period, so only 6 million is extracted.
Similarly, vnd4,927 billion has 4, absorbed into the non-numeric
prefix, leaving only 927 billion. Both cases return a high-confidence
result with no flag.
**Platform (please complete the following information):**
- Platform: Python
- Environment: pip package (recognizers-number-with-unit)
- Version: 1.0.1
- Culture: English
**Additional context**
The spaced variant (USD 34.6 million) also currently returns no result
because ISO codes are absent from CurrencyPrefixList — that is tracked
as a separate enhancement request.
Describe the bug
When an ISO currency code is directly concatenated to a number (no space between
them), the currency model returns a wrong numeric value with high confidence and
no review flag. This is silent data corruption — the caller has no indication
the returned value is incorrect.
To Reproduce
Install: pip install recognizers-number-with-unit