Skip to content

Move numbering kinds to Codex (take 2)#145

Merged
laurmaedje merged 6 commits intotypst:mainfrom
MDLC01:numeral-systems
Mar 18, 2026
Merged

Move numbering kinds to Codex (take 2)#145
laurmaedje merged 6 commits intotypst:mainfrom
MDLC01:numeral-systems

Conversation

@MDLC01
Copy link
Collaborator

@MDLC01 MDLC01 commented Feb 18, 2026

This is a rewrite of #116.

In order to move forward with this, I suggest we only focus on the architecture for now; we can discuss the names and shorthands and add missing numeral systems in a later PR. I also think we should wait to decide precisely which numeral systems to include before adding more extensive tests. That way, we can simply test the individual numeral systems, which is what we care about. I added all the numeral systems from Typst with no modification other than removing the zero case when it felt appropriate. Adding new numeral systems can be left for future PR.

I would like some resource to link for Hiragana, Katakana, Korean, and Bengali letters (specifically their order).

The algorithms are the same as the ones we ended up with in #116. The main change is that I wanted to have a way to represent the fact that some numeral systems are simply unable to represent some values. In this case, we now return an error instead of falling back to Arabic numerals. Then, the client (i.e., Typst) can decide what to do (raise an error, or fallback to Arabic numerals).

Given that this plans to move some things out of typst/typst, I'll cc. @laurmaedje.

@MDLC01 MDLC01 added meta Discussion about the structure of this repo waiting on reviews Breaking and non-breaking changes need respectively 3 and 2 reviews labels Feb 18, 2026
@MDLC01 MDLC01 force-pushed the numeral-systems branch 6 times, most recently from e8d75d1 to 4b66286 Compare March 2, 2026 18:17
@MDLC01 MDLC01 force-pushed the numeral-systems branch from 4b66286 to 29ec220 Compare March 3, 2026 09:41
Copy link
Member

@laurmaedje laurmaedje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice and clean!

As an automated sanity check, it would be nice to validate once that the new algorithms give the exact same output as Typst's current ones for all numbers from 0..100_000 or something like that.

It would also be nice to have better test coverage. While the underlying systems are all tested in principle, we don't have tests for very large numbers and also not for all the named systems. So, it would be easy to introduce a regression when editing that code.

Since it's quite annoying to test those all manually, I could imagine to just have a hash-based safety net. Basically, after testing against Typst's currently output, we at least have reasonable confidence that it's not worse than before. We could then, for each system, compute a hash of how it formats the numbers from 0 to some big N and commit that hash into a test. (I'm kind of on a hash-based testing hype train... ^^)

PS: As you already noticed, I extracted and complemented the unrelated changes into #150.

@laurmaedje laurmaedje removed the waiting on reviews Breaking and non-breaking changes need respectively 3 and 2 reviews label Mar 13, 2026
@MDLC01
Copy link
Collaborator Author

MDLC01 commented Mar 16, 2026

I compared the results between Typst main and this PR for all integers between 0 and 100,000 for all numeral systems that have a shorthand (i.e., excluding two Chinese ones, which are not accessible yet and we don't implement ourselves anyway). The hashes match as expected. The only differences are zero for numeral systems that don't actually support it, and numbers that are too large to be represented by a fixed system (circled Arabic numerals), which now don't fallback to regular Arabic numerals anymore.

I'm not sure how to implement the hash-based testing infrastructure though. Should I use the siphasher crate (like Typst does for the hash128 function) and simply store the results in a reference file, or are you thinking of something more complex?

@laurmaedje
Copy link
Member

laurmaedje commented Mar 16, 2026

I was thinking of the most low tech solution possible. Basically just computing a single composite hash for all numbers from 0 to 10000 per named numbering system and hard coding the result into assert_eq. Doesn't even need to be in a separate file for me. The purpose is only to detect regressions after all.

And yeah, using siphasher would be good.

@MDLC01
Copy link
Collaborator Author

MDLC01 commented Mar 17, 2026

I added tests for numbers from 0 to 99,999. Going higher makes the test too slow to run because of NamedNumeralSystem::Symbols which allocates very long strings.

I also used a match expression to make sure we keep exhaustive tests in the future.

@laurmaedje laurmaedje added the waiting on reviews Breaking and non-breaking changes need respectively 3 and 2 reviews label Mar 17, 2026
@laurmaedje laurmaedje merged commit 0426b6a into typst:main Mar 18, 2026
2 checks passed
@laurmaedje laurmaedje removed the waiting on reviews Breaking and non-breaking changes need respectively 3 and 2 reviews label Mar 18, 2026
@laurmaedje
Copy link
Member

Thank you, great work!

@MDLC01 MDLC01 deleted the numeral-systems branch March 18, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

meta Discussion about the structure of this repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants