This repository was archived by the owner on May 19, 2026. It is now read-only.
This repository was archived by the owner on May 19, 2026. It is now read-only.
SubwordTokenizer with WordPieceVocabulary #116
In cuDF 24.06,
SubwordTokenizerwill be deprecated in favor ofWordPieceVocabulary. We should update https://github.com/rapidsai/crossfit/blob/main/crossfit/op/tokenize.py accordingly.Relevant PR: rapidsai/cudf#18334