Skip to content
This repository was archived by the owner on May 19, 2026. It is now read-only.
This repository was archived by the owner on May 19, 2026. It is now read-only.

Replace SubwordTokenizer with WordPieceVocabulary #116

@sarahyurick

Description

@sarahyurick

In cuDF 24.06, SubwordTokenizer will be deprecated in favor of WordPieceVocabulary. We should update https://github.com/rapidsai/crossfit/blob/main/crossfit/op/tokenize.py accordingly.

Relevant PR: rapidsai/cudf#18334

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions