Skip to content

Replace SubwordTokenizer with WordPieceVocabulary #116

@sarahyurick

Description

@sarahyurick

In cuDF 24.06, SubwordTokenizer will be deprecated in favor of WordPieceVocabulary. We should update https://github.com/rapidsai/crossfit/blob/main/crossfit/op/tokenize.py accordingly.

Relevant PR: rapidsai/cudf#18334

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions