my i use my own vocabulary

I want to train t5 from scratch, and use my own vocabulary.

the model i can load like this:
config = T5Config.from_json_file(config_file)
model = T5ForConditionalGeneration(config)

the vocabulary is like this below, it seems the tokenizer cannot load this  vocab. how should i load this to a proper tokenizer?
{
"": 0,
"": 1,
"": 2,
"": 3,
"": 4,
"，": 5,
"的": 6,
"？": 7,
"了": 8,
.....
.....
.....
"<s_181>": 33786,
"<s_182>": 33787,
"<s_183>": 33788,
"<s_184>": 33789,
"<s_185>": 33790,
"<s_186>": 33791,
"<s_187>": 33792,
"<s_188>": 33793,
"<s_189>": 33794
}





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

my i use my own vocabulary #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

my i use my own vocabulary #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions