Q
Can we use speech synthesis or voice conversion to generate data used as training data?
A
It's allowed. But you can only use the allowed training dataset to train your speech synthesis or voice conversion model.
Q
Can we use the pre-trained model (such as wav2vec) to extract features or distill the pre-trained model for our model training?
A
It's allowed. But the pre-train model must be trained on the allowed training dataset. The model pre-trained on other datasets is not allowed.
Q
What's the format of submission file?
A
The format of submission file is the same as the text file of Eval set. The first column is utt-id while the second is the result of recognition.
Q
Can we use the Eval dataset to finetune model?
A
It's allowed.
Q
The limitation of model size in track I.
A
In track I, the number of model parameters (instead of the size of checkpoint file) cannot exceed 15M. For example, the baseline system of track I has 13.9M model parameters which satisfies the limit of track I.
Q
Does the test set contain English?
A
The test set does not have any English.
Q
Can we use extra data to train language models?
A
It's not allowed. You can only use dataset allowed to train LM.
Q
Is there speaker overlap between the Eval set and Test set.
A
No overlap.
Q
Can we use the language model based on FST?
A
It's allowed. But the file of language model must satisfy the limitation of FST file(cannot exceed 25M) and the graph must be built on the dataset allowed.