ICSRC 2022

Q Can we use speech synthesis or voice conversion to generate data used as training data?

A It's allowed. But you can only use the allowed training dataset to train your speech synthesis or voice conversion model.

Q Can we use the pre-trained model (such as wav2vec) to extract features or distill the pre-trained model for our model training?

A It's allowed. But the pre-train model must be trained on the allowed training dataset. The model pre-trained on other datasets is not allowed.

Q What's the format of submission file?

A The format of submission file is the same as the text file of Eval set. The first column is utt-id while the second is the result of recognition.

Q Can we use the Eval dataset to finetune model?

A It's allowed.

Q The limitation of model size in track I.

A In track I, the number of model parameters (instead of the size of checkpoint file) cannot exceed 15M. For example, the baseline system of track I has 13.9M model parameters which satisfies the limit of track I.

Q Does the test set contain English?

A The test set does not have any English.

Q Can we use extra data to train language models?

A It's not allowed. You can only use dataset allowed to train LM.

Q Is there speaker overlap between the Eval set and Test set.

A No overlap.

Q Can we use the language model based on FST?

A It's allowed. But the file of language model must satisfy the limitation of FST file(cannot exceed 25M) and the graph must be built on the dataset allowed.

If you have any other queries, please contact us at azhang@nwpu-aslp.org