Overview of Vosk API Training
Vosk API Training offers tools for custom speech recognition model training using the Kaldi toolkit. It supports various stages from acoustic model training to language model creation and decoding pipelines.
Directory Structure and Key Files
The training repository includes essential files like cmd.sh for command configurations, conf/ for feature extraction settings, and local/ for data preparation scripts. Key files include path.sh for setting Kaldi paths, run.sh as the main entry point for training, and RESULTS for displaying Word Error Rate results.
Installation Steps
To begin training with Vosk API, clone the repository, install Kaldi, and configure the paths correctly. Ensure necessary tools like ffmpeg, sox, and sctk are available for data preparation and scoring. Set up environment variables using cmd.sh and path.sh.
Training Process Breakdown
The training process starts with Data Preparation by downloading and preparing the dataset. Next, Dictionary Preparation creates pronunciation dictionaries. MFCC Feature Extraction follows, extracting MFCC features and performing CMVN. Acoustic Model Training includes monophone, LDA+MLLT, and SAT models. The TDNN Chain Model Training utilizes speaker adaptation with i-vectors. The process concludes with Decoding, evaluating the model on test data.
Results and Evaluation
Evaluation of the model's performance is crucial. Utilize RESULTS to display the Word Error Rate (WER) results. The provided example in RESULTS.txt showcases the WER percentage along with insertion, deletion, and substitution errors for different decoding scenarios.
Stay Ahead in Today’s Competitive Market!
Unlock your company’s full potential with a Virtual Delivery Center (VDC). Gain specialized expertise, drive
seamless operations, and scale effortlessly for long-term success.
Book a Meeting to Avail the Services of Voxpow