Correct Pronunciation Detection of the Arabic Alphabet Using Deep Learning,Applied Sciences

当前位置： X-MOL 学术 › Appl. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Correct Pronunciation Detection of the Arabic Alphabet Using Deep Learning
Applied Sciences ( IF 2.5 ) Pub Date : 2021-03-11 , DOI: 10.3390/app11062508
Nishmia Ziafat , Hafiz Farooq Ahmad , Iram Fatima , Muhammad Zia , Abdulaziz Alhumam , Kashif Rajpoot

Automatic speech recognition for Arabic has its unique challenges and there has been relatively slow progress in this domain. Specifically, Classic Arabic has received even less research attention. The correct pronunciation of the Arabic alphabet has significant implications on the meaning of words. In this work, we have designed learning models for the Arabic alphabet classification based on the correct pronunciation of an alphabet. The correct pronunciation classification of the Arabic alphabet is a challenging task for the research community. We divide the problem into two steps, firstly we train the model to recognize an alphabet, namely Arabic alphabet classification. Secondly, we train the model to determine its quality of pronunciation, namely Arabic alphabet pronunciation classification. Due to the less availability of audio data of this kind, we had to collect audio data from the experts, and novices for our model’s training. To train these models, we extract pronunciation features from audio data of the Arabic alphabet using mel-spectrogram. We have employed a deep convolution neural network (DCNN), AlexNet with transfer learning, and bidirectional long short-term memory (BLSTM), a type of recurrent neural network (RNN), for the classification of the audio data. For alphabet classification, DCNN, AlexNet, and BLSTM achieve an accuracy of 95.95%, 98.41%, and 88.32%, respectively. For Arabic alphabet pronunciation classification, DCNN, AlexNet, and BLSTM achieve an accuracy of 97.88%, 99.14%, and 77.71%, respectively.

中文翻译：

使用深度学习正确检测阿拉伯字母的发音

阿拉伯语的自动语音识别有其独特的挑战，在这一领域进展相对缓慢。具体而言，经典阿拉伯文受到的研究更少。阿拉伯字母的正确发音对单词的含义有重要影响。在这项工作中，我们根据字母表的正确发音设计了阿拉伯字母分类的学习模型。对于研究团体来说，正确的阿拉伯字母发音分类是一项艰巨的任务。我们将问题分为两个步骤，首先，我们训练模型以识别字母，即阿拉伯字母分类。其次，我们训练模型以确定其发音质量，即阿拉伯字母发音分类。由于这种音频数据的可用性较低，我们不得不从专家和新手那里收集音频数据，以进行模型训练。为了训练这些模型，我们使用mel-specgramgram从阿拉伯字母的音频数据中提取发音特征。我们采用了深度卷积神经网络（DCNN），具有转移学习功能的AlexNet和双向长期短期记忆（BLSTM）（一种递归神经网络（RNN））来对音频数据进行分类。对于字母分类，DCNN，AlexNet和BLSTM分别达到95.95％，98.41％和88.32％的精度。对于阿拉伯字母发音分类，DCNN，AlexNet和BLSTM的准确度分别为97.88％，99.14％和77.71％。我们使用mel-spectrogram从阿拉伯字母的音频数据中提取发音特征。我们采用了深度卷积神经网络（DCNN），具有转移学习功能的AlexNet和双向长期短期记忆（BLSTM）（一种递归神经网络（RNN））来对音频数据进行分类。对于字母分类，DCNN，AlexNet和BLSTM分别达到95.95％，98.41％和88.32％的精度。对于阿拉伯字母发音分类，DCNN，AlexNet和BLSTM的准确度分别为97.88％，99.14％和77.71％。我们使用mel-spectrogram从阿拉伯字母的音频数据中提取发音特征。我们采用了深度卷积神经网络（DCNN），具有转移学习功能的AlexNet和双向长期短期记忆（BLSTM）（一种递归神经网络（RNN））来对音频数据进行分类。对于字母分类，DCNN，AlexNet和BLSTM分别达到95.95％，98.41％和88.32％的精度。对于阿拉伯字母发音分类，DCNN，AlexNet和BLSTM的准确度分别为97.88％，99.14％和77.71％。用于音频数据的分类。对于字母分类，DCNN，AlexNet和BLSTM分别达到95.95％，98.41％和88.32％的精度。对于阿拉伯字母发音分类，DCNN，AlexNet和BLSTM的准确度分别为97.88％，99.14％和77.71％。用于音频数据的分类。对于字母分类，DCNN，AlexNet和BLSTM分别达到95.95％，98.41％和88.32％的精度。对于阿拉伯字母发音分类，DCNN，AlexNet和BLSTM的准确度分别为97.88％，99.14％和77.71％。

更新日期：2021-03-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>