In this paper, a hierarchical Arabic phoneme recognition system is proposed in which Mel Frequency Cepstrum Coefficients (MFCC) features is used to train the hierarchical neural networks architecture. Here, separate neural networks (subnetworks) are to be recursively trained to recognize subsets of phonemes. The overall recognition process is a combination of the outputs of these subnetworks. Experiments that explore the performance of the proposed hierarchical system in comparison to non-hierarchical (flat) baseline systems are also presented in this paper.
With the recent developments of technology and the advances in artificial intelligence and machine learning techniques, it has become possible for the robot to understand and respond to voice as part of Human-Robot Interaction (HRI). The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments. In this work, a review of the voice-based interface for HRI systems has been presented. The review focuses on voice-based perception in HRI systems from three facets, which are: feature extraction, dimensionality reduction, and semantic understanding. For feature extraction, numerous types of features have been reviewed in various domains, such as time, frequency, cepstral (i.e. implementing the inverse Fourier transform for the signal spectrum logarithm), and deep domains. For dimensionality reduction, subspace learning can be used to eliminate the redundancies of high-dimensional features by further processing extracted features to reflect their semantic information better. For semantic understanding, the aim is to infer from the extracted features the objects or human behaviors. Numerous types of semantic understanding have been reviewed, such as speech recognition, speaker recognition, speaker gender detection, speaker gender and age estimation, and speaker localization. Finally, some of the existing voice-based interface issues and recommendations for future works have been outlined.
Many assistive devices have been developed for visually impaired (VI) person in recent years which solve the problems that face VI person in his/her daily moving. Most of researches try to solve the obstacle avoidance or navigation problem, and others focus on assisting VI person to recognize the objects in his/her surrounding environment. However, a few of them integrate both navigation and recognition capabilities in their system. According to above needs, an assistive device is presented in this paper that achieves both capabilities to aid the VI person to (1) navigate safely from his/her current location (pose) to a desired destination in unknown environment, and (2) recognize his/her surrounding objects. The proposed system consists of the low cost sensors Neato XV-11 LiDAR, ultrasonic sensor, Raspberry pi camera (CameraPi), which are hold on a white cane. Hector SLAM based on 2D LiDAR is used to construct a 2D-map of unfamiliar environment. While A* path planning algorithm generates an optimal path on the given 2D hector map. Moreover, the temporary obstacles in front of VI person are detected by an ultrasonic sensor. The recognition system based on Convolution Neural Networks (CNN) technique is implemented in this work to predict object class besides enhance the navigation system. The interaction between the VI person and an assistive system is done by audio module (speech recognition and speech synthesis). The proposed system performance has been evaluated on various real-time experiments conducted in indoor scenarios, showing the efficiency of the proposed system.