The present study showcases a novel deep learning-based vision application tasked with reducing the communication gap between signlanguage and non-signlanguage users. Speech and hearing impairments are a type of dis...
详细信息
ISBN:
(纸本)9783031686382;9783031686399
The present study showcases a novel deep learning-based vision application tasked with reducing the communication gap between signlanguage and non-signlanguage users. Speech and hearing impairments are a type of disability that restricts an individual's ability to communicate with others properly. Modern-day automation tools can be used to address this communication gap and allow people to communicate ubiquitously and in a variety of situations. The method defined in the paper involves loading a video file, extracting each frame, and detecting the hand landmarks in each frame using the Media-Pipe library. Then the frame is cropped, and the region of interest is pre-processed and stored in a new data directory for training purposes. The pre-processing involves the use of Gaussian blur, edge detection, morphological transformations, and signal processing functions. Data augmentation is then performed, and images are saved in a new directory. The images are then used to train a custom CNN model, which contains four convolutional layers along with two fully connected layers. The model is compiled using the categorical cross-entropy loss function, optimised using the RMSprop optimiser, and then evaluated using the evaluation metric, accuracy. The predicted signlanguage alphabet is displayed on the screen and is converted to speech using the Google text-to-Speech library. The model achieves an overall accuracy of 93.96%. The findings indicate that the proposed approach can serve as a road map to develop a real-time system capable of signlanguage recognition and Direct future investigations in this domain.
暂无评论