State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional enc...
详细信息
ISBN:
(纸本)9781457705397
State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
With naturalistic dialogue management, a spoken dialogue system behaves as a human would under similar conditions. This paper reports on an experiment to develop naturalistic clarification strategies for noisy speech ...
详细信息
With naturalistic dialogue management, a spoken dialogue system behaves as a human would under similar conditions. This paper reports on an experiment to develop naturalistic clarification strategies for noisy speech recognition in the context of spoken dialogue systems. We collected a wizard-of-Oz corpus in which human wizards with access to a rich set of clarification actions made clarification decisions online, based on human-readable versions of system data. The experiment compares an evaluation of calls to a baseline system in a library domain with calls to an enhanced version of the system. The new system has a clarification module based on the wizard data that is a decision tree constructed from three machine-learned models. It replicates the wizards' ability to ground partial understandings of noisy input and to build upon them. The enhanced system has a significantly higher rate of task completion, greater task success and improved efficiency.
Artificial Intelligence and machinelearning toolkits such as Scikit-learn, PyTorch and Tensorflow provide today a solid starting point for the rapid prototyping of R&D solutions. However, they can be hardly porte...
详细信息
ISBN:
(纸本)9798350302615
Artificial Intelligence and machinelearning toolkits such as Scikit-learn, PyTorch and Tensorflow provide today a solid starting point for the rapid prototyping of R&D solutions. However, they can be hardly ported to heterogeneous decentralised hardware and real-world production environments. A common practice involves outsourcing deployment solutions to scalable cloud infrastructures such as Amazon SageMaker or Microsoft Azure. In this paper, we proposed an open-source microservices-based architecture for decent-ralised machine intelligence which aims at bringing R&D and deployment functionalities closer following a low-code approach. Such an approach would guarantee flexible integration of cutting-edge functionalities while preserving complete control over the deployed solutions at negligible costs and maintenance efforts.
Along with high performance of electronic appliances, prolongation of the design period is becoming a big issue. if this problem can be solved, time spent on design can be used for circuit performance improvement and ...
详细信息
ISBN:
(纸本)9781538621592
Along with high performance of electronic appliances, prolongation of the design period is becoming a big issue. if this problem can be solved, time spent on design can be used for circuit performance improvement and development of new circuits. Therefore, efficient circuit design through the assist of computer is required to further improve productivity. Some automatic circuit design methods have been proposed. However, these methods are unsuitable for designing a lot of circuits because it consumes a lot of time to design the new circuit. In this paper, an automatic design method of OP-Amp sizing by inference of machinelearning is proposed, and predicts the element value of the circuit. From the simulation results, we succeeded in predicting element values of a circuit that satisfies the desired characteristic about 90% accuracy and shortening the design time.
Sign language is a form of communication language to connect a deaf-mute person to the world. It involves the uses of hand gestures and body movement in order to express an idea. Nevertheless, general publics are most...
详细信息
ISBN:
(纸本)9781728133775
Sign language is a form of communication language to connect a deaf-mute person to the world. It involves the uses of hand gestures and body movement in order to express an idea. Nevertheless, general publics are mostly not educated to comprehend the sign language. For this reason, there is a need to have a translator to facilitate the communication. This paper would like to present a Convolutional Neural Network (CNN) model for predicting American Sign Language. There are 4800 images were captured to train and validate the proposed model. 95% recognition accuracy was attained in experiment, which shows robust performance in recognition 24 static American Sign Language pattern. The successful development of this model can be served as the basis to develop a more complicated sign language translator.
Deep neural networks have been successfully used in the task of black-box modeling of analog audio effects such as distortion. Improving the processing speed and memory requirements of the inference step is desirable ...
详细信息
Deep neural networks have been successfully used in the task of black-box modeling of analog audio effects such as distortion. Improving the processing speed and memory requirements of the inference step is desirable to allow such models to be used on a wide range of hardware and concurrently with other software. In this paper, we propose a new application of recent advancements in neural network pruning methods to recurrent black-box models of distortion effects using a Long Short-Term Memory architecture. We compare the efficacy of the method on four different datasets;one distortion pedal and three vacuum tube amplifiers. Iterative magnitude pruning allows us to remove over 99% of parameters from some models without a loss of accuracy. We evaluate the real-time performance of the pruned models and find that a 3x-4x speedup can be achieved, compared to an unpruned baseline. We show that training a larger model and then pruning it outperforms an unpruned model of equivalent hidden size. A listening test confirms that pruning does not degrade the perceived sound quality, but may even slightly improve it. The proposed techniques can be used to design computationally efficient deep neural networks for processing the sound of the electric guitar in real time.
Although the obtained accuracy on some lab-controlled facial expression datasets has been very high, the recognition of facial expressions in wild environments is still a challenging problem. Local Binary Patterns (LB...
详细信息
ISBN:
(纸本)9781479970612
Although the obtained accuracy on some lab-controlled facial expression datasets has been very high, the recognition of facial expressions in wild environments is still a challenging problem. Local Binary Patterns (LBP) is a widely used operator in facial expression recognition. However, there are few variations of LBP operators specifically designed for facial expression recognition. In this paper, we propose a novel representation approach called the Double Complete d-LBP (Double Cd-LBP) according to the characteristics of facial expressions. Two d-LBP are employed to represent details and the contour of faces separately, and complete LBP is used to take sign and magnitude components into account. Moreover, multi-scale LBP is exploited to obtain local texture and global information. We then use the extreme learningmachine auto-encoder (ELM-AE) as the feature selection approach to learn the discriminative feature. Cascade forest is employed as the final decision classifier. Experiments conducted on the six facial expression databases, including both lab-controlled and wild environments databases, show that our method outperforms or on par with state-of-the-arts.
Modern human behavioral signalprocessing and machine-learning methods have introduced novel ways for representing and estimating internal states of people in goal-based conversational interactions, such as psychother...
Automatic detection of solar array faults reduces maintenance costs and increases efficiency. In this paper, we address the problem of fault detection, localization, and classification in utility-scale photovoltaic (P...
详细信息
Automatic detection of solar array faults reduces maintenance costs and increases efficiency. In this paper, we address the problem of fault detection, localization, and classification in utility-scale photovoltaic (PV) arrays using machinelearning methods. More specifically, we develop a series of customized neural networks for detection and classification of solar array faults. We evaluate fault detection and classification using metrics such as accuracy, confusion matrices, and the Risk Priority Number (RPN). We examine and assess the use of customized neural networks with dropout regularizers. We develop and evaluate neural network pruning strategies and illustrate the trade-off between fault classification model accuracy and algorithm complexity. Our approach promises to elevate the performance and robustness of PV arrays and compares favorably against existing methods.
Nearest neighbor query processing is a fundamental problem that arises in many fields such as spatial databases and machinelearning ASPE, which uses invertible matrices to encrypt data, is a widely adopted Secure Nea...
详细信息
ISBN:
(纸本)9781538674741
Nearest neighbor query processing is a fundamental problem that arises in many fields such as spatial databases and machinelearning ASPE, which uses invertible matrices to encrypt data, is a widely adopted Secure Nearest Neighbor (SNN) query scheme. Encrypting data by matrices is actually a linear combination of the multiple dimensions of the data, which is completely consistent with the relationship between the source signals and observed signals in the signalprocessing. By viewing dimensions of the data and the encrypted data as source signals and observed signals, respectively, we formally prove and experimentally demonstrate that ASPE is actually insecure against even ciphertext only attacks, using signalprocessing theory. Prior work proved that it is impossible to construct an SNN scheme even in much relaxed standard security models, we invalidate this hardness understanding by pointing out the incorrectness of the hardness proof.
暂无评论