Piano transcription is a fundamental problem in music information retrieval, which aims to infer the note sequence from the piano multimedia. This paper proposes a piano transcription model named CRNN-GCN, which is fu...
Piano transcription is a fundamental problem in music information retrieval, which aims to infer the note sequence from the piano multimedia. This paper proposes a piano transcription model named CRNN-GCN, which is fused with audio transcription model Onset and Frames(CRNN) and visual transcription model Graph Convolutional Network(GCN). CRNN extracts features from the audio, and GCN extracts features from hand skeletons rather than video frames, which effectively reduces video memory and computational complexity. All the features are then integrated to obtain better transcription results. On our self-built dataset OMAPS2, the F1-scores of the single-modal CRNN and GCN are 89.98% and 61.63%, while the F1-score of the multi-modal CRNNGCN reaches 92.06%, which is the best result at present.
With the popularity of cameras, smartphones and other devices, visual content data such as images and videos are increasing day by day, and visual content data processing technology has become a research direction in ...
详细信息
Handwriting analysis, commonly referred to as Graphology, can reflect a person's personality because writing movements are controlled by the brain, which contains memories about various life experiences and stored...
详细信息
Currently, primary imageprocessing is used to improve recognition results. In this paper, improvements have been performed using normalization and phasing operations to increase image contrast. A normalized histogram...
详细信息
The reference of Martial arts action structure to assist the artificial correction is poor, for this problem, in this paper, a new method of Martial arts action structure analysis and reconstruction based on computer ...
详细信息
ISBN:
(纸本)9781728131290
The reference of Martial arts action structure to assist the artificial correction is poor, for this problem, in this paper, a new method of Martial arts action structure analysis and reconstruction based on computer numerical simulation is proposed. Firstly, three-dimensional visualimage information acquisition of martial arts is carried out, and adaptive threshold decomposition and wavelet analysis are used to perform noise reduction pretreatment. The original domain feature point library is formed by the original separation of the domain feature points of the martial arts action structure image. Then, the edge contour feature extraction method is used to extract the contour features of the martial arts in the image, input into the expert system of the orthodontics, carry on the visual analysis and correction, and realize the Martial arts action structure analysis and the feature reconstruction identification. The results of computer simulation show that this method can be used to reconstruct the martial arts structure and improve the quantitative analysis ability of martial arts. The signal-to-noise ratio of the output image is high, and the recognition performance of the imageprocessing and motion is better.
As an effective solution to alleviating the insufficiency of labeled data for indoor positioning, deep semi-supervised learning (DSSL) can be employed to lessen the dependency on labeled data by exploiting potential p...
As an effective solution to alleviating the insufficiency of labeled data for indoor positioning, deep semi-supervised learning (DSSL) can be employed to lessen the dependency on labeled data by exploiting potential patterns in unlabeled samples. Inspired by the inherent similarity between imageprocessing and indoor positioning, and the efficiency of the consistency regularization method for image classification, we propose an adapted mean teacher (AMT) model under the DSSL paradigm for indoor positioning by using channel impulse response. For enhancing the generalization of the trained model, we design an efficient implicit augmentation scheme for the training process in AMT model. In addition, we design a special residual network to efficiently extract location characteristics in the AMT framework. We conduct extensive simulation experiments for indoor scenarios with a heavy non-line-of-sight condition to demonstrate the effectiveness of our proposed AMT model. Numerical results illustrate that AMT model outperforms a number of consistency regularization methods and the pseudo-label method, in terms of accuracy and convergence.
Infrared small target detection (IRSTD) is an important subject in many fields such as real-time monitoring and drone applications. Drones typically require real-time processing and transmission of infrared image data...
详细信息
ISBN:
(数字)9798350396034
ISBN:
(纸本)9798350396041
Infrared small target detection (IRSTD) is an important subject in many fields such as real-time monitoring and drone applications. Drones typically require real-time processing and transmission of infrared image data during flight. Lightweight models can reduce computational and communication burdens. While existing methods perform target detection effectively, resource-intensive demands hinder remote real-time detection. To provide high-performance target detection and analysis in resource-constrained environments, we propose a cross-sensing compression network (CSC-Net) for IRSTD in this paper. Specifically, the compression mixer module (CMM) enhances the hybrid scheme’s adaptability to different locations by exploring both semantic and spatial dimensions, thus improving image association flexibility. The filter perception module (FPM) dynamically adjusts input contributions for enhanced cross-sensing, emphasizing the benefits of multilayered semantics. Experiments on SIRST and IRSTD-1K show that the proposed CSC-Net can achieve accurate target location and segmentation. In particular, its visual effects and resource consumption can meet the remote real-time monitoring scenario.
Backpropagation-based supervised learning has achieved great success in computer vision tasks. However, its biological plausibility is always controversial. Recently, the bioinspired Hebbian learning rule (HLR) has re...
详细信息
Backpropagation-based supervised learning has achieved great success in computer vision tasks. However, its biological plausibility is always controversial. Recently, the bioinspired Hebbian learning rule (HLR) has received extensive attention. Self-Organizing Map (SOM) uses the competitive HLR to establish connections between neurons, obtaining visual features in an unsupervised way. Although the representation of SOM neurons shows some brain-like characteristics, it is still quite different from the neuron representation in the human visual cortex. This paper proposes an improved SOM with multi-winner, multi-code, and local receptive field, named mlSOM. We observe that the neuron representation of mlSOM is similar to the human visual cortex. Furthermore, mlSOM shows a sparse distributed representation of objects, which has also been found in the human inferior temporal area. In addition, experiments show that mlSOM achieves better classification accuracy than the original SOM and other state-of-the-art HLR-based methods. The code is accessible at https://***/JiaHongZ/mlSOM.
Predicting the critical ionospheric frequency of F2 layer (foF2) could provide guidance for satellite navigation and high frequency communications frequency selection. Models based on deep learning have been proven to...
详细信息
ISBN:
(数字)9798350350920
ISBN:
(纸本)9798350350937
Predicting the critical ionospheric frequency of F2 layer (foF2) could provide guidance for satellite navigation and high frequency communications frequency selection. Models based on deep learning have been proven to forecast ionospheric variations effectively. In this letter, we propose a Seq2Seq model with long short-term memory and attention mechanism (Seq2Seq-LSTM-Attention), aiming at predicting the foF2 parameter more accurately. The training and testing of the foF2 measurements from Wuhan, China (30.6°N, 114.3°E) show that the proposed Seq2Seq-LSTM-Attention model can effectively capture the correlation in the foF2 sequences and outperforms several existing cutting-edge deep learning based models in prediction accuracy. The results confirm that the proposed model is able to mine the potential relationships in the foF2 sequences more deeply.
This research paper presents the implementation and evaluation of a Total Variation (TV) layer within a deep learning framework for image denoising tasks. The TV layer is based on Chambolle’s projection method and ai...
This research paper presents the implementation and evaluation of a Total Variation (TV) layer within a deep learning framework for image denoising tasks. The TV layer is based on Chambolle’s projection method and aims to enhance the performance of a baseline denoising network. The study investigates the effectiveness of incorporating a single TV denoising layer, which adds only six parameters to an arbitrary baseline autoencoder. The resulting models are evaluated across three datasets with different Gaussian noise. By comparing an arbitrary network architecture with an identical counterpart model that incorporates the TV denoising layer, the results indicate significant improvements in performance. The magnitude of these performance gains diminishes as the variance of noise increases. Notably, for lower Gaussian noise variance such as for a standard deviation of 25, the gain in terms of the mean peak signal-to-noise ratio (PSNR) metric is estimated to be approximately +0.89 dB, a discernible improvement that is visually perceptible. Although the resulting model has no significant increase in terms of model size, it does present significant increases in complexity.
暂无评论