State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, rec...
详细信息
ISBN:
(纸本)9783031048814;9783031048807
State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, recent advances in Deep Learning have led other research fields that process sequential data to use a new neural architecture: the Transformer, whose popularity has increased over time. In this paper, we study the application of the Transformer model to the end-to-end OMR systems. We produced several models based on all the existing approaches in this field and tested them on various corpora with different types of encodings for the output. The obtained results allow us to make an in-depth analysis of the advantages and disadvantages of applying this architecture to these systems. This discussion leads us to conclude that Transformers, as they were conceived, do not seem to be appropriate to perform end-to-end OMR, so this paper raises interesting lines of future research to get the full potential of this architecture in this field.
Simultaneous Localization and Mapping (SLAM) is used to solve the problem of autonomous localization and navigation of mobile robots in unknown environments. Loop closure detection is a key part of SLAM, which largely...
详细信息
ISBN:
(纸本)9781728192017
Simultaneous Localization and Mapping (SLAM) is used to solve the problem of autonomous localization and navigation of mobile robots in unknown environments. Loop closure detection is a key part of SLAM, which largely determines accuracy and stability of SLAM. In recent years, some experiments have proved that the loop closure detection system based on neural network is superior to the traditional loop closure detection in both accuracy and real-time performance. In this paper, we propose an adaptive real-time loop closure detection (AR-Loop) method based on monocular vision. A pre-trained convolutional neural network (CNN) is used to extract image features. Then features of different layers are concatenated as image descriptors. In addition, the adaptive candidate matching range algorithm and image-to-sequence calibration algorithm are proposed to improve the performance of the algorithm. Extensive experiments have been conducted on several open datasets to validate the performance of AR-Loop. It has been demonstrated that the recall rate is increased by over 18% compared with other state-of-the-art algorithms when the precision is 100%.
暂无评论