The structural richness of music notation leads to develop specific approaches to the problem of Optical Music Recognition (OMR). Among them, it is becoming common to formulate the output of the system as a graph stru...
详细信息
ISBN:
(纸本)9783031048814;9783031048807
The structural richness of music notation leads to develop specific approaches to the problem of Optical Music Recognition (OMR). Among them, it is becoming common to formulate the output of the system as a graph structure, where the primitives of music notation are the vertices and their syntactic relationships are modeled as edges. As an intermediate step, many works focus on locating and categorizing the symbol primitives found in the music score image using object detection approaches. However, training these models requires precise annotations of where the symbols are located. This makes it difficult to apply these approaches to new collections, as manual annotation is very costly. In this work, we study how to extract the primitives as an image-to-multiset problem, where it is not necessary to provide fine-grained information. To do this, we implement a model based on image captioning that retrieves a sequence of music primitives found in a given image. Our experiments with the MUSCIMA++ dataset demonstrate the feasibility of this approach, obtaining good results with several models, even in situations with limited annotated data.
暂无评论