Dataset for Optical Music Recognition
Posted: Sun Nov 12, 2017 1:27 am
Let me first introduce myself.
My name is Kwon-Young Choi and I'm a PhD student on the subject of Optical Music Recognition (OMR).
In my work, I am using a lot of trainable models called neural networks that need a lot of annotated data to work.
However, it currently doesn't exist any printed score OMR oriented dataset for researcher to be used.
The type of music scores I am searching for are complex, dense, noisy orchestral/piano scores.
Essentially, what I'm asking here is if you, imlsp librarians (or others), had already encounter such very complex scores, and if yes, put a little link to the score on imslp.
This dataset of about 100 scores will be public and usable by anybody, either OMR researcher or musician.
Thanks for your help!
For more precision:
* complex: the music scores should be not trivial to read. examples: polyphonic multi-voice scores, voice that jump from one staff to another, weird rare symbols, ...
* dense: high quantity of symbols in a small zones. These situations produces many segmentation problems that had bothered OMR researcher for a long time.
* noisy: time and bad scanning quality had damaged the music scores, i.e. modern music score produced by recent music editing software with perfect graphic quality is not the purpose of this dataset.
My name is Kwon-Young Choi and I'm a PhD student on the subject of Optical Music Recognition (OMR).
In my work, I am using a lot of trainable models called neural networks that need a lot of annotated data to work.
However, it currently doesn't exist any printed score OMR oriented dataset for researcher to be used.
The type of music scores I am searching for are complex, dense, noisy orchestral/piano scores.
Essentially, what I'm asking here is if you, imlsp librarians (or others), had already encounter such very complex scores, and if yes, put a little link to the score on imslp.
This dataset of about 100 scores will be public and usable by anybody, either OMR researcher or musician.
Thanks for your help!
For more precision:
* complex: the music scores should be not trivial to read. examples: polyphonic multi-voice scores, voice that jump from one staff to another, weird rare symbols, ...
* dense: high quantity of symbols in a small zones. These situations produces many segmentation problems that had bothered OMR researcher for a long time.
* noisy: time and bad scanning quality had damaged the music scores, i.e. modern music score produced by recent music editing software with perfect graphic quality is not the purpose of this dataset.