Hi,
I want to propose a way to create better quality output and less bandwidth by converting scanned pages to vector output. Bitmap images take a lot of space and have much redundancy. By converting the outlines to a vector image, and grouping similar elements together output can be improved, and bandwidth saved by a large factor. I am not suggesting OCR, because OCR does extra semantic analysis, and is fragile. Semantic analysis would be only an optional extra, but by default the system would work without actually understanding the content. I am willing to implement such a system, and I wonder if funding would be available for something like this.
Kristof Bastiaensen
higher quality and less bandwidth by conversion to vector formats
Moderator: kcleung
Re: higher quality and less bandwidth by conversion to vector formats
This sounds interesting, i'll let our leader, Feldmahler know.
Re: higher quality and less bandwidth by conversion to vector formats
Hi Kristof,
This sounds quite interesting and yes, funding is available, but first I'll need to know a bit more about your background and how the conversion works on a more technical level. My e-mail is eguo@imslp.org, and it may also be helpful to have a Skype call at some point. Let me know.
Thanks,
Edward
This sounds quite interesting and yes, funding is available, but first I'll need to know a bit more about your background and how the conversion works on a more technical level. My e-mail is eguo@imslp.org, and it may also be helpful to have a Skype call at some point. Let me know.
Thanks,
Edward
-
- active poster
- Posts: 1558
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: higher quality and less bandwidth by conversion to vector formats
Without moving to a vector format, the information theory makes bitmap images much smaller if they represent clean figures than if they represent dirty images. A typical typeset pdf page is about 15 kB big and a decent scanned page is about 100 kB big. When I convert a typeset pdf to tif, change the page layout and reconvert the images to pdf the result is not much bigger than the original pdf. And a Henle scan is smaller than a ca.1800 scan.
Then all pdf's made with scans would be much smaller if we had an application that recognizes a staff line and replaces it with a clean staff line. Same for note stems, beams etc. Other symbols and text indications could come later (OCR).
I have already made .001% of the job with an application that deletes stains smaller than the dot of a lowercase i in indication like vivace. But don't expect a size reduction bigger than about 5% so far. See an example here http://imslp.org/wiki/Piano_Sonata_in_F ... rel_Anton)
Then all pdf's made with scans would be much smaller if we had an application that recognizes a staff line and replaces it with a clean staff line. Same for note stems, beams etc. Other symbols and text indications could come later (OCR).
I have already made .001% of the job with an application that deletes stains smaller than the dot of a lowercase i in indication like vivace. But don't expect a size reduction bigger than about 5% so far. See an example here http://imslp.org/wiki/Piano_Sonata_in_F ... rel_Anton)
-
- active poster
- Posts: 1558
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: higher quality and less bandwidth by conversion to vector formats
Also read https://en.wikipedia.org/wiki/SmartScore It converts images to MIDI and to MusicXML. I think the best bandwidth advice is: retypeset it!
Re: higher quality and less bandwidth by conversion to vector formats
I'm unclear what OP is proposing here. An implementation of an existing process, methodology, and format; a new one; or both?
Re: higher quality and less bandwidth by conversion to vector formats
I hope this idea is progressing behind the scenes. Further to better quality and reduced bandwidth (as well as reduced storage space), conversion to vector format may work as a pre-processing layer for optical music recognition programs, thus facilitating the transformation of scanned scores into files compatible with music editing software. Quite interesting, IMO.imslp wrote:Hi Kristof,
This sounds quite interesting and yes, funding is available, but first I'll need to know a bit more about your background and how the conversion works on a more technical level. My e-mail is eguo@imslp.org, and it may also be helpful to have a Skype call at some point. Let me know.
Thanks,
Edward
Max
Re: higher quality and less bandwidth by conversion to vector formats
Yep, this is progressing, will announce when the time comes.