Page 2 of 5

Posted: Sun Nov 02, 2008 9:11 pm
by kcleung
Carolus wrote:I wonder if we should consider setting up an FTP server where unlocked score files with logos present could be stored for processing. That way, with several people working together, a considerable number of titles could be processed using some of the methods outline above and ultimately added to the collection.
That's a great idea!!!! We should definitely arrange ftp servers as soon as possible so that multiple people can work in these collections! This would also *lower* the threshold for potential contributors.

They would *not* need to have access to a scanner, printer nor score collection. All they need are internet access (preferably broadband) and a computer that is under four years old.


Also in this way, the money originally planned to be spent on buying the OM series ourselves can be better used in setting up the server and other required infrastructure.

Posted: Mon Nov 03, 2008 3:02 am
by Yagan Kiely
The PDF's on the CDs are Password protected, is that a problem with anyone else?

What guide would you suggest for a mac user?

Posted: Mon Nov 03, 2008 3:55 am
by ras1
Open with Preview, then do File->Print. Select PDF->Save as PDF. This unlocked it for me, on Mac OS 10.4.

Posted: Mon Nov 03, 2008 4:02 am
by Yagan Kiely
Yes, I did that too... very slow however...

Posted: Mon Nov 03, 2008 5:27 am
by Carolus
The PDF locking can be easily "picked." That much is an entirely 'automated' process with the correct software (I use PDF Key Pro on my Mac). I've already asked Feldmahler about setting up an FTP site for this purpose. The files I will be uploading will already be unlocked anyway.

The processing as described by Daphnis would take care of the embedded meta-tags, etc. in addition to the more obvious stripping of logos and trademarks. It would really be great if a similar process could be developed for the Google scans - of which there are quite a few available. The Google items have been done with a very bizarre process. Most of the pages are a nice 600dpi monochrome, but every so often a single system or sometimes a half-page appears in 150dpi grayscale as a seperate graphic on the page.

Google was embedding their logo as a watermark on some of the scores (but not the later ones, it appears). Microsoft is even worse on the scans they've done for the New York Public Library. As it stands right now, the logos, extensive meta-tags, etc. embedded in their scans renders them unsuitable for IMSLP.

Posted: Mon Nov 03, 2008 12:03 pm
by Lyle Neff
Can you remove meta-tags in Acrobat 6.0? If so, how?

Posted: Mon Nov 03, 2008 6:51 pm
by kcleung
Just use the method I mentioned a bit earlier in this thread (you also need to install ghostscript for windows) and it will only take the necessary bits and leave all the metadata behind.

the password only prevents you from changing the data (and reading metadata) but *not* printing :)

I tried one of the files in the CD and it works. It only take me 8 minutes to strip a 50-page document.

Posted: Mon Nov 03, 2008 7:50 pm
by ras1
Let me know if you're planning on setting up a server to remove logos - I have the Ravel/Elgar/etc. Violin one and no time to do it myself.

Posted: Mon Nov 03, 2008 8:27 pm
by kcleung
We should talk to Feldmahler (or others in the central admin) to set up an ftp server for all PDF files "infected" with logos urgently. Since this server will target CDSM, which release items PD in USA, perhaps the server should also be in USA. Then contributors can upload infected files or check out the entries to strip the logos.

Posted: Mon Nov 03, 2008 8:48 pm
by Lyle Neff
kcleung wrote:Just use the method I mentioned a bit earlier in this thread (you also need to install ghostscript for windows) and it will only take the necessary bits and leave all the metadata behind.

the password only prevents you from changing the data (and reading metadata) but *not* printing :)

I tried one of the files in the CD and it works. It only take me 8 minutes to strip a 50-page document.
Could you list briefly the steps all in one place? (Ghostscript is installed on my computer, but I haven't used it myself in years.)

Stripping logos off PDF files in Linux, Mac and possibly win

Posted: Mon Nov 03, 2008 9:37 pm
by kcleung
Requirements:

ghostscript
an image editing tool (e.g. gimp)
tiffcp: http://www.stillhq.com/pngtools/
tiff2pdf: http://www.libtiff.org/tools.html

Under Linux, all the software mentioned above are pre-packaged and I believe that they should be readily available in mac. But you may have to compile them for windows.

To perform batch jobs like this, the most cost effective way is to run Linux!!!!!

Steps:

1. Put each parts file in its own subdirectory and change to the subdirectory of one of the parts

2. run as one line:

gs -sDEVICE=tiffg4 -dNOPAUSE -r300 -dBATCH -sPAPERSIZE=a4 -sOutputFile=output_%04d.tiff foo.pdf

(foo represents the file name) This would open up the pdf file and perform all the necessary conversions (from pdf to 300dpi 1-bit BW A4 sized tiff images)

3. Run GIMP, in the "open" window, go to the subdirectory, highlight 5-10 tiff files at a time and click "open". This way reduces numbers of required mouse activities (the limiting factor for processing speed)

4. Erase the logo with the eraser and close the file, make sure you click "save" when it asks you whether you want to save.

You only have to set up the eraser once and from now on, it takes *three clicks* (including the eraser action) to process each page, thus decreasing processing time of each file to 5 seconds! Smile

5. at command prompt, change to the subdirectory and concatenate all processed images by running:
tiffcp -c g4 *.tiff output.tiff

6. finally we convert output.tiff back to pdf and send the pdf file back to the parent directory by:
tiff2pdf output.tiff > ../foo.pdf

Posted: Tue Nov 04, 2008 12:29 am
by Yagan Kiely
4. Erase the logo with the eraser and close the file, make sure you click "save" when it asks you whether you want to save.

You only have to set up the eraser once and from now on, it takes *three clicks* (including the eraser action) to process each page, thus decreasing processing time of each file to 5 seconds! Smile
I only got GIMP 5 days ago. How would I do this?

Posted: Tue Nov 04, 2008 10:03 am
by kcleung
Yagan Kiely wrote:
4. Erase the logo with the eraser and close the file, make sure you click "save" when it asks you whether you want to save.

You only have to set up the eraser once and from now on, it takes *three clicks* (including the eraser action) to process each page, thus decreasing processing time of each file to 5 seconds! Smile
I only got GIMP 5 days ago. How would I do this?
Assume you use gimp 2.6, first you open the files as described previously. After files are opened, go to the toolbox pane, there is an eraser at the left column 5th from up-down. Click the eraser, then set the brush to the right size and you can now start erasing stuff on the image! If you make a mistake, on the menu of the image window, go to edit->undo and you are out of trouble!

After you are happy with the image, you can close the image. It will ask you whether to save, click "yes".

Posted: Tue Nov 04, 2008 12:45 pm
by Yagan Kiely
Ooh I thought there was a way to save the key/mouse strokes/clicks so as to only do something once and to apply it to each image therein.

I know how to do that! I'm not that retarded! Honestly!...

Posted: Tue Nov 04, 2008 6:46 pm
by kcleung
Yagan Kiely wrote:Ooh I thought there was a way to save the key/mouse strokes/clicks so as to only do something once and to apply it to each image therein.

I know how to do that! I'm not that retarded! Honestly!...
the trouble is that logos can be in different positions in different pages, so you really can't automate the eraser action, although once you set the eraser as the chosen tool and set the brush size, the system will remember these settings and for each pages, you just do

erase -> close -> save

three clicks :)