BNF higher resolution downloads ?
Moderator: kcleung
BNF higher resolution downloads ?
Hi !
I'm using the BNF database for uploading scans to IMSLP, but Carolus told me that the resolution isn't good enough. Is there a way to download the files in higher quality ? Perhaps it is because I don't use PDFArchitect properly ?
Thanks ! (Sorry for the eventual grammar mistakes.)
I'm using the BNF database for uploading scans to IMSLP, but Carolus told me that the resolution isn't good enough. Is there a way to download the files in higher quality ? Perhaps it is because I don't use PDFArchitect properly ?
Thanks ! (Sorry for the eventual grammar mistakes.)
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Hi Grisou,
Kalliwoda covered a method for doing this in this thread: Acquiring scans from Gallica/BNF.
To summarize, the PDF files from BNF give you images that are at best 90 dpi. Looking at the example of Gutmann's Conte de soir, Op.50 that you uploaded the images within that PDF are approximately 1025 by 1247 pixels. The score's original size is given as 35cm (around 13 3/4 inches). So 1247/13.75 = 90 dpi.
With BNF's zoom viewer you can get the url for each page of the score and make some modifications which result in this:
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
btv1b52000301j = identifies this as Gutman's Op.50
f1 = identifies this as page one so changing this to f2 gives you page 2, etc.
l=5 is the current zoom level. 6 is the highest
r=0,0,2236,2236 = this tells what part of the image to display. 0,0 is the upper left corner and 2236,2236 tells it to display 2236 pixels to the right and 2236 pixels down from the upper left corner. 2236 is the largest number of pixels that BNF will allow to be displayed.
At zoom level 5 for this particular score you get the entire image which is 1713 by 2164 pixels. This is around 150 dpi. Some scores at level 5 will get cropped.
At zoom level 6 this http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236 will give you a cropped image but you can download 4 images (2 columns, 2 rows) for the entire page like this:
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
Then you can create a blank image in Gimp or Photoshop and copy/paste these images into it butting them up together. The result is an image for the page that is 3422 by 4322 pixels which is around 300 dpi. Some of the scores that I have been working with would require 6 images (3 columns, 3 rows) per page. Actually, the method I use to get BNF's level 6 images is requiring the download of 63 images per page and while the process of downloading and stitching the images together is more automated it isn't without its pitfalls and also requires a great deal of preparation work. For short works like Gutman's 6 page Op.50 the 4-image-per-page manual download and stitching method isn't too burdensome and of course the 150 dpi images you can get easier than the 300 dpi images and those are a significant improvement over the 90 dpi images.
Hope this proves helpful,
Cypressdome
Kalliwoda covered a method for doing this in this thread: Acquiring scans from Gallica/BNF.
To summarize, the PDF files from BNF give you images that are at best 90 dpi. Looking at the example of Gutmann's Conte de soir, Op.50 that you uploaded the images within that PDF are approximately 1025 by 1247 pixels. The score's original size is given as 35cm (around 13 3/4 inches). So 1247/13.75 = 90 dpi.
With BNF's zoom viewer you can get the url for each page of the score and make some modifications which result in this:
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
btv1b52000301j = identifies this as Gutman's Op.50
f1 = identifies this as page one so changing this to f2 gives you page 2, etc.
l=5 is the current zoom level. 6 is the highest
r=0,0,2236,2236 = this tells what part of the image to display. 0,0 is the upper left corner and 2236,2236 tells it to display 2236 pixels to the right and 2236 pixels down from the upper left corner. 2236 is the largest number of pixels that BNF will allow to be displayed.
At zoom level 5 for this particular score you get the entire image which is 1713 by 2164 pixels. This is around 150 dpi. Some scores at level 5 will get cropped.
At zoom level 6 this http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236 will give you a cropped image but you can download 4 images (2 columns, 2 rows) for the entire page like this:
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
http://gallica.bnf.fr/proxy?method=R&ar ... ,2236,2236
Then you can create a blank image in Gimp or Photoshop and copy/paste these images into it butting them up together. The result is an image for the page that is 3422 by 4322 pixels which is around 300 dpi. Some of the scores that I have been working with would require 6 images (3 columns, 3 rows) per page. Actually, the method I use to get BNF's level 6 images is requiring the download of 63 images per page and while the process of downloading and stitching the images together is more automated it isn't without its pitfalls and also requires a great deal of preparation work. For short works like Gutman's 6 page Op.50 the 4-image-per-page manual download and stitching method isn't too burdensome and of course the 150 dpi images you can get easier than the 300 dpi images and those are a significant improvement over the 90 dpi images.
Hope this proves helpful,
Cypressdome
-
- active poster
- Posts: 1558
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: BNF higher resolution downloads ?
The French site http://www.actualitte.com/ has many articles about BNF and digitization. To summarize them, don't expect any new BNF high resolution scans in the next 10 years!
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Coulonnus,
As someone whose ability to read French is based upon Google Translate and my two years of study at a third-rate high school over twenty years ago could you perhaps elaborate on what is taking place at BNF to cause this?
Thanks,
Cypressdome
As someone whose ability to read French is based upon Google Translate and my two years of study at a third-rate high school over twenty years ago could you perhaps elaborate on what is taking place at BNF to cause this?
Thanks,
Cypressdome
-
- active poster
- Posts: 1558
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: BNF higher resolution downloads ?
The main article http://www.actualitte.com/bibliotheques ... -40048.htm says that some firms will continue the digitization job. You will be able to see this scans for free if you visit the BNF. (I don't know what the printing conditions will be). Otherwise the access will not be free, even for other major libraries in the world.
This article also contains a few comparisons in English with other libraries in the world.
http://www.actualitte.com/tribunes/bnf- ... n-1916.htm says that 95% of those scans won't be online for 10 years. The remaining 5% will join Gallica, the online section of BNF.
Bruno Racine, director of BNF, wrote an article in << Le Monde >> : http://www.lemonde.fr/idees/article/201 ... _3232.html There are so many documents to scan that we need private help. We are not making PD material something private. These scans will be something new. The profits will be used to digitize more documents.
I can't summarize all other articles. A "BNF" search on this site gives many results!
This article also contains a few comparisons in English with other libraries in the world.
http://www.actualitte.com/tribunes/bnf- ... n-1916.htm says that 95% of those scans won't be online for 10 years. The remaining 5% will join Gallica, the online section of BNF.
Bruno Racine, director of BNF, wrote an article in << Le Monde >> : http://www.lemonde.fr/idees/article/201 ... _3232.html There are so many documents to scan that we need private help. We are not making PD material something private. These scans will be something new. The profits will be used to digitize more documents.
I can't summarize all other articles. A "BNF" search on this site gives many results!
Re: BNF higher resolution downloads ?
Hey:
I've created a script (very crude indeed) to download high quality scans from BNF out of a plurality of tiles. It's a python script. I've used python since it's comprehensible for me (I'm a C programmer) and as a lilypond user in a windows box (I suppose many of you are lilypond users) Python it's already installed with your lilypond installation. If you're running a linux box, then you'll probably have it installed along with the other necessary tools anyways. In any case please check.
So, what the script does is, to download a bunch of jpg tiles (being the page division user defined) and then assemble it to individual high quality pages. The tile size
is set to the resulting size of a 4X4 tiles per page as absolute (and default) minimal (you can increase the page division if you wish)
What you need to run it is: Python installed, with the paths correctly set, cURL installed with the path correctly set, and Imagemagick installed with the paths correctly set.
check:
http://www.python.org/
http://curl.haxx.se/
http://www.imagemagick.org
To make it work you need first to open a directory for your piece . This is mandatory by common sense, since there will be heavy file activity and the script will delete all the downloaded tiles once it assembled the pages (it will issue the command del PAGE* so, be warned)
Then open the console and CD to your piece's directory . In a web browser, go to the gallica site and go for the document of your choice. select the max possible document zoom and point your mouse to the far bottom right zone of the image (you may need to drag the document scan to make viewable that part of the scan). Once there, right click the said image/tile and click "image properties" (or so, depending on your browser) option. In firefox a popup will apear with a link like this
http://gallica.bnf.fr/proxy?method=R&ar ... 08,256,256 Briefly said, this command says "Hey gallica, put yerself in zoom 6 and gimme' a tile of 256X256 pixels from the Y=6144 and X=4608 coord' "
As, from some time on we can not do the trick o requesting a tile of 6144X4608 from the coordinates 0,0 anymore (as cypressdome says, it' was limited to a smaller area) we need a script that automates this task for us, and access the hi definition zoom.
Now we copy this address, write down the document pages quantity (26 in this case) and write to the command line (the double quotes are mandatory here):
That's it!! for this document in particular, it took +-30 minutes to download and assemble all (more than 400 tiles) and get a directory with 26 jpg's of +-4mB each. My internet connection is rather very slow, but I think that with connections of better bandwidth and speed the download will be more brief and expeditive.
Here is the script. Test it with few pages, and if you need more options type getbnf -h (or ask me here!)
I hope it's of some usefulness for the comunity
Cheers
Nachus
I've created a script (very crude indeed) to download high quality scans from BNF out of a plurality of tiles. It's a python script. I've used python since it's comprehensible for me (I'm a C programmer) and as a lilypond user in a windows box (I suppose many of you are lilypond users) Python it's already installed with your lilypond installation. If you're running a linux box, then you'll probably have it installed along with the other necessary tools anyways. In any case please check.
So, what the script does is, to download a bunch of jpg tiles (being the page division user defined) and then assemble it to individual high quality pages. The tile size
is set to the resulting size of a 4X4 tiles per page as absolute (and default) minimal (you can increase the page division if you wish)
What you need to run it is: Python installed, with the paths correctly set, cURL installed with the path correctly set, and Imagemagick installed with the paths correctly set.
check:
http://www.python.org/
http://curl.haxx.se/
http://www.imagemagick.org
To make it work you need first to open a directory for your piece . This is mandatory by common sense, since there will be heavy file activity and the script will delete all the downloaded tiles once it assembled the pages (it will issue the command del PAGE* so, be warned)
Then open the console and CD to your piece's directory . In a web browser, go to the gallica site and go for the document of your choice. select the max possible document zoom and point your mouse to the far bottom right zone of the image (you may need to drag the document scan to make viewable that part of the scan). Once there, right click the said image/tile and click "image properties" (or so, depending on your browser) option. In firefox a popup will apear with a link like this
http://gallica.bnf.fr/proxy?method=R&ar ... 08,256,256 Briefly said, this command says "Hey gallica, put yerself in zoom 6 and gimme' a tile of 256X256 pixels from the Y=6144 and X=4608 coord' "
As, from some time on we can not do the trick o requesting a tile of 6144X4608 from the coordinates 0,0 anymore (as cypressdome says, it' was limited to a smaller area) we need a script that automates this task for us, and access the hi definition zoom.
Now we copy this address, write down the document pages quantity (26 in this case) and write to the command line (the double quotes are mandatory here):
Code: Select all
getbnf -a "http://gallica.bnf.fr/proxy?method=R&ark=btv1b9009896r.f1&l=6&r=6144,4608,256,256" -p26 -o "cambiniduos2va2bk_"
Here is the script. Test it with few pages, and if you need more options type getbnf -h (or ask me here!)
I hope it's of some usefulness for the comunity
Cheers
Nachus
- Attachments
-
- getbnf.rar
- BNF high res document downloader
- (2.59 KiB) Downloaded 1099 times
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Hi nachus001!
If I can get this to work I will be unbelievably happy! I can only guess that I have some type of path/environment variable issue as when I run the script I get this error message: "python: can't open file 'getbnf.py': [Errno 2] No such file or directory". I'm running Windows 7 and have run Python scripts in the past to grab images from Hathi Trust and to images displayed using Zoomify but in both cases it was run from Python's own directory. I've got Python, Imagemagick, and curl in my %PATH% environment variable, have PYTHONPATH set, and have the appropriate registry entries under the subkey HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\3.2\PythonPath. Any ideas?
Thanks,
Cypressdome
If I can get this to work I will be unbelievably happy! I can only guess that I have some type of path/environment variable issue as when I run the script I get this error message: "python: can't open file 'getbnf.py': [Errno 2] No such file or directory". I'm running Windows 7 and have run Python scripts in the past to grab images from Hathi Trust and to images displayed using Zoomify but in both cases it was run from Python's own directory. I've got Python, Imagemagick, and curl in my %PATH% environment variable, have PYTHONPATH set, and have the appropriate registry entries under the subkey HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\3.2\PythonPath. Any ideas?
Thanks,
Cypressdome
Re: BNF higher resolution downloads ?
Hi cypressdome:cypressdome wrote:Hi nachus001!
If I can get this to work I will be unbelievably happy! I can only guess that I have some type of path/environment variable issue as when I run the script I get this error message: "python: can't open file 'getbnf.py': [Errno 2] No such file or directory". I'm running Windows 7 and have run Python scripts in the past to grab images from Hathi Trust and to images displayed using Zoomify but in both cases it was run from Python's own directory. I've got Python, Imagemagick, and curl in my %PATH% environment variable, have PYTHONPATH set, and have the appropriate registry entries under the subkey HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\3.2\PythonPath. Any ideas?
Thanks,
Cypressdome
I have a win xp box and what I did was to make a directory "getbnf" and add to the path ' %PATH%;D:\getbnf '
Once I did this I just type "getbnf" anywhere and the script just runs.. Maybe it's better to copy the script to the python executables directory or to any other directory you have pointed in the path. On the other side windows 7 manages the path completely different than windows xp as far as I know
regards
Nachus
Re: BNF higher resolution downloads ?
cypressdome:
I found this for win7.
http://geekswithblogs.net/renso/archive ... ows-7.aspx
they don't do anything with the registry
regards
Nachus
I found this for win7.
http://geekswithblogs.net/renso/archive ... ows-7.aspx
they don't do anything with the registry
regards
Nachus
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Thanks Nachus!
That is where I've got the path variable set. I've now downloaded and installed Python 3.3 (had been using 3.2) and have c:\python33 now listed in my path. Here's the message running the script gives me now:
At least it seems to have progressed beyond not being able to find the script. I also tried it on an old XP machine I had that still had Python 3.2 on it and it was giving me the same "no such file/directory message." That's when I went back to Win7 and installed Python 3.3
Thanks,
Cypressdome
That is where I've got the path variable set. I've now downloaded and installed Python 3.3 (had been using 3.2) and have c:\python33 now listed in my path. Here's the message running the script gives me now:
Code: Select all
G:\bnf>python getbnf.py -a "http://gallica.bnf.fr/proxy?method=R&ark=btv1b525007218.f1&l=6&r=5632,4352,256,256" -p2 -o "septmel"
File "getbnf.py", line 70
print 'write getbnf -h for further information!'
^
SyntaxError: invalid syntax
Thanks,
Cypressdome
Re: BNF higher resolution downloads ?
Cypressdome:
you don't need to run python with the script argument.
The script will run alone, as the first line is #!/usr/bin/python
write this at th console prompt
And for the error. I have the python 2.4.5 version (the one that is installed with lilypond) and it works for me. Python 3.X uses
the print() function instead of the print ' ' keyword. And print ' ' keyword occurrences in the code won't work anymore. So I have
modified the script for python 3.X and up, with the print function. It also work for me (py 2.4.5) I downloaded two pages of 4MB out of your prompt address
Tell me if there is any problem
cheers
Nachus
you don't need to run python with the script argument.
The script will run alone, as the first line is #!/usr/bin/python
write this at th console prompt
Code: Select all
getbnf_p3 -a "http://gallica.bnf.fr/proxy?method=R&ark=btv1b525007218.f1&l=6&r=5632,4352,256,256" -p2 -o "septmel"
the print() function instead of the print ' ' keyword. And print ' ' keyword occurrences in the code won't work anymore. So I have
modified the script for python 3.X and up, with the print function. It also work for me (py 2.4.5) I downloaded two pages of 4MB out of your prompt address
Tell me if there is any problem
cheers
Nachus
- Attachments
-
- getbnf_p3.rar
- BNF high res document downloader. Modified for Python 3.x boxes
- (2.6 KiB) Downloaded 978 times
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Nachus,
You are my new hero! That updated script worked like a charm. The only change I had to make was to add the "py" extension to "getbnf_p3" in the command line (probably an issue with my Windows and/or various Python installations). Over a 12mbps connection about 1/3 the way around the world from France it took about 14 minutes to download the images and stitch together 29 pages.
Many, many thanks!
Cypressdome
You are my new hero! That updated script worked like a charm. The only change I had to make was to add the "py" extension to "getbnf_p3" in the command line (probably an issue with my Windows and/or various Python installations). Over a 12mbps connection about 1/3 the way around the world from France it took about 14 minutes to download the images and stitch together 29 pages.
Many, many thanks!
Cypressdome
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Nachus,
Would there be some way to modify the script so that the output file names have the page numbers include some leading zeros? I've started to convert to black and white a 121 page score and when the system sorts the file names alphabetically the result is Filename_1, 10, 11, 12, 100, 101, etc. Even with shorter scores you have to deal with 1, 10, 11, 12,..., 19, 2, 20, 21, etc. If not, it certainly isn't a terrible burden to live with on my end.
Thanks again,
Cypressdome
Would there be some way to modify the script so that the output file names have the page numbers include some leading zeros? I've started to convert to black and white a 121 page score and when the system sorts the file names alphabetically the result is Filename_1, 10, 11, 12, 100, 101, etc. Even with shorter scores you have to deal with 1, 10, 11, 12,..., 19, 2, 20, 21, etc. If not, it certainly isn't a terrible burden to live with on my end.
Thanks again,
Cypressdome
Re: BNF higher resolution downloads ?
Here it is:
I was bothered too by that odd ordering that appears in Irfanview (because of the lack of leading zeroes). Now I corrected the output page numbering with five leading zeroes, in order to cover any document size. Here it is
I was bothered too by that odd ordering that appears in Irfanview (because of the lack of leading zeroes). Now I corrected the output page numbering with five leading zeroes, in order to cover any document size. Here it is
- Attachments
-
- getbnf_p3.rar
- BNF high res document downloader. Modified for Python 3.x boxes. Output leading zeroes corrected
- (2.62 KiB) Downloaded 1090 times
-
- active poster
- Posts: 569
- Joined: Fri Aug 27, 2010 1:10 am
- notabot: 42
- notabot2: Human
- Location: the piney woods of Florida
Re: BNF higher resolution downloads ?
Thanks Nachus that worked perfectly! Now to start transferring the many Chopin first editions they have!
Cypressdome
Cypressdome