High resolution jpgs
Moderator: kcleung
-
goldberg988
- active poster
- Posts: 105
- Joined: Sun Dec 31, 2006 5:51 am
- notabot: YES
- notabot2: Bot
- Location: Tulsa, OK, USA
High resolution jpgs
I am looking at some Andre Caplet manuscripts on BnF/Gallica. I can download jpgs using the multi scan downloader, they are about 7MB per page. I can convert to PDF using GIMP (probably not the best method) and then it's 10+MB per page. The jpgs are greyscale/monochrome and high-resolution. What is the best procedure here? Just live with huge files, since it's a manuscript? Convert to monochrome (I feel a lot of detail gets lost since they are manuscripts)? A better way to convert greyscale to PDF without losing too much detail, and reducing file size? Thanks
-
coulonnus
- active poster
- Posts: 1630
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: High resolution jpgs
Please provide a specific Caplet's work and I will see what I can do.
-
goldberg988
- active poster
- Posts: 105
- Joined: Sun Dec 31, 2006 5:51 am
- notabot: YES
- notabot2: Bot
- Location: Tulsa, OK, USA
Re: High resolution jpgs
Here's one. Happy if you can help; but also would still like to know how to do the process myself? Thanks
https://gallica.bnf.fr/ark:/12148/btv1b100718419/#
https://gallica.bnf.fr/ark:/12148/btv1b100718419/#
-
coulonnus
- active poster
- Posts: 1630
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: High resolution jpgs
Look at https://imslp.org/wiki/Paroles_%C3%A0_l ... ndr%C3%A9) when it becomes available. I'll provides technical details if you like it.
-
goldberg988
- active poster
- Posts: 105
- Joined: Sun Dec 31, 2006 5:51 am
- notabot: YES
- notabot2: Bot
- Location: Tulsa, OK, USA
Re: High resolution jpgs
Yes, I would love the technical details! Thanks
-
coulonnus
- active poster
- Posts: 1630
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: High resolution jpgs
Well, several posts will be necessary ! Be ready to manipulate the command promp language which I may no longer call the DOS. From https://gallica.bnf.fr/ark:/12148/btv1b100718419/ click "En savoir plus".
Store the line "Identifiant : ark:/12148/btv1b100718419" somewhere
If there are less than 5 images edit a batch file called curlgallica.bat with
curl https://gallica.bnf.fr//iiif/ark:/12148/bpt6k10291515/f[1-5]/full/full/0/native.jpg -o q#1.jpg
replacing the big bpt string with that of the present gallica page.
Type curlgallica.bat into the command prompt window. This will create the 5 first images of this score.
But if there are more than 5 images the gallica server is exasperated by your requests and screws up images above No. 5
A workaround is to set a 10-s delay time between the images. Create this batch file :
set lienbnf=<your bpt number>
@FOR /L %%A IN (1,1,%1) DO (curl https://gallica.bnf.fr//iiif/ark:/12148/%lienbnf%/f[%%A-%%A]/full/full/0/native.jpg -o "p#1.jpg"
timeout 10)
replacing %1 with the number of images.
This creates a set of .jpg images, each one about 3 MB size.
I'll continue when you succeed at this stage.
Store the line "Identifiant : ark:/12148/btv1b100718419" somewhere
If there are less than 5 images edit a batch file called curlgallica.bat with
curl https://gallica.bnf.fr//iiif/ark:/12148/bpt6k10291515/f[1-5]/full/full/0/native.jpg -o q#1.jpg
replacing the big bpt string with that of the present gallica page.
Type curlgallica.bat into the command prompt window. This will create the 5 first images of this score.
But if there are more than 5 images the gallica server is exasperated by your requests and screws up images above No. 5
A workaround is to set a 10-s delay time between the images. Create this batch file :
set lienbnf=<your bpt number>
@FOR /L %%A IN (1,1,%1) DO (curl https://gallica.bnf.fr//iiif/ark:/12148/%lienbnf%/f[%%A-%%A]/full/full/0/native.jpg -o "p#1.jpg"
timeout 10)
replacing %1 with the number of images.
This creates a set of .jpg images, each one about 3 MB size.
I'll continue when you succeed at this stage.
-
coulonnus
- active poster
- Posts: 1630
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: High resolution jpgs
Each jpg should have a size about 4000x5000
-
coulonnus
- active poster
- Posts: 1630
- Joined: Thu Jul 12, 2007 8:53 am
- notabot: 42
- notabot2: Human
- Location: Nice, France
- Contact:
Re: High resolution jpgs
Then an important step is finding the right threshold to convert jpg images to monochrome. Select an image which has many 16th notes and natural sharp.
Run this batch: @FOR /L %%A IN (10,5,95) DO magick p%1.jpg -compress group4 -threshold %%A%% p%%A%1.tif
dir/on p??%1.tif
entering the image number.
This will provide a set of monochrome tif files with monochrome conversion thresholds 10%, 15% etc. The 2 first digit represent the theshold. Examine the tif with a 50% theshold. Are the tiny vertical lines of a natural sign visible? Aren't the gaps between the beams too filled? Find the tif with the best result. Remember the best threshold. Often it is that where the tif size begins to increase in the list.
Then read https://imslp.org/wiki/IMSLP_talk:Scanning_music_scores. Choose a procedure, reading what to install. My favorite procedure in No. 5.
Run this batch: @FOR /L %%A IN (10,5,95) DO magick p%1.jpg -compress group4 -threshold %%A%% p%%A%1.tif
dir/on p??%1.tif
entering the image number.
This will provide a set of monochrome tif files with monochrome conversion thresholds 10%, 15% etc. The 2 first digit represent the theshold. Examine the tif with a 50% theshold. Are the tiny vertical lines of a natural sign visible? Aren't the gaps between the beams too filled? Find the tif with the best result. Remember the best threshold. Often it is that where the tif size begins to increase in the list.
Then read https://imslp.org/wiki/IMSLP_talk:Scanning_music_scores. Choose a procedure, reading what to install. My favorite procedure in No. 5.
Re: High resolution jpgs
Missing from that page (https://imslp.org/wiki/IMSLP_talk:Scanning_music_scores) is any mention of ScanTailor Advanced (https://github.com/ScanTailor-Advanced/ ... r-advanced), which is the primary program I use for all of my contributions.
There are a lot of steps involved in using ScanTailor Advanced, but once you understand the nuances and quirks, then it can do most of the processing interactively (rotating, splitting, deskewing, cropping, margin configuration, many options for thresholding, etc.). After creating the processed output images (either black/white or color TIFFs), I feed those images to the tool "img2pdf" (https://pypi.org/project/img2pdf/) to create the final PDF.
Specifically for a Gallica manuscript scan like this, I personally like to keep the JPGs. First, I download the highest-resolution JPGs (for Gallica, usually 300-400DPI), process them in ScanTailor Advanced (deskew, setting page boxes with identical sizes, no margins, finalize as color TIFFs possibly at the lower DPI). Then to avoid img2pdf converting the color TIFFs into PNGs, I convert them back to JPGs first (with a reasonable quality level between 75-90 or so). Finally I feed the JPGs to img2pdf.
I realize this explanation doesn't provide a step-by-step workflow (from image download, to conversion, etc.) like coulonnus is offering, but I hope this gives you an idea about alternate steps, since it's different than any of the 6 procedures listed on that page.
Edit: Added link to img2pdf project page, since there's an unrelated website with the same name
There are a lot of steps involved in using ScanTailor Advanced, but once you understand the nuances and quirks, then it can do most of the processing interactively (rotating, splitting, deskewing, cropping, margin configuration, many options for thresholding, etc.). After creating the processed output images (either black/white or color TIFFs), I feed those images to the tool "img2pdf" (https://pypi.org/project/img2pdf/) to create the final PDF.
Specifically for a Gallica manuscript scan like this, I personally like to keep the JPGs. First, I download the highest-resolution JPGs (for Gallica, usually 300-400DPI), process them in ScanTailor Advanced (deskew, setting page boxes with identical sizes, no margins, finalize as color TIFFs possibly at the lower DPI). Then to avoid img2pdf converting the color TIFFs into PNGs, I convert them back to JPGs first (with a reasonable quality level between 75-90 or so). Finally I feed the JPGs to img2pdf.
I realize this explanation doesn't provide a step-by-step workflow (from image download, to conversion, etc.) like coulonnus is offering, but I hope this gives you an idea about alternate steps, since it's different than any of the 6 procedures listed on that page.
Edit: Added link to img2pdf project page, since there's an unrelated website with the same name