High resolution jpgs

Advice and Help

Moderator: kcleung

Post Reply
goldberg988
active poster
Posts: 105
Joined: Sun Dec 31, 2006 5:51 am
notabot: YES
notabot2: Bot
Location: Tulsa, OK, USA

High resolution jpgs

Post by goldberg988 »

I am looking at some Andre Caplet manuscripts on BnF/Gallica. I can download jpgs using the multi scan downloader, they are about 7MB per page. I can convert to PDF using GIMP (probably not the best method) and then it's 10+MB per page. The jpgs are greyscale/monochrome and high-resolution. What is the best procedure here? Just live with huge files, since it's a manuscript? Convert to monochrome (I feel a lot of detail gets lost since they are manuscripts)? A better way to convert greyscale to PDF without losing too much detail, and reducing file size? Thanks
coulonnus
active poster
Posts: 1630
Joined: Thu Jul 12, 2007 8:53 am
notabot: 42
notabot2: Human
Location: Nice, France
Contact:

Re: High resolution jpgs

Post by coulonnus »

Please provide a specific Caplet's work and I will see what I can do.
goldberg988
active poster
Posts: 105
Joined: Sun Dec 31, 2006 5:51 am
notabot: YES
notabot2: Bot
Location: Tulsa, OK, USA

Re: High resolution jpgs

Post by goldberg988 »

Here's one. Happy if you can help; but also would still like to know how to do the process myself? Thanks

https://gallica.bnf.fr/ark:/12148/btv1b100718419/#
coulonnus
active poster
Posts: 1630
Joined: Thu Jul 12, 2007 8:53 am
notabot: 42
notabot2: Human
Location: Nice, France
Contact:

Re: High resolution jpgs

Post by coulonnus »

Look at https://imslp.org/wiki/Paroles_%C3%A0_l ... ndr%C3%A9) when it becomes available. I'll provides technical details if you like it.
goldberg988
active poster
Posts: 105
Joined: Sun Dec 31, 2006 5:51 am
notabot: YES
notabot2: Bot
Location: Tulsa, OK, USA

Re: High resolution jpgs

Post by goldberg988 »

Yes, I would love the technical details! Thanks
coulonnus
active poster
Posts: 1630
Joined: Thu Jul 12, 2007 8:53 am
notabot: 42
notabot2: Human
Location: Nice, France
Contact:

Re: High resolution jpgs

Post by coulonnus »

Well, several posts will be necessary ! Be ready to manipulate the command promp language which I may no longer call the DOS. From https://gallica.bnf.fr/ark:/12148/btv1b100718419/ click "En savoir plus".
Store the line "Identifiant : ark:/12148/btv1b100718419" somewhere

If there are less than 5 images edit a batch file called curlgallica.bat with
curl https://gallica.bnf.fr//iiif/ark:/12148/bpt6k10291515/f[1-5]/full/full/0/native.jpg -o q#1.jpg
replacing the big bpt string with that of the present gallica page.

Type curlgallica.bat into the command prompt window. This will create the 5 first images of this score.

But if there are more than 5 images the gallica server is exasperated by your requests and screws up images above No. 5

A workaround is to set a 10-s delay time between the images. Create this batch file :

set lienbnf=<your bpt number>
@FOR /L %%A IN (1,1,%1) DO (curl https://gallica.bnf.fr//iiif/ark:/12148/%lienbnf%/f[%%A-%%A]/full/full/0/native.jpg -o "p#1.jpg"
timeout 10)

replacing %1 with the number of images.

This creates a set of .jpg images, each one about 3 MB size.

I'll continue when you succeed at this stage.
coulonnus
active poster
Posts: 1630
Joined: Thu Jul 12, 2007 8:53 am
notabot: 42
notabot2: Human
Location: Nice, France
Contact:

Re: High resolution jpgs

Post by coulonnus »

Each jpg should have a size about 4000x5000
coulonnus
active poster
Posts: 1630
Joined: Thu Jul 12, 2007 8:53 am
notabot: 42
notabot2: Human
Location: Nice, France
Contact:

Re: High resolution jpgs

Post by coulonnus »

Then an important step is finding the right threshold to convert jpg images to monochrome. Select an image which has many 16th notes and natural sharp.

Run this batch: @FOR /L %%A IN (10,5,95) DO magick p%1.jpg -compress group4 -threshold %%A%% p%%A%1.tif
dir/on p??%1.tif

entering the image number.

This will provide a set of monochrome tif files with monochrome conversion thresholds 10%, 15% etc. The 2 first digit represent the theshold. Examine the tif with a 50% theshold. Are the tiny vertical lines of a natural sign visible? Aren't the gaps between the beams too filled? Find the tif with the best result. Remember the best threshold. Often it is that where the tif size begins to increase in the list.

Then read https://imslp.org/wiki/IMSLP_talk:Scanning_music_scores. Choose a procedure, reading what to install. My favorite procedure in No. 5.
Pnorcks
regular poster
Posts: 22
Joined: Tue Jun 27, 2023 4:29 am
notabot: 42
notabot2: Human

Re: High resolution jpgs

Post by Pnorcks »

Missing from that page (https://imslp.org/wiki/IMSLP_talk:Scanning_music_scores) is any mention of ScanTailor Advanced (https://github.com/ScanTailor-Advanced/ ... r-advanced), which is the primary program I use for all of my contributions.

There are a lot of steps involved in using ScanTailor Advanced, but once you understand the nuances and quirks, then it can do most of the processing interactively (rotating, splitting, deskewing, cropping, margin configuration, many options for thresholding, etc.). After creating the processed output images (either black/white or color TIFFs), I feed those images to the tool "img2pdf" (https://pypi.org/project/img2pdf/) to create the final PDF.

Specifically for a Gallica manuscript scan like this, I personally like to keep the JPGs. First, I download the highest-resolution JPGs (for Gallica, usually 300-400DPI), process them in ScanTailor Advanced (deskew, setting page boxes with identical sizes, no margins, finalize as color TIFFs possibly at the lower DPI). Then to avoid img2pdf converting the color TIFFs into PNGs, I convert them back to JPGs first (with a reasonable quality level between 75-90 or so). Finally I feed the JPGs to img2pdf.

I realize this explanation doesn't provide a step-by-step workflow (from image download, to conversion, etc.) like coulonnus is offering, but I hope this gives you an idea about alternate steps, since it's different than any of the 6 procedures listed on that page.

Edit: Added link to img2pdf project page, since there's an unrelated website with the same name
Post Reply