[BUG] Faulty IMSLP/WIMA redirections

Moderators: kcleung, Wiki Admins

reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

[BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

In my talk page WIMA project participant Jeko89 reports errors with some IMSLP/WIMA redirections
I've completed, after many uploads, the transfer of all files by Simone Stella and Fritz Brodersen.
I've tried to check if the automatic redirects from WIMA to IMSLP works with all files... but it doesn't!!
Some examples?
For Brodersen:
It seems to me a random behaviour. It's surely not but....What is the motivation? What have I done wrong?
The answer to Jeko89's final question is that he did nothing wrong. I've noticed several such errors myself. Jeko89's message causes me to raise the problem in this forum. I guess there is some subtle bug(s) in the web script that generates the WIMA upload log file
imslp
Site Admin
Posts: 1642
Joined: Thu Jan 01, 1970 12:00 am

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by imslp »

Actually all of those links look fine to me in Firefox (they redirect to the right file). I'm guessing that this might be a browser issue. Can you send me some screenshots?

Also, could you copy and paste the actual links that are not working, so that I can be sure we are talking about the same thing.
jeko89
Posts: 5
Joined: Tue Oct 04, 2011 12:23 pm
notabot: 42
notabot2: Human

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by jeko89 »

Hello, I raised this problem yesterday.
The problem isn't solved: try with the Missa da pacem (uploaded by Brodersen) at this link http://icking-music-archive.org/ByComposer/Despres.php:
- the Credo file is correctly linked to IMSLP
- the Credo: Cantus part file isn't linked to IMSLP
It seems to me that the problem doesn't relate to the URLs because thy're quite simile.
It's a challenging bug!
P.S: it has happened to many files that I uploaded (~5% of the total)

Cheers, Giacomo.
reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

I started out preparing a perl script searching the WIMA/IMSLP transfer log file for not working redirection URLs. The perl script extracts the IMSLP URLs from the log file and tests their validity by calling one of the *nix utilities 'wget' or 'curl'.

This script has turned out to be definitely no good: the outcome is that the IP address of my home pc got banned for 'ripping'. For example if I attempt to access one of the redirection URLS, ''http://imslp.org/wiki/Special:ReverseLookup/141168" I get this error message:
You have reached this message because the site ripping ban script has been triggered. Site ripping is forbidden; repeated offenders will be banned indefinitely ...
Which utility can I use for checking working URLs when I loop through the transfer log file?
imslp
Site Admin
Posts: 1642
Joined: Thu Jan 01, 1970 12:00 am

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by imslp »

Before I answer anything else, I first want to make sure I understand what exactly the problem is. Is the uploads not registering in wimaredirects.txt? Is Special:ReverseLookup not giving the correct destination? Or is there some other problem? I cannot give a response unless I first know what the problem is (and I cannot seem to replicate it from the details given in this thread).
Choralia
Site Admin
Posts: 766
Joined: Fri Aug 28, 2009 9:08 pm
notabot: 42
notabot2: Human

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by Choralia »

imslp wrote:Is the uploads not registering in wimaredirects.txt?
I think this is the case. I looked for the "Cantus" file of the "Credo" of "Missa da pacem", as suggested by Reccmo. Specifically, I searched inside wimaredirects.txt for the WIMA filename "3_2_Credo_Cantus.pdf", with no success. So it seems that this upolad was not registered in wimaredirects.txt for some reason. I also searched for it inside the .htaccess file on the WIMA website, and obviously it was not present there either.

Max
reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

Choralia wrote:
imslp wrote:Is the uploads not registering in wimaredirects.txt?
I think this is the case. I looked for the "Cantus" file of the "Credo" of "Missa da pacem", as suggested by Reccmo. Specifically, I searched inside wimaredirects.txt for the WIMA filename "3_2_Credo_Cantus.pdf", with no success. So it seems that this upolad was not registered in wimaredirects.txt for some reason. I also searched for it inside the .htaccess file on the WIMA website, and obviously it was not present there either.
Max
Max outlines one of the error case types. Another error case type is IMSLP URLs which are recorded in wimaredirects.txt but are non-functioning. When a user accesses a WIMA file which is redirected to one of those non-working IMSLP URLs she'll get one of those bl... 404 errors.

I've tried once again, this time from the WIMA server, to test non-valid redirection URLs systematically. However, with my perl script in test mode (stops upon the 10th line in wimaredirects.txt) I still get the 'anti ripper' error message
You have reached this message because the site ripping ban script has been triggered. Site ripping is forbidden; repeated offenders will be banned indefinitely.
The logic of the perl script is really simple, just

Code: Select all

$infile = "wimaredirects.txt";
$outfile = "wimaredirect_errs.txt";

open (INFILE, "<$infile") or die "Can't open $infile\n";
open (OUTFILE, ">$outfile") or die "Can't open $outfile\n";

$i = 0;
while (<INFILE>) {
  $i++;
  chomp;
  ($skip,$wima,$imslp) = split;
  @args = ("curl","$imslp");
  system(@args) == 0
    or print "$wima\n$imslp\n$?\n\n";
  last if ($i > 10);
}
So I'm afraid I've no chance to investigate the problem systematically:-(
imslp
Site Admin
Posts: 1642
Joined: Thu Jan 01, 1970 12:00 am

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by imslp »

Regarding entries missing from wimaredirects.txt, I have found a bug where, if the submitter makes a mistake during submission and the submission page shows an error on top, wimaredirects.txt will not be updated even if subsequent submission is successful. This bug is fixable but not easily, so I'm thinking of writing an alternate redirect function that can take entire WIMA urls and redirect them to the right place (if submitted; if not it bounces back to WIMA).

Regarding 404 errors, please provide me a sample (or few) of such URLs so that I can figure out what is wrong.

Regarding the script, I would rather prefer to resolve the problem using other methods. The site ripping ban script is there precisely to prevent such bot access to IMSLP so that the server does not get overloaded for everyone else.
reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

imslp wrote:Regarding entries missing from wimaredirects.txt, I have found a bug where, if the submitter makes a mistake during submission and the submission page shows an error on top, wimaredirects.txt will not be updated even if subsequent submission is successful. This bug is fixable but not easily, so I'm thinking of writing an alternate redirect function that can take entire WIMA urls and redirect them to the right place (if submitted; if not it bounces back to WIMA).

Regarding 404 errors, please provide me a sample (or few) of such URLs so that I can figure out what is wrong.
'Redirect /scores/haendel/H312/satz_1-02_Oboe-1.pdf http://imslp.org/wiki/Special:ReverseLookup/136565'
'Redirect /scores/c.raehs/Facsimiles/XM55/13.pdf http://imslp.org/wiki/Special:ReverseLookup/141574'
imslp wrote:Regarding the script, I would rather prefer to resolve the problem using other methods. The site ripping ban script is there precisely to prevent such bot access to IMSLP so that the server does not get overloaded for everyone else.
imslp
Site Admin
Posts: 1642
Joined: Thu Jan 01, 1970 12:00 am

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by imslp »

reccmo wrote:'Redirect /scores/haendel/H312/satz_1-02_Oboe-1.pdf http://imslp.org/wiki/Special:ReverseLookup/136565'
This batch of files were removed from the page by the uploader himself because of duplication. (See http://imslp.org/index.php?title=Concer ... did=860041 ) The normal procedure is to delete the file itself afterwards, but in this case this was not done since the uploader himself removed the files. In any case, there is really nothing I can do except to bounce this file back to WIMA (which will be what happens in the new redirection system).
'Redirect /scores/c.raehs/Facsimiles/XM55/13.pdf http://imslp.org/wiki/Special:ReverseLookup/141574'
This batch of files were removed also because of duplication. (See http://imslp.org/index.php?title=Concer ... did=867907 ) Here the normal procedure was followed and the file was afterwards deleted entirely from the server (hence the different error message). In the new redirection system the file of course will be bounced back to WIMA.

Note that the only thing wimaredirects.txt does is add an entry when a file is submitted; it does nothing else after that. (And hence the new redirection system.)
imslp
Site Admin
Posts: 1642
Joined: Thu Jan 01, 1970 12:00 am

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by imslp »

You can now use http://imslp.org/index.php?title=Specia ... <urlstring>

<urlstring> is the same as the WIMA url in wimaredirects.txt (i.e. without http://icking-music-archive.org)
reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

imslp wrote:You can now use http://imslp.org/index.php?title=Specia ... <urlstring>

<urlstring> is the same as the WIMA url in wimaredirects.txt (i.e. without http://icking-music-archive.org)
Do you plan to apply that format in 'http://imslp.org/wimaredirects.txt'?
imslp
Site Admin
Posts: 1642
Joined: Thu Jan 01, 1970 12:00 am

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by imslp »

No, because that would defeat the purpose of the new page. There are flaws in wimaredirects.txt that are not fixable, so the new URL is designed to be a generic redirect URL handling all WIMA PDF files. What IMSLP has it will redirect to IMSLP, what IMSLP does not it will redirect back to WIMA. You can keep using wimaredirects.txt until the WIMA collection is fully transferred, but after that wimaredirects.txt should probably be deprecated in favor of this new URL scheme.
reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

Max has helped me substantially with providing php code for handling the new generic IMSLP redirect on the WIMA server.

The main challenge was to prevent infinite Apache loops for WIMA URLS bounced back to the WIMA server. I believe it's working now and have replaced the WIMA redirects based on IMSLP's upload log by the new generic IMSLP redirect. On the WIMA server a simple .htaccess file in the score root folder and a php redirect handler in the document root folder are involved.

I encourage forum readers to test the redirects of WIMA files, transferred as well as not transferred. Please report any errors encountered.
reccmo
active poster
Posts: 269
Joined: Mon Aug 08, 2011 8:54 am
notabot: 42
notabot2: Human
Location: Aarhus, Denmark
Contact:

Re: [BUG] Faulty IMSLP/WIMA redirections

Post by reccmo »

reccmo wrote:Max has helped me substantially with providing php code for handling the new generic IMSLP redirect on the WIMA server.

The main challenge was to prevent infinite Apache loops for WIMA URLS bounced back to the WIMA server. I believe it's working now and have replaced the WIMA redirects based on IMSLP's upload log by the new generic IMSLP redirect. On the WIMA server a simple .htaccess file in the score root folder and a php redirect handler in the document root folder are involved.

I encourage forum readers to test the redirects of WIMA files, transferred as well as not transferred. Please report any errors encountered.
Unfortunately it turns out that this redirect method conflicts with IMSLP's WIMA upload logic. When you access a thus redirected WIMA file then the WIMA upload ends up with an error message like
File #1 Error: Error: Failure #47 on CURL for 'http://icking-music-archive.org/scores/ ... torale.pdf'
I've been discussing this problem with Max, who says
Error 47 for CURL is "too many redirects":

CURLE_TOO_MANY_REDIRECTS (47)

Too many redirects. When following redirects, libcurl hit the maximum amount. Set your limit with CURLOPT_MAXREDIRS.

I'm not sure that the problem is due to the downloading being managed by php "per se", as it is essentially transparent to the browser. I think that there might be two possible causes:

1) a recursion issue similar to the one that we solved yesterday, but seen from the IMSLP side rather than from the WIMA side;

2) the php downloading being fragmented into small chunks of 2048 bytes. If CURL accounts a redirection for each chunk, it is easy to hit the CURL limit.

Cause 1) should be possibly fixed by Feldmahler, so let's investigate case 2) first. I would suggest to change this instruction on Redirect.php:

$buffer = fread($fd, 2048);

using a much larger value than 2048, so that the file is not fragmented, or fragment in very few chunks. To the limit, one can use:

$buffer = fread($fd, $fsize);

That should always result in a single file chunk.

If the above modification solves the issue, we may conclude that cause 2) applies. If it doesn't, I guess we will have to investigate cause 1) with Feldmahler.
The WIMA server side php logic includes ao. this logic

Code: Select all

      header("Content-type: ".$content_type);
      header("Content-Disposition: inline; filename=\"".$path_parts["basename"]."\""); 
      header("Content-length: $fsize");
      header("Cache-control: private"); 
      while(!feof($fd)) {
        $buffer = fread($fd, 2048);
        echo $buffer;
      }			  
      fclose ($fd);


I've performed another upload with the fread buffer increased to the full file size ($fsize) - and still end up with error #47. So it looks like we need to look further into a recursion issue on the IMSLP server side.

Can the IMSLP generic redirect be set up to circumvent this problem? If not then a solution might be to let the generic redirect return the WIMA file path modified like

'http://icking-music-archive.org/scores/ ... es-SAB.pdf' -> 'http://icking-music-archive.org/scores1 ... es-SAB.pdf'

On the WIMA server I've created a new directory, 'http://icking-music-archive.org/scores1/' including symlinks to all level 1 directories in 'http://icking-music-archive.org/scores/'.

For now I've taken back the 'old' redirect method into production.
Post Reply