Page 1 of 1
A request for help on wiki backup
Posted: Fri Nov 16, 2007 11:08 pm
by imslp
Just to bring attention to this matter, I'm re-posting this as an announcement.
Currently I'm working on making available a copy of the IMSLP wiki database, with the text. However, I am also working on several other IMSLP-related feature implementations, and I would be extremely grateful if someone would have time to research ways to back up the wiki database that is
- Not too system-intensive (I'd prefer SQL backups over XML),
- Can be used to recreate to a very functional wiki (i.e. the wiki after re-creation should be pretty much fully functional, and not just dead text)
- Contains no private or sensitive information
I note here that IMSLP uses Mediawiki 1.11.0. Also, this backup system will continue to be used after IMSLP is back online.
Edit: Actual code, whether shell code or script code, to accomplish the above is extremely appreciated.
Posted: Sat Nov 17, 2007 9:40 am
by Yagan Kiely
Re: A request for help on wiki backup
Posted: Sat Nov 17, 2007 10:01 am
by Leonard Vertighel
imslp wrote:
- Can be used to recreate to a very functional wiki (i.e. the wiki after re-creation should be pretty much fully functional, and not just dead text)
- Contains no private or sensitive information
The problem is that the database dump should not contain the user password hashes. This means that users could not log in to a wiki set up from the dump. An admin would thus have to reenable the accounts as users claim them, trusting people to only claim accounts that were really theirs in the first place. Alternatively, all existing accounts would have to remain locked, and users would be forced to set up new accounts.
For static mirrors (no user editing) this is obviously not a problem.
I'm afraid that this is a fundamental problem that we have to live with.
Posted: Sat Nov 17, 2007 2:09 pm
by imslp
Actually, I don't mind the old accounts being permanently frozen; it would be necessary to protect privacy (and passwords, even though they are hashed with salts). As long as new accounts can be created that is fine.
The issue here is that several other tables actually use the user table (like page revisions), so I cannot just throw the user table away...
Posted: Sat Nov 17, 2007 4:24 pm
by emeraldimp
Well, one way would be to exclude the user table on a mysqldump and handle it seperately... Not sure exactly how you'd handle it separately, though; I didn't see a command in mysqldump to exclude a field.
Posted: Sat Nov 17, 2007 4:27 pm
by imslp
That is my concern too.
Any code, whether shell code or script code that can accomplish the goals I put forth in the first post would be much welcome.
Posted: Sat Nov 17, 2007 5:46 pm
by emeraldimp
Well... This should handle the exclusion part. Might take a while to run, though.
Code: Select all
mysqldump -u mysqluser -p password wikidb --ignore-table=wikidb.wiki_user > wikidb.sql
Now for the users...
Code: Select all
mysql -p password -u mysqluser testwikidb -X < userselect.sql | fgrep -v "<field name=\"user_password\">" | fgrep -v "<field name=\"user_email\">" | fgrep -v "<field name=\"user_real_name\">" > users.xml
Where usersselect.sql is
Code: Select all
select user_id, user_name, user_real_name, "" as user_password, user_newpassword, user_email, user_options, user_touched, user_token from testwiki_user;
Now we need to transform it...
Code: Select all
xsltproc usertransform.xsl users.xml > users.sql
Where usertransform.xsl is
Code: Select all
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" "xsl:version="1.0">
<xsl:output method="text" encoding="utf8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="resultset">
<xsl:for-each select="row">
<xsl:text>INSERT INTO wikidb_user (</xsl:text>
<xsl:for-each select="field"><xsl:value-of select="@name" /><xsl:if test="position()!=last()">,</xsl:if></xsl:for-each>
<xsl:text>) VALUES (</xsl:text>
<xsl:for-each select="field">"<xsl:value-of select="." />"<xsl:if test="position()!=last()">,</xsl:if></xsl:for-each>);
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Concatenate...
Code: Select all
cat wikidb.sql users.sql > backupwikidb.sql
Next we should probably compress it...
Code: Select all
tar cjf backupwikidb.sql.tar.bz2 backupwikidb.sql
And that should do it! (I think...)
You might have to change a couple things (obviously username and password, but also possibly the select statement; I have a rather old version of mediawiki installed).
Posted: Sat Nov 17, 2007 6:03 pm
by imslp
If anyone could test this on Mediawiki 1.11.0 it would be much appreciated
Also, I'm not sure if that is the extent of the user information or not; the official Wikipedia backup (
http://download.wikimedia.org/enwiki/20071018/) shows several things as private...
On another note, would someone familiar with the XML backup method comment on the speed and restoration process? I would like to avoid it for speed reasons, but if it is the only way to produce an usable backup that does not contain private data, it may unfortunately be the only way...