User talk:Mia

From Linking experiences of World War One
Revision as of 00:36, 25 July 2015 by GavinRobinson (talk | contribs) (Performance improvements)

Jump to: navigation, search

Fixing infobox/Lua errors

I'm putting my notes here so the status is clear while I'm working on fixing things.

GavinRobinson had reported that

Before the upgrade, it seemed to be the number of infoboxes on a manually created page that triggered errors. The first few infoboxes displayed fine but any more didn't display, and the number of error messages was the same as the number of missing infoboxes.

And that:

I've also been getting script errors in manual edits, especially if there are lots of infoboxes on a page. See http://collaborativecollections.org/WorldWarOne/index.php?title=Category:Pages_with_script_errors

However I'm not sure the number of errors is directly related to the number of infoboxes as http://collaborativecollections.org/WorldWarOne/British_Infantry_Arm has ten errors and 7 infoboxes; http://collaborativecollections.org/WorldWarOne/Guards_Division,_UK has matched error and infobox numbers; and http://collaborativecollections.org/WorldWarOne/Coldstream_Guards,_UK has one error and 7 infoboxes. I suspect it might be because some of the infobox templates got out of sync with usage but haven't had time to investigate.

Also NB: $wgShowExceptionDetails may be on at some times, in which case more information will show per error.


I'm still getting signal 24 errors. British Infantry Arm should have 17 infoboxes (see page source). 7 are displaying correctly, 10 are missing and there are 10 error messages.--GavinRobinson (talk) 10:16, 15 January 2015 (PST)
Is it always the same infobox missing? --Mia (talk) 10:43, 15 January 2015 (PST)
I think so. They display in the same order they're entered in the source code. It seems to be always after 7 infoboxes that it breaks down. Category:Pages_with_script_errors also includes some of the infobox templates themselves, so maybe try to fix those first and see if any pages still have problems. A maximum of 7 infoboxes won't be enough as some British regiments have up to 50 battalions. The signal 24 error on pages with more than 7 infoboxes may be a different thing from what was breaking the imports before Christmas. I've manually created all the battalion pages that failed to import, and none of them have script errors.--GavinRobinson (talk) 10:59, 15 January 2015 (PST)
I'm going to quickly try and replicate it with a different infobox before posting on the Scribunto pages. I should probably also try to reduce the number of conditionals as that'd generally reduce load but that's probably not related. This is probably where learning just enough to create infoboxes while in the middle of so many other things will come back to bite me. --Mia (talk) 11:07, 15 January 2015 (PST)
The response I got was 'You set $wgScribuntoEngineConf['luastandalone']['cpuLimit'] too low. Either raise it or unset it.' - it wasn't set so I tried it at 60 seconds, which has sorted out those errors but is slow to load. Caching will help with existing pages but it's still going to need tweaking. Anyway, let me know how you get on. --Mia (talk) 15:30, 15 January 2015 (PST)
Yes, the 24 error is fixed. Pages that had it are now displaying properly, although it looks like they don't disappear from Category:Pages_with_script_errors until they've been edited. Pages with infoboxes are still very slow to save: 57 seconds for Coldstream Guards, UK and 67 seconds after the captcha for 4th Guards Brigade, UK. Talk pages are only taking about 5 seconds to save.--GavinRobinson (talk) 01:14, 16 January 2015 (PST)
I'll remove some of the conditional processing from the infoboxes to see if that helps (but probably not for a few days) --Mia (talk) 03:29, 16 January 2015 (PST)
http://collaborativecollections.org/WorldWarOne/index.php?title=Category:Pages_with_too_many_expensive_parser_function_calls&action=edit&redlink=1 for pages that use 'too many expensive parser functions (like #ifexist)' doesn't exist, which might suggest that it's not the #ifexists, but I'll tweak Template:Infobox military unit, Template:Infobox command structure and Template:Infobox theatre of war --Mia (talk) 16:53, 26 January 2015 (PST)
After removing unnecessary #ifs, parser profiling data for Template:Infobox command structure (generated with the Show Preview button) showed: Expensive parser function count 5/100 though there's only two #ifs on the page. Template:Infobox_military_unit shows the same thing before and after removing one #if from the infobox. However, Template:Infobox theatre of war went from 5/100 to 4/100 when I checked before and after editing out an unnecessary #if.
Pages like 4th_Guards_Brigade,_UK showing 'Expensive parser function count: 0/100' when you view source. --Mia (talk) 12:33, 2 February 2015 (PST)


Import errors

I've successfully imported a batch of 10 pages of Indian battalions. Next I tried a batch of 20. 18 pages imported properly but then it gave a 500 Internal Server Error. This is different from the error message I got before Christmas, which was:

Fatal error: Maximum execution time of 30 seconds exceeded in /home/miaftpuser/collaborativecollections.org/WorldWarOne/extensions/Scribunto/engines/LuaStandalone/LuaStandaloneEngine.php on line 388

That came up after only 6 pages of 32 had imported, so there is an improvement.--GavinRobinson (talk) 00:44, 9 February 2015 (PST)

Thanks Gavin! I've tweaked one php setting for timeout, but if that doesn't improve things I'll try another. It's only a little shared hosting account so it's tricky to beef it up too much. --Mia (talk) 03:59, 9 February 2015 (PST)
I'm still getting the same problem. This time I got 19 pages out of 20 before it threw a 500. Both times, the error came 2 minutes 3 seconds after starting the upload. At this rate, I can get all the Indian battalions imported without too much effort as long as the errors don't get any worse, but I'll need more pages per batch to get through the British.--GavinRobinson (talk) 05:53, 9 February 2015 (PST)


Just tried importing a batch of 14 British record offices. 10 imported then it threw a 500, again at 2 minutes 3 seconds. Record office pages have more infoboxes than Indian battalions but no more than 10 per page. Imported the other 4 in another batch with no problems. I'll see what the limits are for other kinds of unit.--GavinRobinson (talk) 10:58, 18 February 2015 (PST)

Only got 2 British divisions out of a batch of 10 before the 500 error, at 2 minutes 4 seconds. These pages have more infoboxes than most and also 2 revisions per page.--GavinRobinson (talk) 11:05, 18 February 2015 (PST)


The last import error was my fault: I fed it a bad file. So nothing needs investigating there.--GavinRobinson (talk) 10:20, 21 February 2015 (PST)

Ok! Are you still getting the 500 error on other imports? --Mia (talk) 16:45, 22 February 2015 (PST)
Yes, but only if the import process goes over 2 minutes. The number of pages I can get imported in under 2 minutes varies. It seems to be roughly related to the number of infoboxes but not a direct correlation with the total number of infoboxes. I get a vague impression that theatre adds more time than command structure, but too many of either will break it.--GavinRobinson (talk) 00:17, 23 February 2015 (PST)
It might be related to the number of expensive parser functions or the 'Highest expansion depth' rather than time. Each of Template:Infobox theatre of war, Template:Infobox military unit and Template:Infobox command structure show 'Expensive parser function count: 5/100', and two show 'Highest expansion depth: 10/40' (Template:Infobox military unit is 11/40). That said, Lua time usage is still showing as 60 seconds so I'll see what I can tweak. --Mia (talk) 11:36, 23 February 2015 (PST)


The 500 also happens with manual edits if they take 2 minutes 3 seconds to save. Just tried to manually create a page for the biggest British infantry regiment (50+ infoboxes) and it didn't work.--GavinRobinson (talk) 09:00, 27 February 2015 (PST)

Actually I've just found in my notes that some regiment pages took longer than 2:03 to save manual edits and didn't get an error. Lincolnshire Regiment, UK was 2:09; Argyll and Sutherland Highlanders, UK was 2:42. So if any settings have changed between 21st Feb and now they could have made things worse for manual edits.--GavinRobinson (talk) 11:26, 27 February 2015 (PST)
I've been madly working on thesis chapters so I haven't made any changes since Feb 9th. I've checked and max_execution_time is set to 0 as is max_input_time. I've just upped the memory_limit in case that makes a difference. Let me know if any of the times or any detail of the error message changes! --Mia (talk) 14:35, 27 February 2015 (PST)

Performance improvements

I've edited in the cache changes suggested in http://wiki.dreamhost.com/MediaWiki_Troubleshooting

Front page before the edits: CPU time usage: 0.044 seconds, Real time usage: 0.043 seconds

Australian units page before the edits: CPU time usage: 0.248 seconds, Real time usage: 0.250 seconds

About the same immediately after but hopefully it'll improve once the cache has built. --Mia (talk) 16:05, 6 February 2015 (PST)

Pages with infoboxes seem to be loading quite fast now but saving edits still very slow. Adding 1 infobox to British Infantry Arm just took 1 minute 43 seconds to save. I might try importing a small batch on Monday and see how far it gets.--GavinRobinson (talk) 09:25, 7 February 2015 (PST)
Yikes, that's crazy! Let me know if it gets any quicker the second time. I've also just increased the time on $wgParserCacheExpireTime to two weeks, but IIRC logged in users won't get the benefits of all cache changes (https://www.mediawiki.org/wiki/Manual%3a%24wgParserCacheExpireTime) --Mia (talk) 10:06, 7 February 2015 (PST)
Another edit to the same page, 1 minute 43 again. Interesting that it's so consistent.--GavinRobinson (talk) 10:56, 7 February 2015 (PST)
Also, a note to myself to check whether there are enough database connections available and or see if there's another database-related bottleneck. --Mia (talk) 10:06, 7 February 2015 (PST) Also https://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html --Mia (talk) 14:44, 27 February 2015 (PST)
And others:


I've been doing some performance tests to try and get a better idea of why things are going slow and how they might be improved. The test is to keep increasing the number of instances of a template on a page, then time how long it takes to load the preview, and after the preview loads, look at the parser profiling data. In the tables below, "actual time" is time taken to load the preview, timed with a stopwatch, and the other times are taken from parser profiling data. Times are all in seconds.

This is a control using {{documentation}} which is a simple template without parameters but invokes a Lua module:

Number of templates Lua time CPU time Real time Actual time
0 0 0.024 0.230 4
1 0.2 0.376 1.708 5
2 0.2 0.384 0.603 5.51
3 0.2 0.408 0.711 4.57
5 0.2 0.444 0.774 3.9
10 0.2 0.532 0.808 4

This is for {{infobox command structure}}:

Number of templates Lua time CPU time Real time Actual time
1 1.030 1.280 2.416 5.3
2 1.740 2.228 5.831 9.94
3 2.690 3.332 6.119 9.74
4 3.540 4.356 8.827 14.29
5 4.540 5.688 10.236 13.99
10 8.510 10.257 24.837 28
20 17.460 19.749 39.408 43.21

I assume real time is always higher because the CPU is shared with other sites. If that's the case, it suggests that better hosting could halve the time it takes to save and preview edits of pages that have infoboxes, but we might still find that pages with lots of infoboxes are slow. Northumberland Fusiliers, UK has about 50 battalions. The Machine Gun Corps will need about 300 child units and I don't see a logical way to split them up into smaller groups.--GavinRobinson (talk) 03:05, 19 July 2015 (PDT)

Thanks Gavin! I'll try to sort out the hosting this week (though it's another busy one), bearing in mind the impact of those infoboxes.
I think we should consider the pros and cons of both Wikibase and Semantic MediaWiki. Being able to run a query to find children might remove some of this problem, and make the wiki easier to maintain, but there might still be performance issues and other limitations and complications.--GavinRobinson (talk) 11:59, 19 July 2015 (PDT)
Looks like Wikibase is going to be very good but isn't ready yet as they haven't fully implemented queries or editing data from client wikis. I'm still investigating SMW: mature enough to use straight away but not as sophisticated as Wikibase but has lots of features that are better than what we've got now but there may or may not be obscure problems that make it unfit for our purposes. Were there any particular problems that made you decide against it last year?--GavinRobinson (talk) 12:12, 24 July 2015 (PDT)
I've run the same test on Wikipedia with their current Infobox command structure and it went really fast: only 0.238s Lua time and 0.526s CPU time even with 40 instances of the template. So maybe it is just the server.--GavinRobinson (talk) 10:30, 20 July 2015 (PDT)
Ok, good to know. Do you know anyone running an infobox-heavy wiki on other affordable hosts? I'd been hoping to upgrade the site before I went abroad tomorrow (also in case that fixed the queue, though that might also be a MySQL bug) but FTPing from Dreamhost is *slow*. I haven't had time to ask on any mailing lists between various trips.
I don't know of any wikis as infobox heavy as this, partly because I don't know that many wikis, and partly because what we're doing is quite unusual. Might be worth asking Marine Lives where their wiki is hosted as it runs very well, but I suspect it costs a lot. Job queue has gone up because I was editing templates but has been going down normally. We'll see what happens when it gets down to 82.--GavinRobinson (talk) 01:36, 25 July 2015 (PDT)

Australian units

I've created 1st Battalion, Australian Imperial Force (AIF) and 1st Infantry Brigade, Australian Imperial Force (AIF) to demonstrate some infobox values, categories and sort orders for Australian units. If these are all OK I can import the rest of the missing AIF infantry battalions and brigades as I've found it's very easy to generate the pages from a spreadsheet of autofilled numbers. I'm not sure about the best default sort for Australian units. It might not be very helpful if everything ends up under A. Maybe in Category:Australian Army they could have sort order "Battalion 01" etc.--GavinRobinson (talk) 03:32, 14 May 2015 (PDT)

They look good! And once the pages are set up properly it'd make adding the Digitised items from AWM found by User talk:B3rn a lot quicker. Where does the sort order information come from? --Mia (talk) 03:58, 14 May 2015 (PDT)
For now I've made arbitrary decisions about sort order strings based on the unit name. For example, I've set the default sort with the code:
{{DEFAULTSORT:Australian Imperial Force Battalion 01}}
The sort string there is the name rearranged into hierarchical order: country/service/branch, then unit type, then unit number (zerofilled to make it sort correctly as a string). I now think this will probably be OK for the default as it's useful to have all Australian units under "Australian" in source-related categories such as "with/without personal narratives". But it probably needs a different sort order setting for Category:Australian Army because everything in that is likely to start with "Australian". I've done a category only for AIF infantry battalions with its own sort order specified like this:
[[Category:Australian Imperial Force infantry battalions|01]]
There the sort order is just the battalion number zerofilled, because everything else goes without saying. When this category is full it'll have 0-6 as headings because they're the first characters of all the sort strings specified for member pages. This is similar to what I've done with British and Indian battalions, which have different categories sorted by regiment name or number.
So I'm leaning towards giving Category:Australian Army and Category:Australian Imperial Force a sort string in the form "Battalion 01" or something like that. Or would it be better or worse if it was only the number and not the unit type? Any feedback would be useful.--GavinRobinson (talk) 05:43, 14 May 2015 (PDT)
A 'sort string in the form "Battalion 01" or something like that' sounds reasonable to me. --Mia (talk) 06:04, 14 May 2015 (PDT)


Jobs queue

We might have a slight problem: the job queue is stuck on 82. I triggered lots of jobs by moving templates last week. The queue had been going down at a reasonable rate but it's stopped at 82 and hasn't moved for a few days. It's supposed to run one job every time someone views a page, but now viewing a page doesn't change it. (Since they removed the job queue from Special:Statistics you can only see it through this API result.) It might be worth trying to empty it manually. If that doesn't work, I don't know what to suggest.--GavinRobinson (talk) 00:23, 8 July 2015 (PDT)

Thanks for letting me know! I'm travelling so might not be able to access everything this week but I'll have a go at triggering it in the meantime anyway. --Mia (talk) 06:25, 8 July 2015 (PDT)
I've run runJobs.php, but the API call is still showing 82 jobs; OTOH showJobs.php returns 0. Still investigating. --Mia (talk) 02:27, 15 July 2015 (PDT)
It seems to be only "jobs" that is stuck in the API result: the numbers for "edits" and "articles" are updating normally.--GavinRobinson (talk) 10:34, 15 July 2015 (PDT)