Talk:British units in World War I
Nothing is new under the sun, and there is an excellent page of Infantry Battalion War Diary Transcript Links (WW1) containing 'links to transcripts of First World War British Army infantry battalion war diaries from WO 95 which have already been published on other sites'. The edit history shows a number of people contributed to the page on the archived British National Archives Your Archives wiki (try saying that five times quickly!) so I'm wondering about the best way to acknowledge the work of those contributors if the content is copied over to battalion pages here. Thoughts? --Mia (talk) 10:36, 4 November 2014 (PST)
Units to be placed
Is the 1/4th Battalion (Alexandra, Princess of Wales's Own) Yorkshire Regiment part of the Yorkshire Regiment listed under Line Regiments?
- Yes. The regiment name appears in several different forms, and they're also nicknamed the Green Howards. We probably need to have a big discussion on naming conventions. I'll try to find some lists of regiment names to point out variations. Whatever the canonical name is, there can also be redirects for alternative names.--GavinRobinson (talk) 01:30, 15 November 2014 (PST)
According to E.A. James, British Regiments 1914-18, there were 1,761 battalions in British infantry regiments in the First World War. This doesn't include Machine Gun Corps battalions, which might be another hundred or so, and nations from the rest of the British Empire, which could be several hundred more. A search for "battalion" in WO 95 gives 3,501 items, although many of these will be duplicates because a battalion's diary is split between more than one item.
I think it would be possible to grab some data from WO 95 and use it to automatically create pages for British Empire battalions that have war diaries (many won't, and will have to be done some other way). Discovery allows up to 1,000 search results to be exported as CSV or XML (all catalogue data is under OGL, so no copyright problems). I've worked out searches and filters that should split the results into groups of less than 1,000. The data would then need to be extracted from the results files and manipulated into the right form (probably by a mixture of automatic and manual methods). Ultimately, it can be converted to wiki XML that can be imported through Special:Import. This would provide some basic seed data for battalions, including:
- parent regiment
- at least one theatre of war it served in
- at least one parent brigade and grandparent division
- catalogue references, links and dates covered for all official war diaries held at TNA
Nationality can't be extracted automatically, but there would be economies of scale to doing it in batches with the intermediate data instead of manually editing every page.
The intermediate data will also be a useful, but not definitive, source for regiment names.
There's no point going too far with this method until naming conventions, preload templates and infoboxes have been finalised, but I think it has potential.--GavinRobinson (talk) 04:31, 15 November 2014 (PST)
Gavin, this is brilliant! What's the best way to finalise the preload templates and naming conventions? My main concern with naming conventions is that the name of each page conveys enough information to disambiguate it from similarly named units in the same and other armies. I have a slightly odd week of travel ahead of me so I'm not quite sure when I'll be online, but perhaps we could find time to work on the same document together? I suspect I took a shortcut in importing the infoboxes that I need to go back and untangle to get them to work. --Mia (talk) 09:45, 15 November 2014 (PST)
- I've put up a spreadsheet of British infantry regiment names. I'm starting to type up some notes and questions on naming of regiments and units.--GavinRobinson (talk) 02:58, 19 November 2014 (PST)
- Looks good! When you say 'parent regiment', do you mean administrative or tactical? Is your data from a particular point in time or generalised across the years? Getting at least a marker in for parent/grandparent units at a particular point in time would be a good start. --Mia (talk) 15:06, 19 November 2014 (PST)
- I've now put up some notes and queries about naming specific units, regiments and formations.
- I've also downloaded search results CSV files from Discovery and am starting to smooth the data out (it's inconsistent but usable). The data I can get will apply to a range of dates covered by the official war diaries. It's derived from the theatres and divisions that the diaries are filed under, which doesn't necessarily reflect every theatre, brigade or division a battalion was in but covers at least some. Parent regiment is administrative, and will be derived from the unit name in the title of the catalogue record.--GavinRobinson (talk) 05:55, 20 November 2014 (PST)
- I've updated the command structure infobox so you can record parent and grandparent units for both administrative and tactical relationships, it has start and end date and allows references (as footnotes).
- War diaries should go into the appropriate section of the page (and update the category tag to show the battalion has a linked diary)
- I haven't done anything about theatres of war yet. See Template talk:Infobox command structure for discussion of names. What's your preference for naming conventions for British battalions and regiments? Finally, is there anything else you need resolved to go ahead with importing the WO 95 data? --Mia (talk) 17:27, 23 November 2014 (PST)
(Continued from above.)
My preference for constructing the page names of British battalions is:
- least ambiguous ordinal number + any words absolutely necessary for disambiguation + short form of regiment name + suffix ", UK"
- 1st Battalion Royal West Surrey Regiment, UK
- 1/5th Battalion Yorkshire Regiment, UK
- 1st Garrison Battalion Lincolnshire Regiment, UK
For regiment page names:
- short form of regiment name + suffix ", UK"
- Yorkshire Regiment, UK
- Lincolnshire Regiment, UK
- Cameronians (Scottish Rifles), UK
- Irish Guards, UK
Division page names:
- ordinal number, if any + any words absolutely necessary for disambiguation + "Division" + suffix ", UK"
- 1st Division, UK
- Guards Division, UK
- 46th Division, UK
- 1st Cavalry Division, UK
As for the body of the page, I think the British will definitely need repeatable infoboxes (but Australians won't because their battalion names don't change and are likely to be the same as the page name). Infobox will probably need to include:
- full name: the longest form of the name, including full regiment name and any optional words
- short name: the full name shortened according to the same rules as the page title
- any other alternative names
- start date
- end date
- page name: 1/5th Battalion Yorkshire Regiment, UK
- start date: 1/4/1908
- end date: 1/1/1915
- full name: 5th Battalion Alexandra, Princess of Wales's Own (Yorkshire Regiment)
- short name: 5th Battalion Yorkshire Regiment
- other name: 5th Battalion Green Howards
- start date: 1/1/1915
- end date: 6/11/1918
- full name: 1/5th Battalion Alexandra, Princess of Wales's Own (Yorkshire Regiment)
- short name: 1/5th Battalion Yorkshire Regiment
- other name: 1/5th Battalion Green Howards
(Or should nation suffix be separated by ; because some regiment names have , in them?)
Apart from things already being discussed on other talk pages, I'll need infobox for theatre/location to be ready. There's no big hurry as I won't have the command structures sorted out until next week.--GavinRobinson (talk) 08:01, 25 November 2014 (PST)
Progress: I've nearly finished turning the CSV into a relational database, which was the hard part. It should be done tomorrow, then I can start writing Python functions to pull data out of the database and format it as wiki pages, which should be easy. There will be about 1,100 British battalions, 200 Indian battalions, 300 brigades and divisions, 100 British infantry regiments, plus redirects for alternative names or abbreviations of some regiments. A battalion's tactical parents will be included only if it was in a brigade or in divisional troops. Brigades will have parent divisions. Units and formations will have at least one theatre with at least approximate dates. Now to catch up on all the other talk pages.--GavinRobinson (talk) 07:49, 9 December 2014 (PST)
Very nearly there now. I should be able to send you the first batch of pages tomorrow. Most issues are resolved or not urgent. I just need to know whether the source headings in Template:Battalion should stay as they are or change to what I last suggested. And are there any changes you want to make to the HTML comments in Template:Battalion? That's all.
I've got the XML export/import process working and just need to finish the functions to fill in the page content, which is mostly a case of select queries and string concatenation, so not much can go wrong.