So, it's no Wordle, but it's my first text cloud. This visualises the 250 most common words in the titles of each series in the initial dataset. It's also the first time I've mined the titles for data, and another step in the process of feeling out the attributes of this dataset. I've excluded a few "stop" words ("and","with","the","for") and anything with less than three characters, but otherwise this is a raw representation of the titles.
It shows first of all that the most frequently occuring terms in series titles are either generic descriptors ("files", "correspondence") or metadata, referring to the organisation or structure of the series, rather than its content ("alphabetical", "prefix", "single"). But then after the top twenty or so words, there's a large number of more descriptive terms. The difference in scale between these layers is significant; for example "series" and "files" occur in around 10,000 series titles (about a third of all series), whereas "drawings" occurs in around 800 series, and "HMAS", "Papua" and "Lighthouse" all occur in around 200 series. Some odd features show up as well, for example "Yokohama" and "specie"; turns out there are a large number of series consisting of records from the Yokohama Specie Bank, a Japanese bank involved in trade with China and Australia around the mid-C20th - it gets a mention in this 1940 telegram from Menzies to the High Commissioner in London. I wonder how the records ended up in the Archives?
Next, to try integrating text clouds as interfaces / overlays for the previous visualisations.