Temporal mapping, crowdsourcing, and standardization

Scholars' Lab Blog //Temporal mapping, crowdsourcing, and standardization

Blog //Temporal mapping, crowdsourcing, and standardization

By Lindsay O’Connor on September 6, 2011

We migrated this website to a new platform, and are working to correct formatting errors in older blog posts as a result. If you encounter an error, please send an email to scholarslab@virginia.edu. Thanks!

For my first DH project review, I picked the Linguistic Atlas Projects because I’m interested in regional dialect and linguistic change and because I liked the idea of a linguistic map that the name “linguistic atlas” invokes. But when I looked closer at the site, I wanted to make it into more of a globe instead of just a collection of maps. Comparison across regions could be easier and less time-consuming for the user if the analyses had more overlap or formal standardization. Right now, each study appears differently on the website. Some provide analyses and some only provide data and a basic description. The LAMSAS Density Estimations Maps provide density across the region while the LAPNW maps provide isolated community locations, and these two analyses map none of the same terms. The extensive data files show many similar questions and terms, so coming up with similar analyses would not require additional research. I know we’re only supposed to suggest a single change, but I want to suggest an additional change that would also serve the goal of more extensive and accessible comparisons, this time within regions instead of just between them. Much of the data is already dated, so newer surveys would make for a more current version of the Atlas as it is, and they would also give these linguistic maps an additional dimension by allowing for comparisons over time. Users could see how dialects are changing within and across regions and demographic groups. I realize this isn’t so much a shortcoming of the project as much as it is a next step it could take, and with Scholars’ Lab developing Neatline , Linguistic Atlas might be a good candidate for an update.

I also want to reflect briefly on crowdsourcing via What’s on the Menu. Annie helpfully pointed out some inconsistencies in the instructions on what to include and what to ignore when transcribing menus, and the “About” page mentions “cleanup” before the project is complete. I’ve seen so many restaurant menus with typos and misspellings that I’m sure there will be many variations in the language used to name and describe many menu items, and those variations and “errors” are part of the beauty of these objects. It would be a shame for those things to get “cleaned up;” a project like this that documents and preserves ephemera should maintain all the quirks in its archive. I’m ambivalent on crowdsourcing here; it’s of course great for getting things accomplished quickly and cheaply and it will be much better than OCR for all the different menu formats, but like OCR it might still lead to inconsistent representation or interpretation of items in the archive. This possible problem could also become a strength, however, if the archive of menu items and prices could also serve as an archive of the transcription process. If information about each transcriber were preserved and associated with the items they transcribe, What’s on the Menu (or maybe any crowdsourced project?) could serve the secondary function of demonstrating how different people interpret different objects. This issue will probably come up again as we work on Prism, so I look forward to reviewing other crowdsourced projects along the way.

Now I’m asking for more standardization in one project and am resisting standardization in the other. The difference is in what level of analysis or representation is being standardized, and in a way, the Linguistic Atlas Projects might serve as a model for the data in What’s on the Menu. The LAPs document linguistic variation, and I hope What’s on the Menu will document variations in names, descriptions, spelling, and punctuation the way the LAPs document variations in word choice and pronunciation. The standardization I think the LAPs lack is at a higher level of analysis, not at the level of data collection. It seems that many different projects share the challenge of representing variation in the data or objects in their archives in formats that are consistent and user-friendly.