#1 Digital Humanities Winter Institute Report-back

I attended the Digital Humanities Winter Institute at UMD’s MITH, [http://mith.umd.edu] this past week, which meant I learned how to be a better researcher and overall nerd.

Firstly, hooray for CUNY GC for co-sponsoring, including making graduate student scholarships available, which meant I could go.

I enrolled in the Data Curation for the Digital Humanities track, co-taught by Trevor Munoz and Dorothea Salo, where I learned about issues in data ordering; data sociology; general wrangling, selecting and retrieving; and linked data. I looked at all this with an eye to the CollectiveAccess [http://www.collectiveaccess.org/] build project I’m working on with the Interference Archive [http://interferencearchive.org/] at the moment.

The Institute was like a faucet of information and I like a sponge absorbed as much as possible, learning that I am especially apt and interested in: large-scale data/text analysis, working with open data sets, and image algorithm analysis.

First: DataCuration

According to Munoz, Data curation addresses challenges of maintaining digital information in a manner that preserves its meaning and usefulness as a potential input for further research and scholarship.
Why concern yourself with data if you’re not in the sciences? “Data is alleged evidence” said Dorothea Salo, e.g. data is what you are going to show people to prove you didn’t pull your critical and analytic conclusions our of your a$$.

Data Ordering:

Data curation is larger than archiving. The individualism of humanists is a problem — you can’t curate alone! If you want a project to continue after you die, retire, get a new job, or start a new project, you need to think about how you are digitize, organize, and preserve your data. The basic instruction was DOCUMENT. BUT REALLY, DOCUMENT. Which is an instruction I could hear and tell my colleagues over and over. It’s just that important.

Data Sociology:

Data is most useful to people who care about it, which is to say make your data available to your communities of interest, be they Whitman scholars or critical race theorists. There is a social element to data sharing, it lives on in social circles.

Data Selection

As in sociology [and life], other people matter. The audience is the REASON we ultimately select what to keep and share — we are *not* hoarders, we are scholars.

Data Retrieval

So you have some various data, perhaps these types:
•    Image collections
•    Page scanned books
•    Basecamp
•    marked up books
•    these and dissertations
•    website preservation
•    audio & video
•    complex multimedia
•    tweets

Before your brain explodes, remember that there are many software options, and that choosing one comes last. First, consider your audience, your order, and your content!

General Data Wrangling & Collections Software & Tools:

Digital Library Software – Designed for image exhibitions
Omeka
Greenstone
ContentDM [will do books]

Hybrid Solutions
Preservation: Fedora Comons, microservices
Deposit/mgmt: Hydra, Islandora [VREs, virtual research environments]
End-user UI: Yhrda, Isandora, Okema, glue [puts Omeka onto Fedora Commons]

Archives platforms
Archivematica, ArchivesSpace [beta soon], Duke Data, CollectiveAccess, BitCurator

Data Management Platforms
Dataverse Network, thedata.org [db mgmt platform in a box]
See list on DCC wiki.

Linked Open Data

With Linked Open Data, the main idea is that researchers and, well, anyone, can derive different kinds of value from existing technology. There is what is called a semantic web, a way of using terms that can be replicated across the web, which one of my classmates likened to esperanto, which can be used to describe things. Every Thing, actually. And if “we” web-information-sharers agree to use shared semantics, then the information about Things can be linked up using a friendly little identifier called a URI, which is like a URL, but more specific to a particular thing.

Well how the HECK would I know what URI to pick? Is that tree a TREE or a DECIDUOUS TREE? Before your brain melts,I’m gonna tell you that’s the easy part, folks — thanks to the naming power of language, most things have already been assigned a taxonomy [creepy but comforting?]. If you’re talking about an author, try  VIAF http://viaf.org/, and if you’re talking about a book, you need the Library of Congress [LOC], and if you’re real unclear still, try CALAIS [http://www.opencalais.com/] to help build your Linked Open data URIs.