How to be the Superhero: Saving Unloved Data | Love Data Week

lego man holding laptop and coffee

This week we have reviewed how to protect, organize, document, and respect your data. But why is this all important? Yes, we here at William H. Hannon Library are concerned about you and your data for that unpredictable moment your hard drive fails or you accidently knock your cup of coffee across your laptop. We want to make sure your data is safe and accessible for just after accidents or technical malfunctions. However, we are also invested in your ability to access valuable data from other agencies.

Last year, scientists feared that decades of data providing evidence of global warming would become harder to find due to the change to the Trump Administration, which nominated climate change deniers for top positions in his cabinet. Much of this data was only housed by the government. Data rescue events gathered academics, librarians, coders, and citizens to extract government data to ensure the public does not lose access to it. Organizations like Data Refuge copied and preserved decades of government-sponsored climate research on servers outside of the United States.

Legacy, heritage and at-risk data share one common theme: barrier to access. Field notes, lab notebooks, handwritten transcripts, measurements or ledgers, things written by hand, are also considered at-risk data. Data created on outdated technology or using proprietary formats are at risk. The shift from physical documents to digital distribution of documents creates a lot of risk that data may be lost.

Securing legacy data takes time, resources and expertise but is well worth the effort as old data can enable new research and the loss of data could impede future research. So how to approach reviving legacy or at-risk data?

How do you eat an elephant? One bite at a time.

1. Recover and inventory the data

  • Format, type
  • Accompanying material–codebooks, notes, marginalia

2. Organize the data

  • Depending on discipline/subject: date, variable, content/subject

3. Assess the data

  • Are there any gaps or missing information
  • Triage–consider nature of data along with ease of recovery

4. Describe the data

  • Assign metadata at the collection/file level

5. Digitize/normalize the data:

  • Digitization is not preservation. Choose a file format that will retain its functionality (and accessibility!) over time: “Which file formats should I use?”

6. Review

  • Confirm there are no gaps or indicate where gaps exist

7. Deposit and disseminate

  • Make the data open and available for re-use

Today’s post was written by Marie Kennedy, Serials and Electronic Resources Librarian, and Jessea Young, Digital Initiatives Librarian. Content inspired by Love Data Week. Image credit: julochka on flickr (CC BY-NC 2.0)