We all know somebody with a data horror story. We’ve seen the announcements of the desperate researcher pasted all over campus, begging the person who stole their laptop to just return the hard disc that contained crucial data of their PhD – necessary to finish their thesis.
I, too, have my own data horror story. During my studies in Brussels (long long ago – around 2005-, when laptops were still uncommon), I had two desktop computers: one at my place in Brussels, and one at my parents’ house. These desktops functioned as backups of each other. At that time, external hard drives were still uncommon, so I sometimes burned data on a CD, but my computers functioned as backups of each other. Until both crashed within 10 days of each other. I lost all my digital university materials of the first years of my studies. Worst of all: this mishap occurred right in the middle of my exams and I clearly remember I needed a functioning computer to prepare for a Matlab-based exam.
Since then, I’ve been backing up my data. First by burning everything on DVDs and storing folders on flash drives. Then, in 2008, I got my first external HD of 500 GB, and have been faithfully backing up my work every since.
Currently I use a cloud service to store my data and sync everything across all my devices, and I also have a 3 TB HD that I use as an extra, physical backup.
So, whether you are starting as a PhD candidate or you are a seasoned researcher, it is always good to think about our data management. Here are a few aspects to consider:
1. Storage
When we talk about data management, the first thing we think about is data storage – where will we physically and/or in the cloud store our data. The first step is to figure out your storage, thinking of the worst case scenario: if a fire destroys your building or the zombie apocalypse happens in your country, will your data still be safe?
2. Format
The second aspect of data management is the format of the data. Will you be saving your measurements as simple .csv files, and if so, will another researcher be able to figure out what each column means? How will you structure the folders and subfolders of your work? If you have a constant stream of data (for example, from monitoring), how much of it will you save in raw format and how in processed format?
3. Publishing of data
When we talk about data management, we also need to talk about making our data FAIR (findable, accessible, interoperable and reusable). More and more, researchers are encouraged to share their datasets. But when you publish your data, you should do so in a way that it serves others (hence, the FAIR principles). It is good to think through how you will publish your data before you even start to collect and save data.
4. DMP
At Delft University of Technology, data management is important and all PhD candidates are expected to present their Data Management Plan (DMP, which they discuss with the Data steward) during their go/no go meeting. The DMP details how data will be collected, managed, stored and made available. Even if your university does not require a DMP, it’s still good practice to write down how you will manage your data at the start of your research, at the start of a new project, and/or at the start of a new collaboration.
5. Best practices
We’ve all heard the data horror stories. Let’s now focus on data success stories. I encourage you to talk with senior PhD candidates or researchers about how they deal with their data, and to follow the example of those with a sound data management strategy. Participate in data management training. Read through the information from your university with best practices for data management.
6. Your library
Data management is typically covered by the library services of your university. If you are in doubt, reach out to your Friendly Campus Librarian. Trust me on this one: librarians, and by extent the many people who work in research support through your library, are your best allies. When you are in doubt as to where to start with your data management, go to the library and talk to your librarians. They will be able to point you to the right person (the Data Steward) and get you on your way with data management.