Backup
Backup
Protect against accidental or malicious data loss
A backup is a separate, secure copy of your data that can be used to restore your files if the original is lost, corrupted, or deleted. Backups are typically stored on a different device and/or location, and are not affected by changes to the original file.
Making backups of files is an essential element of research data management which ensures that original data files can be restored from backup copies, should they get damaged or go missing.
Regular backups help protect against accidental or malicious data loss due to:
- human error
- hardware failure
- software or media faults
- virus infection or malicious hacking, and
- power failure
The form of backup procedure required for a project will depend on local circumstances, the perceived value of the data and the levels of risk of losing data you are prepared to take. Carrying out an informal risk analysis can provide a good indication of backup needs.
Many cloud storage solutions offer synchronisation of local files (i.e. files stored on your computer) to the cloud. This is not a backup – if you delete or overwrite a file on your device, that change happens everywhere, immediately. If a file is accidentally deleted or damaged, your cloud storage may not offer any method of recovery. You should determine if any version history or recovery features are available and enabled.
To protect your research data, you need dedicated backup copies, ideally made regularly, stored securely, and kept separate from your day-to-day working files.
Backup strategy overview
- Make at least three backup copies.
- Use at least two different types of storage media.
- Keep at least one copy off-site or in institution-managed cloud storage.
- Automate backups where possible.
- Regularly test your ability to restore data from backups.
- Encrypt backups that contain personal or sensitive data.
- Define who is responsible for managing and checking backups.
- Plan for backup in your Data Management Plan and inform participants during consent.
Key considerations when planning a backup strategy for your research data
Is there any backup provision already in place?
Find out if your institution has an operational backup policy. Most universities have one for files held on a university network space and institution-managed cloud storage. In most cases, their policies do not include your local drive; you must manually backup this drive if you use it for data storage. If you are not happy with the robustness of the solution you should carry out an independent backup of critical files.
Which systems to back up?
You need a strategy for all systems where data are held, including portable computers and devices, non-network computers and home-based computers.
It will be important to identify which information on these systems should be backed up. This could be all, some or just the parts that have changed. If your institution does not provide any system backup, you may need to take full responsibility for all your own backups.
What file formats should I use?
Backups of master copies should ideally be in file formats that are suitable for long-term digital preservation, i.e. open or standard formats as opposed to proprietary ones.
How often should I backup up the file?
Consider how often you make changes to your data, and which amount of changed data you are prepared to lose between backups. Consider backing up after each change to a data file or at regular intervals, such as daily or weekly. Using automated tools to schedule backups is advisable.
How many copies should I make?
Most back-up policies would recommend having at least three copies of the data, with at least one being stored offsite. This is knows as the 3-2-1 backup strategy which recommends that three copies of the data are made in total, with the copies stored on two different types of storage media and one copy of the data is stored offsite.
Where should I store my backups?
The backup storage method should balance convenience, security, and risk. For day-to-day access, you can backup your files to a networked drive or a cloud storage service that supports versioning.
For sensitive or irreplaceable data, consider maintaining offline backups using external hard drives or even institutional archival systems. These add an extra layer of protection against ransomware or accidental deletion.
Avoid using USB flash drives or pen drives as your main backup media — they’re prone to failure and offer limited security. Physical media should be safely stored. Most manufacturers provide recommendations for the best storage conditions of physical media.
Note that backups that contain personal data require encryption and should always be stored securely. Consider the geographic location of where your backups are stored, and what legislation may apply to the data residency. For example, the UK GDPR contains rules about transfers of personal data to receivers located outside the UK – these receivers include backup services.
How about backing up personal data?
Where data contain personal information, personal data or sensitive data, additional care must be taken and the files alongside the backups should be encrypted and securely stored. Data protection legislation emphasises that only the minimum necessary personal data should be retained. Therefore, in the case of personal data, while the 3-2-1 strategy suggests multiple copies, it is crucial to assess whether all are necessary.
How should I organise my backups?
If you are making your own backups on removable media, make sure they are well-labelled, indicating the content and date/time. Without some management, achieving the ultimate aim of restoring lost data may prove difficult.
Are there any tools I can use to help me?
It is good to use an automated backup process to back up frequently used and critical data files. Windows and MacOS both have backup tools built-in, File History and Time Machine respectively, which make backing up easy and as simple as a few clicks.
How can I verify and validate backup files?
It is important that you verify and validate backup files regularly by fully restoring them to another location and comparing them with the original. Backup copies can be checked for completeness and integrity, for example by checking the file size, date and MD5 checksum value. It is also worth considering how long the backed-up data should be retained and if any data retention policies apply to it.
What are Checksums and why should I use them?
Checksums provide a simple way to compute the integrity of data files before and after file transfer or in backups. A checksum is like a unique fingerprint of a file that can be used to verify whether two files are identical.
Each time you run a checksum, a number string is created for each file. Even if one byte of data has been altered or corrupted, that string will change. If the checksums before and after copying or backing up a data file match, then you can be sure that the data have not altered during this process.
A free software tool for computing MD5 checksums is MD5summer for Windows. This tool computes checksums according to the MD5 checksum algorithm. See our MD5summer video tutorial. While MD5 is sufficient for basic integrity checking, for more secure or sensitive data, you may prefer to use stronger algorithms like SHA-256 or SHA-512. These offer better protection against accidental collisions and tampering.
What should I do if data loss happens?
Always prepare a disaster recovery plan. This plan should outline the steps to take if files are lost or corrupted, who to contact for support (for example your University IT team), where backup copies are located and how to restore them and when and how to communicate data loss to stakeholders or funders.
It is advisable to practice a full restore at least once during your project to ensure you know what to do in an emergency.