The Ohio State Universities | University Libraries | Resource Guides

Introductory Research Data Management

Versioning

Keeping versions, or copies, of your files allows you to go back in time and see what changes were made. How many versions to keep is a personal choice, but there should be enough versions to understand how the data has been manipulated. 

Tips and tricks:

  • Always keep multiple, well-documented versions of the original data, and the normalized/standardized versions of the data.
  • Use version numbers in the file name: v001, v002, v003...
  • Include information in the file name about significant changes: such as ‘cropped’, ‘normalized’, etc.
  • At the end of the project, delete the versions that you no longer need

NOTE: BuckeyeBox does offer a version history feature, but it is not acceptable for restricted data. Read more about the types of data that are appropriate for sharing on BuckeyeBox.

File Formats

Your file format largely determines whether or not you can open a file at a later date. Proprietary file formats require the proper version of the proprietary software. Non-proprietary, or open, formats are preferred because they are independent and more durable formats. Saving your data in open, unencrypted and uncompressed formats will make your data usable for years to come. If you can’t save your data in an open format, consider including the software name, version, and parent company in the accompanying readme.txt file for future users. Preferred file formats include:

  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Spreadsheets or tables: CSV
  • Video: MOV, MPEG, AVI, MXF
  • Audio: WAVE, AIFF, MP3, MXF
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Web archives: WARC

For more in-depth discussion, see the Library of Congress’ Sustainability of Digital Formats web site.

Backing up your data

Backing up is something we all know we should do, but often don't. For those high-value files, such as your original research data, use the Rule of Three:

  • Three copies
  • Two different media types (hard drive and memory stick, or CD and cloud service, etc.)
  • One off-site back-up (Unless you are handling restricted data. Then you must ensure that your back-ups comply with your security plan. Unsure what restricted data is? Check out OSU's classification assignments.) 

File Naming Conventions

Making your file names descriptive and standardized will make them easy to understand later and more useful to others. Choose a file naming convention, detail it in a readme.txt file, and stick with it.Consider including:

  • Project name or acronym
  • Location or spatial coordinates
  • Researcher name or initials
  • Date or data range
  • Type of data (what did you measure?)
  • Version number
  • Make sure your file names have the three-letter extension on them for application-specific files

Tips and tricks:

  • For dates, use YYYYMMDD
  • Try to keep file names short, since long file names do not work well with all software
  • Don’t use special characters, like ~ ! @ # $ % ^ & * ( ) ` ; ? , [ ] { } ‘ “
  • Use leading zeros and your files will sort in sequential order; 001, 002, 003, …010, 011, 013, …100, 101, 102 
  • Don’t use spaces; instead use underscores, dashes, no separation at all, or CamelCase (no spaces, but every new word is capitalized)