Additional Resources

C: Additional Resources

View some selected tools that can open a variety of data types .


U: Additional Resources

Types of Documentation:

[Image of a README file, Codebook, Commented Code]

Working With Various File Formats:

When curating data for a repository that accepts all types of data, you can receive many different types of files. As a result, you might not always have the requisite software needed to open and view the files. When this occurs, there are a few different ways to still be able to read the files using common, readily available software.

Common proprietary formats you might encounter include (but are not limited to) MATLAB (.mat, .m), Stata (.dta, .dct, .do), SAS (.sas, .sas7bdat), SPSS (.sav, .sps), ESRI/ArcGIS (.shp, .dbf, .gdb).

For some proprietary formats, there are open source, freely available software packages that can work with them. For example, QGIS can be used to work with files created in ESRI’s ArcGIS platform. For others, you may have to convert the files. A useful tool for conversion is called Stat/Transfer . It is not freely available, but can be worth the investment given that it also helps with older legacy file formats.

Notepad++ is a free source code and text editor. It is an exceptionally helpful tool when working with text files that appear unstructured when opened with regular Notepad or Wordpad. It can also often be used to open code files such as .m, .r, .do, .sas. Notepad++ is also worth trying when a file appears to not have any extension.

Curating human participant data can be challenging. The Data Curation Network has a Primer on Human Participants Data Essentials that can help inform that process.


T: Additional Resources

Excel Archival Tool from GitHub: http://z.umn.edu/exceltool - The Excel Archival Tool programmatically converts Excel files to open source formats (specifically, CSV and PNG).

McGrory, John. (2015). Poster for "Excel Archival Tool: Automating the Spreadsheet Conversion Process". Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/171966.

Module 3 Understand: more information about proprietary file formats, software version documentation, and other important actions for understanding the data.

Janée, Greg; Sawchuk, Sandra; Yoo, Ho Jung. (2019). Microsoft Excel Data Curation Primer. Data Curation Network GitHub Repository.

Smithsonian Institution Archives. Smithsonian Recommended Preservation Formats for Electronic Records. https://siarchives.si.edu/what-we-do/digital-curation/recommended-preservation-formats-electronic-records.

Cornell University Library. File formats for digital content: Probability for full long-term preservation, in Recommended File Formats. https://guides.library.cornell.edu/ecommons/formats


E: Additional Resources

There are numerous other tools and metrics being created by the community to evaluate FAIRness:


D: Additional Resources

Resources referenced in this guide and related to dataset documentation: