C: Additional Resources
View some selected tools that can open a variety of data types .
U: Additional Resources
Types of Documentation:
[Image of a README file, Codebook, Commented Code]
Working With Various File Formats:
When curating data for a repository that accepts all types of data, you can receive many different types of files. As a result, you might not always have the requisite software needed to open and view the files. When this occurs, there are a few different ways to still be able to read the files using common, readily available software.
Common proprietary formats you might encounter include (but are not limited to) MATLAB (.mat, .m), Stata (.dta, .dct, .do), SAS (.sas, .sas7bdat), SPSS (.sav, .sps), ESRI/ArcGIS (.shp, .dbf, .gdb).
For some proprietary formats, there are open source, freely available software packages that can work with them. For example, QGIS can be used to work with files created in ESRI’s ArcGIS platform. For others, you may have to convert the files. A useful tool for conversion is called Stat/Transfer . It is not freely available, but can be worth the investment given that it also helps with older legacy file formats.
Notepad++ is a free source code and text editor. It is an exceptionally helpful tool when working with text files that appear unstructured when opened with regular Notepad or Wordpad. It can also often be used to open code files such as .m, .r, .do, .sas. Notepad++ is also worth trying when a file appears to not have any extension.
Curating human participant data can be challenging. The Data Curation Network has a Primer on Human Participants Data Essentials that can help inform that process.
T: Additional Resources
Excel Archival Tool from GitHub: http://z.umn.edu/exceltool - The Excel Archival Tool programmatically converts Excel files to open source formats (specifically, CSV and PNG).
McGrory, John. (2015). Poster for "Excel Archival Tool: Automating the Spreadsheet Conversion Process". Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/171966.
Module 3 Understand: more information about proprietary file formats, software version documentation, and other important actions for understanding the data.
Janée, Greg; Sawchuk, Sandra; Yoo, Ho Jung. (2019). Microsoft Excel Data Curation Primer. Data Curation Network GitHub Repository.
Smithsonian Institution Archives. Smithsonian Recommended Preservation Formats for Electronic Records. https://siarchives.si.edu/what-we-do/digital-curation/recommended-preservation-formats-electronic-records.
Cornell University Library. File formats for digital content: Probability for full long-term preservation, in Recommended File Formats. https://guides.library.cornell.edu/ecommons/formats
E: Additional Resources
There are numerous other tools and metrics being created by the community to evaluate FAIRness:
- Original principles published by FORCE11: https://www.force11.org/fairprinciples and original article: https://doi.org/10.1038/sdata.2016.18
- The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable.
- FAIR metrics developed by the original research group: https://github.com/FAIRMetrics/Metrics
- FAIR scoring rubric used in the activity
- FAIRsFAIR Data Object Assessment Metrics (v0.4) Contains 17 core metrics to assess the FAIRness of a dataset.
- This work builds on an RDA working group called FAIR Data Maturity Model (RDA) which published the FAIR Data Maturity Model Specification and Guidelines (2020): https://doi.org/10.15497/rda00050
- This group also released a beta tool in 2021 called that F-UJI Automated FAIR Data Assessment Tool which aims to computationally assess FAIRness by entering a DOI or URL for a dataset.
- Australian Research Data Commons provides a self-guided question-based FAIR data assessment tool
D: Additional Resources
Resources referenced in this guide and related to dataset documentation:
- Archivematica: open-source digital preservation system [WWW Document], n.d. URL https://www.archivematica.org/en/ (accessed 1.13.22).
- Atlassian, n.d. Jira | Issue & Project Tracking Software [WWW Document]. Atlassian. URL https://www.atlassian.com/software/jira (accessed 1.13.22).
- Deep Blue Repositories, Univ. of Michigan Library [WWW Document], n.d. URL https://www.lib.umich.edu/collections/deep-blue-repositories(accessed 1.13.22).
- Digital Content Transfer Tools - Digital Preservation (Library of Congress) [WWW Document], n.d. URL https://www.digitalpreservation.gov/series/challenge/data-transfer-tools.html (accessed 1.13.22).
- Guide to writing “README” style metadata | Cornell Research Data Management Service Group [WWW Document], n.d. URL https://data.research.cornell.edu/content/README (accessed 1.13.22).
- Kunze, J.A., 2021. Bagitspec. URL https://github.com/jkunze/bagitspec
- Jones, S., Pryor, G. & Whyte, A. (2013). ‘How to Develop Research Data Management Services - a guide for HEIs’. DCC How-to Guides. Edinburgh: Digital Curation Centre. https://www.dcc.ac.uk/guidance/how-guides/how-develop-rdm-services (see section on data catalogs and repositories.)
- The Dataverse Project - Dataverse.org [WWW Document], n.d. URL https://dataverse.org/home (accessed 1.13.22).
- Packaging Tool | Data Conservancy, n.d. URL https://dataconservancy.org/software/ (accessed 1.13.22).