C Step: Check the Data Deposit

Learning Outcomes

Curators will be able to:
  1. Perform curation actions such as conducting a file inventory and opening the files.
  2. Check the submission for completeness based on a predefined criteria.
  3. Develop preliminary recommendations to be used for the “Understand” step.

Terms to know

Items that have been submitted by the depositor.

A package that contains data that will be stored within a digital archive.

A package created from the Archival Information Package (AIP) to distribute digital content to users.

The list of files in the submission information package (SIP).

The act of structuring files in a hierarchical way to ensure findability.

A file that is usually a text file (.txt) or a rich text format file (.rtf) or markdown (.md) that gives information about the creators of the data, where the data was created, methods used to produce the data, sharing privileges, and so on.

Data about data. Metadata can include the author, file size, the date the document was created and keywords to describe the document.

Summary of the Check Step

The check step is the first step of the CURATED process. In this step, we take an inventory of the contents that have been submitted by the depositor, known in the Open Archival Information System (OAIS) model as the submission information package (SIP). The SIP will become an archival information package (AIP)through the process of curation and a dissemination information package (DIP) through its retrieval by a user. Examples of the contents for SIPs may include: data files, code files, supporting documents, and metadata. At this step we are inventorying what has been submitted and noting our initial thoughts. We’ll examine the content more closely in the “Understand” step. However, to prepare for the next step, we can start opening or downloading software that will allow us to examine submission components and obtaining any resources we’ll need to help with the next step.

Common things to look for during this step are the record level metadata, file inventory, file organization, the README file, and whether the file(s) can open or not:

Check for:

Questions to ask:

  • Is the submission complete based on any predefined criteria for your repository? An example of predefined criteria is the Dryad repository guidelines .
Record Level Metadata
  • Does the description have sufficient detail?
  • Are all required fields filled out? (e.g., title, author/creator, licensing)
File inventory
  • What files are included?
  • Are files missing?
File organization
  • Are there unwanted spaces or special characters?
  • Is there a file naming convention?
  • Order and description (if many files)
  • File hierarchy in terms you can clearly see the relationships in the naming of the files on the top level and the files below them in terms of how they are named (ex. School_data as a top level folder and school_data_ny as a folder in the school_data folder referring to school data from New York) .
Brief file diagnostic
  • (software) Does this open?
  • (code) Does this run?
  • What version?

C Step Actions

   1. Check data files.
   2. Verify all metadata provided by the author and review the available documentation.

C Step Checklist

Key Ethical Considerations

  • Review participant agreement and data use agreements; examine potential impacts of sharing this data. Consider:
    • Individuals and communities represented
    • Representativeness of diverse human populations
    • Protection or endangerment status of species
    • Geographic locations (e.g., contested boundaries, historical and current political situations)
    • Animal research ethics and approval
  • Is it possible that the data deposit may impact a specific group?
  • Does this data deposit follow compliance and institutional policy?


Materials Needed
 1. Data deposit.

In this activity, using the checklist below, you will perform the Check on the data deposit. Once you have completed this activity, feel free to run the C step on another dataset of your choosing.

C Step Number C Step Yes/No/NA
C1 Files open as expected?        
C2 Code runs as expected?        
C3 Metadata has all required fields filled out such as the title, author, and licensing information.        
C4 Is there any documentation?        
C5 Are there human participant data present?        
C6 Do the file names have unwanted spaces or characters?        
C7 Is there a file hierarchy in place?        
C8 Is there a file naming convention?        

Additional Resources

View the Data Curation Tools List. to see tools that can open a variety of data types.