E Step: Evaluate the Overall Data Package


Learning Outcomes


Curators will be able to:
  1. Evaluate the results of the curation process.
  2. Assess the impact/value of data curation by considering the relationships between the depositor/repository/curator.
  3. Assess a dataset using measures of FAIRness.



Terms to know




The FAIR Principles were developed by a set of diverse stakeholders that outline how scientific data should be shared. They stand for Findable, Accessible, Interoperable, and Reusable.



Summary of the Evaluate Step

In this step you will evaluate the overall data package to determine if data curation by the repository adds value to the data sharing process and that the resulting data package is findable, accessible, interoperable, reusable or FAIR*.

*Read more about FAIR: https://www.force11.org/fairprinciples


What do we mean by Evaluate?

Curation is a partnership between:

  • the curator and the researcher
  • the researcher and the repository system
  • the curator and the repository system


The diagram below shows the relationship and key considerations between the curator, researcher, and repository platform and how we work together to make data more FAIR.


1: Researcher / Curator Relationship

  • Was communication with the researcher successful? Did they make/accept the recommended modifications to the dataset?
  • Did the expertise of the curator allow you to effectively work with the researcher’s data?
  • Did the researcher value the curation process?

2: Researcher / Repository Relationship

  • Do the features of the system/platform facilitate making the data FAIR (i.e., minting PIDs, assigning licenses, structured metadata, etc.)?
  • Is the technology well supported and maintained?
  • What standards and best practices does the repository follow? (i.e., digital preservation, etc.)

3: Curator / Repository Relationship

  • As a depositor/user, do I trust this repository?
  • Will this repository ensure my data are FAIR? How?
  • Is there transparency with what actions will be taken with my data?

FAIR Data

  • An important end goal of data sharing and data curation is data that are:
    • Findable: To be findable (F) or discoverable, data and metadata should be richly described to enable attribute-based search
      • (meta)data are assigned a globally unique and eternally persistent identifier
      • data are described with rich metadata
      • (meta)data are registered or indexed in a searchable resource
      • metadata specify the data identifier
    • Accessible: To be broadly accessible (A), data and metadata should be retrievable in a variety of formats that are sensible to humans and machines using persistent identifiers
      • (meta)data are retrievable by their identifier using a standardized communications protocol
      • the protocol is open, free, and universally implementable
      • the protocol allows for an authentication and authorization procedure, where necessary
      • metadata are accessible, even when the data are no longer available
    • Interoperable: To be interoperable (I), the description of metadata elements should follow community guidelines that use an open, well defined vocabulary.
      • (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
      • (meta)data use vocabularies that follow FAIR principles
      • (meta)data include qualified references to other (meta)data
    • Reusable: To be reusable (R), the description of essential, recommended, and optional metadata elements should be machine processable and verifiable, use should be easy and data should be citable to sustain data sharing and recognize the value of data.
      • (meta)data have a plurality of accurate and relevant attributes
      • (meta)data are released with a clear and accessible data usage license
      • (meta)data are associated with their provenance
      • (meta)data meet domain-relevant community standards

    Source: https://www.force11.org/fairprinciples

     E Step Actions

    As we consider each stakeholder connection, ask yourself:

    • Did curation result in stronger relationships with the depositor? (Researcher / Curator)
      • One way to assess this is to survey authors that deposit data to your repository and ask them about their experience. Read this case study...
    • Researcher / Curator Case Study

      Case Study: How satisfied are depositors with our curation services? We asked them!

      Members of the Data Curation Network, representing academic and non-profit data repositories, wanted to better understand how satisfied depositors were with the data curation services their data received from curation staff during the data sharing process (deposit, ingest, appraisal, curation, and publication).

      In spring 2021, we surveyed 568 researchers who had recently deposited data into one of 6 data repositories and asked them to consider their most recent data curation experience. Our 11-question survey received a 42% response rate with 239 valid responses. Of these:

      • 87% strongly agreed that they were satisfied with their curation experience
      • 75% reported that due to the curation process, changes were made to their data. For the remainder who said no changes were made, almost all said it was because no changes were needed
      • 81% strongly agreed that Data curation by their repository added value to the data sharing process
      • 98% said they would recommend this repository to a colleague
      • The most value-add action cited by many researchers was simply having a curator take time and review the dataset, as one respondent sums up: “Feedback from someone who comes to the data/documents with fresh eyes is simply invaluable…”

      Download a copy of the survey instrument to use for your repository! Citation: Wright, Sara; Johnston, Lisa; Marsolek, Wanda; Luong, Hoa; Braxton, Susan; Lafferty-Hess, Sophia; Herndon, Joel; Carlson, Jake. (2021). Data Curation Network End User Survey 2021. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/DZQP-KS53.

    • Is our repository trustworthy for end users? (Curator / Repository & Researcher / Repository): Communicating the trustworthiness of your repository can be multi-faceted, as data stewards we are caring for valuable resources; however, some repository processes may not be readily apparent to end users. There have been a number of resources, principles, and process that can help us understand how to build trust including:
      • TRUST Principles: The TRUST Principles are a collaboratively developed set of principles to demonstrate the trustworthiness of digital repository and stand for Transparency, Responsibility, User focus, Sustainability, and Technology.
      • CoreTrustSeal Certification: The CoreTrustSeal is a peer-reviewed self-assessment that demonstrates a repository’s ability to meet 16 core criteria in areas of sustainability, organizational structure, preservation, and security.
      • CARE Principles for Indigenous Data Governance: The CARE Principles are people and purpose-oriented and complement the FAIR principles providing a framework for ethically working with Indigenous People’s data, they stand for Collective Benefit, Authority to Control, Responsibility, and Ethics.
      • Force11 and COPE Research Data Publishing Ethics: These publication workflows were developed to help repositories more consistently apply high ethical standards when issues or areas of concern related to authorship, rigor, legal and regulatory restrictions, and risk to human subjects, communities or society.
    • Did this curation result in reuse of the data? (FAIR data): We may also consider when evaluating the success of our curation quantitative or qualitative metrics for reuse (i.e., citations, download statistics). However, doing this in practice can be tricky. Some interesting research in this space you might want to explore include:
      • Hemphill, L., Pienta, A., Lafia, S., Akmon, D., & Bleckley, D. (2021). How do properties of data, their curation, and their funding relate to reuse? https://doi.org/10.1002/asi.24646
      • Faniel, I. M., Frank, R. D., & Yakel, E. (2019). Context from the data reuser’s point of view. Journal of Documentation, 75(6), 1274–1297. https://doi.org/10.1108/JD-08-2018-0133
      • Federer, L. (2020). Measuring and Mapping Data Reuse: Findings From an Interactive Workshop on Data Citation and Metrics for Data Reuse. Harvard Data Science Review, 2(2). https://doi.org/10.1162/99608f92.ccd17b00
      • Make Data Count: Make Data Count is a global, community-led initiative focused on the development of open research data assessment metrics. The principles of our social and technical infrastructure are rooted in transparency and accessibility.
    • Did curation result in FAIR data? (FAIR data)

    Activity: Evaluate for FAIRNess

    Materials Needed
    For this activity you will assess a dataset for FAIRness and then recommend ways to increase the FAIRness.

    Directions

      1. Please identify a dataset to use for this activity. Options:
            a. Our example dataset (final version in the repository)
            b. A dataset in your data repository
            c. One from another repository (e.g., FigShare, ICPSR, etc.)
      2. Use the curator checklist above to assess the dataset for key FAIR features.
      3. Determine suggestions for potentially improving the FAIRness of the selected dataset.


    E Step Checklist


    Findable -

      
      
      

    Accessible -

      
      

    Interoperable -

      
      

    Reusable -

      
      
      
      

    Key Ethical Considerations

    • Final review--remember it is not too late to surface any ethical concerns.
    • Verify the words/language being used are not racist/harmful.
    • Remind the submitter of their responsibility, if they choose to ignore requests for de-identification or similar concerns.

    Additional Resources

    There are numerous other tools and metrics being created by the community to evaluate FAIRness: