As much as we agree on “data is the new oil”, if we think about it closely, it is the only oil that keeps the world running! So, in a way, the data itself is not as “new” as we portray it is, however, our approach towards data is new, and therefore, with every bit of data being generated, comes a new regulation! And why not, after all over 400 million terabytes of data are created each day [1]. The global data creation is expected to grow over 390 zettabytes by 2028 [2].
Data is the new oil – Clive Humby
As research, businesses, and technologies increasingly rely on data-driven insights, ensuring that data is well-organized and reusable is a fundamental challenge. And one of the most noticeable steps towards combating this challenge is the introduction of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. The FAIR principles provide a structured framework to improve data management, making it easier for humans and machines to locate, access, and use data effectively.
There are numerous and diverse stakeholders who visibly benefit from overcoming these obstacles: researchers wanting to share, get credit, and reuse each other’s data and interpretations; professional data publishers offering their services; software and tool-builders providing data analysis and processing services such as reusable workflows; funding agencies (private and public) increasingly concerned with long-term data stewardship; and a data science community mining, integrating and analysing new and existing data to advance discovery.
In this article, we will dive into some basic aspects of FAIR principles.
Key objectives of the FAIR Principles
The FAIR principles were formally introduced in 2016 through a publication in Scientific Data by Wilkinson et al. [3]. These guidelines emerged from a 2014 workshop in Leiden, Netherlands, where experts convened to address challenges in data sharing and reuse [4].
The overarching goal of the FAIR principles is to maximize the utility of data by promoting practices that support long-term data preservation and reuse. By adhering to these principles, researchers can enhance the transparency, efficiency, and impact of their work.
Let’s understand each pillar of F.A.I.R principles that lays its foundation [5].
- Findability: Data and metadata should be easy to locate for both humans and machines. This involves assigning globally unique and persistent identifiers, providing rich metadata, and registering or indexing data in searchable resources.
- Accessibility: Once found, data should be retrievable using standardized protocols. This includes ensuring that metadata remains accessible even if the data itself is no longer available.
- Interoperability: Data should be compatible with other datasets and tools. This requires using standardized vocabularies and formats, facilitating integration with other data sources.
- Reusability: Data should be well-described and documented to allow for replication and further use. This entails clear licensing, detailed provenance information, and adherence to community standards.
A detailed description of all these pillars of the FAIR data principles is provided by the GO FAIR Initiative.
Best practices for making your data FAIR
Organizations and policymakers play a pivotal role in the widespread adoption of FAIR principles. By establishing clear guidelines, providing necessary infrastructure, and promoting education and training, they can support researchers in implementing FAIR practices. Collaborative efforts among funding agencies, research institutions, and governmental bodies are essential to create an ecosystem where FAIR data management is the norm, thereby enhancing the quality and impact of scientific research.
The following are key best practices commonly emphasized by these stakeholders to support the FAIRification of data [6][7][8].
- Begin early: Incorporating FAIR considerations at the beginning of a project ensures that data is managed thoughtfully from the beginning, making it easier to organize, document, and share later. Early planning allows researchers to choose appropriate metadata standards, file formats, and repositories, reducing the risk of data loss or inaccessibility. It also promotes transparency, reproducibility, and long-term usability, aligning the project with best practices in open science and maximizing the impact of the research.
- Utilize standardized metadata: Using domain-specific metadata standards ensures that data is described in a consistent, structured way that others in the same field can easily understand and use. These standards improve discoverability by enabling search engines and repositories to accurately index datasets, making them easier to find. They also support interoperability by allowing different systems, tools, and researchers to seamlessly integrate and interpret data without miscommunication or loss of meaning.
- Apply clear licensing: Assigning explicit usage licenses (e.g., Creative Commons) to datasets facilitates legal reuse and sharing, which is crucial to the Reusability aspect of the FAIR principles. A clear license tells others exactly how they can use, modify, or share your data, removing legal ambiguity and encouraging responsible reuse. Without a license, even well-documented and accessible data may go unused due to uncertainty about permissions. By defining usage rights up front, you increase the likelihood that your data will be cited, built upon, and integrated into future research, maximizing its value and impact.
- Engage data stewards: Collaborating with data stewards is essential as they possess specialized knowledge in data governance, quality, and lifecycle management, which ensures that data is accurate, consistent, and usable. Their expertise is particularly valuable in implementing the FAIR principles as they understand the technical and organizational requirements needed to make data compliant with these standards. By working with data stewards, teams can avoid common pitfalls, enhance data transparency, and ensure long-term value and reusability of their datasets.
FAIR principles have significantly influenced research data management since their introduction. Looking ahead, the focus is on enhancing machine-actionability, semantic interoperability, and integrating FAIR practices across various disciplines. Initiatives like FAIR 2.0 aim to address semantic interoperability challenges, ensuring that data and metadata are not only accessible but also meaningful across different systems and contexts [9]. Additionally, the development of FAIR Digital Objects (FDOs) seeks to standardize data representation, facilitating seamless data exchange and reuse globally [10].
Embracing the FAIR principles is a collective journey that requires commitment from individual researchers, institutions, and policymakers alike. By prioritizing data that is findable, accessible, interoperable, and reusable, the scientific community can foster greater collaboration, transparency, and innovation. As we continue to refine and implement FAIR practices, we move towards a more open and efficient research environment that benefits all stakeholders.
Further readings:
• https://www.go-fair.org/fair-principles/
• https://www.go-fair.org/fair-principles/fairification-process/
• https://www.surf.nl/files/2019-04/surf-fair-data-advanced-use-cases-final.pdf