Data and Resource Sharing Plan
See PDF here: Longevity Consortium Data and Resource Sharing Plan (Award U19AG023122)
DATA AND RESOURCE SHARING PLAN
1. Resource Sharing. The LC will adopt the NIA’s policies on biospecimen sharing. Each project narrative includes information on the biospecimens to be used as well as the constructs to be generated from them that will ultimately be made available to the community via NIA policies. The LC will follow the legacy LC biospecimen sharing plan (BSP) with oversight described in section 6 below. However, because of the uniformity in the BSP, much of the material below focuses on data sharing.
2. Data Sharing Overview. The LC will partner with the NIA-subsidized Data Management and Coordination Center (DMCC) headed by SAGE Bionetworks, to enable data sharing. LC generated data will be shared publicly on the Exceptional Longevity Translational Resources (ELITE) portal overseen DMCC. Each proposed project has more detailed information about the specific data types that project will generate or exploit (see Data Management and Sharing sections), along with the description of any unique meta-data or other project-specific information associated with the data types (in section ‘VI,’ which is common to all project and core narratives, and in the independent 2-page data sharing document associated with each project). The LC research will have three data sources, each following different guidelines for deposits and access: 1. internally generated data, which will strictly follow NIA and DMCC guidelines; 2. extant data sets from repositories with established procedures and protocols for data sharing (e.g., the UK Biobank (UKB), AD Knowledge Portal, dbGAP), which will strictly follow the rules, policies, and procedures associated with those repositories; and 3. data from foreign institutions, which will strictly follow the terms and conditions associated with those institutions and/or in the subcontracts to collaborative researchers. There is a close working relationship between the LC, DMCC (e.g., NJS, PS, NR, SM, JL, DE are funded by the DMCC grant and/or on various DMCC committees) and Dr. Beth Wilmot, who is the liaison between the NIA, DMCC and other consortia funded by the NIA.
3. Data Types
3.a. Types and amount of scientific data expected to be generated in the project: Data types to be generated during the LC renewal funding period will be highly diverse (e.g., cell-line based functional assay results and various omic characterizations of mice and humans, harmonized human cohorts, etc.). These proposed project-specific data types are discussed in more detail in the relevant individual project and core narratives. The investigators will also make use of various extant data sets available through public (e.g., dbGAP) and highly secure and not easily accessible channels , as described in the narratives associated with the various projects. The investigators will abide by all the rules, regulations, and expectations about the use of those data sets as set forth by the institutions overseeing the repositories harboring those data sets. All permission-related documents and certifications for accessing these extant data sets will be available for review from each relevant investigator and institution. Finally, LC researchers will also exploit foreign data sets, including the UKB, and will abide by all policies in place for accessing those datasets. For the Young Finns Study (YFS) and Danish Health Registry (DHR), the LC will support data analysts employed by the institutions overseeing those data sets to carry out relevant analyses. The legacy LC data sets generated during the previous funding cycle have either been deposited into the ELITE portal or in the process.
3.b. Scientific data that will be generated directly by the LC to be preserved and shared. The primary vehicle for sharing data will be the ELITE component of AD Knowledge Portal (ADKP), a designated domain-specific repository that shares human and model system data from more than 50 NIA funded research grants focused on aging and dementia research. The DMCC interacts directly with ADKP and, in conjunction with the NIA via Dr. Beth Wilmot, will be the conduit between the LC investigators generating data and AD Knowledge Portal.
3.c. Metadata, other relevant data, and associated documentation: The ELITE Portal collects a predefined set of unstructured and structured metadata from all data contributors. The LC will provide methodological details for how data was generated and processed, a minimum set of phenotypic and clinical variables, and metadata annotations to describe the contents of all data files and tables.
4. Related Tools, Software and/or Code. The LC will produce a website, software, protocols, quality control (QC) workflows, data processing scripts and wrappers, presentations, and scientific publications. The LC will have links to this information and a chart reflecting progress on the proposed projects, including biospecimens, vetted assay protocols, software, presentations, and publications. Links to the DMCC and related information will be provided on Confluence. All other products will also be deposited or linked to the ELITE portal.
5. Standards. The LC will utilize established data dictionaries and standards that are being developed for the ELITE Portal when depositing data into the ELITE portal. These data dictionaries will be modified based on those in use with the broader NIA-supported AD Knowledge Portal. These standards are derived from commonly used biomedical data ontologies (e.g., the NCI Thesaurus) and include definitions and source URIs to allow interoperability and term mapping with other data repositories. LC internal standards, to be described in the meta-data accompanying data deposits in the ELITE portal, are discussed in detail in the different project and core narratives, include the use of replicates, spike-in instrument calibration, and reagent standards.
6. Data Preservation, Access, and Associated Timelines
6.a. Repository where scientific data and metadata will be archived. All data and metadata will be deposited in the ELITE Portal. The ELITE Portal shares human and model system data from all NIA-funded Exceptional Longevity projects, including the LC, the LLFS, and the ILO project. The ELITE Portal is built on the Synapse data sharing platform, an NIH-Designated Generalist Repository which adheres to FAIR-TLC principles (Findable, Accessible, Interoperable, and Reusable- Traceable, Licensed, and Connected). The ELITE Portal enables data sharing under a tiered system with controlled-access commensurate to the sensitivity of the data.
6.b. How scientific data will be findable and identifiable. The ELITE Portal is publicly accessible and all study descriptions, protocols, and file annotations can be viewed by anyone on the web. Annotations can be used to programmatically query the data and provide data summaries, which permit discoverability of project-specific data in conjunction with data from other projects, studies, and cohorts. The ELITE Portal and the underlying data store, Synapse, have the capacity to assign data Digital Object Identifiers (DOI) registered with DataCite and maintain a complete version history of all digital assets. We will reference versioned DOIs in any publications to enable external researchers to easily obtain the exact set of files and programs that were used to generate results in those manuscripts. There will be no limitations on the access or reuse of our data. All human samples will be anonymized and identified only by arbitrary ID number.
6.c. When and how long the scientific data will be made available. LC internally generated data will be made available through the ELITE Portal as soon as possible, but no later than 3 months after data validation and quality control (QC) is completed. A timeline for the process will be determined and overseen by Jennifer Dougherty and Drs. Schork and Grike. There will be no publication embargo once data are deposited. The ELITE Portal will also serve as the long-term archive for the data set for as long as it is useful.
7. Access, Distribution, or Reuse Considerations
7.a. Factors affecting subsequent access, distribution, or reuse of scientific data. Data uploaded to the ELITE Portal are classified into one of four data access tiers depending on the protections needed to address factors such as: informed consent requirements; applicable laws, regulations, and policies, including tribal laws and regulations; existing contracts or agreements, and/or other privacy requirements. Aggregate human data will be available to any registered Synapse user while individual-level human data in any form (raw, processed, derived) may be accessed under controlled access (see 5B). Individual-level data won’t be redistributed outside the ELITE Portal as part of controlled access restrictions.
7.b. Whether access to scientific data will be controlled. Download of data from the ELITE Portal requires creation of a free user account and prohibits any redistribution or attempts to re-identify research participants. Investigators wishing to access controlled-access data will need to submit a Data Use Request (DUR), which must be renewed each year. In general, DURs include the execution of a DU Certificate and an Intended DU statement (IDU) describing the proposed research use. Access requests are reviewed by a dedicated Access and Compliance Team at Sage Bionetworks to verify that all requirements are met prior to granting access.
7.c. Protections for privacy, rights, and confidentiality of human research participants. Neither the data uploaded to the ELITE Portal, nor the metadata will disclose the identity of research participants. All data will be coded/de-identified in such a way that the identity of the research participant/data subject cannot be readily ascertained, for example via Safe Harbor or Expert de-identification methods, or that constitutes a Limited Data Sets pursuant to the provisions defined by the Health Insurance Portability and Accountability Act of 1996 (as amended). In addition, human data will be protected by the NIH Certificate of Confidentiality.
8. Oversight of Data Management and Sharing: Ensuring that data are deposited in a timely manner by Project and Core Leads will be managed by consistent and routine communications between LC leadership, the DMCC, the LC OSMB, and the NIA. The oversight of the BSP will follow the same 3 protocols, as described below. The LC, DMCC and NIA (via Beth Wilmot) are in constant communication and cross-funded, allowing ample opportunity for exposure of delayed or problematic data deposits. These communications will lead to plans for situation-specific preventive strategies and remedies that can be executed with escalating sophistication if necessary. Each Project and Core Lead has designated individuals to be responsible for working with the DMCC to ensure timely and reliable deposits. Each project will also designate personnel to handle all biospecimen sharing requests as well. Any and all questions/concerns about deposit dates, timelines, and general expectations will be taken up by the LC leadership, the OSMB, the DMCC and the NIA program officials.