Dear AnVIL users,


The AnVIL team is excited to announce the release of the following studies below on the AnVIL platform. If you are interested in these datasets, please submit data access requests through dbGaP. The datasets are now findable on the AnVIL Data Explorer for cohort building and on the AnVIL Data Library for dataset level search.


For additional resources for how to find datasets on AnVIL, please refer to the AnVIL Data Explorer Guide or DUOS Data Library user guide.


Study Name

phsID

DULs

Release Notes

Submitter blog post

Where to apply for access

Link to dataset on AnVIL

Common Fund (CF) Genotype-Tissue Expression Project (GTEx)

phs000424.v10.p2

GRU

Sample and subject annotation files were added.

Link

dbGaP

Data Workspace

NHGRI GREGoR Consortium: Genomics Research to Elucidate the Genetics of Rare Disease

phs003047.v3.p2

GRU

HMB

The new data release consists of 8,840 participants and more than 3,000 families. Included in the release are short-read whole exomes and genomes, long-read whole genomes, and RNA-seq files.

Link

dbGaP

Explorer

Data Library

Impact of Genomic Variation on Function (IGVF) Consortium

phs003472.v1.p1

HMB-MDS

This first release contains data from seven participants and includes assay data such as single-cell ATAC-seq, single-cell RNA sequencing, and SHARE-seq data.

N/A

dbGaP

Explorer

Data Library

Genomic Answers for Kids

phs002206.v5.p1

DS-PEDD-IRB

The new data release includes over 2,000 long-read genome sequences and 12,000 short-read genome and exome analyses, nearly 400 snapshots of patient transcriptomes and epigenomes in individual cells using single-cell RNA (scRNA) and sc open chromatin (scATAC), over 3,000 bulk whole genome bisulphite genome sequences for methylome interpretation, and over 200 functional assessments in available patient tissues using full length cDNA sequences by IsoSeq (PacBio) methodology. This release also consolidates data from release 4 and 5 into a single dataset for exporting.

Link

dbGaP

Explorer

Data Library

Center for Common Disease Genomics [CCDG] - Neuropsychiatric: Epilepsy: Epi25 Consortium

phs001489.v4.p2

32 consent codes. For a full list, please see dbGaP study page.

This new data release includes whole genome genotype data on over 30,000 Epi25 participants, generated using Illumina’s Infinium GSA-MD v1 platform. Additionally, detailed clinical phenotypes related to epilepsy diagnosis are now available for both the GSA data as well as the whole exome sequencing (WES) data previously released in v3.

Link


dbGaP

Explorer

Data Library

CARD Consortium: North American Brain Expression Consortium

phs001300.v5.p1 (parent)

phs003181.v2.p1 (child)

GRU

This new release includes 206 samples with haplotype-resolved assemblies, structural and small variant calls, as well as methylation calls for neurologically ‘normal’ prefrontal cortex ( and cortex ) brain tissue samples.

Link

dbGaP

Explorer

Data Library

CARD Consortium: Gene Expression in Postmortem DLPFC and Hippocampus from Schizophrenia and Mood Disorders

phs000979.v4.p2

GRU

This new release includes 155 samples with haplotype-resolved assemblies, structural and small variant calls, as well as methylation calls for neurologically ‘normal’ prefrontal cortex ( and cortex ) brain tissue samples.

Link

dbGaP

Explorer

Data Library

PAGE: The Charles Bronfman Institute for Personalized Medicine (IPM) BioMe Biobank

phs000925.v1.p1

GRU

Please see dbGaP for more information.

N/A

dbGaP

Explorer

Data Library

PAGE: Multi-Ethnic Cohort Study

phs000220.v2.p2

GRU

Please see dbGaP for more information.

N/A

dbGaP

Explorer

Data Library

PAGE: Global Reference Panel

phs001033.v1.p1

GRU

Please see dbGaP for more information.

N/A

dbGaP

Explorer

Data Library


In addition to the data released above, the following are developmental enhancements made to the AnVIL data:

  • Inconsistencies in snapshot naming conventions that were causing issues with indexing for the AnVIL Data Explorer have been resolved. 

  • MD5s for file metadata are now consistently encoded to Base64 to be consistent with what is provided in the GCS metadata.

  • Values that were causing issues importing data from DUOS or the AnVIL Data Explorer into Workspaces have been corrected.

  • The presence of double-pipes that was causing issues with indexing certain datasets for the AnVIL Data Explorer has been resolved.


Thank you,
The AnVIL Team
--
M. Kate Balaconis, PhD
Data Sciences Platform
Program Manager
Broad Institute of MIT and Harvard
105 Broadway, Cambridge, MA 02142