Dear AnVIL users,


The AnVIL team is excited to announce the release of the following studies below on the AnVIL platform as well as modifications to existing datasets on AnVIL. 


If you are interested in these datasets, please submit data access requests through dbGaP. The datasets are now findable on the AnVIL Data Explorer for cohort building and on the AnVIL Data Library for dataset level search.


For additional resources, please refer to finding and using AnVIL data and https://anvilproject.org/releases.

New data releases

Study Name

phsID

DULs

Release Notes

Submitter blog post

Where to apply for access

Link to dataset on AnVIL

NHGRI GREGoR Consortium: Genomics Research to Elucidate the Genetics of Rare Disease

phs003047.v4.p3

GRU

HMB

This fourth data release contains data from 10,683 participants in 4,366 families including family, pedigree and phenotype information as well as genomic data, such as short-read DNA and RNA sequencing data, are available for the majority of GREGoR participants.In this release, a subset of short-read whole genome sequencing data is uniformly processed by the GREGoR Data Coordinating Center, which is used to generate a Consortium joint callset for small variants (SNVs and Indels).

Link

dbGaP

Explorer

Data Library

Impact of Genomic Variation on Function (IGVF) Consortium

phs003472.v1.p1

HMB,

GRU-PUB, Mouse 

This IGVF consortium data release encompasses both human and mouse single-cell genome and transcriptome studies. The human component comprises 95 single-cell multiome or single-cell RNA sequencing studies featuring 89 biological samples derived from 50 human donors across 4 distinct tissue/organ types. The release includes Illumina single-cell ATAC-seq and RNA-seq data, along with downstream analyses characterizing chromatin states, transcriptome, transcription factor activity, and mitochondrial DNA across diverse biological contexts. The mouse component consists of 309 datasets generated using Illumina and Nanopore single-cell RNA sequencing technologies, representing 16 different tissue and organ types collected from 183 mice spanning 16 distinct strains. The release also provides comprehensive cell type annotations across strains and tissues/organs based on the single-cell RNA sequencing data.

N/A

dbGaP

Explorer

Data Library

OurHealth - Cardiovascular Disease in South Asians

phs003821.v1.p1

GRU

The first release includes self-reported basic demographics, anthropometric traits, cardiometabolic outcomes, and Blended Genome-Exome (BGE) sequencing data on 621 study participants.

Link

dbGaP

Explorer

Data Library

Assessment of Complex Chromosomal Changes in De-Identified Cell Lines

phs004000.v1.p1

GRU

The dataset includes 16 individuals from LCL or fibroblast sources. All samples have Illumina WGS sequence data and 8 samples have PacBio WGS sequencing completed.

Link

dbGaP

Explorer

Data Library

Center for Common Disease Genomics [CCDG] - Cardiovascular: Genetic and Phenotypic Determinants of Blood Pressure and Other Cardiovascular Risk Factors

phs002236.v1.p1

DS-DCCA-NPU-MDS

This release contains 2,150 WES samples and 2,081 array samples.

N/A

dbGaP

Explorer

Data Library

METSIM (METabolic Syndrome In Men) Study


phs000743.v4.p1

(parent)

phs001579.v1.p1

(child)

HMB-IRB

Please refer to the dbGaP study page for more information.

N/A

dbGaP

Explorer

Data Library


Data modifications to existing releases

Study Name

phsID

DULs

Release Notes

NHGRI GREGoR Consortium: Genomics Research to Elucidate the Genetics of Rare Disease

phs003047.v3.p2

GRU

HMB

Errata: Two issues have been identified in this release. Reach out to anvil-data@broadinstitute.org for more information.


AnVIL will be deleting the erroneous files by the second week of December.

Impact of Genomic Variation on Function (IGVF) Consortium

phs003472.v1.p1

HMB-MDS

Errata: Two files in the current dataset were found to have errors and were replaced in this data release. Reach out to anvil-data@broadinstitute.org for more information.

Center for Common Disease Genomics [CCDG] - Neuropsychiatric: Epilepsy: Epi25 Consortium

phs001489.v4.p2

DS-EPSBACID-MDS-RD

Additional data was delivered to AnVIL_CCDG_Broad_NP_Epilepsy_AUSAUS_EP_BA_CN_ID_MDS_GSA-MD that was not included in the original release.

Genomic Answers for Kids

phs002206.v4.p1

DS-PEDD-IRB

The below workspaces and snapshots from version 4 were deleted since this data is now consolidated and duplicated in version 5 of the dataset (links to version 5: Explorer/Data Library).


AnVIL_CMH_GAFK_IsoSeq

AnVIL_CMH_GAFK_GS_long_read

AnVIL_CMH_GAFK_PacBio_methyl_tagged

AnVIL_CMH_GAFK_WGS

AnVIL_CMH_GAFK_ES

AnVIL_CMH_GAFK_IlluminaGSA

AnVIL_CMH_GAFK_GS-linked-read

AnVIL_CMH_GAFK_MGI

AnVIL_CMH_GAFK_10X-Genomics

AnVIL_CMH_GAFK_WGBS

AnVIL_CMH_GAFK_scRNA

AnVIL_CMH_GAFK_SCATAC


Developmental enhancements and bug fixes

Enhancement/Fix

Datasets Affected

Datasets with data submitter provided file inventory tables that did not include a column for file size resulted in there being files present in the Data Explorer with a null file size. This caused issues with indexing and degraded the user experience in the Data Explorer. These datasets have been updated with the proper file sizes in place.

AnVIL_ENCORE_293T ; AnVIL_ENCORE_RS293 (phs003018.v1.p1 NRES)

A small number of datasets included files with a file size of 0 bytes. These are primarily things like empty stderr files from workflows and other low value files of the like. These files were removed from indexing and the Data Explorer, but kept in the original tables for the datasets.

AnVIL_HPRC (NRES) 

AnVIL_T2T (NRES)
AnVIL_CCDG_Broad_CVD_EOCAD_PartnersBiobank_HMB_WES (phs002018 HMB-MDS)

AnVIL_CCDG_Broad_NP_Epilepsy_USAMON_GRU_WES (phs001489.v4.p2 GRU)

AnVIL_GREGoR_R01_GRU (phs003047.v1.p1 GRU)

AnVIL_GTEx_public_data (phs000424 NRES)

AnVIL_MAS_ISO_seq (phs003200.v1.p1 DS-MSC-MDS)

AnVIL_NIMH_Broad_ConvergentNeuro_McCarroll_Eggan_CIRM_GRU_VillageData (phs002032 GRU)

True duplicate records were identified in the file_inventory table (an AnVIL-generated table to support indexing) for the GTEx v10 snapshot. These duplicate records have been removed. 

AnVIL_GTEx_v10_hg38 (phs000424 GRU)

Durable DRS URIs were not properly being leveraged in recently updated snapshots. New snapshots have been issued leveraging durable DRS URIs. Note that this was released as a post-release patch to anvil11 (Q3 2025 release).

AnVIL_ENCORE_293T; AnVIL_ENCORE_RS293 (phs003018.v1.p1 NRES)

AnVIL_MAGE (NRES)



--
M. Kate Balaconis, PhD
Data Sciences Platform
Program Manager
Broad Institute of MIT and Harvard
105 Broadway, Cambridge, MA 02142