Key Points
Genome-wide association studies of somatic GATA1s mutations did not identify strong genetic risk factors for transient leukemia in DS.
South Asian genetic ancestry was positively associated with risk of developing GATA1s mutations in newborns with DS.
Visual Abstract
Myeloid leukemia of Down syndrome (DS) is preceded by a transient neonatal preleukemia driven by somatic mutations in the chromosome X gene GATA1, resulting in a shorter protein isoform (GATA1s). GATA1s mutations occur at high frequency in DS, but beyond trisomy 21, risk factors for this preleukemia are unknown. We investigated whether germline genetic variation influences development of GATA1s mutations in DS. Whole-genome sequencing was performed on 434 children with DS from the Oxford DS Cohort Study previously screened for GATA1s mutations. After quality control, association tests were conducted separately for disomic autosomes, trisomic chromosome 21, and chromosome X. Regression tests were performed for mutation variant allele frequency or the binary trait (103 GATA1s-positive cases, 326 controls), adjusting for sex and ancestry-related principal components. Genetic ancestry of each participant was inferred and tested for association with GATA1s mutations. We identified 3 genome-wide significant (P < 5 × 10−8) loci associated with GATA1s mutations. However, these may be false positives because few linked variants showed evidence of association at each locus. No significant associations were detected on chromosome 21 or the GATA1 region on chromosome X. Increasing proportions of South Asian genetic ancestry were associated with an increased risk of GATA1s mutations, with each 10% increase in ancestry associated with a 1.11-fold higher risk of developing GATA1s mutations (P = .031). Our genetic epidemiology study of somatic GATA1s mutations in DS did not identify strong germ line genetic effects. The association with genetic ancestry may relate to unmeasured genetic or nongenetic effects, such as fetal exposures, and warrants further investigation.
Introduction
Down syndrome (DS), one of the most common chromosomal disorders, occurs in ∼1 in 700 live births in the United States,1 and is caused by constitutive trisomy of chromosome 21. Dysregulation of early hematopoiesis is a hallmark of DS, characterized by increased hematopoietic stem cell (HSC) frequency and abnormal erythro-megakaryopoiesis during fetal development.2 Furthermore, children with DS have a remarkably high risk of developing myeloid leukemia, in particular acute megakaryoblastic leukemia for whom there is a reported 500- to 1500-fold increased risk compared with children without DS.3,4
Myeloid leukemia in children with DS (ML-DS) is preceded by transient abnormal myelopoiesis (TAM), a transient neonatal leukemic condition characterized by increased peripheral blast cells and acquired N-terminal truncating mutations in the erythro-megakaryocytic transcription factor gene GATA1 that result in a shorter protein isoform, GATA1s.5-10 TAM presents clinically in ∼10% of newborns with DS, with features including leukocytosis, splenomegaly, hepatomegaly, and effusions, and can be fatal in severe cases.5-9 An additional 10% to 15% of neonates with DS harbor subclonal GATA1s mutations with low circulating blasts and clinically silent disease (“silent TAM”).11 Of newborns with DS with either clonal or subclonal GATA1s mutations, an estimated 16% to 23% will acquire additional mutations in childhood and develop ML-DS.9,11-13
Trisomy of chromosome 21 has genome-wide effects on DNA methylation and the regulation of gene expression,2,14-19 and is essential for the development of preleukemia in DS.20 However, most newborns with DS do not have detectable GATA1s mutations and will never develop ML-DS, suggesting that additional genetic or environmental factors may influence the acquisition and expansion of GATA1s-mutant clones. To address the question of why only a subset of DS newborns develop transient leukemia, we performed a comprehensive genetic epidemiology study of GATA1s mutations via whole-genome sequencing (WGS) plus targeted GATA1 sequencing of 434 children with DS in the Oxford Down Syndrome Cohort Study (ODSCS). We investigated the association of genome-wide genetic variation, including at the GATA1 gene on chromosome X and on the trisomic chromosome 21, as well as inferred genetic ancestry with the presence and clonal fraction of somatic GATA1s mutations.
Methods
Study participants
The study comprised 434 newborns with DS from the ODSCS, a prospective cohort that enrolled children with DS between 2006 to 2017 from 18 hospitals in the United Kingdom. Each participant was followed-up from birth to 4 years of age. Clinical and hematological data from a related pilot study, the Oxford Imperial DS Cohort Study, including 172 of the ODSCS newborns, were previously published.11 Clinical and hematological data were collected by questionnaire. Race/ethnicity of participants was based on self-reports. Parents gave written informed consent in accordance with the Declaration of Helsinki, and the study was approved by the Thames Valley Research Ethics Committee (06MRE12-10; National Institute for Health and Care Research portfolio no. 6362) and by the institutional review board of the University of Southern California.
Targeted deep sequencing of the GATA1 gene
To identify somatic GATA1s mutations, we performed targeted sequencing in the 434 newborns with DS, as previously described.11,21 In brief, we conducted targeted amplification of GATA1 exons 2 and 3 along with the addition of sample barcodes and sequencing adapters. Sequencing was performed on an Illumina MiSeq as 150-base paired-end reads with a depth goal of >2000 for each base within the regions of interest. We used an in-house pipeline to generate VarScan22 somatic data for mapping and variant analysis, as previously described.11,21 Mutations were determined to lead to the generation of a GATA1 short protein if they lead to either the introduction of a premature stop codon, the loss of the first methionine, or were located in the regions that regulate splicing at the boundaries between exon 2 and intron 2 or intron 2 and exon 3. The variant allele frequency (VAF) of GATA1s mutations was manually verified and compared with in-run controls using the Integrative Genomics Viewer.23 Our minimum limit of detection of mutant GATA1 sequence was a VAF of ∼0.3% (0.003).
Germ line WGS and QC
WGS was performed at the Broad Institute as part of a funded National Institutes of Health INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) and the Gabriella Miller Kids First project. DNA was isolated from peripheral blood samples of 434 newborns with DS and sequenced on an Illumina NovaSeq platform at an average coverage of 38×. Genotype calling was conducted following the Genome Analysis Toolkit Best Practices workflow for germline short variant discovery (single nucleotide variants NVs and indels).24,25 Genotype calling for chromosome 21 was performed separately in the Genome Analysis Toolkit Haplotype caller using triploid mode.26
After obtaining the sequencing data, we performed single-nucleotide polymorphism (SNP)-level and sample-level quality control (QC) steps using VCFtools version 0.1.14 and PLINK version 1.9 and version 2.0. At the sample level, we removed samples with a kinship coefficient of >0.177, which included duplicates or first-degree relatives, and samples with sex discrepancies characterized by X chromosome inbreeding coefficient F, with a value of <0.25 indicating female and >0.8 indicating male. A total of 429 samples remained after QC filtering. At the SNP level, we excluded nonautosomal variants and variants on chromosome 21, multiallelic variants, variants with significant differential missingness between cases and controls at P < .01, variants with significant departure from Hardy-Weinberg equilibrium in controls at P < .0001, and variants with a minor allele frequency (MAF) of <0.05. A total of 6 876 568 variants remained for downstream genome-wide association analysis.
For chromosome 21 analysis, we included 207 313 variants in our association tests. QC of chromosome X variants was conducted separately, as described hereafter.
Statistical analyses
Genome-wide association tests
We performed genome-wide association studies (GWAS) treating the GATA1s mutation phenotype as either a quantitative trait using mutation VAF, or a binary trait with 103 GATA1s-positive cases and 326 GATA1s wild-type controls. Biological sex and 10 principal components (PCs) were adjusted for in the linear regression model with logit-transformed VAF and in the logistic model. An additive allelic effect was assumed for the 3 potential genotypes (ie, 0/0, 0/1, 1/1) on the disomic autosomal chromosomes (excluding chromosome 21). In our main analysis, we included all individuals regardless of race/ethnicity, and subsequently performed a separate analysis limited to self-reported White individuals because this was by far the largest racial/ethnic group (n = 270). We estimated statistical power for the multiancestry GWAS using the Genetic Association Study Power Calculator.27
Chromosome 21 association tests
Variants on chromosome 21 were processed separately, as described earlier. Variants with MAF of <0.01 and multiallelic variants were removed. The trisomic genotype calls 0/0/0, 0/0/1, 0/1/1, and 1/1/1 were coded as 0, 1, 2, and 3, respectively. The association test was then conducted in a similar manner assuming additive allelic effect, and adjusting for biological sex, and 10 PCs in linear/logistic models for the continuous/binary GATA1s trait in both the multiancestry and White-only groups. A chromosome 21–wide P value threshold of 2.41 × 10−7 (0.05/207 313) was used. We focused our analysis on 4 candidate genes, RUNX1, ERG, DYRK1A, and ETS2, which are known to be involved in hematopoiesis and upregulation of GATA1 in DS.28-31
X chromosome association tests
We assessed the association between variants on chromosome X and somatic GATA1s mutations using the chromosome X–wide association study approach, as previously described.32 Briefly, we removed samples who were inferred to be related based on an identical by descent threshold of >0.125, which corresponds to first-degree cousins; and samples with missing rate of >0.1. We removed variants in the pseudo autosomal regions on chromosome X, variants with significant deviation from Hardy-Weinberg equilibrium in females, variants with missingness significantly correlated with binary GATA1s mutation/wild-type status, variants with MAF of <0.05, variants with missing rate of >0.1, and variants with significant MAF and missingness differences between male and female controls. After these QC steps, 333 751 variants remained in our association tests. We focused, in particular, on variation at the GATA1 gene region, at chrX:48.6-49.1Mb, which encompasses 3 previously identified proximal enhancers, 3 putative distal enhancers, and the GATA1 gene itself.33,34
Genetic ancestry analysis
We inferred the genetic ancestry of each participant using RFMix35 and examined potential associations with the GATA1s mutation phenotypes. We used a reference population panel consisting of individuals from the 1000 Genomes Project,36 including individuals for European ancestry (subpopulations: Utah residents with European ancestry; British in England and Scotland; Toscani in Italia; and Iberian in Spain), individuals for African ancestry (subpopulations: Luhya in Kenya; Mende in Sierra Leone; Gambian in Western Division; Yoruba in Ibadan and Nigeria; and Esan in Nigeria), individuals for South Asian ancestry (subpopulations: Sri Lankan Tamil in the United Kingdom; Indian Telugu in the United Kingdom; Bengali in Bangladesh; Gujarati Indian in Houston, TX; and Punjabi in Lahore, Pakistan), and individuals for East Asian ancestry (subpopulations: Kinh in Ho Chi Minh City, Vietnam; Chinese Dai in Xishuangbanna, China; Han Chinese in South; Han Chinese in Beijing, China; and Japanese in Tokyo, Japan). We removed related individuals from the reference samples, and randomly discarded samples until all reference ancestries included the equivalent number of individuals (n = 291, corresponding to the number of European ancestry individuals after removing relatives). We summed the local ancestry estimates across autosomes (excluding chromosome 21) to obtain the global estimate of the proportion of European, African, East Asian, and South Asian ancestries. We tested for association between the estimated ancestry proportions and GATA1s mutations using linear (for VAF) or logistic (for binary) models, adjusting for biological sex, gestational age, and birth weight.
Gene-environment interaction analysis
We conducted an exploratory genome-wide gene-environment interaction analysis to assess whether birth-related variables, specifically gestational age and birth weight, modify the genetic influence on somatic GATA1s mutations. The logit-transformed GATA1s mutation VAF was used as the outcome, and linear regression models were constructed to include both main effects and the interaction terms between genotype and birth-related variables, adjusting for biological sex and 10 PCs. Analyses were performed in all participants regardless of race/ethnicity to maximize statistical power.
Results
Among the 429 newborns with DS in the ODSCS with WGS data and that passed QC, there were 103 GATA1s mutation–positive cases and 326 GATA1 wild-type controls. The VAF of somatic GATA1s mutations ranged from 0% to 89%, and 50 of 103 newborns with GATA1s mutations were diagnosed with clinical TAM (supplemental Tables 1 and 2). Of the 429 newborns, 213 were designated male at birth, and most were self-reported to be White (n = 270), followed by 59 South Asians, 48 Black, 11 Arabic, 4 East Asian, 1 Latino, and 36 mixed or unknown race/ethnicity (Figure 1; supplemental Table 2).
Principal component analysis (PCA) plot of multiancestry participants in the ODSCS. PCA plots were generated for study participants along with individuals from the 1000 Genomes Project reference populations (gray). Color labels correspond to self-reported race/ethnicity of study participants. PCA plots were generated separately for newborns with GATA1s mutations (left, n = 103) and GATA1 wild-type controls (right, n = 326). PCs were calculated using PLINK, and plots were generated using R.
Principal component analysis (PCA) plot of multiancestry participants in the ODSCS. PCA plots were generated for study participants along with individuals from the 1000 Genomes Project reference populations (gray). Color labels correspond to self-reported race/ethnicity of study participants. PCA plots were generated separately for newborns with GATA1s mutations (left, n = 103) and GATA1 wild-type controls (right, n = 326). PCs were calculated using PLINK, and plots were generated using R.
GWAS of somatic GATA1s mutations in newborns with DS
We first conducted a multiancestry GWAS in the full set of 429 newborns to identify common autosomal variants associated with GATA1s mutation VAF, excluding the trisomic genotypes on chromosome 21. We identified 1 SNP, rs115118904, that achieved genome-wide significance (P < 5 × 10−8) in the overall study, located at chromosome 2p16.3 (Figure 2A; Table 1; supplemental Figure 1). The variant rs115118904 (β = 3.26; P = 3.22 × 10−8) is located within the gene LHCGR, which encodes luteinizing hormone receptor and choriogonadotropin receptor.37 We also identified a region with suggestive evidence of association with GATA1s mutation VAF at chromosome 15q21.3 (Figure 2A; supplemental Figure 1). The lead SNP of this region is rs2584234 (β = 1.64; P = 2.32 × 10−7), which is in strong linkage disequilibrium with reported GWAS loci associated with lung function,38-40 but of unclear relevance to TAM/ML-DS.
GWAS results for GATA1s mutation VAF in the ODSCS. Manhattan plot (A) and quantile-quantile plot (B) of GWAS results for GATA1s mutation VAF in the full multiancestry cohort (N = 429; top). Manhattan plot (C) and quantile-quantile plot (D) of GWAS results for GATA1s mutation VAF in self-reported White participants only (n = 270; bottom). Red horizontal line indicates genome-wide significance threshold (P = 5 × 10−8). Blue horizontal line indicates suggestive significance threshold (P = 1 × 10−5).
GWAS results for GATA1s mutation VAF in the ODSCS. Manhattan plot (A) and quantile-quantile plot (B) of GWAS results for GATA1s mutation VAF in the full multiancestry cohort (N = 429; top). Manhattan plot (C) and quantile-quantile plot (D) of GWAS results for GATA1s mutation VAF in self-reported White participants only (n = 270; bottom). Red horizontal line indicates genome-wide significance threshold (P = 5 × 10−8). Blue horizontal line indicates suggestive significance threshold (P = 1 × 10−5).
Genome-wide significant loci associated with GATA1s mutation VAF in newborns with DS
SNP . | Chr: position . | Gene∗ . | Ref . | Alt . | MAF† . | Multiancestry (N = 429), GATA1s mutation VAF . | Multiancestry (N = 429), GATA1s mutation status (yes/no) . | White-only (n = 270), GATA1s mutation VAF . | White-only (n = 270), GATA1s mutation status (yes/no) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
β‡ (95% CI) . | P value . | OR (95% CI) . | P value . | β‡ (95% CI) . | P . | OR (95% CI) . | P value . | ||||||
Significant hit (P < 5 × 10−8) from multiancestry analysis | |||||||||||||
rs115118904 | 2: 48 707 683 | LHCGR | T | C | 0.056/0.057 | 3.26 (2.13-4.40) | 3.22 × 10–8 | 4.69 (2.50-8.76) | 1.26 × 10−6 | 2.96 (1.59-4.32) | 3.14 × 10–5 | 3.93 (1.83-8.43) | 4.40 × 10−4 |
Significant hits (P < 5 × 10−8) from White-only analysis | |||||||||||||
rs66519478 | 1: 218 903 556 | LYPLAL1 | G | A | 0.17/0.13 | 1.82 (1.06-2.57) | 3.39 × 10−6 | 2.68 (1.76-4.09) | 4.77 × 10−6 | 2.85 (1.90-3.80) | 1.25 × 10−8 | 4.57 (2.54-8.24) | 4.19 × 10−7 |
rs7528768 | 1: 218 905 645 | LYPLAL1 | C | T | 0.17/0.13 | 1.74 (0.99-2.49) | 7.89 × 10−6 | 2.58 (1.69-3.93) | 1.01 × 10−5 | 2.85 (1.90-3.80) | 1.25 × 10−8 | 4.57 (2.53-8.24) | 4.19 × 10−7 |
rs1972029 | 1: 218 909 006 | LYPLAL1 | G | A | 0.17/0.13 | 1.68 (0.91-2.44) | 2.11 × 10−5 | 2.48 (1.62-3.79) | 2.76 × 10−5 | 2.77 (1.80-3.72) | 3.84 × 10−8 | 4.26 (2.38-7.63) | 1.09 × 10−6 |
rs12647008 | 4: 56 189 001 | CRACD | A | T | 0.10/0.041 | 1.21 (0.14-2.26) | .025 | 2.03 (1.14-3.64) | .017 | 5.83 (3.96-7.69) | 3.28 × 10−9 | 24.00 (5.90-97.6) | 8.94 × 10−6 |
SNP . | Chr: position . | Gene∗ . | Ref . | Alt . | MAF† . | Multiancestry (N = 429), GATA1s mutation VAF . | Multiancestry (N = 429), GATA1s mutation status (yes/no) . | White-only (n = 270), GATA1s mutation VAF . | White-only (n = 270), GATA1s mutation status (yes/no) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
β‡ (95% CI) . | P value . | OR (95% CI) . | P value . | β‡ (95% CI) . | P . | OR (95% CI) . | P value . | ||||||
Significant hit (P < 5 × 10−8) from multiancestry analysis | |||||||||||||
rs115118904 | 2: 48 707 683 | LHCGR | T | C | 0.056/0.057 | 3.26 (2.13-4.40) | 3.22 × 10–8 | 4.69 (2.50-8.76) | 1.26 × 10−6 | 2.96 (1.59-4.32) | 3.14 × 10–5 | 3.93 (1.83-8.43) | 4.40 × 10−4 |
Significant hits (P < 5 × 10−8) from White-only analysis | |||||||||||||
rs66519478 | 1: 218 903 556 | LYPLAL1 | G | A | 0.17/0.13 | 1.82 (1.06-2.57) | 3.39 × 10−6 | 2.68 (1.76-4.09) | 4.77 × 10−6 | 2.85 (1.90-3.80) | 1.25 × 10−8 | 4.57 (2.54-8.24) | 4.19 × 10−7 |
rs7528768 | 1: 218 905 645 | LYPLAL1 | C | T | 0.17/0.13 | 1.74 (0.99-2.49) | 7.89 × 10−6 | 2.58 (1.69-3.93) | 1.01 × 10−5 | 2.85 (1.90-3.80) | 1.25 × 10−8 | 4.57 (2.53-8.24) | 4.19 × 10−7 |
rs1972029 | 1: 218 909 006 | LYPLAL1 | G | A | 0.17/0.13 | 1.68 (0.91-2.44) | 2.11 × 10−5 | 2.48 (1.62-3.79) | 2.76 × 10−5 | 2.77 (1.80-3.72) | 3.84 × 10−8 | 4.26 (2.38-7.63) | 1.09 × 10−6 |
rs12647008 | 4: 56 189 001 | CRACD | A | T | 0.10/0.041 | 1.21 (0.14-2.26) | .025 | 2.03 (1.14-3.64) | .017 | 5.83 (3.96-7.69) | 3.28 × 10−9 | 24.00 (5.90-97.6) | 8.94 × 10−6 |
Genome-wide significant loci (P < 5 × 10−8) highlighted in bold.
95% CI, 95% confidence interval; Alt, alternative allele; Chr, chromosome; MAF, minor allele frequency in ODSCS; OR, odds ratio; Ref, reference allele.
Corresponds to the nearest gene.
Multiancestry/White-only.
β estimate corresponds to each 1-unit increase in the logit-transformed VAF.
Next, we restricted our GWAS analysis to the 270 self-reported White participants and identified 4 additional genome-wide significant SNP associations at 1q41 and 4p12 (Figure 2C; Table 1; supplemental Figure 1). Variants rs66519478 (β = 2.85; P = 1.25 × 10−8), rs7528768 (β = 2.85; P = 1.25 × 10−8), and rs1972029 (β = 2.77; P = 3.84 × 10−8) are located adjacent to LYPLAL1, which may be involved in protein depalmitoylation,41 whereas variant rs12647008 (β = 5.83; P = 3.28 × 10−9) is located within CRACD, which is involved in F-actin polymerization.42 Variants in LYPLAL1 showed suggestive evidence in the multiancestry GWAS with P values < 5.0 × 10−5, whereas the CRACD variant association appears specific to White individuals (Table 1).
In addition to analyzing GATA1s mutation VAF, we conducted GWAS considering the GATA1s mutation phenotype as a binary trait. Although we found no genome-wide significant SNP associations (supplemental Figure 2), the 5 significant variants for GATA1s VAF showed evidence of association with presence/absence of GATA1s mutation, albeit not reaching genome-wide significance (Table 1). In both the continuous and binary analyses, QQ plots and genomic inflation factors indicated agreement between the expected and observed P value distributions (Figure 2B,D), supporting that our analyses were robust to potential confounding because of population stratification.
Association tests on the trisomic chromosome 21
We conducted separate association tests for variants on chromosome 21 because of the trisomic genotypes. No individual variant achieved genome-wide significance nor chromosome 21–wide significance (supplemental Figure 3). The variant with the smallest P value, rs73907214 (βWhite-only = 2.00; PWhite-only = 1.22 × 10−6), was located in the region 21q21.2, and is located adjacent to the genes SLX9, ADARB1, and ITGB2. We did not identify any statistically significant associations in leukemia-associated candidate genes including RUNX1, ERG, ETS2, or DYRK1A.
Chromosome X–wide association study
Separate analyses were also conducted for variants on chromosome X, in particular focusing on the GATA1 gene itself and nearby enhancer regions located at Xp11.23. We did not identify statistically significant associations in the GATA1 gene region at a reduced P value threshold of .05/646 = 7.74 × 10−5, adjusting for the number of independent SNPs in this region (n = 646), or any variants outside of the GATA1 region that achieved genome-wide significance (supplemental Figure 4).
Genetic ancestry associations with GATA1s mutations
We estimated genetic ancestry proportions for each individual in our multiancestry cohort and tested the association with somatic GATA1s mutations (supplemental Table 3; supplemental Figures 5 and 6). We found that the estimated proportion of global South Asian ancestry was significantly positively associated with GATA1s mutation status and VAF, with each 10% increase in South Asian ancestry associated with a 1.11-fold increased risk of developing a GATA1 mutation and an ∼1.21-fold increase in mutation VAF (Table 2). Interestingly, within the South Asian ancestry group, genetic ancestry related to the subpopulations of Indian Telugu in the United Kingdom; Bengali in Bangladesh; and Punjabi in Lahore, Pakistan, were significantly positively associated with GATA1s mutation status and VAF (Table 2).
Association between inferred genetic ancestry and GATA1s mutations in newborns with DS
Ancestry superpopulations . | GATA1s mutation VAF . | GATA1s mutation status (yes/no) . | ||
---|---|---|---|---|
OR (95% CI)∗ . | P value . | OR (95% CI)∗ . | P value . | |
AFR | 0.906 (0.770-1.065) | .23 | 0.928 (0.834-1.032) | .17 |
EAS | 0.817 (0.409-1.629) | .56 | 0.863 (0.521-1.430) | .58 |
EUR | 0.968 (0.854-1.098) | .61 | 0.985 (0.917-1.059) | .69 |
SAS | 1.211 (1.018-1.441) | .032 | 1.107 (1.010-1.214) | .031 |
South Asian ancestry subpopulations | OR (95% CI)† | P value | OR (95% CI)† | P value |
STU | 1.291 (0.986-1.690) | .064 | 1.147 (0.995-1.323) | .059 |
ITU | 2.098 (1.166-3.774) | .014 | 1.517 (1.106-2.081) | .0097 |
BEB | 1.377 (1.037-1.827) | .027 | 1.185 (1.021-1.377) | .026 |
GIH | 1.215 (0.999-1.477) | .052 | 1.114 (1.004-1.235) | .041 |
PJL | 1.380 (1.077-1.768) | .011 | 1.193 (1.046-1.361) | .0087 |
Ancestry superpopulations . | GATA1s mutation VAF . | GATA1s mutation status (yes/no) . | ||
---|---|---|---|---|
OR (95% CI)∗ . | P value . | OR (95% CI)∗ . | P value . | |
AFR | 0.906 (0.770-1.065) | .23 | 0.928 (0.834-1.032) | .17 |
EAS | 0.817 (0.409-1.629) | .56 | 0.863 (0.521-1.430) | .58 |
EUR | 0.968 (0.854-1.098) | .61 | 0.985 (0.917-1.059) | .69 |
SAS | 1.211 (1.018-1.441) | .032 | 1.107 (1.010-1.214) | .031 |
South Asian ancestry subpopulations | OR (95% CI)† | P value | OR (95% CI)† | P value |
STU | 1.291 (0.986-1.690) | .064 | 1.147 (0.995-1.323) | .059 |
ITU | 2.098 (1.166-3.774) | .014 | 1.517 (1.106-2.081) | .0097 |
BEB | 1.377 (1.037-1.827) | .027 | 1.185 (1.021-1.377) | .026 |
GIH | 1.215 (0.999-1.477) | .052 | 1.114 (1.004-1.235) | .041 |
PJL | 1.380 (1.077-1.768) | .011 | 1.193 (1.046-1.361) | .0087 |
Significant associations (P < .05) are highlighted in bold.
Ancestry superpopulations: AFR, African; EAS, East Asian; EUR, European; SAS, South Asian.
South Asian subpopulations: BEB, Bengali in Bangladesh; GIH, Gujarati Indian in Houston, TX; ITU, Indian Telugu in the United Kingdom; PJL, Punjabi in Lahore, Pakistan; STU, Sri Lankan Tamil in the United Kingdom.
OR, odds ratio.
P values for GATA1s mutation VAF or mutation status calculated by linear or logistic regression tests, adjusting for sex, gestational age, and birth weight.
OR and 95% CIs calculated for each 10% increase in ancestry proportions.
OR and 95% CIs calculated for each 1% increase in ancestry proportions.
Gene-environment interaction analysis
We conducted genome-wide gene-environment analysis to explore whether gestational age or birth weight may modify the genetic influence on somatic GATA1s mutations. We identified 1 region on chromosome 18q12.3 that demonstrated statistically significant interaction with gestational age (supplemental Figure 7). The lead SNP at this region, rs11082436, exhibited an interaction coefficient of 2.15 (P = 7.07 × 10−9), indicating that for each 1 week increase in gestational age, the influence of the SNP on logit-transformed GATA1s mutation VAF increased by 2.15 units. We did not identify any significant interactions with birth weight (supplemental Figure 7).
Discussion
The risk of developing leukemia is markedly higher in children with DS compared with children without DS.3 Although trisomy 21 increases the risk, still only a small proportion of children with DS will develop leukemia, suggesting that additional modifying risk factors may exist. In this study, we aimed to understand whether germ line genetic variation contributes to the risk of developing GATA1s mutations, the first step toward ML-DS. We conducted association studies of genome-wide genetic variants, including separate analyses on the trisomic chromosome 21 and on chromosome X, for somatic GATA1s mutations in newborns with DS in the ODSCS. Altogether, we did not find evidence of a strong contribution of heritable genetic variation to the development of TAM but did identify an intriguing association with South Asian ancestry in our multiancestry cohort.
Although our GWAS identified 3 genome-wide significant loci across the multiancestry and White-only analyses, the lack of convincing Manhattan peaks and variants in LD displaying significance suggests that these may be spurious associations. Although our relatively small sample size may have prohibited the discovery of genetic variants with small to moderate effects on GATA1s mutation development, our study was well powered (>80%) to discover variants with odds ratios in the range of 2.7 to 2.8 for SNPs with risk allele frequencies of 20% to 30%. These are relatively large effect sizes but are consistent with those reported for variants that have previously been associated with acute lymphoblastic leukemia in individuals with DS and in the euploid population.43,44 The lack of detection of strong genetic effects, either for variants associated with the presence (binary) or with the clonal frequency of GATA1s mutations, suggests that the variation in GATA1s mutations seen in newborns with DS may largely be driven by a combination of trisomy 21 and stochastic processes. The effects of trisomy 21 on hematopoiesis, including a bias of HSCs toward the erythroid and megakaryocytic lineages,2,18,19 have been associated with GATA1 overexpression as well as greater gene body and promoter region chromatin accessibility,19 which may increase the likelihood of GATA1s mutations developing.45 We acknowledge that our targeted sequencing approach, with a sensitivity of ∼0.3% VAF, may potentially have missed variants with very low clonal frequency. Although this could have led to some misclassification of case-control status in our binary trait analysis, it is unlikely to have affected results for our GWAS of GATA1s mutation VAF. Targeted sequencing at an even greater depth of coverage is required to better understand the prevalence of acquired GATA1s mutations among newborns with DS.
Our observation of a significant association between higher proportions of inferred South Asian genetic ancestry and GATA1s mutations may point to the presence of genetic risk factors that were not detectable in this study because of small sample size, which also limited our ability to perform population-stratified analyses. Genetic ancestry, however, may also serve as a proxy for unmeasured environmental effects,46,47 and it is possible that South Asian ancestry may correlate with in utero exposures that could influence the development of GATA1s mutations. Potential exposures of relevance include gestational diabetes and maternal obesity, which are known to be more prevalent in South Asian women48 and both of which have been associated with chronic fetal hypoxia.49 We note that observations in individuals with DS and in the euploid population converge on a potential role of hypoxia and oxidative stress in the etiology of GATA1s mutations. Biomarkers of oxidative stress and hypoxia have been found to be higher across the lifespan in individuals with DS,50,51 and, recently, oxidative stress was shown to be increased in trisomy 21 fetal liver HSCs.19 Furthermore, the spectrum of GATA1s mutations has been linked to signatures of oxidative stress.52 It is also intriguing to note that overexpression of GATA1 has been reported in euploid individuals who experience hypoxia at high altitudes,53 and in cells undergoing hypoxic conditions.54 Taken together, exogenous factors that influence hypoxic conditions and oxidative stress during fetal development may affect the development of GATA1s mutations in DS. Our exploratory gene-environment interaction analysis suggests that certain early-life or birth-related variables, such as gestational age, may interact with genetic variation to influence risk of GATA1s mutations. Further investigation into the role of both genetic and nongenetic factors is warranted.
GATA1s mutations in newborns with DS have been studied in various regions globally,55-58 yet data on their prevalence across populations and in relation to environmental exposures remain scarce. Comprehensive, cross-population epidemiological studies are warranted to elucidate the prevalence of GATA1s mutations in newborns with DS with varying genetic ancestries and to identify the in utero exposures that may underlie this variation.
Acknowledgments
The authors thank the families who participated in this study. The authors acknowledge the University of Southern California’s Center for Advanced Research Computing for providing computing resources (https://www.carc.usc.edu/).
Data used in this study were generated with support from a National Institutes of Health, INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE grant X01HD107380, and a Blood Cancer UK program grant (Bloodwise 13001 [P.V. and I.R.]. N.E. was supported by Cancer Research UK grant DRCPGM∖100058. A.J.d.S. is a scholar of the Leukemia & Lymphoma Society.
The content and conclusions from this work do not reflect the official views of the sponsors.
Authorship
Contribution: A.J.d.S., I.R., and P.V. conceived and designed this study; Y.L. and N.E. analyzed the data; N.E. and P.L. performed experiments; Y.L and A.J.d.S. prepared the manuscript; and all authors edited and approved the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Adam J. de Smith, University of Southern California, USC Norris Research Tower, NRT-1509H, Biggy St, Los Angeles, CA 90033; email: desmith@usc.edu.
References
Author notes
The whole-genome sequencing data are available at the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/; accession number phs002982.v1.p1) and at the National Institutes of Health INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) data coordinating center (DCC).
The full-text version of this article contains a data supplement.