• Genome-wide association studies of somatic GATA1s mutations did not identify strong genetic risk factors for transient leukemia in DS.

  • South Asian genetic ancestry was positively associated with risk of developing GATA1s mutations in newborns with DS.

Abstract

Myeloid leukemia of Down syndrome (DS) is preceded by a transient neonatal preleukemia driven by somatic mutations in the chromosome X gene GATA1, resulting in a shorter protein isoform (GATA1s). GATA1s mutations occur at high frequency in DS, but beyond trisomy 21, risk factors for this preleukemia are unknown. We investigated whether germline genetic variation influences development of GATA1s mutations in DS. Whole-genome sequencing was performed on 434 children with DS from the Oxford DS Cohort Study previously screened for GATA1s mutations. After quality control, association tests were conducted separately for disomic autosomes, trisomic chromosome 21, and chromosome X. Regression tests were performed for mutation variant allele frequency or the binary trait (103 GATA1s-positive cases, 326 controls), adjusting for sex and ancestry-related principal components. Genetic ancestry of each participant was inferred and tested for association with GATA1s mutations. We identified 3 genome-wide significant (P < 5 × 10−8) loci associated with GATA1s mutations. However, these may be false positives because few linked variants showed evidence of association at each locus. No significant associations were detected on chromosome 21 or the GATA1 region on chromosome X. Increasing proportions of South Asian genetic ancestry were associated with an increased risk of GATA1s mutations, with each 10% increase in ancestry associated with a 1.11-fold higher risk of developing GATA1s mutations (P = .031). Our genetic epidemiology study of somatic GATA1s mutations in DS did not identify strong germ line genetic effects. The association with genetic ancestry may relate to unmeasured genetic or nongenetic effects, such as fetal exposures, and warrants further investigation.

Down syndrome (DS), one of the most common chromosomal disorders, occurs in ∼1 in 700 live births in the United States,1 and is caused by constitutive trisomy of chromosome 21. Dysregulation of early hematopoiesis is a hallmark of DS, characterized by increased hematopoietic stem cell (HSC) frequency and abnormal erythro-megakaryopoiesis during fetal development.2 Furthermore, children with DS have a remarkably high risk of developing myeloid leukemia, in particular acute megakaryoblastic leukemia for whom there is a reported 500- to 1500-fold increased risk compared with children without DS.3,4 

Myeloid leukemia in children with DS (ML-DS) is preceded by transient abnormal myelopoiesis (TAM), a transient neonatal leukemic condition characterized by increased peripheral blast cells and acquired N-terminal truncating mutations in the erythro-megakaryocytic transcription factor gene GATA1 that result in a shorter protein isoform, GATA1s.5-10 TAM presents clinically in ∼10% of newborns with DS, with features including leukocytosis, splenomegaly, hepatomegaly, and effusions, and can be fatal in severe cases.5-9 An additional 10% to 15% of neonates with DS harbor subclonal GATA1s mutations with low circulating blasts and clinically silent disease (“silent TAM”).11 Of newborns with DS with either clonal or subclonal GATA1s mutations, an estimated 16% to 23% will acquire additional mutations in childhood and develop ML-DS.9,11-13 

Trisomy of chromosome 21 has genome-wide effects on DNA methylation and the regulation of gene expression,2,14-19 and is essential for the development of preleukemia in DS.20 However, most newborns with DS do not have detectable GATA1s mutations and will never develop ML-DS, suggesting that additional genetic or environmental factors may influence the acquisition and expansion of GATA1s-mutant clones. To address the question of why only a subset of DS newborns develop transient leukemia, we performed a comprehensive genetic epidemiology study of GATA1s mutations via whole-genome sequencing (WGS) plus targeted GATA1 sequencing of 434 children with DS in the Oxford Down Syndrome Cohort Study (ODSCS). We investigated the association of genome-wide genetic variation, including at the GATA1 gene on chromosome X and on the trisomic chromosome 21, as well as inferred genetic ancestry with the presence and clonal fraction of somatic GATA1s mutations.

Study participants

The study comprised 434 newborns with DS from the ODSCS, a prospective cohort that enrolled children with DS between 2006 to 2017 from 18 hospitals in the United Kingdom. Each participant was followed-up from birth to 4 years of age. Clinical and hematological data from a related pilot study, the Oxford Imperial DS Cohort Study, including 172 of the ODSCS newborns, were previously published.11 Clinical and hematological data were collected by questionnaire. Race/ethnicity of participants was based on self-reports. Parents gave written informed consent in accordance with the Declaration of Helsinki, and the study was approved by the Thames Valley Research Ethics Committee (06MRE12-10; National Institute for Health and Care Research portfolio no. 6362) and by the institutional review board of the University of Southern California.

Targeted deep sequencing of the GATA1 gene

To identify somatic GATA1s mutations, we performed targeted sequencing in the 434 newborns with DS, as previously described.11,21 In brief, we conducted targeted amplification of GATA1 exons 2 and 3 along with the addition of sample barcodes and sequencing adapters. Sequencing was performed on an Illumina MiSeq as 150-base paired-end reads with a depth goal of >2000 for each base within the regions of interest. We used an in-house pipeline to generate VarScan22 somatic data for mapping and variant analysis, as previously described.11,21 Mutations were determined to lead to the generation of a GATA1 short protein if they lead to either the introduction of a premature stop codon, the loss of the first methionine, or were located in the regions that regulate splicing at the boundaries between exon 2 and intron 2 or intron 2 and exon 3. The variant allele frequency (VAF) of GATA1s mutations was manually verified and compared with in-run controls using the Integrative Genomics Viewer.23 Our minimum limit of detection of mutant GATA1 sequence was a VAF of ∼0.3% (0.003).

Germ line WGS and QC

WGS was performed at the Broad Institute as part of a funded National Institutes of Health INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) and the Gabriella Miller Kids First project. DNA was isolated from peripheral blood samples of 434 newborns with DS and sequenced on an Illumina NovaSeq platform at an average coverage of 38×. Genotype calling was conducted following the Genome Analysis Toolkit Best Practices workflow for germline short variant discovery (single nucleotide variants NVs and indels).24,25 Genotype calling for chromosome 21 was performed separately in the Genome Analysis Toolkit Haplotype caller using triploid mode.26 

After obtaining the sequencing data, we performed single-nucleotide polymorphism (SNP)-level and sample-level quality control (QC) steps using VCFtools version 0.1.14 and PLINK version 1.9 and version 2.0. At the sample level, we removed samples with a kinship coefficient of >0.177, which included duplicates or first-degree relatives, and samples with sex discrepancies characterized by X chromosome inbreeding coefficient F, with a value of <0.25 indicating female and >0.8 indicating male. A total of 429 samples remained after QC filtering. At the SNP level, we excluded nonautosomal variants and variants on chromosome 21, multiallelic variants, variants with significant differential missingness between cases and controls at P < .01, variants with significant departure from Hardy-Weinberg equilibrium in controls at P < .0001, and variants with a minor allele frequency (MAF) of <0.05. A total of 6 876 568 variants remained for downstream genome-wide association analysis.

For chromosome 21 analysis, we included 207 313 variants in our association tests. QC of chromosome X variants was conducted separately, as described hereafter.

Statistical analyses

Genome-wide association tests

We performed genome-wide association studies (GWAS) treating the GATA1s mutation phenotype as either a quantitative trait using mutation VAF, or a binary trait with 103 GATA1s-positive cases and 326 GATA1s wild-type controls. Biological sex and 10 principal components (PCs) were adjusted for in the linear regression model with logit-transformed VAF and in the logistic model. An additive allelic effect was assumed for the 3 potential genotypes (ie, 0/0, 0/1, 1/1) on the disomic autosomal chromosomes (excluding chromosome 21). In our main analysis, we included all individuals regardless of race/ethnicity, and subsequently performed a separate analysis limited to self-reported White individuals because this was by far the largest racial/ethnic group (n = 270). We estimated statistical power for the multiancestry GWAS using the Genetic Association Study Power Calculator.27 

Chromosome 21 association tests

Variants on chromosome 21 were processed separately, as described earlier. Variants with MAF of <0.01 and multiallelic variants were removed. The trisomic genotype calls 0/0/0, 0/0/1, 0/1/1, and 1/1/1 were coded as 0, 1, 2, and 3, respectively. The association test was then conducted in a similar manner assuming additive allelic effect, and adjusting for biological sex, and 10 PCs in linear/logistic models for the continuous/binary GATA1s trait in both the multiancestry and White-only groups. A chromosome 21–wide P value threshold of 2.41 × 10−7 (0.05/207 313) was used. We focused our analysis on 4 candidate genes, RUNX1, ERG, DYRK1A, and ETS2, which are known to be involved in hematopoiesis and upregulation of GATA1 in DS.28-31 

X chromosome association tests

We assessed the association between variants on chromosome X and somatic GATA1s mutations using the chromosome X–wide association study approach, as previously described.32 Briefly, we removed samples who were inferred to be related based on an identical by descent threshold of >0.125, which corresponds to first-degree cousins; and samples with missing rate of >0.1. We removed variants in the pseudo autosomal regions on chromosome X, variants with significant deviation from Hardy-Weinberg equilibrium in females, variants with missingness significantly correlated with binary GATA1s mutation/wild-type status, variants with MAF of <0.05, variants with missing rate of >0.1, and variants with significant MAF and missingness differences between male and female controls. After these QC steps, 333 751 variants remained in our association tests. We focused, in particular, on variation at the GATA1 gene region, at chrX:48.6-49.1Mb, which encompasses 3 previously identified proximal enhancers, 3 putative distal enhancers, and the GATA1 gene itself.33,34 

Genetic ancestry analysis

We inferred the genetic ancestry of each participant using RFMix35 and examined potential associations with the GATA1s mutation phenotypes. We used a reference population panel consisting of individuals from the 1000 Genomes Project,36 including individuals for European ancestry (subpopulations: Utah residents with European ancestry; British in England and Scotland; Toscani in Italia; and Iberian in Spain), individuals for African ancestry (subpopulations: Luhya in Kenya; Mende in Sierra Leone; Gambian in Western Division; Yoruba in Ibadan and Nigeria; and Esan in Nigeria), individuals for South Asian ancestry (subpopulations: Sri Lankan Tamil in the United Kingdom; Indian Telugu in the United Kingdom; Bengali in Bangladesh; Gujarati Indian in Houston, TX; and Punjabi in Lahore, Pakistan), and individuals for East Asian ancestry (subpopulations: Kinh in Ho Chi Minh City, Vietnam; Chinese Dai in Xishuangbanna, China; Han Chinese in South; Han Chinese in Beijing, China; and Japanese in Tokyo, Japan). We removed related individuals from the reference samples, and randomly discarded samples until all reference ancestries included the equivalent number of individuals (n = 291, corresponding to the number of European ancestry individuals after removing relatives). We summed the local ancestry estimates across autosomes (excluding chromosome 21) to obtain the global estimate of the proportion of European, African, East Asian, and South Asian ancestries. We tested for association between the estimated ancestry proportions and GATA1s mutations using linear (for VAF) or logistic (for binary) models, adjusting for biological sex, gestational age, and birth weight.

Gene-environment interaction analysis

We conducted an exploratory genome-wide gene-environment interaction analysis to assess whether birth-related variables, specifically gestational age and birth weight, modify the genetic influence on somatic GATA1s mutations. The logit-transformed GATA1s mutation VAF was used as the outcome, and linear regression models were constructed to include both main effects and the interaction terms between genotype and birth-related variables, adjusting for biological sex and 10 PCs. Analyses were performed in all participants regardless of race/ethnicity to maximize statistical power.

Among the 429 newborns with DS in the ODSCS with WGS data and that passed QC, there were 103 GATA1s mutation–positive cases and 326 GATA1 wild-type controls. The VAF of somatic GATA1s mutations ranged from 0% to 89%, and 50 of 103 newborns with GATA1s mutations were diagnosed with clinical TAM (supplemental Tables 1 and 2). Of the 429 newborns, 213 were designated male at birth, and most were self-reported to be White (n = 270), followed by 59 South Asians, 48 Black, 11 Arabic, 4 East Asian, 1 Latino, and 36 mixed or unknown race/ethnicity (Figure 1; supplemental Table 2).

Figure 1.

Principal component analysis (PCA) plot of multiancestry participants in the ODSCS. PCA plots were generated for study participants along with individuals from the 1000 Genomes Project reference populations (gray). Color labels correspond to self-reported race/ethnicity of study participants. PCA plots were generated separately for newborns with GATA1s mutations (left, n = 103) and GATA1 wild-type controls (right, n = 326). PCs were calculated using PLINK, and plots were generated using R.

Figure 1.

Principal component analysis (PCA) plot of multiancestry participants in the ODSCS. PCA plots were generated for study participants along with individuals from the 1000 Genomes Project reference populations (gray). Color labels correspond to self-reported race/ethnicity of study participants. PCA plots were generated separately for newborns with GATA1s mutations (left, n = 103) and GATA1 wild-type controls (right, n = 326). PCs were calculated using PLINK, and plots were generated using R.

Close modal

GWAS of somatic GATA1s mutations in newborns with DS

We first conducted a multiancestry GWAS in the full set of 429 newborns to identify common autosomal variants associated with GATA1s mutation VAF, excluding the trisomic genotypes on chromosome 21. We identified 1 SNP, rs115118904, that achieved genome-wide significance (P < 5 × 10−8) in the overall study, located at chromosome 2p16.3 (Figure 2A; Table 1; supplemental Figure 1). The variant rs115118904 (β = 3.26; P = 3.22 × 10−8) is located within the gene LHCGR, which encodes luteinizing hormone receptor and choriogonadotropin receptor.37 We also identified a region with suggestive evidence of association with GATA1s mutation VAF at chromosome 15q21.3 (Figure 2A; supplemental Figure 1). The lead SNP of this region is rs2584234 (β = 1.64; P = 2.32 × 10−7), which is in strong linkage disequilibrium with reported GWAS loci associated with lung function,38-40 but of unclear relevance to TAM/ML-DS.

Figure 2.

GWAS results for GATA1s mutation VAF in the ODSCS. Manhattan plot (A) and quantile-quantile plot (B) of GWAS results for GATA1s mutation VAF in the full multiancestry cohort (N = 429; top). Manhattan plot (C) and quantile-quantile plot (D) of GWAS results for GATA1s mutation VAF in self-reported White participants only (n = 270; bottom). Red horizontal line indicates genome-wide significance threshold (P = 5 × 10−8). Blue horizontal line indicates suggestive significance threshold (P = 1 × 10−5).

Figure 2.

GWAS results for GATA1s mutation VAF in the ODSCS. Manhattan plot (A) and quantile-quantile plot (B) of GWAS results for GATA1s mutation VAF in the full multiancestry cohort (N = 429; top). Manhattan plot (C) and quantile-quantile plot (D) of GWAS results for GATA1s mutation VAF in self-reported White participants only (n = 270; bottom). Red horizontal line indicates genome-wide significance threshold (P = 5 × 10−8). Blue horizontal line indicates suggestive significance threshold (P = 1 × 10−5).

Close modal
Table 1.

Genome-wide significant loci associated with GATA1s mutation VAF in newborns with DS

SNPChr: positionGene RefAltMAF Multiancestry (N = 429), GATA1s mutation VAFMultiancestry (N = 429), GATA1s mutation status (yes/no)White-only (n = 270), GATA1s mutation VAFWhite-only (n = 270), GATA1s mutation status (yes/no)
β (95% CI)P valueOR (95% CI)P valueβ (95% CI)POR (95% CI)P value
Significant hit (P < 5 × 10−8) from multiancestry analysis 
rs115118904 2: 48 707 683 LHCGR 0.056/0.057 3.26 (2.13-4.40) 3.22 × 10–8 4.69 (2.50-8.76) 1.26 × 10−6 2.96 (1.59-4.32) 3.14 × 10–5 3.93 (1.83-8.43) 4.40 × 10−4 
Significant hits (P < 5 × 108) from White-only analysis 
rs66519478 1: 218 903 556 LYPLAL1 0.17/0.13 1.82 (1.06-2.57) 3.39 × 10−6 2.68 (1.76-4.09) 4.77 × 10−6 2.85 (1.90-3.80) 1.25 × 108 4.57 (2.54-8.24) 4.19 × 10−7 
rs7528768 1: 218 905 645 LYPLAL1 0.17/0.13 1.74 (0.99-2.49) 7.89 × 10−6 2.58 (1.69-3.93) 1.01 × 10−5 2.85 (1.90-3.80) 1.25 × 108 4.57 (2.53-8.24) 4.19 × 10−7 
rs1972029 1: 218 909 006 LYPLAL1 0.17/0.13 1.68 (0.91-2.44) 2.11 × 10−5 2.48 (1.62-3.79) 2.76 × 10−5 2.77 (1.80-3.72) 3.84 × 108 4.26 (2.38-7.63) 1.09 × 10−6 
rs12647008 4: 56 189 001 CRACD 0.10/0.041 1.21 (0.14-2.26) .025 2.03 (1.14-3.64) .017 5.83 (3.96-7.69) 3.28 × 109 24.00 (5.90-97.6) 8.94 × 10−6 
SNPChr: positionGene RefAltMAF Multiancestry (N = 429), GATA1s mutation VAFMultiancestry (N = 429), GATA1s mutation status (yes/no)White-only (n = 270), GATA1s mutation VAFWhite-only (n = 270), GATA1s mutation status (yes/no)
β (95% CI)P valueOR (95% CI)P valueβ (95% CI)POR (95% CI)P value
Significant hit (P < 5 × 10−8) from multiancestry analysis 
rs115118904 2: 48 707 683 LHCGR 0.056/0.057 3.26 (2.13-4.40) 3.22 × 10–8 4.69 (2.50-8.76) 1.26 × 10−6 2.96 (1.59-4.32) 3.14 × 10–5 3.93 (1.83-8.43) 4.40 × 10−4 
Significant hits (P < 5 × 108) from White-only analysis 
rs66519478 1: 218 903 556 LYPLAL1 0.17/0.13 1.82 (1.06-2.57) 3.39 × 10−6 2.68 (1.76-4.09) 4.77 × 10−6 2.85 (1.90-3.80) 1.25 × 108 4.57 (2.54-8.24) 4.19 × 10−7 
rs7528768 1: 218 905 645 LYPLAL1 0.17/0.13 1.74 (0.99-2.49) 7.89 × 10−6 2.58 (1.69-3.93) 1.01 × 10−5 2.85 (1.90-3.80) 1.25 × 108 4.57 (2.53-8.24) 4.19 × 10−7 
rs1972029 1: 218 909 006 LYPLAL1 0.17/0.13 1.68 (0.91-2.44) 2.11 × 10−5 2.48 (1.62-3.79) 2.76 × 10−5 2.77 (1.80-3.72) 3.84 × 108 4.26 (2.38-7.63) 1.09 × 10−6 
rs12647008 4: 56 189 001 CRACD 0.10/0.041 1.21 (0.14-2.26) .025 2.03 (1.14-3.64) .017 5.83 (3.96-7.69) 3.28 × 109 24.00 (5.90-97.6) 8.94 × 10−6 

Genome-wide significant loci (P < 5 × 10−8) highlighted in bold.

95% CI, 95% confidence interval; Alt, alternative allele; Chr, chromosome; MAF, minor allele frequency in ODSCS; OR, odds ratio; Ref, reference allele.

Corresponds to the nearest gene.

Multiancestry/White-only.

β estimate corresponds to each 1-unit increase in the logit-transformed VAF.

Next, we restricted our GWAS analysis to the 270 self-reported White participants and identified 4 additional genome-wide significant SNP associations at 1q41 and 4p12 (Figure 2C; Table 1; supplemental Figure 1). Variants rs66519478 (β = 2.85; P = 1.25 × 10−8), rs7528768 (β = 2.85; P = 1.25 × 10−8), and rs1972029 (β = 2.77; P = 3.84 × 10−8) are located adjacent to LYPLAL1, which may be involved in protein depalmitoylation,41 whereas variant rs12647008 (β = 5.83; P = 3.28 × 10−9) is located within CRACD, which is involved in F-actin polymerization.42 Variants in LYPLAL1 showed suggestive evidence in the multiancestry GWAS with P values < 5.0 × 10−5, whereas the CRACD variant association appears specific to White individuals (Table 1).

In addition to analyzing GATA1s mutation VAF, we conducted GWAS considering the GATA1s mutation phenotype as a binary trait. Although we found no genome-wide significant SNP associations (supplemental Figure 2), the 5 significant variants for GATA1s VAF showed evidence of association with presence/absence of GATA1s mutation, albeit not reaching genome-wide significance (Table 1). In both the continuous and binary analyses, QQ plots and genomic inflation factors indicated agreement between the expected and observed P value distributions (Figure 2B,D), supporting that our analyses were robust to potential confounding because of population stratification.

Association tests on the trisomic chromosome 21

We conducted separate association tests for variants on chromosome 21 because of the trisomic genotypes. No individual variant achieved genome-wide significance nor chromosome 21–wide significance (supplemental Figure 3). The variant with the smallest P value, rs73907214 (βWhite-only = 2.00; PWhite-only = 1.22 × 10−6), was located in the region 21q21.2, and is located adjacent to the genes SLX9, ADARB1, and ITGB2. We did not identify any statistically significant associations in leukemia-associated candidate genes including RUNX1, ERG, ETS2, or DYRK1A.

Chromosome X–wide association study

Separate analyses were also conducted for variants on chromosome X, in particular focusing on the GATA1 gene itself and nearby enhancer regions located at Xp11.23. We did not identify statistically significant associations in the GATA1 gene region at a reduced P value threshold of .05/646 = 7.74 × 10−5, adjusting for the number of independent SNPs in this region (n = 646), or any variants outside of the GATA1 region that achieved genome-wide significance (supplemental Figure 4).

Genetic ancestry associations with GATA1s mutations

We estimated genetic ancestry proportions for each individual in our multiancestry cohort and tested the association with somatic GATA1s mutations (supplemental Table 3; supplemental Figures 5 and 6). We found that the estimated proportion of global South Asian ancestry was significantly positively associated with GATA1s mutation status and VAF, with each 10% increase in South Asian ancestry associated with a 1.11-fold increased risk of developing a GATA1 mutation and an ∼1.21-fold increase in mutation VAF (Table 2). Interestingly, within the South Asian ancestry group, genetic ancestry related to the subpopulations of Indian Telugu in the United Kingdom; Bengali in Bangladesh; and Punjabi in Lahore, Pakistan, were significantly positively associated with GATA1s mutation status and VAF (Table 2).

Table 2.

Association between inferred genetic ancestry and GATA1s mutations in newborns with DS

Ancestry superpopulationsGATA1s mutation VAFGATA1s mutation status (yes/no)
OR (95% CI) P valueOR (95% CI) P value
AFR 0.906 (0.770-1.065) .23 0.928 (0.834-1.032) .17 
EAS 0.817 (0.409-1.629) .56 0.863 (0.521-1.430) .58 
EUR 0.968 (0.854-1.098) .61 0.985 (0.917-1.059) .69 
SAS 1.211 (1.018-1.441) .032 1.107 (1.010-1.214) .031 
South Asian ancestry subpopulations OR (95% CI)  P value OR (95% CI)  P value 
STU 1.291 (0.986-1.690) .064 1.147 (0.995-1.323) .059 
ITU 2.098 (1.166-3.774) .014 1.517 (1.106-2.081) .0097 
BEB 1.377 (1.037-1.827) .027 1.185 (1.021-1.377) .026 
GIH 1.215 (0.999-1.477) .052 1.114 (1.004-1.235) .041 
PJL 1.380 (1.077-1.768) .011 1.193 (1.046-1.361) .0087 
Ancestry superpopulationsGATA1s mutation VAFGATA1s mutation status (yes/no)
OR (95% CI) P valueOR (95% CI) P value
AFR 0.906 (0.770-1.065) .23 0.928 (0.834-1.032) .17 
EAS 0.817 (0.409-1.629) .56 0.863 (0.521-1.430) .58 
EUR 0.968 (0.854-1.098) .61 0.985 (0.917-1.059) .69 
SAS 1.211 (1.018-1.441) .032 1.107 (1.010-1.214) .031 
South Asian ancestry subpopulations OR (95% CI)  P value OR (95% CI)  P value 
STU 1.291 (0.986-1.690) .064 1.147 (0.995-1.323) .059 
ITU 2.098 (1.166-3.774) .014 1.517 (1.106-2.081) .0097 
BEB 1.377 (1.037-1.827) .027 1.185 (1.021-1.377) .026 
GIH 1.215 (0.999-1.477) .052 1.114 (1.004-1.235) .041 
PJL 1.380 (1.077-1.768) .011 1.193 (1.046-1.361) .0087 

Significant associations (P < .05) are highlighted in bold.

Ancestry superpopulations: AFR, African; EAS, East Asian; EUR, European; SAS, South Asian.

South Asian subpopulations: BEB, Bengali in Bangladesh; GIH, Gujarati Indian in Houston, TX; ITU, Indian Telugu in the United Kingdom; PJL, Punjabi in Lahore, Pakistan; STU, Sri Lankan Tamil in the United Kingdom.

OR, odds ratio.

P values for GATA1s mutation VAF or mutation status calculated by linear or logistic regression tests, adjusting for sex, gestational age, and birth weight.

OR and 95% CIs calculated for each 10% increase in ancestry proportions.

OR and 95% CIs calculated for each 1% increase in ancestry proportions.

Gene-environment interaction analysis

We conducted genome-wide gene-environment analysis to explore whether gestational age or birth weight may modify the genetic influence on somatic GATA1s mutations. We identified 1 region on chromosome 18q12.3 that demonstrated statistically significant interaction with gestational age (supplemental Figure 7). The lead SNP at this region, rs11082436, exhibited an interaction coefficient of 2.15 (P = 7.07 × 10−9), indicating that for each 1 week increase in gestational age, the influence of the SNP on logit-transformed GATA1s mutation VAF increased by 2.15 units. We did not identify any significant interactions with birth weight (supplemental Figure 7).

The risk of developing leukemia is markedly higher in children with DS compared with children without DS.3 Although trisomy 21 increases the risk, still only a small proportion of children with DS will develop leukemia, suggesting that additional modifying risk factors may exist. In this study, we aimed to understand whether germ line genetic variation contributes to the risk of developing GATA1s mutations, the first step toward ML-DS. We conducted association studies of genome-wide genetic variants, including separate analyses on the trisomic chromosome 21 and on chromosome X, for somatic GATA1s mutations in newborns with DS in the ODSCS. Altogether, we did not find evidence of a strong contribution of heritable genetic variation to the development of TAM but did identify an intriguing association with South Asian ancestry in our multiancestry cohort.

Although our GWAS identified 3 genome-wide significant loci across the multiancestry and White-only analyses, the lack of convincing Manhattan peaks and variants in LD displaying significance suggests that these may be spurious associations. Although our relatively small sample size may have prohibited the discovery of genetic variants with small to moderate effects on GATA1s mutation development, our study was well powered (>80%) to discover variants with odds ratios in the range of 2.7 to 2.8 for SNPs with risk allele frequencies of 20% to 30%. These are relatively large effect sizes but are consistent with those reported for variants that have previously been associated with acute lymphoblastic leukemia in individuals with DS and in the euploid population.43,44 The lack of detection of strong genetic effects, either for variants associated with the presence (binary) or with the clonal frequency of GATA1s mutations, suggests that the variation in GATA1s mutations seen in newborns with DS may largely be driven by a combination of trisomy 21 and stochastic processes. The effects of trisomy 21 on hematopoiesis, including a bias of HSCs toward the erythroid and megakaryocytic lineages,2,18,19 have been associated with GATA1 overexpression as well as greater gene body and promoter region chromatin accessibility,19 which may increase the likelihood of GATA1s mutations developing.45 We acknowledge that our targeted sequencing approach, with a sensitivity of ∼0.3% VAF, may potentially have missed variants with very low clonal frequency. Although this could have led to some misclassification of case-control status in our binary trait analysis, it is unlikely to have affected results for our GWAS of GATA1s mutation VAF. Targeted sequencing at an even greater depth of coverage is required to better understand the prevalence of acquired GATA1s mutations among newborns with DS.

Our observation of a significant association between higher proportions of inferred South Asian genetic ancestry and GATA1s mutations may point to the presence of genetic risk factors that were not detectable in this study because of small sample size, which also limited our ability to perform population-stratified analyses. Genetic ancestry, however, may also serve as a proxy for unmeasured environmental effects,46,47 and it is possible that South Asian ancestry may correlate with in utero exposures that could influence the development of GATA1s mutations. Potential exposures of relevance include gestational diabetes and maternal obesity, which are known to be more prevalent in South Asian women48 and both of which have been associated with chronic fetal hypoxia.49 We note that observations in individuals with DS and in the euploid population converge on a potential role of hypoxia and oxidative stress in the etiology of GATA1s mutations. Biomarkers of oxidative stress and hypoxia have been found to be higher across the lifespan in individuals with DS,50,51 and, recently, oxidative stress was shown to be increased in trisomy 21 fetal liver HSCs.19 Furthermore, the spectrum of GATA1s mutations has been linked to signatures of oxidative stress.52 It is also intriguing to note that overexpression of GATA1 has been reported in euploid individuals who experience hypoxia at high altitudes,53 and in cells undergoing hypoxic conditions.54 Taken together, exogenous factors that influence hypoxic conditions and oxidative stress during fetal development may affect the development of GATA1s mutations in DS. Our exploratory gene-environment interaction analysis suggests that certain early-life or birth-related variables, such as gestational age, may interact with genetic variation to influence risk of GATA1s mutations. Further investigation into the role of both genetic and nongenetic factors is warranted.

GATA1s mutations in newborns with DS have been studied in various regions globally,55-58 yet data on their prevalence across populations and in relation to environmental exposures remain scarce. Comprehensive, cross-population epidemiological studies are warranted to elucidate the prevalence of GATA1s mutations in newborns with DS with varying genetic ancestries and to identify the in utero exposures that may underlie this variation.

The authors thank the families who participated in this study. The authors acknowledge the University of Southern California’s Center for Advanced Research Computing for providing computing resources (https://www.carc.usc.edu/).

Data used in this study were generated with support from a National Institutes of Health, INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE grant X01HD107380, and a Blood Cancer UK program grant (Bloodwise 13001 [P.V. and I.R.]. N.E. was supported by Cancer Research UK grant DRCPGM∖100058. A.J.d.S. is a scholar of the Leukemia & Lymphoma Society.

The content and conclusions from this work do not reflect the official views of the sponsors.

Contribution: A.J.d.S., I.R., and P.V. conceived and designed this study; Y.L. and N.E. analyzed the data; N.E. and P.L. performed experiments; Y.L and A.J.d.S. prepared the manuscript; and all authors edited and approved the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Adam J. de Smith, University of Southern California, USC Norris Research Tower, NRT-1509H, Biggy St, Los Angeles, CA 90033; email: desmith@usc.edu.

1.
Parker
SE
,
Mai
CT
,
Canfield
MA
, et al;
National Birth Defects Prevention Network
.
Updated national birth prevalence estimates for selected birth defects in the United States, 2004-2006
.
Birth Defects Res A Clin Mol Teratol
.
2010
;
88
(
12
):
1008
-
1016
.
2.
Roy
A
,
Cowan
G
,
Mead
AJ
, et al
.
Perturbation of fetal liver hematopoietic stem and progenitor cell development by trisomy 21
.
Proc Natl Acad Sci U S A
.
2012
;
109
(
43
):
17579
-
17584
.
3.
Hasle
H
,
Clemmensen
IH
,
Mikkelsen
M
.
Risks of leukaemia and solid tumours in individuals with Down's syndrome
.
Lancet
.
2000
;
355
(
9199
):
165
-
169
.
4.
Marlow
EC
,
Ducore
J
,
Kwan
ML
, et al
.
Leukemia risk in a cohort of 3.9 million children with and without Down syndrome
.
J Pediatr
.
2021
;
234
. 172.e3-180.e3.
5.
Taub
JW
,
Mundschau
G
,
Ge
Y
, et al
.
Prenatal origin of GATA1 mutations may be an initiating step in the development of megakaryocytic leukemia in Down syndrome
.
Blood
.
2004
;
104
(
5
):
1588
-
1589
.
6.
Wechsler
J
,
Greene
M
,
McDevitt
MA
, et al
.
Acquired mutations in GATA1 in the megakaryoblastic leukemia of Down syndrome
.
Nat Genet
.
2002
;
32
(
1
):
148
-
152
.
7.
Hitzler
JK
,
Cheung
J
,
Li
Y
,
Scherer
SW
,
Zipursky
A
.
GATA1 mutations in transient leukemia and acute megakaryoblastic leukemia of Down syndrome
.
Blood
.
2003
;
101
(
11
):
4301
-
4304
.
8.
Ahmed
M
,
Sternberg
A
,
Hall
G
, et al
.
Natural history of GATA1 mutations in Down syndrome
.
Blood
.
2004
;
103
(
7
):
2480
-
2489
.
9.
Gamis
AS
,
Alonzo
TA
,
Gerbing
RB
, et al
.
Natural history of transient myeloproliferative disorder clinically diagnosed in Down syndrome neonates: a report from the Children's Oncology Group Study A2971
.
Blood
.
2011
;
118
(
26
):
6752
-
6996
. quiz 6996.
10.
Cruz Hernandez
D
,
Metzner
M
,
de Groot
AP
, et al
.
Sensitive, rapid diagnostic test for transient abnormal myelopoiesis and myeloid leukemia of Down syndrome
.
Blood
.
2020
;
136
(
12
):
1460
-
1465
.
11.
Roberts
I
,
Alford
K
,
Hall
G
, et al;
Oxford-Imperial Down Syndrome Cohort Study Group
.
GATA1-mutant clones are frequent and often unsuspected in babies with Down syndrome: identification of a population at risk of leukemia
.
Blood
.
2013
;
122
(
24
):
3908
-
3917
.
12.
Klusmann
JH
,
Creutzig
U
,
Zimmermann
M
, et al
.
Treatment and prognostic impact of transient leukemia in neonates with Down syndrome
.
Blood
.
2008
;
111
(
6
):
2991
-
2998
.
13.
Yamato
G
,
Deguchi
T
,
Terui
K
, et al
.
Predictive factors for the development of leukemia in patients with transient abnormal myelopoiesis and Down syndrome
.
Leukemia
.
2021
;
35
(
5
):
1480
-
1484
.
14.
Muskens
IS
,
Li
S
,
Jackson
T
, et al
.
The genome-wide impact of trisomy 21 on DNA methylation and its implications for hematopoiesis
.
Nat Commun
.
2021
;
12
(
1
):
821
.
15.
Letourneau
A
,
Santoni
FA
,
Bonilla
X
, et al
.
Domains of genome-wide gene expression dysregulation in Down's syndrome [published correction appears in Nature. 2016;531(7594):400]
.
Nature
.
2014
;
508
(
7496
):
345
-
350
.
16.
Liu
B
,
Filippi
S
,
Roy
A
,
Roberts
I
.
Stem and progenitor cell dysfunction in human trisomies
.
EMBO Rep
.
2015
;
16
(
1
):
44
-
62
.
17.
Prandini
P
,
Deutsch
S
,
Lyle
R
, et al
.
Natural gene-expression variation in Down syndrome modulates the outcome of gene-dosage imbalance
.
Am J Hum Genet
.
2007
;
81
(
2
):
252
-
263
.
18.
Jardine
L
,
Webb
S
,
Goh
I
, et al
.
Blood and immune development in human fetal bone marrow and Down syndrome
.
Nature
.
2021
;
598
(
7880
):
327
-
331
.
19.
Marderstein
AR
,
De Zuani
M
,
Moeller
R
, et al
.
Single-cell multi-omics map of human fetal blood in Down syndrome
.
Nature
.
2024
;
634
(
8032
):
104
-
112
.
20.
Wagenblast
E
,
Araújo
J
,
Gan
OI
, et al
.
Mapping the cellular origin and early evolution of leukemia in Down syndrome
.
Science
.
2021
;
373
(
6551
):
eabf6202
.
21.
Labuhn
M
,
Perkins
K
,
Matzk
S
, et al
.
Mechanisms of progression of myeloid preleukemia to transformed myeloid leukemia in children with Down syndrome
.
Cancer Cell
.
2019
;
36
(
2
):
123
-
138
.
22.
Koboldt
DC
,
Chen
K
,
Wylie
T
, et al
.
VarScan: variant detection in massively parallel sequencing of individual and pooled samples
.
Bioinformatics
.
2009
;
25
(
17
):
2283
-
2285
.
23.
Robinson
JT
,
Thorvaldsdóttir
H
,
Winckler
W
, et al
.
Integrative genomics viewer
.
Nat Biotechnol
.
2011
;
29
(
1
):
24
-
26
.
24.
DePristo
MA
,
Banks
E
,
Poplin
R
, et al
.
A framework for variation discovery and genotyping using next-generation DNA sequencing data
.
Nat Genet
.
2011
;
43
(
5
):
491
-
498
.
25.
McKenna
A
,
Hanna
M
,
Banks
E
, et al
.
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
.
Genome Res
.
2010
;
20
(
9
):
1297
-
1303
.
26.
Poplin
R
,
Ruano-Rubio
V
,
DePristo
MA
, et al
.
Scaling accurate genetic variant discovery to tens of thousands of samples
.
BioRxiv
.
Published online 24 July 2018
.
27.
Jennifer Li
Johnson
,
Abecasis
GR
.
GAS Power Calculator: web-based power calculator for genetic association studies
.
BioRxiv
.
Published online 17 July 2017
.
28.
Stankiewicz
MJ
,
Crispino
JD
.
AKT collaborates with ERG and Gata1s to dysregulate megakaryopoiesis and promote AMKL
.
Leukemia
.
2013
;
27
(
6
):
1339
-
1347
.
29.
Birger
Y
,
Goldberg
L
,
Chlon
TM
, et al
.
Perturbation of fetal hematopoiesis in a mouse model of Down syndrome's transient myeloproliferative disorder
.
Blood
.
2013
;
122
(
6
):
988
-
998
.
30.
Banno
K
,
Omori
S
,
Hirata
K
, et al
.
Systematic cellular disease models reveal synergistic interaction of trisomy 21 and GATA1 mutations in hematopoietic abnormalities
.
Cell Rep
.
2016
;
15
(
6
):
1228
-
1241
.
31.
Gialesaki
S
,
Bräuer-Hartmann
D
,
Issa
H
, et al
.
RUNX1 isoform disequilibrium promotes the development of trisomy 21-associated myeloid leukemia
.
Blood
.
2023
;
141
(
10
):
1105
-
1118
.
32.
Gao
F
,
Chang
D
,
Biddanda
A
, et al
.
XWAS: a software toolset for genetic data analysis and association studies of the X chromosome
.
J Hered
.
2015
;
106
(
5
):
666
-
671
.
33.
Fulco
CP
,
Nasser
J
,
Jones
TR
, et al
.
Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations
.
Nat Genet
.
2019
;
51
(
12
):
1664
-
1669
.
34.
Fulco
CP
,
Munschauer
M
,
Anyoha
R
, et al
.
Systematic mapping of functional enhancer-promoter connections with CRISPR interference
.
Science
.
2016
;
354
(
6313
):
769
-
773
.
35.
Maples
BK
,
Gravel
S
,
Kenny
EE
,
Bustamante
CD
.
RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference
.
Am J Hum Genet
.
2013
;
93
(
2
):
278
-
288
.
36.
Auton
A
,
Brooks
LD
,
Durbin
RM
, et al;
1000 Genomes Project Consortium
.
A global reference for human genetic variation
.
Nature
.
2015
;
526
(
7571
):
68
-
74
.
37.
Ascoli
M
,
Fanelli
F
,
Segaloff
DL
.
The lutropin/choriogonadotropin receptor, a 2002 perspective
.
Endocr Rev
.
2002
;
23
(
2
):
141
-
174
.
38.
Kichaev
G
,
Bhatia
G
,
Loh
PR
, et al
.
Leveraging polygenic functional enrichment to improve GWAS power
.
Am J Hum Genet
.
2019
;
104
(
1
):
65
-
75
.
39.
Barton
AR
,
Sherman
MA
,
Mukamel
RE
,
Loh
PR
.
Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses
.
Nat Genet
.
2021
;
53
(
8
):
1260
-
1269
.
40.
Shrine
N
,
Guyatt
AL
,
Erzurumluoglu
AM
, et al;
Understanding Society Scientific Group
.
New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries
.
Nat Genet
.
2019
;
51
(
3
):
481
-
493
.
41.
Bürger
M
,
Zimmermann
TJ
,
Kondoh
Y
, et al
.
Crystal structure of the predicted phospholipase LYPLAL1 reveals unexpected functional plasticity despite close relationship to acyl protein thioesterases
.
J Lipid Res
.
2012
;
53
(
1
):
43
-
50
.
42.
Jung
YS
,
Wang
W
,
Jun
S
, et al
.
Deregulation of CRAD-controlled cytoskeleton initiates mucinous colorectal cancer via beta-catenin
.
Nat Cell Biol
.
2018
;
20
(
11
):
1303
-
1314
.
43.
Brown
AL
,
de Smith
AJ
,
Gant
VU
, et al
.
Inherited genetic susceptibility to acute lymphoblastic leukemia in Down syndrome
.
Blood
.
2019
;
134
(
15
):
1227
-
1237
.
44.
Perez-Andreu
V
,
Roberts
KG
,
Harvey
RC
, et al
.
Inherited GATA3 variants are associated with Ph-like childhood acute lymphoblastic leukemia and risk of relapse
.
Nat Genet
.
2013
;
45
(
12
):
1494
-
1498
.
45.
Makova
KD
,
Hardison
RC
.
The effects of chromatin organization on variation in mutation rates in the genome
.
Nat Rev Genet
.
2015
;
16
(
4
):
213
-
223
.
46.
González Burchard
E
,
Borrell
LN
,
Choudhry
S
, et al
.
Latino populations: a unique opportunity for the study of race, genetics, and social environment in epidemiological research
.
Am J Public Health
.
2005
;
95
(
12
):
2161
-
2168
.
47.
Florez
JC
,
Price
AL
,
Campbell
D
, et al
.
Strong association of socioeconomic status with genetic ancestry in Latinos: implications for admixture studies of type 2 diabetes
.
Diabetologia
.
2009
;
52
(
8
):
1528
-
1536
.
48.
Heslehurst
N
,
Sattar
N
,
Rajasingam
D
,
Wilkinson
J
,
Summerbell
CD
,
Rankin
J
.
Existing maternal obesity guidelines may increase inequalities between ethnic groups: a national epidemiological study of 502,474 births in England
.
BMC Pregnancy Childbirth
.
2012
;
12
:
156
.
49.
Desoye
G
,
Carter
AM
.
Fetoplacental oxygen homeostasis in pregnancies with maternal diabetes mellitus and obesity
.
Nat Rev Endocrinol
.
2022
;
18
(
10
):
593
-
607
.
50.
Perrone
S
,
Longini
M
,
Bellieni
CV
, et al
.
Early oxidative stress in amniotic fluid of pregnancies with Down syndrome
.
Clin Biochem
.
2007
;
40
(
3-4
):
177
-
180
.
51.
Donovan
MG
,
Rachubinski
AL
,
Smith
KP
, et al
.
Multimodal analysis of dysregulated heme metabolism, hypoxic signaling, and stress erythropoiesis in Down syndrome
.
Cell Rep
.
2024
;
43
(
8
):
114599
.
52.
Cabelof
DC
,
Patel
HV
,
Chen
Q
, et al
.
Mutational spectrum at GATA1 provides insights into mutagenesis and leukemogenesis in Down syndrome
.
Blood
.
2009
;
114
(
13
):
2753
-
2763
.
53.
Azad
P
,
Zhao
HW
,
Cabrales
PJ
, et al
.
Senp1 drives hypoxia-induced polycythemia via GATA1 and Bcl-xL in subjects with Monge's disease
.
J Exp Med
.
2016
;
213
(
12
):
2729
-
2744
.
54.
Zhang
FL
,
Shen
GM
,
Liu
XL
,
Wang
F
,
Zhao
YZ
,
Zhang
JW
.
Hypoxia-inducible factor 1-mediated human GATA1 induction promotes erythroid differentiation under hypoxic conditions
.
J Cell Mol Med
.
2012
;
16
(
8
):
1889
-
1899
.
55.
Goemans
BF
,
Noort
S
,
Blink
M
, et al
.
Sensitive GATA1 mutation screening reliably identifies neonates with Down syndrome at risk for myeloid leukemia
.
Leukemia
.
2021
;
35
(
8
):
2403
-
2406
.
56.
Terui
K
,
Toki
T
,
Taga
T
, et al
.
Highly sensitive detection of GATA1 mutations in patients with myeloid leukemia associated with Down syndrome by combining Sanger and targeted next generation sequencing
.
Genes Chromosomes Cancer
.
2020
;
59
(
3
):
160
-
167
.
57.
Rainis
L
,
Bercovich
D
,
Strehl
S
, et al
.
Mutations in exon 2 of GATA1 are early events in megakaryocytic malignancies associated with trisomy 21
.
Blood
.
2003
;
102
(
3
):
981
-
986
.
58.
Queiroz
LB
,
Lima
BD
,
Mazzeu
JF
, et al
.
Analysis of GATA1 mutations and leukemogenesis in newborns with Down syndrome
.
Genet Mol Res
.
2013
;
12
(
4
):
4630
-
4638
.

Author notes

The whole-genome sequencing data are available at the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/; accession number phs002982.v1.p1) and at the National Institutes of Health INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) data coordinating center (DCC).

The full-text version of this article contains a data supplement.

Supplemental data