AbstractGiven the established association between numerous GBA1 variants and specific neurological diseases, we extended the exploration by a phenome-wide association study to assess the impact of GBA1 variants on a wider spectrum of health-related traits. We identified 41 phenotypes associated with GBA1 variants, 39 of which were unreported, including 21 non-neurological and 20 neurological phenotypes. Based on variant-level association tests, we found beyond the neurological phenotypes particularly decreased gray-white matter contrast measures across 13 distinct brain regions, the non-coding variant rs9628662 was associated with six non-neurological traits such as hypermetropia. Another non-coding variant rs3115534 showed associations with eight biomarkers of multiple categories, and an increased risk of benign digestive neoplasms. Notably, compared to protein-coding variant p.T408M, the rs3115534 had opposing effects on three hematological biomarkers. Additionally, gene-level association analyses revealed significant associations with three neurological diseases including Parkinson’s disease. The findings demonstrated that GBA1 variants significantly impact various health-related traits.
IntroductionThe GBA1 gene is located on chromosome 1 (1q21) and encodes the lysosomal enzyme β-glucocerebrosidase (GCase), which hydrolyzes glucose from glucosylceramide (GlcCer) and glucosylsphingosine (GlcSph)1,2. Pathogenic biallelic GBA1 variants induce structural modifications in GCase, reducing its activity and stability, which drives intracellular GlcCer accumulation and leads to Gaucher Disease (GD) in recessive inheritance, a lysosomal storage disorder3,4. GD can be classified into three subtypes based on clinical progression and the presence/absence of neurological impairment. Type 1 is the most common non-neuronopathic form, while types 2 and 3 involve progressive neurological damage and greater severity5,6,7,8. Approximately 300 GBA1 variants have been associated with GD9, and many are also observed in Parkinson’s disease (PD), including familial cases with autosomal dominant inheritance and sporadic cases10. The connection with PD was initially noticed in large GD clinics11, with p.N409S and p.L483P being the most common PD-associated variants worldwide12. Moreover, GBA1 variants have also been detected subsequently in PD patients affected with REM sleep behavior disorder (RBD) and dementia with Lewy bodies (DLB)13,14,15. In total, previous reports have suggested several GBA1 variants may result in the emergence of neurological phenotypes.In the past decades, the ubiquitous presence of pleiotropy has been further substantiated, emphasizing that a variant or gene has an effect on multiple categories of traits16,17. While previous studies on GBA1 have mostly focused on one neurological disease or similar neurological phenotypes18,19,20,21, recent research reveals that GBA1-associated traits extend beyond neurological conditions, including blood urea nitrogen (BUN), uric acid, hemoglobin (Hb), and hematocrit (HCT)22. Nevertheless, investigations into the non-neurological effects of GBA1 remain limited, leaving substantial gaps in our understanding of the broader spectrum of GBA1-associated phenotypes. This highlights the need for systematic evaluations of gene-phenotype associations across a diverse phenotypic landscape. Elucidating the associations is essential for advancing genomic medicine, as facilitates the development of novel diagnostic and therapeutic approaches23,24.A phenome-wide association study (PheWAS) begins with a specific genetic variant and then analyzes across a curated collection of human phenotypes25, including both neurological and non-neurological phenotypes. This approach is particularly effective in exploring pleiotropy, revealing how a single gene can influence multiple traits. Here, we leveraged deep phenotyping data and whole-exome sequencing (WES) from the UKB26, using PheWAS to comprehensively characterize the phenotype spectrum of the GBA1 variants (Fig. 1). Initially, we performed variant-level analyses to assess the contribution of common variants to complex health-related continuous traits and diseases beyond neurological phenotypes. Subsequently, gene-based burden tests were used to evaluate the collective impact of rare variants on a broad range of phenotypes.Fig. 1: Flow diagram outlining the process for analysis of GBA1 variants with multiple phenotypes.UKB UK Biobank, WES whole-exome sequencing, MAF minor allele frequency, LoF loss of function variants, Mis missense variants, LoF+Mis a combination of loss of function and missense variants, Syn synonymous variants.Full size imageResultsOverview of the frequency of GBA1 variants and phenotypic dataWe processed phenotype data entirely from 502,357 samples in the UK Biobank. After rigorous quality control, we retained only 324,542 unrelated Europeans. A total of 366 GBA1 variants were involved during the classification (Supplementary Table 1). We found 181 individuals (0.06%) carrying at least one of 26 rare (MAF < 0.1%) LoF variants, 2081 individuals (0.64%) carrying at least one of 226 rare missense variants, 2262 individuals (0.70%) carrying at least one of 252 rare LoF+missense variants, and 1558 individuals (0.48%) carrying at least one of 101 rare synonymous variants. In addition, 263,717 individuals (81.26%) carried at least one of 13 common (MAF > 0.1%) variants in the GBA1 gene. Among the common variants, three were protein-coding variants (Supplementary Table 4), namely p.N409S(c.1226A > G, MAF = 1.55 × 10−3), p.T408M(c.1223C > T, MAF = 7.15 × 10−3), and p.E365K(c.1093G > A, MAF = 1.42 × 10−2), all of which were previously reported. Moreover, the remaining 10 non-coding variants, encompassing eight in intronic regions and two in UTR regions (Supplementary Fig. 2), were situated within the targeted areas of whole-exome sequencing and adhered to stringent quality control protocols.To investigate the pleiotropy of GBA1 variants on multiple systems of the human body and maximize diagnostic utility, 2150 binary traits (Supplementary Table 2), and 3314 continuous traits were classified into 21 ICD-10 chapters (Supplementary Table 3). The “Nervous system” category exhibited the highest percentage (70%) among continuous phenotypes, whereas the category of “Neoplasms” exhibited the highest percentage (16%) among binary phenotypes (Fig. 2a, b). Out of all binary traits, 732 were obtained through classification by PEACOK, while an additional 1420 PheCodes were generated as disease phenotypes through ICD-10 mapping.Fig. 2: Phenotypic diversity of the sequenced UK Biobank cohort.a The percentage and number of binary traits analyzed in the UKB cohort per ICD-10 disease chapter. Each marked number represents the number of phenotypes in the category. b The percentage and number of continuous traits analyzed in the UKB cohort per chapter. Each marked number represents the number of phenotypes in the category.Full size imageVariant-level associations with continuous traitsWe identified common variants associated with 33 continuous traits (P < 1.16 × 10−6), spanning the genitourinary, hematological, health status factors, neurological, psychiatric, endocrine and metabolic, ophthalmic, and gastrointestinal categories (Fig. 3b). A substantial number (72%) of the significant correlations were mainly attributed to two non-coding variants: rs9628662 (AAF = 0.31), and rs3115534 (AAF = 0.87) (Fig. 3d, Supplementary Table 5). Among the identified continuous phenotypes, 32 had not previously been associated with the specific variants analyzed in this study, based on cross-referencing with public databases Open Targets and the GWAS Catalogue. According to records from the Open Targets database, the evidence suggested that all the observed associations were independent of known linkage disequilibrium (LD) effects, reinforcing the novelty of these findings. Notably, 17 were brain MRI-based phenotypes, followed by eight hematological biomarkers.Fig. 3: Summary of variant-level PheWAS results.a Associations between single variants and binary traits. For all associations that appear in the analysis, we mark the significant associations and suggestive associations at the sub-threshold. The solid line represents the significant P value threshold (1.79× 10−6). The dashed line represents the suggestive P value threshold (9.15× 10−6). The y-axis is capped at −log10(P) = 16 and the associations with P < 10−16 were plotted on the y = 16 line. (n = 1). b Associations between single variants and common traits. For all associations that appear in the analysis, we mark only the significant association with the smallest P value per category. The solid line represents the significant P value threshold (1.16× 10−6). The dashed line represents the suggestive P value threshold (9.15× 10−6). The y-axis is capped at −log10(P) = 16 and the associations with P < 10−16 were plotted on the y = 16 line. (n = 4). c Illustration of all significant associations with binary traits at a variant level. d Effect sizes for significant associations with continuous traits at a variant level. PD Parkinson’s disease, HbA1C Glycated hemoglobin, GGT Gamma glutamyltransferase, GWC Gray-white contrast, LH left hemisphere, RH right hemisphere.Full size imageThe variant rs9628662 demonstrated significant associations with 15 brain MRI-based phenotypes. The scanner transverse (Y) brain position showed the highest statistical significance (β = 0.09, P = 2.23 ×10−16). Furthermore, significant correlations were observed between this variant and decreased gray-white matter contrast (GWC) measures across 13 brain regions, such as gray-white contrast in the caudal middle frontal (right hemisphere) (β = −0.06, P = 1.81 ×10−8).In addition, it was suggested that GBA1 variants exhibit noteworthy associations with various biomarkers. The rs3115534 variant demonstrated significant associations with eight distinct biomarkers, including four of the “hematological and immune” category (such as red blood cell (erythrocyte) count, β = 0.01, P = 9.79 ×10−15), two of the genitourinary category (such as urea, β = −0.03, P = 6.59 ×10−61\()\), one of the “endocrine and metabolic” category (calcium, β = −0.01, P = 2.59 ×10−10) and one of gastrointestinal category (gamma-glutamyltransferase, β = −0.01, P = 5.92 ×10−8). The association with gamma-glutamyltransferase was previously reported (Supplementary Table 9)27. The rs2075569 variant similarly showed associations with seven different biomarkers, including four of the “hematological and immune” category (such as reticulocyte count, β = 0.01, P = 5.60 ×10−7), two of the “endocrine and metabolic” category (such as HbA1c, β = −0.01. P = 1.16 ×10−10) and one of the genitourinary category (urea, β = −0.01, P = 1.53 ×10−7). The p.T408M variant showed associations with three different biomarkers of the “hematological and immune” category (such as hematocrit percentage, β = −0.07, P = 2.66 ×10−9), highlighting the potential pleiotropic effects (Supplementary Table 10). Furthermore, the p.E365K variant showed suggestive associations with hematocrit percentage at sub-threshold (β = 0.04, P = 4.96 ×10−6).Among the significant associations, we highlighted both rs2075569 and rs3115534 were associated with lower calcium and urea levels. However, the rs140335079 variant was associated with higher urea levels. The non-coding variant rs3115534 and the coding variant p.T408M had opposing directions of effect on hematocrit percentage, hemoglobin concentration, and red blood cell (erythrocyte) count. In summary, despite the relatively modest effect sizes and the different directions of effects on a phenotype, common variants within the GBA1 gene demonstrate the remarkable potential to affect multiple phenotypes, especially traits originating from brain MRI and blood biochemistry.Variant-level associations with binary traitsWe identified unreported associations between specific GBA1 variants and five binary traits (P < 1.78 ×10−6) through comparison with Open Targets and the GWAS Catalogue, which spanned categories including neoplasms and ophthalmology (Fig. 3a). We found the rs9628662 variant was simultaneously associated with four distinct ophthalmic phenotypes (Fig. 3c, Supplementary Table 6), all explaining reasons for glasses/contact lenses, such as myopia (OR = 0.91, P = 3.89 ×10−15) and hypermetropia (OR = 0.91, P = 1.87 ×10−8). Furthermore, we highlighted the association between the rs3115534 variant and a higher risk of benign neoplasm of other parts of the digestive system (OR = 1.03, P = 1.24 ×10−13). This finding warranted further investigation to elucidate the potential role of rs3115534 in the early detection of benign tumors, which may lead to significant morbidity by compressing or obstructing digestive structures or undergoing malignant transformation.We specifically focused on the diseases previously associated with GBA1 variants, including PD, GD, DLB, and RBD. Due to the inherent limitations in the scope of diseases covered by PheCodes, GD, DLB, and RBD were not included in our analysis of the diseases. Instead, we expanded our examination to include other sleep disorders, such as parasomnias, and additional forms of dementia, such as dementia with cerebral degeneration. The strongest association with dementia with cerebral degenerations (p.N409S, P = 0.02), failed to reach statistical significance. Similarly, no significant associations were found between sleep disorders and common variants (e.g., rs9628662, OR = 1.14, P = 0.02). Although we did not detect any study-wide significant associations with PD, suggestive associations (P < 9.15 ×10−6) showed that p.E365K leads to adverse effects in people with PD compared to controls (OR = 1.57, P = 6.08 ×10−6). It was consistent with previous studies showing that p.E365K was associated with the risk of PD in total populations28,29. Additionally, another common variant p.T408M had weak evidence for association with PD (OR = 1.69, P = 1.21 × 10−4), suggesting the potential PD risk among carriers. These findings partially replicated known GBA1-related phenotypes and confirmed the reliability of our approach.Gene burden analyses of rare variantsCarriers with qualifying rare variants generally exhibited a low prevalence of phenotypes across all associations in this study. Given the limited statistical power for detecting individual rare variant associations owing to their infrequency, we performed gene-level analysis to investigate the impact of aggregated rare variants on complex phenotypes. We utilized four categories of rare variants to compute a cumulative burden (all rare LoF variants, all rare missense variants, all rare LoF+missense variants, and all rare synonymous variants) and conducted the analysis.PheWAS of GBA1 variants showed significant associations (Table 1, Supplementary Fig. 3a, Supplementary Table 7) with three neurological phenotypes (P < 5.81 ×10−6), including dementia with cerebral degenerations (burden LoF+Missense variants: OR = 6.16, P = 1.32 ×10−7), PD (burden LoF+Missense variants: OR = 2.38, P = 4.29 ×10−7) and “Siblings have suffered from PD” (burden LoF+Missense variants: OR = 3.05, P = 9.99 ×10−7), which means participants’ brothers or sisters have suffered from PD. LoF+Missense variants were strongly associated with all of the three phenotypes, whereas LoF variants were only associated with “Siblings have suffered from PD” (OR = 10.66, P = 8.57 ×10−7). We successfully replicated the known association with PD. In addition, these findings also suggested that the effects of LoF variants tend to be more severe than those of combined LoF+Missense variants for the same phenotype.In contrast to the combination of two predictably deleterious GBA1 variants above, missense variants showed no significant associations. However, relationships between missense variants and two phenotypes reached suggestive significance at sub-threshold (P < 9.15 ×10−6), including Weighted-mean FA in tract medial lemniscus (right) (β = −0.34, P = 4.04 ×10−6) (Supplementary Fig. 3b, Supplementary Table 8), as well as PD (OR = 2.25, P = 8.92 ×10−6). As expected, synonymous variants showed no impact on disease risk, even at the sub-threshold, indicating that the confounding effects of population sub-structure were successfully minimized.Table 1 Significant associations of rare variant burden with binary traitsFull size tableDiscussionThis study represents the largest Phenome-wide analysis aimed at comprehensively assessing the contribution of the GBA1 gene to complex health-related continuous traits and human diseases, within a prospective European cohort to date. We leveraged exome sequences from 324,542 participants of the UKB and analyzed records of 5464 phenotypes, expanding beyond previous research which was limited to neurological diseases. Our analysis identified 41 phenotypes associated with GBA1 variants at the variant level and the gene level. While several identified phenotypes were previously reported, we found a substantial number of previously unreported neurological and non-neurological phenotypes.This study revealed novel associations with MRI-derived neurological phenotypes, particularly GWC measures across a wide range of brain regions. GWC quantifies the degree of blurring observed at the interface between gray matter and white matter compartments in the brain and serves as an indicator of localized tissue integrity variations, myelin loss, augmented water presence in the white matter, or accumulation of iron30. Previous studies showed that lower GWC scores were associated with diminished cognitive abilities and an increased progression from mild cognitive impairment to dementia30,31. This observation provided a potential indicator for early detection and differentiation of neurodegenerative diseases and new avenues for research into the mechanisms by which changes in gray and white matter integrity affect cognitive function.This study revealed new non-neurological associations, including biomarkers spanning multiple categories, that remained unrecognized in studies primarily focused on neurological diseases. We found GBA1 variants strongly associated with lower urate levels, which were consistent with previous findings that showed lower levels of uric acid in the serum could serve as a biomarker for the progression of GBA1-PD32. Previous research reported that the rs11264345 variant in the GBA1 gene exhibited a significant association with blood urea nitrogen (BUN), uric acid, hemoglobin (Hb), and hematocrit (HCT)22. In addition, we identified an association between the GBA1 variants and elevated HbA1c levels, which had been implicated in heightened cognitive impairment and neuroaxonal damage in PD patients33,34. Previous studies suggested that GBA1 may drive these changes through interactions with CDC12335, a protein associated with shared genetic susceptibility to PD and Type 2 diabetes (T2D)36, potentially affecting pathways associated with HbA1c levels, a clinical marker for chronic hyperglycemia and glycemic control in T2D. Overall, this cross-systemic exploration of biomarker correlations offered a perspective that could significantly improve clinical management and therapeutic outcomes for patients by intervening in the progression of GBA1-related diseases through atypical biomarkers.This study revealed that non-coding variants may variably influence the direction of phenotypic effects, emphasizing the crucial role of non-coding variants in shaping individual health outcomes and enhancing the comprehension of disease mechanisms. This aligned with prior findings that disease-related non-coding variants impact gene expression in diverse ways37. We demonstrated the relationships between two variants (rs2075569 and rs3115534) and lower calcium levels. In contrast, it was previously reported that GBA1-PD neurons exhibited increased calcium levels at normal conditions and amplified calcium release from the endoplasmic reticulum stores38. Therefore, the negative correlation between non-coding variants and calcium levels showed that non-coding variants may reflect the complexity of calcium regulatory mechanisms, indicating a previously unrecognized regulatory mechanism. In addition, we found the two non-coding variants associated with higher hematological parameters. Previous studies demonstrated that GCase deficiency leads to GlcCer accumulation in macrophages, particularly in the bone marrow, where Gaucher cell infiltration contributes to cytopenia and other clinical features39. The identification of the rs2075569 variant as a cis-expression Quantitative Trait Locus (cis-eQTL) for Thrombospondin (THBS3)40,41, an adhesive glycoprotein involved in cell-cell and cell-matrix interactions42, underscored the potential role of non-coding variants in regulating key hematological traits. Furthermore, the rs3115534 acting as a cis-eQTL influenced expression of the GBA1 gene in small intestine terminal ileum tissue from GTEx eQTL data43, potentially influencing local metabolism and lysosomal function, which may contribute to the development of benign digestive tumors. The association was further supported by evidence from GD patients, where GBA1 dysfunction was associated with an increased risk of gastrointestinal cancers, including colon cancer44. UKBEC data suggested that the rs9628662 variant is a cis-eQTL of nearby genes (including YY1AP1) in the medulla, which contains both white matter tracts and clusters of gray matter45,46,47. Altered gene expression in this tissue may disrupt the structure and function of gray and white matter, potentially contributing to the GWC changes. In summary, future studies that include additional types of non-coding variants can help expand the understanding of mechanisms of biochemistry and pathophysiology underlying GBA1-related diseases.The pioneering use of PheWAS to analyze GBA1 variants was a major strength48. This PheWAS efficiently provided a valuable resource of robust associations, revealing novel disease mechanisms, and expanding the phenotypic spectrum of the known gene. Regarding the rarity of the variants when analyzing thousands of traits in a biobank-scale population, we applied Firth’s logistic regression to control false positive associations49. However, our analysis had several limitations. Despite our research being constrained by the number of European ancestry participants carrying rare damaging variants, which resulted in an underpowered analysis for detecting significant effects in missense variants, we could still identify significant relationships involving both LoF and LoF+Missense variants. Our findings primarily applied to individuals of European ancestry, and the generalizability to other populations required further investigation. Although a single diagnosis (ICD-10 codes) per individual may not fully reflect diagnostic variability, the accuracy of neurodegenerative disease diagnoses (such as all-cause dementia and PD) based on inpatient and death registry data in the UKB has been validated50. Future studies with more hospital diagnoses and follow-up encounters could improve diagnostic accuracy. Moreover, experimental validation could be conducted to confirm the discovered associations.In conclusion, we finally identified 41 neurological and non-neurological phenotypes associated with GBA1 variants, 39 of which have not been previously reported. These findings extend the impact of GBA1 beyond neurological diseases, reveal the significance of non-coding variants, and propose therapeutic targets through biomarker discovery. Large cohorts of diverse populations will help refine the clinical spectrum of GBA1-related phenotypes and advance precision medicine for disease prevention and management.MethodsUKB resourceThe UK Biobank is a population-based cohort study conducted in the United Kingdom involving approximately 500,000 individuals aged 40-70 years at recruitment, who are followed up continuously. Ethical approval for the UK Biobank was granted by the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB), with renewal in 2021 (reference 21/NW/0157). All studied participants provided informed consent. A curated subset of genetically unrelated White British participants (UKB Field ID 22006) was directly selected, comprising more than 80% of the UKB cohort26. After excluding samples with inconsistencies in genetically determined and self-reported sex, with abnormal sex chromosome aneuploidy, and those lacking whole-exome sequencing data at the time of this study, a total of 324,542 unrelated samples were finally included (Supplementary Fig. 1).Variant selectionThe Regeneron Genetic Center conducted exome sequencing on DNA samples collected from the UK Biobank. Multiplexed samples were sequenced with paired-end 75-bp reads on the Illumina NovaSeq 6000 platform, subsequently all raw sequencing data were aligned to the full GRCh38 reference using the OQFE protocols. For further information on sequencing and variant calling methods utilized in detail, please visit https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?ID=170.We included variants that had variant missingness < 10%, quality score (QUAL) ≥ 30, genotype quality score (GQ) ≥ 20, and total depth (DP) ≥ 10. These criteria were applied using the VCFtools51 and custom-designed software, “FilterVar”, each tailored to specific aspects of the analysis as required. For our analyses, variants were annotated using “ExtremeVar”, with the genomic construct hg38 and GBA1 transcript NM_000157 (chr1: 155234452-155244670). After quality control and annotation, GBA1 variants were categorized into common and rare variants based on a minor allele frequency (MAF) threshold of 0.1%. 13 common variants (MAF > 0.1%) annotated as the exome, intron, UTR3, or UTR5 were included. Furthermore, rare variants (MAF < 0.1%) were classified into four categories: (1) Loss-of-function variants (LoF): Variants predicted to result in a stop-gain, frameshift deletion, or splicing site alteration, which were directly classified as deleterious by “ExtremeVar”. (2) Missense variants. (3) LoF+Missense: A combination of loss-of-function and missense variants. (4) Synonymous variants.PhenotypesThe main phenotypic categories included in this study were binary traits and continuous traits. These phenotypic data were sourced from the April 2023 data release as part of the UKB application 99093. All phenotypic data were extracted from phenotypes available through the UK Biobank Data Showcase, and then except for ICD-10 data, processed using the Quanli Wang lab modified version of PHESANT package52, which can be viewed at https://github.com/astrazeneca-cgr-publications/PEACOK, to parse the phenotypes. The package excluded continuous phenotypes with less than 500 participants by default. The remaining phenotypes were adjusted to follow a normal distribution through the process of inverse normal rank transformation. Moreover, a minimum threshold of 500 participants was enforced for each binary trait by default. Based on the Field ID paths, we excluded certain binary traits (1) sociodemographics, such as education and employment; (2) dietary information, such as 24-h recall and food preferences; (3) hospital inpatient administrative data, such as admission and discharge dates, and other related aspects; (4) blood count processing details, such as reticulocyte count acquisition methods; (5) ICD-9 diagnosis codes; (6) OPCS traits and other binary traits related to treatment methods. Moreover, in addition to the 732 binary traits generated by the PEACOK R package, we extracted ICD-10 diagnosis codes (Field ID 41270) from various health-related records within the UKB. All ICD-10 codes were systematically mapped to PheCodes53. Participants were considered eligible cases for specific PheCodes if they had at least one recorded diagnosis of the ICD-10 code, while controls consisted of participants who never had a positive diagnosis for any PheCodes within the corresponding ICD-10 root chapter. We restricted the analysis to PheCodes with at least 20 cases, resulting in a final consideration of 1420 PheCodes. In summary, a total of 2150 binary phenotypes and 3314 continuous phenotypes were included in the study.Phenome‑wide association study (PheWAS)We performed variant-level PheWAS analyses between all 5464 phenotypes and 13 GBA1 common variants. For binary traits, each variant-level analysis was carried out independently by Firth’s logistic regression, correcting for age, age squared2, sex, and the first 10 principal components. For continuous traits, we used linear regression correcting for the same covariates. In this approach, we took two Bonferroni-adjusted thresholds to individually define significant variant-level correlations within continuous traits (P = 0.05/13/3314 ≈ 1.16 ×10−6, Bonferroni correction for 13 variants in 3314 continuous traits) and binary traits (P = 0.05/13/2150 ≈ 1.78 ×10−6, Bonferroni correction for 13 variants in 2150 binary traits). In addition, we took a sub-threshold to define suggestive correlations (Bonferroni-adjusted significance threshold of P = 0.05/5464 ≈ 9.15 ×10−6).Rare variants were mainly tested on aggregate. We performed gene-level PheWAS analyses between 5,464 phenotypes and burden scores in the GBA1 gene. For the analysis, we considered four categories of qualifying variants, encompassing all rare variants (MAF < 0.1%). Firstly, burden scores were calculated for each participant based on the count of qualifying variants to represent the cumulative effects and enhance the statistical power. For binary traits, Firth’s logistic regression was employed, while linear regression was utilized for continuous traits. Covariates included age, age squared2, sex, and the first 10 principal components. Throughout all analyses conducted using the PheWAS R package54, additive genotype models were employed. In this approach, we took two Bonferroni-adjusted thresholds to individually define significant gene-based correlations within continuous traits (P = 0.05/4/3314 ≈ 3.77 ×10−6, Bonferroni correction for four categories of rare variants in 3314 continuous traits) and binary traits (P = 0.05/4/2150 ≈ 5.81 ×10−6, Bonferroni correction for four categories of rare variants in 2,150 continuous traits). In addition, we took a sub-threshold to define suggestive correlations (P = 0.05/5464 ≈ 9.15 ×10−6).
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request. The UK Biobank data are available through the UK Biobank Access Management System https://www.ukbiobank.ac.uk/. Open Targets: https://platform.opentargets.org/target/. GWAS Catalogue: https://www.ebi.ac.uk/gwas/. eQTLs data are available through the following websites: BrainSeq: http://eqtl.brainseq.org/, UKBEC: http://www.braineac.org/, eQTLGen: https://www.eqtlgen.org/, BIOSQTL: http://genenetwork.nl/biosqtlbrowser, FHS_eQTL: ftp://ftp.ncbi.nlm.nih.gov/eqtl/original_submissions/FHS_eQTL/, and GTEx: http://gtexportal.org.
Code availability
The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.
AbbreviationsGCase:
β-glucocerebrosidase
GlcCer:
Glucosylceramide
GlcSph:
Glucosylsphingosine
PD:
Parkinson’s disease
GD:
Gaucher disease
DLB:
Dementia with Lewy bodies
RBD:
Rapid eye movements (REM) sleep behavior disorders
BUN:
Blood urea nitrogen
Hb:
Hemoglobin
HCT:
Hematocrit
T2D:
Type 2 diabetes
HbA1c:
Glycated hemoglobin
PheWAS:
Phenome-wide association study
WES:
Whole-exome sequencing
QUAL:
Quality score
GQ:
Genotype quality score
DP:
Total depth
MAF:
Minor Allele Frequency
AAF:
Alternate Allele Frequency
LoF:
Loss of function variants
GWC:
Gray-white matter contrast
ReferencesBarkhuizen, M., Anderson, D. G. & Grobler, A. F. Advances in GBA-associated Parkinson’s disease–Pathology, presentation and therapies. Neurochem. Int. 93, 6–25 (2016).CAS
PubMed
Google Scholar
Do, J., McKinney, C., Sharma, P. & Sidransky, E. Glucocerebrosidase and its relevance to Parkinson disease. Mol. Neurodegener. 14, 36 (2019).PubMed
PubMed Central
Google Scholar
Zhang, X., Wu, H., Tang, B. & Guo, J. Clinical, mechanistic, biomarker, and therapeutic advances in GBA1-associated Parkinson’s disease. Transl. Neurodegener. 13, 48 (2024).CAS
PubMed
PubMed Central
Google Scholar
Beavan, M. S. & Schapira, A. H. Glucocerebrosidase mutations and the pathogenesis of Parkinson disease. Ann. Med. 45, 511–521 (2013).CAS
PubMed
Google Scholar
Grabowski, G. A. Phenotype, diagnosis, and treatment of Gaucher’s disease. Lancet 372, 1263–1271 (2008).CAS
PubMed
Google Scholar
Parlar, S. C., Grenn, F. P., Kim, J. J., Baluwendraat, C. & Gan-Or, Z. Classification of GBA1 Variants in Parkinson’s Disease: The GBA1-PD Browser. Mov. Disord. 38, 489–495 (2023).CAS
PubMed
PubMed Central
Google Scholar
Blandini, F. et al. Glucocerebrosidase mutations and synucleinopathies: Toward a model of precision medicine. Mov. Disord. 34, 9–21 (2019).PubMed
Google Scholar
Hruska, K. S., LaMarca, M. E., Scott, C. R. & Sidransky, E. Gaucher disease: mutation and polymorphism spectrum in the glucocerebrosidase gene (GBA). Hum. Mutat. 29, 567–583 (2008).CAS
PubMed
Google Scholar
Smith, L., Mullin, S. & Schapira, A. H. V. Insights into the structural biology of Gaucher disease. Exp. Neurol. 298, 180–190 (2017).CAS
PubMed
Google Scholar
Koros, C. et al. A Global Perspective of GBA1-Related Parkinson’s Disease: A Narrative Review. Genes 15, 1605 (2024).CAS
PubMed
PubMed Central
Google Scholar
Payne, K., Walls, B. & Wojcieszek, J. Approach to Assessment of Parkinson Disease with Emphasis on Genetic Testing. Med. Clin. North Am. 103, 1055–1075 (2019).PubMed
Google Scholar
Riboldi, G. M. & Di Fonzo, A. B. GBA, Gaucher Disease, and Parkinson’s Disease: From Genetic to Clinic to New Therapeutic Approaches. Cells 8, https://doi.org/10.3390/cells8040364 (2019).Nalls, M. A. et al. A multicenter study of glucocerebrosidase mutations in dementia with Lewy bodies. JAMA Neurol. 70, 727–735 (2013).PubMed
Google Scholar
Hertz, E., Chen, Y. & Sidransky, E. Gaucher disease provides a unique window into Parkinson disease pathogenesis. Nat. Rev. Neurol. 20, 526–540 (2024).PubMed
Google Scholar
Thaler, A. et al. A “dose” effect of mutations in the GBA gene on Parkinson’s disease phenotype. Parkinsonism Relat. Disord. 36, 47–51 (2017).PubMed
Google Scholar
Novo, I., López-Cortegano, E. & Caballero, A. Highly pleiotropic variants of human traits are enriched in genomic regions with strong background selection. Hum. Genet. 140, 1343–1351 (2021).CAS
PubMed
PubMed Central
Google Scholar
Visscher, P. M. & Yang, J. A plethora of pleiotropy across complex traits. Nat. Genet. 48, 707–708 (2016).CAS
PubMed
Google Scholar
Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012).CAS
PubMed
PubMed Central
Google Scholar
Krohn, L. et al. Genome-wide association study of REM sleep behavior disorder identifies polygenic risk and brain expression effects. Nat. Commun. 13, 7496 (2022).CAS
PubMed
PubMed Central
Google Scholar
Guerreiro, R. et al. Investigating the genetic architecture of dementia with Lewy bodies: a two-stage genome-wide association study. Lancet Neurol. 17, 64–74 (2018).PubMed
Google Scholar
Zhou, Y. et al. Mutational spectrum and clinical features of GBA1 variants in a Chinese cohort with Parkinson’s disease. NPJ Parkinsons Dis. 9, 129 (2023).CAS
PubMed
PubMed Central
Google Scholar
Wang, C. H. et al. GBA1 as a risk gene for osteoporosis in the specific populations and its role in the development of Gaucher disease. Orphanet J. Rare Dis. 19, 144 (2024).PubMed
PubMed Central
Google Scholar
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).CAS
PubMed
PubMed Central
Google Scholar
Delude, C. M. Deep phenotyping: The details of disease. Nature 527, S14–S15 (2015).CAS
PubMed
Google Scholar
Bastarache, L., Denny, J. C. & Roden, D. M. Phenome-Wide Association Studies. JAMA 327, 75–76 (2022).PubMed
PubMed Central
Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).CAS
PubMed
PubMed Central
Google Scholar
Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet 53, 185–194 (2021).CAS
PubMed
Google Scholar
Huang, Y., Deng, L., Zhong, Y. & Yi, M. The Association between E326K of GBA and the Risk of Parkinson’s Disease. Parkinsons Dis. 2018, 1048084 (2018).PubMed
PubMed Central
Google Scholar
Greuel, A. et al. GBA Variants in Parkinson’s Disease: Clinical, Metabolomic, and Multimodal Neuroimaging Phenotypes. Mov. Disord. 35, 2201–2210 (2020).CAS
PubMed
Google Scholar
Uribe, C. et al. Gray/White Matter Contrast in Parkinson’s Disease. Front. Aging Neurosci. 10, 89 (2018).PubMed
PubMed Central
Google Scholar
Jefferson, A. L. et al. Gray & white matter tissue contrast differentiates Mild Cognitive Impairment converters from non-converters. Brain Imaging Behav. 9, 141–148 (2015).PubMed
PubMed Central
Google Scholar
Koros, C. et al. Serum uric acid level as a putative biomarker in Parkinson’s disease patients carrying GBA1 mutations: 2-Year data from the PPMI study. Parkinsonism Relat. Disord. 84, 1–4 (2021).CAS
PubMed
Google Scholar
Ogaki, K. et al. Impact of diabetes and glycated hemoglobin level on the clinical manifestations of Parkinson’s disease. J. Neurol. Sci. 454, 120851 (2023).CAS
PubMed
Google Scholar
Uyar, M. et al. Diabetes, Glycated Hemoglobin (HbA1c), and Neuroaxonal Damage in Parkinson’s Disease (MARK-PD Study). Mov. Disord. 37, 1299–1304 (2022).CAS
PubMed
Google Scholar
Song, N. et al. USP9X deubiquitinates and stabilizes CDC123 to promote breast carcinogenesis through regulating cell cycle. Mol. Carcinog. 62, 1487–1503 (2023).CAS
PubMed
Google Scholar
Caputo, V., Termine, A., Strafella, C., Giardina, E. & Cascella, R. Shared (epi)genomic background connecting neurodegenerative diseases and type 2 diabetes. World J. Diab. 11, 155–164 (2020).
Google Scholar
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).PubMed
PubMed Central
Google Scholar
Schöndorf, D. C. et al. iPSC-derived neurons from GBA1-associated Parkinson’s disease patients show autophagic defects and impaired calcium homeostasis. Nat. Commun. 5, 4028 (2014).PubMed
Google Scholar
Stirnemann, J. et al. A Review of Gaucher Disease Pathophysiology, Clinical Presentation and Treatments. Int. J. Mol. Sci. 18, 441 (2017).PubMed
PubMed Central
Google Scholar
Joehanes, R. et al. Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol. 18, 16 (2017).PubMed
PubMed Central
Google Scholar
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).PubMed
PubMed Central
Google Scholar
Vos, H. L., Devarayalu, S., de Vries, Y. & Bornstein, P. Thrombospondin 3 (Thbs3), a new member of the thrombospondin gene family. J. Biol. Chem. 267, 12192–12196 (1992).CAS
PubMed
Google Scholar
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Google Scholar
Revel-Vilk, S. et al. Cancer Risk in Patients with Gaucher Disease Using Real-World Data. J. Clin. Med. 12, https://doi.org/10.3390/jcm12247707 (2023).Jaffe, A. E. et al. Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis. Nat. Neurosci. 21, 1117–1125 (2018).CAS
PubMed
PubMed Central
Google Scholar
Collado-Torres, L. et al. Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia. Neuron 103, 203–216.e208 (2019).CAS
PubMed
PubMed Central
Google Scholar
Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).CAS
PubMed
PubMed Central
Google Scholar
Park, J. et al. Exome-wide evaluation of rare coding variants using electronic health records identifies new gene-phenotype associations. Nat. Med. 27, 66–72 (2021).CAS
PubMed
PubMed Central
Google Scholar
Doerken, S., Avalos, M., Lagarde, E. & Schumacher, M. Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLoS One 14, e0217057 (2019).CAS
PubMed
PubMed Central
Google Scholar
UK Biobank Algorithmically-Defined Outcomes, biobank.ctsu.ox.ac.uk/crystal/ukb/docs/alg_outcome_main.pdf (2024).Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).CAS
PubMed
PubMed Central
Google Scholar
Millard, L. A. C., Davies, N. M., Gaunt, T. R., Davey Smith, G. & Tilling, K. Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank. Int. J. Epidemiol. 47, 29–35 (2018).PubMed
Google Scholar
Wu, P. et al. Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation. JMIR Med. Inf. 7, e14325 (2019).
Google Scholar
Zhang, X. et al. Phenome-wide association study (PheWAS) of colorectal cancer risk SNP effects on health outcomes in UK Biobank. Br. J. Cancer 126, 822–830 (2022).PubMed
Google Scholar
Download referencesAcknowledgementsWe thank all study participants and their families and the investigators and members of the UK Biobank. We thank the members of the National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University. We are grateful for technical support from the High Performance Computing Center of Central South University. This work was supported by the National Key R&D Program of China (Grant No. 2021YFC2502100), the Hunan Innovative Province Construction Project (Grant No. 2021SK1010), the Central South University Research Programme of Advanced Interdisciplinary Study (Grant No. 2023QYJC010), the Scientific Research Program of FuRong laboratory (Grant No. 2023SK2093-1) to J.L.Author informationAuthor notesThese authors contributed equally: Jiaqi Yang, Yuanfeng Huang.Authors and AffiliationsNational Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, ChinaJiaqi Yang, Yuanfeng Huang, Zheng Wang, Shiyu Zhang, Dai Wu, Jiayi Xiong, Yijing Wang, Qiao Zhou, Yixiao Zhu, Guihu Zhao, Bin Li & Jinchen LiBioinformatics Center, Xiangya Hospital & Furong Laboratory, Central South University, Changsha, 410008, Hunan, ChinaJiaqi Yang, Yuanfeng Huang, Zheng Wang, Shiyu Zhang, Jiayi Xiong, Yijing Wang, Qiao Zhou, Yixiao Zhu, Guihu Zhao, Bin Li & Jinchen LiKey Laboratory of Hunan Province in Neurodegenerative Disorders, Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan, ChinaZheng Wang, Jifeng Guo, Beisha Tang & Jinchen LiDepartment of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, ChinaHeng WuClinical Research Center for Immune-Related Encephalopathy of Hunan Province, Hengyang, Hunan, ChinaHeng WuMOE Key Laboratory of Pediatric Rare Diseases, University of South China, Hengyang, ChinaKun XiaAuthorsJiaqi YangView author publicationsYou can also search for this author in
PubMed Google ScholarYuanfeng HuangView author publicationsYou can also search for this author in
PubMed Google ScholarZheng WangView author publicationsYou can also search for this author in
PubMed Google ScholarShiyu ZhangView author publicationsYou can also search for this author in
PubMed Google ScholarDai WuView author publicationsYou can also search for this author in
PubMed Google ScholarJiayi XiongView author publicationsYou can also search for this author in
PubMed Google ScholarHeng WuView author publicationsYou can also search for this author in
PubMed Google ScholarYijing WangView author publicationsYou can also search for this author in
PubMed Google ScholarQiao ZhouView author publicationsYou can also search for this author in
PubMed Google ScholarYixiao ZhuView author publicationsYou can also search for this author in
PubMed Google ScholarGuihu ZhaoView author publicationsYou can also search for this author in
PubMed Google ScholarBin LiView author publicationsYou can also search for this author in
PubMed Google ScholarJifeng GuoView author publicationsYou can also search for this author in
PubMed Google ScholarKun XiaView author publicationsYou can also search for this author in
PubMed Google ScholarBeisha TangView author publicationsYou can also search for this author in
PubMed Google ScholarJinchen LiView author publicationsYou can also search for this author in
PubMed Google ScholarContributionsAuthor Contributions: J.Y. and Y.H. contributed equally to this work. J.Y., Y.H., and J.L. were responsible for the study design and primary analysis. Z.W., Y.W., S.Z., D.W., H.W., J.X., Y.Z., Q.Z., B.T., and G.Z. were responsible for assisting with study conception. J.Y., Y.H., B.L., J.G., K.X., and J.L. were responsible for the original manuscript draft. All authors read and approved the final version of the manuscript. J.L. was responsible for the funding.Corresponding authorCorrespondence to
Jinchen Li.Ethics declarations
Competing interests
The authors declare no competing interests.
Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary informationSupplementary DataSupplementary InformationRights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissionsAbout this articleCite this articleYang, J., Huang, Y., Wang, Z. et al. A PheWAS approach to identify associations of GBA1 variants with comprehensive phenotypes beyond neurological diseases.
npj Parkinsons Dis. 11, 48 (2025). https://doi.org/10.1038/s41531-025-00901-8Download citationReceived: 01 November 2024Accepted: 06 March 2025Published: 17 March 2025DOI: https://doi.org/10.1038/s41531-025-00901-8Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative