Genetics of growth rate in induced pluripotent stem cells

Brian N. Lee1, Henry J. Taylor123,  Filippo Cipriani4,  Narisu Narisu1,  Catherine C. Robertson15,  Amy J. Swift1,  Neelam Sinha1, Tingfen Yan1, Lori L. Bonnycastle1, Nathan Dale4, Annie Butt4, Hemant Parsaud4, Stefan Semrau4, NYSCF Global Stem Cell Array Team, GENESiPS Consortium, iPSCORE Consortium, Joshua W. Knowles6789, Agnieszka D’Antonio-Chronowska12, Kelly A. Frazer1314, Leslie G. Biesecker1, Scott Noggle4, Michael R. Erdos1, Daniel Paull4, Francis S. Collins1 and and D. Leland Taylor 

  1. 1Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
  2. 2British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
  3. 3Heart and Lung Research Institute, University of Cambridge, Cambridge CB1 8RN, UK
  4. 4New York Stem Cell Foundation Research Institute, New York, NY 10019, USA
  5. 5Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor 48109, USA
  6. 6Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94304, USA
  7. 7Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94304, USA
  8. 8Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA 94304, USA
  9. 9Stanford Prevention Research Center, Stanford University School of Medicine, Stanford, CA 94304, USA
  10. 10IKERBASQUE, Basque Foundation for Science, Bilbao 48009, Spain
  11. 11Department of Endocrinology, Metabolism, Nutrition, and Kidney Disease, Biobizkaia Health Research Institute, Cruces 48903, Spain
  12. 12Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
  13. 13Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA
  14. 14Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA

*Correspondence: leland.taylor{at}nih.gov

bioRxiv preprint DOI: https://doi.org/10.1101/2025.07.02.662844

Posted: July 03, 2025, Version 1

Copyright: # BNL, HJT, and FC are joint first authors.

Abstract

Human induced pluripotent stem cells (iPSCs) have transformed biomedical research by enabling the generation of diverse cell types from accessible somatic tissues. However, certain fundamental biological properties, such as the genetic and epigenetic determinants of iPSC proliferation, remain poorly characterized. We measured the growth of iPSC lines derived from 602 unique donors using high-throughput time-lapse imaging, quantified proliferation through a growth Area-Under-the-Curve (gAUC) phenotype, and correlated gAUC with the gene expression and genotype of the cell lines. We identified 3,091 genes associated with gAUC, many of which are well established regulators of cell proliferation. We also found that rare deleterious variants in WDR54 were associated with reduced iPSC growth and that WDR54 was differentially expressed with respect to gAUC. Although no common variants showed a genome-wide association with gAUC, iPSC lines from monozygotic twins were highly correlated, and common genetic variation explained approximately 71-75% of the variance in iPSC growth rates. These results indicate a complex genetic architecture of iPSC growth rates, where rare, large-effect variants in important growth regulators, including WDR54, are layered onto a highly polygenic background. These findings have important implications for the design of pooled iPSC-based studies and disease models, which may be confounded by intrinsic growth differences.

Introduction

The development of human induced pluripotent stem cells (iPSCs) has transformed many aspects of biomedical research. The first iPSCs were described 18 years ago, derived from human fibroblasts reprogrammed into a pluripotent state.13 Since then, numerous protocols have been established to differentiate iPSCs into a wide variety of cell types, including those difficult to obtain as primary human tissues.48 Given this differentiation potential and the facility of in vitro culture and manipulation, iPSCs offer a promising avenue for a wide array of basic science investigations, as well as translational programs including testing of drug and cellular therapies9,10 and potential autotransplantation.11

Many basic biological questions that are difficult to address in vivo can be assessed with iPSCs. For example, disease-associated loci discovered through genome-wide association studies (GWASs) often fall in non-coding regions of the genome, masking the underlying effector gene(s) and the specific cellular mechanisms underlying disease risk.12,13 Studies examining the genetics of gene expression across various tissues have not resolved all of these associations, suggesting that some disease signals may originate from cell types or cell states that are difficult to capture in vivo (e.g. pancreatic progenitor cells or beta cells after glucose stimulation).14 Large-scale genetic studies using iPSCs offer one way to investigate the effects of natural human variation and disease-associated variants in otherwise inaccessible cell types and contexts.1519 As repositories of iPSCs derived from many donors continue to expand, such studies will likely become commonplace, making a detailed understanding of iPSC properties necessary to optimize study designs and account for confounding factors.

In this study (Figure 1), we considered one such variable—growth rates of undifferentiated iPSCs. Growth rates can be a critical factor in pooled study designs where different proliferation rates may cause certain lines to dominate a pool.15,20 Across 602 lines, we measured rates of cell growth from daily images, paired with RNA sequencing on a subset of 333 lines. Using these data, we performed an association analysis of iPSC growth rate with gene expression, common genetic variation, and rare genetic variation. Our results support a polygenic model of cell proliferation, whereby common variants contribute small effects and rare variants may exert larger influences on cellular growth dynamics.

Fig. 1.Graphical overview of this study.

Human iPSCs from multiple cohorts were cultured in 96-well plates until confluence. Daily cell density measurements were obtained via imaging cytometry. Upon reaching confluence, cells were harvested for bulk RNA sequencing, genotyping, and genome sequencing, enabling differential expression analysis and both common- and rare-variant association studies.

Results

Cohort characterization and overview of genetic data

We obtained iPSC lines from 602 donors (ESM Table 1).2126 These lines included both male (278 lines; 46.2%) and female (324 lines; 53.8%) donors. Lines were grown in two types of media, either mTeSR Plus (n = 347) or Freedom (n = 255), which we controlled for in subsequent analyses as a covariate (Methods).

We genotyped all 602 donors and imputed variants to GRCh38 (Methods). Focusing on common single nucleotide polymorphisms (SNPs; minor allele frequency [MAF] > 5%), we retained 5,985,990 variants after quality control procedures. To explore the genetic diversity represented in the full cohort, we grouped participants according to their genetic similarity to 1000 Genomes genetic ancestry groups, including participants from European (EUR), African (AFR), American (AMR), East Asian (EAS), and South Asian (SAS) ancestry groups (Methods).27 For downstream analyses, we described the genetic similarity groups as 1KG-EUR-like, 1KG-AFR-like, 1KG-AMR-like, 1KG-EAS-like, and 1KG-SAS-like, respectively. Most of our participants were 1KG-EUR-like (n = 446), but we also identified participants that were 1KG-AFR-like (n = 39), 1KG-AMR-like (n = 44), 1KG-EAS-like (n = 51), and 1KG-SAS-like (n = 22) ancestry groups (S. Fig 1).

While genotype imputation is accurate for common SNPs, rare genetic variants are often missed. To identify rare genetic variants, we used genome sequencing (GS) or exome sequencing (ES) data to call exonic variants in 514 donors (Methods). For the majority of the cell lines, DNA originated from the iPSC lines; for a subset (n = 195), however, we used DNA obtained directly from donor cells, either blood or fibroblasts. On average, we generated approximately 656.0 million reads per cell line for a depth of 31X. After quality control procedures, we retained 178,745 exonic single nucleotide or short insertion-deletions variants for downstream analyses.

Derivation of growth rate from cell images

To measure cell growth rate, we plated cells from the iPSC lines of all 602 donors (ESM Table 1) on the New York Stem Cell Foundation (NYSCF) robotic culture system, performed at least one passage, and seeded cells at a standard density on fresh plates. We used a Celigo imaging instrument to image each well once a day from which we measured well coverage (expressed as a percent). In total, we had 139,246 timestamped well coverage measurements across 26,066 wells spanning all 602 lines (average 43.3 wells per cell line). For each well, we used the series of well coverage measurements to derive a growth Area-Under-the-Curve (gAUC) to represent growth rate for downstream analyses (Methods). After quality control procedures (Methods), we retained on average 27.0 wells with gAUC estimates per line (25.7 standard deviation [SD]) and observed high reproducibility across the replicate wells (intraclass correlation coefficient [ICC] = 0.96, P = 1.11×10−16, F-test). We also compared gAUC in 16 twin participants (eight pairs) and found them to be correlated (ICC = 0.65; P = 0.02, F-test). These observations suggest a genetic component of gAUC, though cell-specific effects or technical artifacts may also contribute (e.g., epigenomic events that take place during the development of pluripotency from a differentiated cell). To explore the effect of known technical or biological factors on gAUC, we tested for gAUC association with a variety of iPSC attributes including genetically-imputed donor sex, pluripotency reprogramming method, starting cell type for reprogramming, and cell culture media used (Methods). We found three terms to be associated with gAUC: cell culture media (P = 0.008), starting cell type (P = 1.68×10−15), and pluripotency reprogramming method (P = 1.80×10−7) (Fig 2A).

Fig. 2.Covariate associations with growth Area-Under-the-Curve (gAUC) and differential gene expression analyses.

(A) Effect size (y-axis) with 95% confidence intervals (CIs; lines) of iPSC line characteristics (x-axis). Color denotes FDR < 5%. (B) Volcano plots of log2(fold change) (x-axis) and →log10(P) (y-axis) from differential gene expression analysis. Color denotes associated genes (FDR < 5%).

Growth-associated differential gene expression analysis

We sought to characterize the transcriptional profile of iPSC growth rates and generated RNA-sequencing data for 333 lines after the lines were standardized to a uniform mTeSR Plus media (Methods). We generated 235 million reads on average. After quality control and gene expression quantification, we retained expression data for 15,603 genes (Methods).

We tested for associations of gene expression with gAUC and identified 3,091 differentially expressed genes (DEGs; false discovery rate [FDR] < 5%; Fig. 2B; Methods). Many of the genes that we found are well established cell growth and proliferation genes, confirming the gAUC phenotype: KDM4C,28 FGF8,29 SLITRK1,30 NRCAM,31 L1CAM,32 COBL,33 SYT4,34 KRT17,35 NCAPG2,36 PIK3CA,37 AXIN2,38 and RNF43.39

Contribution of common genetic variants to growth rate

We performed a genome-wide common variant association study (CVAS) to test for common variant associations with gAUC using all 602 iPSC lines, controlling for culture media type, starting cell type, reprogramming method, and the first ten genetic principal components (PCs) as covariates (Methods). We did not identify any genome-wide associated signals (P < 5×10−8). However, we found one locus with suggestive evidence of association (P < 1×10−6Fig 3A), tagged by rs4799161. The rs4799161 variant is located within the RP11-383C5.4 lncRNA and is approximately 451 kb upstream of SALL3, the nearest protein-coding gene. SALL3 has been found to influence methylation of Wnt signaling-related genes in iPSCs and, therefore, may play an important role in both cell proliferation and stem cell differentiation pathways.40

Fig. 3.Common-variant association study

(A) Manhattan plot of common variant associations (points) with gAUC. Nominal association threshold (P < 1 ↑ 10→6) is indicated by the blue line. Genome-wide association threshold (P < 5 ↑ 10→8) is indicated by the red line. (B) Percent of growth rate variance explained by the genetic relatedness matrix (y-axis) across different models (x-axis). The horizontal axis and colors indicate the model—either all samples (green), samples with RNA-seq (orange), or RNA-seq samples with pluripotency covariates (purple). The vertical error bars indicate 95% confidence intervals (CIs). (C) Power estimates (y-axis) by sample size (x-axis). Color indicates absolute effect size modelled.

Though we identified limited common variant associations, a substantial genetic component to gAUC may exist, consisting of highly polygenic, small effect sizes. To better understand the genetics of gAUC, we estimated the SNP-based, narrow-sense heritability modelling the genetic relatedness matrix (h 2) along with pluripotency of lines when applicable (Methods). We analyzed the heritability in three ways: (i) across all samples, including media, starting cell type, and reprogramming method as covariates (nAll = 602), (ii) across lines with RNA-seq, where we used expression of POU5F1PODXLNANOG and SOX2 to control for the pluripotency (nRNA = 333; only starting cell type and reprogramming method as covariates since all of the RNA-seq lines were grown in mTeSR media), and (iii) across lines with RNA-seq without controlling for pluripotency. Across all models, we found that the genetic component explained approximately 71-75% of the variance in gAUC, indicating a strong heritable component to iPSC growth rates (Fig. 3B). These findings are consistent with the high intraclass correlation coefficient identified within twin cell line pairs.

Finally, given our heritability findings, we hypothesized that the limited CVAS results were a consequence of an insufficient sample size. We performed calculations to determine an optimal sample size for investigating the effects of common variants on iPSC growth rates (Methods). We estimated that identifying common variants associated with growth phenotypes at 80% power would require a minimum sample size of 1,720 individuals to detect effect sizes of 0.30 and up to 3,222 individuals to detect smaller effects of 0.05 (Fig. 3C).

Contribution of rare genetic variants in protein coding regions to growth rate

For a highly polygenic phenotype, a CVAS often identifies loci with small effect sizes.41 However, rare deleterious or de novo mutations in coding regions may have large effects on cellular growth, as is evident in many cancers.42 To explore the role of moderately rare and rare variants (minor allele frequency [MAF] < 1%) in gAUC, we performed a rare variant association study (RVAS) across 514 donors with GS or ES by aggregating protein-coding variants across genes and testing gene-based burden scores with gAUC (Methods). We considered only variants with a predicted functional consequence of loss of function (pLoF) or putatively damaging missense with a CADD score ≥ 20. In total, after additional quality control filters (Methods), we tested for associations of 8,480 genes with gAUC.

We identified one association with decreased gAUC at WDR54 (PBonferroni < 0.05; Fig. 4A), driven by five potentially damaging missense variants in WDR54 (ESM Table 2). We did not detect any pLoF variants in WDR54. Because these missense variants occurred in DNA derived from both donor fibroblasts (n = 6) and iPSC lines themselves (n = 11), we cannot determine if they are somatic or germline. Given the difficulty in predicting the pathogenicity of missense variants, we conducted a sensitivity analysis by repeating the RVAS using a stricter CADD score threshold (≥ 25), resulting in 3,441 tests. Although the number of missense variants considered in WDR54 decreased to two, the association with decreased gAUC remained (PBonferroni < 0.05), supporting the robustness of this finding. We did not identify any additional associations in this restricted analysis.

Fig. 4.Rare-variant association study

(A) Manhattan plot of rare variant associations (points) with gAUC. The horizontal red line indicates Bonferroni P-value threshold (P < 5.90 ↑ 10→6). (B) The mean confluence trajectories stratified by rare-variant burden scores (color) for WDR54 with confluence percentage (y-axis) over time (x-axis). (C) A scatterplot displaying the relationship between WDR54 gene expression (x-axis) and the average gAUC (y-axis) per line (points). The shaded ribbon indicates the 95% CI.

Finally, to further explore the WDR54 rare variant association, we reconsidered the gAUC differential gene expression results and found an association of WDR54 expression with gAUC (FDR < 5%; P = 4.58×10−4Fig. 2B4C). Combined with the rare variant findings, these findings are consistent with previous reports of increased expression of WDR54 and cell/tumor proliferation.4345

Discussion

In this study, we investigated iPSC growth rates across 602 unique donors using a unique robotic culture system and an imaging instrument that allowed highly reproducible daily capture of cell growth. From the RNA-seq analysis, we identified thousands of differentially expressed genes associated with growth rate. Our rare variant association analysis identified WDR54 as having putatively damaging missense variants associated with reduced growth rate. This finding was also corroborated by our RNA-seq results, where increased expression of WDR54 was associated with increased growth rate. Our common variant association analysis did not identify variants associated with iPSC growth (P < 5×10−8), which our power analysis shows will require larger sample sizes.

Since we studied iPSC lines derived by different laboratories using different methods, it was reasonable to consider that other epigenetic factors might influence the growth rates of individual lines. We pursued three approaches to address this question: (i) We considered 16 cell lines from eight pairs of monozygotic twins and found their growth rates to be highly correlated (ICC = 0.65), though not as high as replicates of the same cell line studied at different times (ICC = 0.96). (ii) We decomposed the variance explained by a genetic relatedness matrix and observed high heritability, approximately 71-75%. (iii) We searched for and found differences in growth rates of iPSCs depending on the starting cell type (blood or skin), the pluripotency reprogramming method (mRNA-based or Sendai virus), and the media used for cell culture (mTeSR Plus or Freedom). Our study accounts for these differences by adding these three variables as covariates in our statistical analyses. These findings demonstrate that, after accounting for methodological sources of variation, a substantial proportion of the remaining variance in iPSC growth is attributable to genetic factors.

The differential gene expression analysis identified numerous genes with expression differences associated with growth rates. Many gAUC-associated genes are known oncogenes including FGF8,29 L1CAM,32 KRT17,35 NCAPG2,36 PIK3CA,37 and RNF43.39 The differential expression of genes associated with chromatin remodeling (i.e. KDM4C),28 cytoskeletal reorganization (COBL),33 and DNA damage repair (KDM4C)28 further implicates cellular features that modulate cell growth.

Finding the WDR54 association with iPSC growth rates comports with prior studies of the function of this gene. WDR54 is a prominent oncogene within colorectal cancer, bladder cancer, and T-cell acute lymphoblastic leukemia, and has also been proposed as a therapeutic target for head and neck squamous cell carcinoma.4446 Knockdown of WDR54 causes reduced cell growth and increased apoptosis.43,44,46 Mechanistic studies show that WDR54 enhances extracellular signal-regulated kinase activation—a process that ultimately regulates cell growth, proliferation, and differentiation—by sustaining epidermal growth factor receptor signaling at the plasma membrane.47 These established roles of WDR54 in cellular proliferation reinforce the biological context for the gene expression and rare-variant association finding.

Strengths of the study include the large number of lines considered, an automated system to precisely measure cell growth over time, and a broad characterization of iPSC growth contributors through SNP genotyping, DNA sequencing by ES and GS, and bulk RNA sequencing. One limitation is the inability to distinguish germline from somatic mutations in these iPSC lines; future studies may benefit by identifying somatic mutations through comparison of iPSC sequence with blood or skin DNA sequence from the donors.48 A second limitation is the inability to detect common variants associated with growth rate, suggesting future studies may need to be several times larger to have the necessary power to detect these polygenic effects. A third limitation is our dependence on computational methods to identify pathogenic variants in the coding region of genes, which may improve in the future with a combination of experimental work and more advanced artificial intelligence models of protein structure and variant function.

Awareness of the genetic drivers of iPSC growth can provide important insights for future research designs. For instance, in cell village association studies, lines with aggressive growth and enhanced differentiation characteristics can overwhelm a heterogeneous pool of cells, potentially confounding experimental outcomes.15,16,20 As large-scale genetic studies of iPSCs become more common, study designs should consider differences in starting cell type, type of media, and pluripotency reprogramming method. These studies may also benefit from genetic screening of individual lines for both somatic and germline rare variants in relevant genes, including WDR54.

Funding

This research was supported by the New York Stem Cell Foundation (NYSCF), a NYSCF Druckenmiller Fellowship (to DLT); the US National Institutes of Health grants ZIAHG000024 (to FSC and LGB), K99DK13917501 (to DLT), U01HL107388 (to the GENESiPS Consortium), R01DK106236 (to JWK), P30DK116074 (to the GENESiPS Consortium), R01DK116750 (to JWK), R01DK120565 (to JWK), R01DK137889 (to JWK), DP3DK112155 (to KAF), and U01HL107442 (to KAF); the Ikerbasque Research Fellowship sponsored by European Union (EU) grant H2020-MSCA-COFUND-2020-101034228-WOLFRAM2 (to ICO); the EU grant PID2023-148986OB-I00 (to ICO); the Gates Cambridge Trust (to HJT); and the NIH Oxford-Cambridge scholars’ program (to HJT).

Declaration of Interest

LGB is a member of the Illumina Medical Ethics advisory board, receives research support from Merck, and royalties from Wolters-Kluwer.

Extended authorship

NYSCF Global Stem Cell Array® team

Lauren Bauer1, Katie Brenner1, Geoff Buckley-Herd1, Matthew Butawan1, John Cerrone1, Anthony Chan1, Sean DesMarteau1, Patrick Fenton1, Peter Ferrarotto1, Brodie Fischbacher1, Camille Fulmore1, Jordan Goldberg1, Ankush Goyal1, Matt Green1, Anna K. Hahn1, Jenna Hall1, Erin Hinch1, Grayson Horn1, Christopher J. Hunter1, Dillion Hutson1, Deanna Ingrassia1, Selwyn Jacob1, Premlatha Jagadeesan1, Travis Kroeker1, Gregory Lallos1, Mike Leibold1, Hector Martinez1, Barry McCarthy1, Paul McCoy1, Connor McKnight1, Dorota N. Moroziewicz1, Enya O’Connor1, Reid Otto1, Temi Oyelola1, Katie Reggio1, Michael Santos1, Ana Sevilla1, Dong Woo Shin1, Vadim Solomonik1, Bruce Sun1, Nicole Tamarov1, Rebecca Tibbetts1, Farah Vejzagic1, Daniel White1, Christopher M. Woodard1, Hongyan Zhou1, Matthew Zimmer1

1. New York Stem Cell Foundation Research Institute, New York, NY 10019, USA

GENESiPS consortium

Ivan Carcamo-Orive1,2, Gabriel E. Hoffman3, Paige Cundiff4, Noam D. Beckmann3, Sunita L. D’Souza5, Joshua W. Knowles6,7,8,9, Achchhe Patel4, Dimitri Papatsenko4,10, Fahim Abbasi11, Gerald M. Reaven11, Sean Whalen12, Philip Lee11, Mohammad Shahbazi11, Marc Y. R. Henrion3, Kuixi Zhu3, Sven Wang3, Panos Roussos3,13,14, Eric E. Schadt3, Gaurav Pandey3, Rui Chang3, Thomas Quertermous11, Ihor Lemischka5

1. IKERBASQUE, Basque Foundation for Science, Bilbao 48009, Spain

2. Department of Endocrinology, Metabolism, Nutrition, and Kidney Disease, Biobizkaia Health Research Institute, Cruces 48903, Spain

3. Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

4. Department of Developmental and Regenerative Biology, Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

5. Department of Developmental and Regenerative Biology, Experimental Therapeutics Institute, Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

6. Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94304, USA

7. Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94304, USA

8. Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA 94304, USA

9. Stanford Prevention Research Center, Stanford University School of Medicine, Stanford, CA 94304, USA

10. Skolkovo Institute of Science and Technology, Nobel Street, Building 3, Moscow 143026, Russia

11. Department of Medicine and Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA

12. Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94148, USA

13. Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

14. Mental Illness Research, Education, and Clinical Center (VISN 3), James J. Peters VA Medical Center, Bronx, NY 10468, USA

iPSCORE consortium

Lana R. Aguiar1, Angelo D. Arias2, Timothy D. Arthur3,4, Paola Benaglio2, W. Travis Berggren5, Juan C. I. Belmonte6, Victor Borja1, Megan Cook1, Matteo D’Antonio4, Agnieszka D’Antonio-Chronowska7, Christopher DeBoever8, Kenneth E. Diffenderfer5, Margaret K. R. Donovan4,8, KathyJean Farnam1, Kelly A. Frazer1,9, Kyohei Fujita9, Melvin Garcia1, Olivier Harismendy4, Benjamin A. Henson1,9, David Jakubosky3,4, Kristen Jepsen1, Isaac Joshua1, He Li9, Hiroko Matsui1, Angelina McCarron1, Naoki Nariai9, Jennifer P. Nguyen4,8, Daniel T. O’Connor10, Jonathan Okubo1, Athanasia D. Panopoulous11,12, Fengwen Rao10, Joaquin Reyna1, Bianca M. Salgado1, Nayara Silva9, Erin N. Smith9, Josh Sohmer1, Shawn Yost8, William W. Y. Greenwald8

1. Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92092, USA

2. Department of Pediatrics, University of California, La Jolla, CA 92092, USA

3. Biomedical Sciences Program, University of California, San Diego, La Jolla, CA 92092, USA

4. Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92092, USA

5. Stem Cell Core, Salk Institute for Biological Studies, La Jolla, CA 92092, USA

6. Altos Labs, Inc., San Diego, CA 92121, USA

7. Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92092, USA

8. Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92092, USA

9. Department of Pediatrics, University of California, San Diego, La Jolla, CA 92092, USA

10. Department of Medicine, University of California, San Diego, La Jolla, CA 92092, USA

11. Board of Governors Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA

12. Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA

Methods

EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS

Summary of induced pluripotent stem cells (iPSCs)

We used existing lines and newly generated lines. We obtained existing lines from four primary studies: the Human Induced Pluripotent Stem Cells Initiative (HipSci; n = 33),22 GENEticS of Insulin Sensitivity (GENESiPS; n = 119),26 iPSC Collection for Omic REsearch (iPSCORE; n = 183),23,25 and NYSCF Global Stem Cell Array (GSA; n = 255).24 In addition, we generated 12 iPSC lines from human fibroblasts of consented donors from the Finland-United States Investigation of NIDDM (FUSION) study21 using the NYSCF automated, high-throughput iPSC characterization and differentiation platform as described previously.24 In total, this study considered 602 lines, which included sixteen twin participants (eight twin pairs).

iPSC thawing and processing

As iPSCs were delivered to NYSCF in various non-standard, non-uniform cryovials, they were not immediately available for thawing. As such, we used manual processes to thaw iPSCs not stored in Matrix tubes. We removed cryovials containing frozen iPSCs from liquid nitrogen storage and transferred them on dry ice before thawing. We placed vials into a 37°C water bath for ∼90 seconds before resuspending them in prewarmed mTeSR1 medium (STEMCELL Technologies, Cat: #85850, Vancouver, BC) containing 10% CloneR (STEMCELL Technologies, Cat: #05888). We transferred cells into a 15 mL conical tube and added additional mTeSR1 with 10% CloneR dropwise to the conical tube to ensure complete neutralization of the cryopreservative medium for a total volume of 15 mL per conical tube. We took a small aliquot (10 µL) of each cell suspension for cell counting using a dead/total count application on an Opera Phenix High-Content Screening System (herein referred to as Opera Phenix; PerkinElmer, Waltham, MA). We centrifuged cell suspensions at 800 revolutions per minute for four minutes and aspirated the supernatant without disturbing the cell pellet. We resuspended cells in mTeSR1 with 10% CloneR media solution and plated cells on a 12-well plate, 24-well plate, or 96-well Geltrex coated plate (Thermo Fisher, Cat: #A1413301), depending on their respective live cell counts as follows. If we counted fewer than 100,000 live cells for a given vial, we resuspended cells in 200 µL of mTeSR + 10% CloneR media and seeded cells into one well of a 96-well plate; if we counted fewer than 250,000 but more than 100,000 live cells, we resuspended cells in 500 µL of media solution and seeded cells into one well of a 24-well plate; and if we counted over 250,000 live cells, we resuspended cells in 1.0 mL of media solution seeded into one well of a 12-well plate.

iPSC culture, expansion, and passaging

After thawing and plating, we cultured cells for 48 hours at 37°C in 5% CO2 and mTeSR1 with 10% CloneR without changing media and without penicillin-streptomycin. We fed cells daily with mTeSR1. We collected supernatant from each well and tested for mycoplasma contamination with the MycoAlert® Mycoplasma PLUS Detection Kit (Lonza, Cat: #LT07-710, Rockville, MD) and the accompanying MycoAlert® Assay Control Set (Lonza, Cat: #LT02-518) as per the manufacturer’s instructions. After we tested for mycoplasma contamination, we transferred mycoplasma-free cells into the automated workcells and fed them mTeSR1 supplemented with penicillin-streptomycin (pen-strep; Thermo Fisher, Cat: #15140122). Upon reaching 60-80% confluence, we passaged cells on the NYSCF robotic culture system into a Geltrex coated 24-well plate (Thermo Fisher) at a density of 50,000 cells per well. We performed all liquid handling steps using an OT-2 liquid handler (OpenTrons, Long Island City, NY) and took measurements on a Synergy HT BioTek plate reader (BioTek, Winooski, VT).

We aspirated culture medium without disturbing the cell layer and washed wells with Accutase (Sigma-Aldrich, Cat: #A6964, St. Louis, MO). We dissociated cells in this same reagent before performing additional washes with 250 µL of Accutase. We incubated cells for 20 minutes at 37°C in 5% CO2. Once cells were fully dissociated, we neutralized Accutase by adding prewarmed mTeSR1 containing 10 µM Y27632 (Accutase-neutralization media ratio: 1:1). We added 100 µL of mTeSR1 for a 96-well plate; 250 µL of mTeSR1 for a 24-well plate, and 500 µL of mTeSR1 for a 12-well plate. We transferred cell suspensions into an intermediate 96-well round-bottom plate and centrifuged at 120 g for five minutes. After centrifugation, we aspirated the supernatant and resuspended cells in 1.0 mL of prewarmed mTeSR1 containing 10 µM of Y27632. We took 10 µL of each cell suspension for a dead/total cell count on the Opera Phenix. We transferred cells from the intermediate plate to the destination plate(s) at the appropriate cell density (for 24-well plates, 50,000 live cells per well; for 96-well plates, 10,000-20,000 live cells per well). We then returned destination plates to the incubator for a minimum of 24 hours.

After 24 hours, we performed a half-feed followed by daily media changes. Upon reaching 60-80% confluence, we passaged cells into Geltrex coated 96-well plates (Thermo Fisher) with 10,000-20,000 cells per well in mTeSR1 with pen-strep and 10 µM Y27632 dihydrochloride (AbMole, Cat: #M1817, Houston, TX). After 24 hours, we performed a half-feed before subsequent daily feeds with mTeSR. Upon reaching ∼90% confluency, we dissociated and counted cells before we transferred the cell suspension into Matrix tubes and centrifuged at 600×g for 5 minutes. We aspirated the supernatant without disturbing the cell pellets. We resuspended cells intended for bulk RNA sequencing in 100 µL of TRIzol™ (Thermo Fisher; Cat: #15596026) per Matrix tube and stored at −80°C. We left cells intended for SNP genotyping and genome/exome sequencing as dry pellets.

Ethics statement

All research in this study was conducted under category 1A as defined by the International Society for Stem Cell Research guidelines. All tissue samples were sourced under IRB #: OH95-HG-N030. Study subjects provided tissues under full consent and agreement to generate and use iPSC lines for broad research. All samples are anonymized such that personally identifiable information is not accessible.

METHOD DETAILS

RNA extraction and sequencing

We performed bulk RNA sequencing for all cell lines in-house at the NIH Intramural Sequencing Core (NISC). We extracted total RNA by transferring the 100 µL TRIzol™ cell suspension to a 1.5 mL microfuge tube and adding 900 µL of additional TRIzol™ following the manufacturer’s instructions. We eluted the RNA in 20 µL RNAse-free water. We assessed total RNA using the RNA 6000 Nano Kit (Agilent, Cat. #5067-1511, Santa Clara, CA) and quantitated using the QubitTM RNA Broad Range Assay kit (Thermo Fisher, Cat. #Q10211). We used 1.0 µg of total RNA with a RNA Integrity Number (RIN) ≥ 7 for library preparation with the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, Cat. #E7490, Ipswich, MA). We sequenced libraries on the Illumina NovaSeq 6000 platform using a S4 flow cell with 151 bp paired-end reads.

DNA extraction, genotyping, and sequencing

AKESOgen (Tempus, Chicago, IL) performed SNP genotyping of all NYSCF GSA iPSC lines. We extracted DNA using the Promega Maxwell 96 gDNA Miniprep HT System following the manufacturer’s recommendations (Promega, Cat. #A2670, Madison, WI). We transferred DNA to AKESOgen for sequencing using the Illumina Global Screening Array and processed raw data using Illumina Genome Studio with reference genome GRCh37. We used three Illumina BeadChip arrays: InfiniumOmni2-5Exome-8v1-3 v1.3 (Illumina, Cat. #20031813, San Diego, CA), Infinium Global Screening Array-24 Kit (Illumina, Cat. #20030770), and Infinium HumanCore-24 v1.2 (Illumina, Cat. #20024566).

We performed SNP genotyping of all other iPSC lines (i.e., from the FUSION, HipSci, iPSCORE, and GENESiPS cohorts) at the NHGRI Genomics Core. We resuspended DNA pellets in 100 µL 1X PBS (Thermo Fisher, Cat. #10010031) and transferred to a 1.5 mL microfuge tube (Fisher Scientific, Cat. #14-222-169). We added 100 µL of additional 1X PBS to wash the Matrix tubes and combined the volumes. We extracted DNA using the QiaAMP Mini kit (Qiagen, Cat. #51304, Hilden, Germany) following manufacturer instructions. We resuspended DNA in 50 µL Buffer AE and quantitated using QubitTM 1X dsDNA Broad Range Assay Kit (Thermo Fisher, Cat. #Q33266). We genotyped 250 ng of DNA using the Ilumina Omni 2.5 Exome-8 v1.5 beadchip array. We analyzed data in GenomeStudio with the v1.5_A1_GRCh37 manifest and the v1.5_A1 cluster file (Illumina).

Genosity (Invitae Corporation, South San Francisco, CA) performed exome sequencing (ES) of NYSCF GSA iPSC lines. We extracted DNA using the Promega Maxwell 96 gDNA Miniprep HT System following manufacturer’s instructions (Promega). We transferred DNA to Genosity for sequencing using Twist Human Core Exome EF Multiplex Complete Kit (Twist Bioscience, Cat. #PN 100803, South San Francisco, CA). Genosity sequenced at a minimum coverage of 25X of 90% of bases for 183 samples and returned the trimmed FASTQ files for downstream analyses.

We performed genome sequencing (GS) of iPSC lines from the FUSION, HipSci, and GENESiPS cohorts at NISC. We resuspended DNA pellets in 100 µL 1X PBS (Thermo Fisher) and transferred the contents to a 1.5 mL microfuge tube (Fisher Scientific). We added 100 µL of additional 1X PBS to wash the matrix tubes and combined the volumes. We extracted DNA using the QiaAMP Mini kit (Qiagen). We resuspended DNA in 50 µL Buffer AE and quantitated using the QubitTM 1X dsDNA Broad Range Assay Kit (Thermo Fisher, Cat. #Q33266). We submitted 1.0 µg of DNA to NISC for library preparation with the Illumina TruSeq PCR-free Sample Preparation kit (Illumina, Cat. #20015962). We sequenced libraries on the Illumina NovaSeqX platform using 25B flow cells.

We downloaded genome sequencing data sourced from donor blood or fibroblasts from iPSCORE (dbGaP accession number phs001325.v6.p1).23,25

Cellular imaging of individual wells

To derive a growth rate phenotype, we used daily measurements of well coverage (expressed as a percent) for each well. We imaged all plates stored within the automated system nightly using Celigo imagers during culture (Revvity Inc., Cat. #200-BFFL-5C, Waltham, MA). We used the default confluence analysis parameters from the Celigo software to measure well coverage.

QUANTIFICATION AND STATISTICAL ANALYSIS

Growth rate derivation from cellular images and replicate concordance

We only considered wells that contained at least three well coverage measurements between 30-70%. We removed outlier wells using magnitude-shape plot outlier detection in scikit-FDA v0.10.1.49 Because most donors had multiple replicate wells, we identified and removed wells with low concordance by calculating the pairwise Pearson correlation coefficients among all replicate wells for each donor. Specifically, we rounded all well coverage measurement time points to the nearest day and aligned the resulting time series of well coverage measurements for each well pair based on shared time points. We calculated Pearson correlation coefficients using all complete, aligned observations. We excluded any well that exhibited a Pearson correlation coefficient (r) < 0.75 with any other replicate well from the same donor. From these filtered data, we min-max normalized the individual per-well time series measurements and fitted each to a standard logistic curve (time measured in days versus normalized well coverage) using Growthcurver v0.3.1.50 We further excluded replicate wells that either failed to fit a growth curve or showed a residual sum of squares from the logistic curve fit exceeding three standard deviations above the mean error. Finally, for each remaining well, we derived the growth Area-Under-the-Curve (gAUC) metric from the fitted curves.

To assess concordance of gAUC across replicate wells corresponding to each donor and within the eight twin pairs, we calculated the intraclass correlation coefficient (ICC) using a one-way random-effects model for single measurements (ICC(1,1)) as implemented in the Python statsmodels package v0.14.4.51

SNP genotype quality control procedures and imputation

After SNP chip genotyping, we aligned array probe sequences to the GRCh37 reference genome using the Burrows-Wheeler Aligner (BWA) v0.17.7.52 Pre-imputation quality control procedures included removing multi-mapping probes, probes containing known 3’ variants, and non-SNP variants. We further filtered out variants with per-variant missingness > 5% and Hardy-Weinberg equilibrium (HWE) P < 10⁻⁶, and removed samples exceeding 5% missingness. Using the 1000 Genomes Project Phase 3 data as a reference,27 we verified and corrected strand orientation, allele assignments, genomic positions, and reference/alternate designations. We excluded biallelic variants with minor allele frequencies (MAF) > 0.4, variants absent from the reference panel, and variants with discordant allele assignments. We merged genotyping data from individual arrays prior to lifting over and imputation.

We conducted lift over from GRCh37 to GRCh38, pre-imputation variant-level quality control procedures, and genotype imputation using the TOPMed Imputation Server.53 We used Eagle v2.454 to perform SNV pre-phasing and minimac453 to impute SNV genotype dosages with the TOPMed imputation panel v3. After imputation, we retained biallelic SNPs meeting the following criteria: MAF > 0.05, imputation quality r2 > 0.3, and HWE P > 10⁻10. We merged the quality-controlled imputed data across studies and used PLINK v2.00a555 with the “–king-cutoff” flag to remove genetically identical samples (KING relatedness coefficient > 0.354).56 Our final study set contained 602 unique individuals. To infer donor sex for all 602 individuals based on genotype calls, we used bcftools v1.1757 with the “guess-ploidy” plugin considering genotypes in the non-pseudoautosomal region of chromosome X from quality-controlled common variant data.

We also generated linkage disequilibrium (LD)-pruned genotypes for downstream analyses, including genetic principal component (PC) calculation and power analyses. We retained only biallelic SNPs with MAF > 0.05, HWE P > 10−10, and imputation quality r2 > 0.8. We excluded high-LD regions defined by Anderson et al.58 We applied LD pruning using a sliding window of 1 kb, a step size of 0.8, and an LD threshold of r2 < 0.1, retaining 66,917 pruned variants for downstream analysis.

Variant calling and quality control procedures in genomic sequencing

After GS and ES, we processed reads according to the GATK Best Practices pipeline for variant discovery.59 Following adapter trimming, we aligned paired-end reads to the GRCh38 reference genome using BWA MEM v0.17.752 with default parameters. We subsequently filtered per-lane BAM files to retain high-quality, primary alignments, merged, and sorted using samtools v1.22.60 To ensure accurate variant calling, we recalibrated base quality scores using the Genome Analysis Toolkit (GATK) v4.0.5.1 BaseRecalibrator,61 incorporating dbSNP v15162 as well as the Mills and 1000 Genomes gold standard indels59 to correct systematic biases in base quality scores. We performed variant calling per sample in GVCF mode using the GATK HaplotypeCaller.61 We calibrated variants (indels and SNPs) via Variant Quality Score Recalibration (VQSR),61 which was trained using the Mills and 1000 Genomes gold standard indels,59 HapMap v3.3,63 sites found to be polymorphic on the Illumina Omni 2.5M SNP array,59 and dbSNP v15162 to finalize high-quality, recalibrated SNP and indel calls.59,62,63

We used VerifyBamID v1.1.164 to compare the sequencing data to quality-controlled genotypes and identified no sample swaps. For downstream analyses, we retained biallelic variants (SNPs and indels with length ≤ 5 bp) meeting the following criteria: MAF < 0.1 and HWE P > 10−10. We kept genetically distinct samples (KING relatedness coefficient < 0.354) as determined from SNP genotype data (n = 514).

RNA-seq data processing and quality control procedures

After RNA sequencing, we performed transcript quantification using Salmon v1.10.365 in quasi-mapping mode with default parameters, aligning reads to the GRCh38 human reference transcriptome (Ensembl v11166). We used tximport v1.32.067 to convert transcript-level abundance estimates to gene-level counts for differential gene expression analysis via the lengthScaledTPM function.

As an additional quality control procedure, we used STAR v2.7.11a68 to align reads to the GRCh38 reference genome in a paired-end fashion. We identified sample swaps and contaminated samples by comparing the allelic RNA-seq read count distribution to the finalized genotypes of the 602 donors using VerifyBamID v1.1.1.64 We identified and resolved 38 pairs of sample swaps. We removed 25 contaminated samples from the analyses (FREEMIX or CHIPMIX > 0.055). After these procedures, 333 iPSC lines remained for final analyses. For downstream analyses, we considered only genes with transcripts per million (TPM) > 0.5 in at least 50% of samples, resulting in 15,603 features.

Determination of covariates for downstream analyses

To determine covariates for subsequent models, we fit a linear mixed model implemented in lmerTest v3.1.35869 to test if genetically-imputed donor sex, pluripotency reprogramming method, starting cell type, and cell culture media are associated (FDR < 5%, Benjamini-Hochberg procedure70) with gAUC, controlling for donor as a random effect. We found pluripotency reprogramming method, starting cell type, and cell culture media were associated with gAUC and therefore included these three terms in subsequent association analyses. We did not include cell culture media type as a covariate in our differential expression analyses because all cell lines included in those analyses were cultured in mTeSR media.

Differential expression analyses

We normalized gAUC using inverse rank normalization and calculated the gAUC mean value for each RNA-seq sample. We performed differential expression analysis using a three-stage process as previously described in Taylor et al.71 First, we tested for differentially expressed genes across gAUC using DESeq2 v1.36.072 with default parameters and adjusting for starting cell type and pluripotency reprogramming method as fixed effect covariates. Next, we used RUVSeq v1.30.073 with default parameters to derive a factor representing unwanted, latent variation using control gene expression, defined as genes with P > 0.25 from the initial differential expression analysis. Finally, we re-ran differential expression analysis and incorporated the calculated RUVSeq factor as an additional fixed-effect covariate. We performed multiple hypothesis correction using the Benjamini-Hochberg procedure.74

Genetic relatedness matrix calculation

We used GCTA v1.94.175 to compute a genetic relatedness matrix (GRM) on the basis of common variant genotype data (MAF > 0.1). We used the GRM in the variance decomposition, common variant, and rare variant analyses.

Variance decomposition analyses

We performed variance decomposition analyses using GCTA v1.94.175 at the cell line (i.e., donor) level with the mean gAUC value across all replicates per donor. We used three subsets of the available data: (i) all unique cell lines (n = 602) including cell culture media, starting cell type, and pluripotency reprogramming method as discrete covariates, (ii) all unique cell lines with bulk RNA-seq data incorporating starting cell type and pluripotency reprogramming method as discrete covariates (n = 333), and (iii) all unique cell lines with bulk RNA-seq data incorporating pluripotency markers as quantitative covariates and starting cell type and pluripotency reprogramming method as discrete covariates (n = 333). We modelled pluripotency with gene expression of four established pluripotency markers measured in TPM: POU5F1 (ENSG00000204531), SOX2 (ENSG00000181449), NANOG (ENSG00000111704), and PODXL (ENSG00000128567).5,76 For subsets (ii) and (iii), we did not include media as a discrete covariate since all cell lines with available RNA-seq data were cultured with mTeSR media.

We fit the following model:Embedded Image

where Y represents the vector of sample-level mean phenotype values, X is the matrix of fixed-effect covariates (either media or gene expression of pluripotency markers), Embedded Image is the random genetic effect modeled using the GRM Kg calculated as described above, and Embedded Image represents the residual error term. Here, Embedded Image and Embedded Image denote the genetic and residual variance components, respectively.

To quantify the proportion of phenotypic variance explained by genetic factors, we calculated the narrow-sense SNP heritability as:Embedded Image

Genetic principal component and ancestry assignment

We calculated genetic PCs and assigned participants to genetic ancestry groups using Fast and Robust Ancestry Prediction by Online singular value decomposition and Shrinkage Adjustment (FRAPOSA)77. Briefly, FRAPOSA works by performing PC analysis in a diverse reference cohort with known genetic ancestry labels, systematically calculating PCs for each participant in a target cohort by comparing them to the reference, and grouping participants from the target cohort to the most similar genetic ancestry group from the reference. Using FRAPOSA with default parameters, LD-pruned genotypes as input, and 1000 Genomes Project Phase 3 data (1KG)27 as the reference cohort, we calculated the first 30 PCs and grouped participants into 1KG-AFR-like (African), 1KG-AMR-like (American), 1KG-EUR-like (European), 1KG-EAS-like (East Asian), or 1KG-SAS-like (South Asian) genetic ancestries.

Common variant association testing

We normalized gAUC growth phenotypes through rank inverse normalization. We conducted association tests using the Generalized Linear Mixed Model Association Tests (GMMAT) v1.4.2 framework with default parameters.78 The analyses proceeded in two steps: (i) fitting a null model using the GRM, and (ii) performing score tests to identify genetic associations. We included 10 genetic PCs, cell culture media type, starting cell type, and pluripotency reprogramming method as fixed-effect covariates. We considered variants with P < 5×10−8 as genome-wide associated and those with P < 10−6 as nominally associated.

Power analyses for common variant associations

We estimated the sample size needed to detect common variant associations with gAUC using an unbalanced one-way ANOVA framework, implemented in the powerEQTL package v0.3.6.79 Briefly, for a range of absolute effect size thresholds (absolute effect size < 0.1, 0.2, 0.3, 0.4, 0.5), we calculated statistical power for up to 6,000 samples. Calculations assumed a genome-wide threshold of 5×10−8, a maximum MAF of 0.5, and a residual standard deviation (σ = 0.4584) derived from variance decomposition across all 602 samples from the common variant association study. We estimated the effect size parameter “deltaVec” by averaging the observed differences in mean gAUC comparing heterozygotes to each homozygous genotype group for LD-pruned variants within each effect size bin.

Rare variant annotation, aggregation, and association testing

We annotated variants using Variant Effect Predictor (VEP,80 Ensembl v11166) selecting the most severe consequence across all protein-coding transcripts (Ensembl v111 annotations66) for each variant. We classified predicted loss of function (pLoF) variants as those with consequences including transcript ablation, splice acceptor variant, splice donor variant, stop gained, frameshift variant, stop lost, start lost, transcript amplification, feature elongation, or feature truncation. We aslo evaluated VEP-annotated missense variants for deleteriousness using CADD v1.681 and classified variants with scores ≥ 20 as damaging.

For each gene, we aggregated coding variants with MAF < 0.01 that were annotated as either pLoF or damaging missense. To be considered in downstream analyses, we required a gene to contain at least two qualifying variants and carry a non-zero burden score in at least 1% of the cohort, resulting in 8,480 genes.

We normalized gAUC growth phenotypes through rank inverse normalization. We tested for associations using GMMAT v1.4.282 with default parameters. The analyses proceeded in two steps: (i) fitting a null model using the GRM, (ii) performing hybrid set-based tests to identify gene-level rare variant associations using the provided variant-gene annotations. For the hybrid test, we used the SMMAT function implemented in GMMAT, which efficiently combines a burden test and an adjusted SKAT test. As covariates, we included 10 genetic PCs, cell culture media type, starting cell type, and pluripotency reprogramming method as fixed-effect covariates. We applied a Bonferroni-corrected threshold to account for the number of unique genes tested (P < 5.90×10−6, 8,480 tests). For the sensitivity analysis, we fit a regression in the same manner but using CADD ≥ 25 to define damaging missense variants, resulting in an adjusted Bonferroni-corrected threshold of P < 1.45×10−5 (3,441 tests) after gene aggregation.

Supplemental Tables

ESM Table S1.Summary statistics of 602 iPSC line attributes.

ESM Table S2.WDR54 variants from the rare variant association study. Genomic coordinates from GRCh38. Source of genotype DNA reported in columns: “Blood DNA”, “Fibroblast DNA”, and “iPSC DNA.”

Supplemental Figures

ESM Fig. S1.Genetic diversity represented in cohort.

First four genetic PCs and predicted genetic ancestry group from FRAPOSA analysis (Methods). Color indicates genetic ancestry group (1KG-AFR-like, 1KG-AMR-like, 1KG-EAS-like, 1KG-EUR-like, or 1KG-SAS-like). Left-hand side plot displays principal components 1 versus 2; right-hand side plot displays principal components 3 versus 4. Point shape indicates study samples (squares) and reference panel samples (circles).

Acknowledgments

Funder Information Declared

New York Stem Cell Foundation, https://ror.org/03n2a3p06, NYSCF Druckenmiller Fellowship

National Institutes of Health, https://ror.org/01cwqze88, ZIAHG000024, K99DK13917501, U01HL107388, R01DK106236, P30DK116074, R01DK116750

National Institutes of Health, https://ror.org/01cwqze88, R01DK120565, R01DK137889, DP3DK112155, U01HL107442, Oxford-Cambridge scholars’ program

European Commission, https://ror.org/00k4n6c32, H2020-MSCA-COFUND-2020-101034228-WOLFRAM2, PID2023-148986OB-I00

Gates Cambridge Trust, Gates Cambridge Scholarship

Footnotes

  • ^ DP, FSC, and DLT are joint senior authors.

This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license

Bibliography

  1. Park, I.H., Zhao, R., West, J.A., Yabuuchi, A., Huo, H., Ince, T.A., Lerou, P.H., Lensch, M.W., and Daley, G.Q. (2008). Reprogramming of human somatic cells to pluripotency with defined factors. Nature 451, 141–146. doi:10.1038/nature06534.
  2. Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., and Yamanaka, S. (2007). Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872. doi:10.1016/j.cell.2007.11.019.
  3. Yu, J., Vodyanik, M.A., Smuga-Otto, K., Antosiewicz-Bourget, J., Frane, J.L., Tian, S., Nie, J., Jonsdottir, G.A., Ruotti, V., Stewart, R., et al. (2007). Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917–1920. doi:10.1126/science.1151526.
  4. Veres, A., Faust, A.L., Bushnell, H.L., Engquist, E.N., Kenty, J.H.-R., Harb, G., Poh, Y.-C., Sintov, E., Gürtler, M., Pagliuca, F.W., et al. (2019). Charting cellular identity during human in vitro β-cell differentiation. Nature 569, 368–373. doi:10.1038/s41586-019-1168-5.
  5. Pagliuca, F.W., Millman, J.R., Gürtler, M., Segel, M., Van Dervort, A., Ryu, J.H., Peterson, Q.P., Greiner, D., and Melton, D.A. (2014). Generation of functional human pancreatic β cells in vitro. Cell 159, 428–439. doi:10.1016/j.cell.2014.09.040.
  6. Busskamp, V., Lewis, N.E., Guye, P., Ng, A.H.M., Shipman, S.L., Byrne, S.M., Sanjana, N.E., Murn, J., Li, Y., Li, S., et al. (2014). Rapid neurogenesis through transcriptional activation in human stem cells. Mol. Syst. Biol. 10, 760. doi:10.15252/msb.20145508.
  7. Pang, Z.P., Yang, N., Vierbuchen, T., Ostermeier, A., Fuentes, D.R., Yang, T.Q., Citri, A., Sebastiano, V., Marro, S., Südhof, T.C., et al. (2011). Induction of human neuronal cells by defined transcription factors. Nature 476, 220–223. doi:10.1038/nature10202.
  8. Walczak, M.P., Drozd, A.M., Stoczynska-Fidelus, E., Rieske, P., and Grzela, D.P. (2016). Directed differentiation of human iPSC into insulin producing cells is improved by induced expression of PDX1 and NKX6.1 factors in IPC progenitors. J. Transl. Med. 14, 341. doi:10.1186/s12967-016-1097-0.
  9. Chehelgerdi, M., Behdarvand Dehkordi, F., Chehelgerdi, M., Kabiri, H., Salehian-Dehkordi, H., Abdolvand, M., Salmanizadeh, S., Rashidi, M., Niazmand, A., Ahmadi, S., et al. (2023). Exploring the promising potential of induced pluripotent stem cells in cancer research and therapy. Mol. Cancer 22, 189. doi:10.1186/s12943-023-01873-0.
  10. Cerneckis, J., Cai, H., and Shi, Y. (2024). Induced pluripotent stem cells (iPSCs): molecular mechanisms of induction and applications. Signal Transduct. Target. Ther. 9, 112. doi:10.1038/s41392-024-01809-0.
  11. Melton, D. (2021). The promise of stem cell-derived islet replacement therapy. Diabetologia 64, 1030–1036. doi:10.1007/s00125-020-05367-2.
  12. Ward, L.D., and Kellis, M. (2012). Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 1095–1106. doi:10.1038/nbt.2422.
  13. Tak, Y.G., and Farnham, P.J. (2015). Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenetics Chromatin 8, 57. doi:10.1186/s13072-015-0050-4.
  14. Umans, B.D., Battle, A., and Gilad, Y. (2021). Where Are the Disease-Associated eQTLs? Trends Genet. 37, 109–124. doi:10.1016/j.tig.2020.08.009.
  15. Wells, M.F., Nemesh, J., Ghosh, S., Mitchell, J.M., Salick, M.R., Mello, C.J., Meyer, D., Pietilainen, O., Piccioni, F., Guss, E.J., et al. (2023). Natural variation in gene expression and viral susceptibility revealed by neural progenitor cell villages. Cell Stem Cell 30, 312–332.e13. doi:10.1016/j.stem.2023.01.010.
  16. Neavin, D.R., Steinmann, A.M., Farbehi, N., Chiu, H.S., Daniszewski, M.S., Arora, H., Bermudez, Y., Moutinho, C., Chan, C.-L., Bax, M., et al. (2023). A village in a dish model system for population-scale hiPSC studies. Nat. Commun. 14, 3240. doi:10.1038/s41467-023-38704-1.
  17. 17.Cuomo, A.S.E., Seaton, D.D., McCarthy, D.J., Martinez, I., Bonder, M.J., Garcia-Bernardo, J., Amatya, S., Madrigal, P., Isaacson, A., Buettner, F., et al. (2020). Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun. 11, 810. doi:10.1038/s41467-020-14457-z.
  18. Schwartzentruber, J., Foskolou, S., Kilpinen, H., Rodrigues, J., Alasoo, K., Knights, A.J., Patel, M., Goncalves, A., Ferreira, R., Benn, C.L., et al. (2018). Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61. doi:10.1038/s41588-017-0005-8.
  19. Nguyen, J.P., Arthur, T.D., Fujita, K., Salgado, B.M., Donovan, M.K.R., iPSCORE Consortium, Matsui, H., Kim, J.H., D’Antonio-Chronowska, A., D’Antonio, M., et al. (2023). eQTL mapping in fetal-like pancreatic progenitor cells reveals early developmental insights into diabetes risk. Nat. Commun. 14, 6928. doi:10.1038/s41467-023-42560-4.
  20. Mitchell, J.M., Nemesh, J., Ghosh, S., Handsaker, R.E., Mello, C., Meyer, D., Raghunathan, K., de Rivera, H., Tegtmeyer, M., Hawes, D., et al. (2020). Mapping genetic effects on cellular phenotypes with “cell villages.” BioRxiv. doi:10.1101/2020.06.29.174383.
  21. Scott, L.J., Erdos, M.R., Huyghe, J.R., Welch, R.P., Beck, A.T., Wolford, B.N., Chines, P.S., Didion, J.P., Narisu, N., Stringham, H.M., et al. (2016). The genetic regulatory signature of type 2 diabetes in human skeletal muscle. Nat. Commun. 7, 11764. doi:10.1038/ncomms11764.
  22. Streeter, I., Harrison, P.W., Faulconbridge, A., The HipSci Consortium, Flicek, P., Parkinson, H., and Clarke, L. (2017). The human-induced pluripotent stem cell initiative-data resources for cellular genetics. Nucleic Acids Res. 45, D691–D697. doi:10.1093/nar/gkw928.
  23. Panopoulos, A.D., D’Antonio, M., Benaglio, P., Williams, R., Hashem, S.I., Schuldt, B.M., DeBoever, C., Arias, A.D., Garcia, M., Nelson, B.C., et al. (2017). iPSCORE: A Resource of 222 iPSC Lines Enabling Functional Characterization of Genetic Variation across a Variety of Cell Types. Stem Cell Reports 8, 1086–1100. doi:10.1016/j.stemcr.2017.03.012.
  24. Paull, D., Sevilla, A., Zhou, H., Hahn, A.K., Kim, H., Napolitano, C., Tsankov, A., Shang, L., Krumholz, K., Jagadeesan, P., et al. (2015). Automated, high-throughput derivation, characterization and differentiation of induced pluripotent stem cells. Nat. Methods 12, 885–892. doi:10.1038/nmeth.3507.
  25. Arthur, T.D., Nguyen, J.P., Henson, B.A., D’Antonio-Chronowska, A., Jaureguy, J., Silva, N., iPSCORE Consortium, Panopoulos, A.D., Izpisua Belmonte, J.C., D’Antonio, M., et al. (2025). Multiomic QTL mapping reveals phenotypic complexity of GWAS loci and prioritizes putative causal variants. Cell Genomics 5, 100775. doi:10.1016/j.xgen.2025.100775.
  26. CarcamoOrive, I., Hoffman, G.E., Cundiff, P., Beckmann, N.D., D’Souza, S.L., Knowles, J.W., Patel, A., Papatsenko, D., Abbasi, F., Reaven, G.M., et al. (2017). Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity. Cell Stem Cell 20, 518–532.e9. doi:10.1016/j.stem.2016.11.005.
  27. 1000 Genomes Project Consortium, Auton, A., Brooks, L.D., Durbin, R.M., Garrison, E.P., Kang, H.M., Korbel, J.O., Marchini, J.L., McCarthy, S., McVean, G.A., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. doi:10.1038/nature15393.
  28. .Zhu, K., Zhang, H., Luan, Y., Hu, B., Shen, T., Ma, B., Zhang, Z., and Zheng, X. (2024). KDM4C promotes mouse hippocampal neural stem cell proliferation through modulating ApoE expression. FASEB J. 38, e23511. doi:10.1096/fj.202302439R.
  29. Liu, R., Huang, S., Lei, Y., Zhang, T., Wang, K., Liu, B., Nice, E.C., Xiang, R., Xie, K., Li, J., et al. (2015). FGF8 promotes colorectal cancer growth and metastasis by activating YAP1. Oncotarget 6, 935–952. doi:10.18632/oncotarget.2822.
  30. Aruga, J., and Mikoshiba, K. (2003). Identification and characterization of Slitrk, a novel neuronal transmembrane protein family controlling neurite outgrowth. Mol. Cell. Neurosci. 24, 117–129. doi:10.1016/s1044-7431(03)00129-5.
  31. Zelina, P., Avci, H.X., Thelen, K., and Pollerberg, G.E. (2005). The cell adhesion molecule NrCAM is crucial for growth cone behaviour and pathfinding of retinal ganglion cell axons. Development 132, 3609–3618. doi:10.1242/dev.01934.
  32. Kiefel, H., Bondong, S., Hazin, J., Ridinger, J., Schirmer, U., Riedle, S., and Altevogt, P. (2012). L1CAM: a major driver for tumor cell invasion and motility. Cell Adh. Migr. 6, 374–384. doi:10.4161/cam.20832.
  33. Haag, N., Schwintzer, L., Ahuja, R., Koch, N., Grimm, J., Heuer, H., Qualmann, B., and Kessels, M.M. (2012). The actin nucleator Cobl is crucial for Purkinje cell development and works in close conjunction with the F-actin binding protein Abp1. J. Neurosci. 32, 17842–17856. doi:10.1523/JNEUROSCI.0843-12.2012.
  34. Huang, C., Walker, E.M., Dadi, P.K., Hu, R., Xu, Y., Zhang, W., Sanavia, T., Mun, J., Liu, J., Nair, G.G., et al. (2018). Synaptotagmin 4 regulates pancreatic β cell maturation by modulating the ca2+ sensitivity of insulin secretion vesicles. Dev. Cell 45, 347–361.e5. doi:10.1016/j.devcel.2018.03.013.
  35. Mikami, Y., Fujii, S., Nagata, K., Wada, H., Hasegawa, K., Abe, M., Yoshimoto, R.U., Kawano, S., Nakamura, S., and Kiyoshima, T. (2017). GLI-mediated Keratin 17 expression promotes tumor cell growth through the anti-apoptotic function in oral squamous cell carcinomas. J. Cancer Res. Clin. Oncol. 143, 1381–1393. doi:10.1007/s00432-017-2398-2.
  36. Zhan, P., Xi, G.-M., Zhang, B., Wu, Y., Liu, H.-B., Liu, Y.-F., Xu, W.-J., Zhu, Q., Cai, F., Zhou, Z.-J., et al. (2017). NCAPG2 promotes tumour proliferation by regulating G2/M phase and associates with poor prognosis in lung adenocarcinoma. J. Cell. Mol. Med. 21, 665–676. doi:10.1111/jcmm.13010.
  37. Samuels, Y., Diaz, L.A., Schmidt-Kittler, O., Cummins, J.M., Delong, L., Cheong, I., Rago, C., Huso, D.L., Lengauer, C., Kinzler, K.W., et al. (2005). Mutant PIK3CA promotes cell growth and invasion of human cancer cells. Cancer Cell 7, 561–573. doi:10.1016/j.ccr.2005.05.014.
  38. Sun, X.-L., Chen, Z.-H., Guo, X., Wang, J., Ge, M., Wong, S.Z.H., Wang, T., Li, S., Yao, M., Johnston, L.A., et al. (2023). Stem cell competition driven by the Axin2-p53 axis controls brain size during murine development. Dev. Cell 58, 744–759.e11. doi:10.1016/j.devcel.2023.03.016.
  39. Koo, B.-K., Spit, M., Jordens, I., Low, T.Y., Stange, D.E., van de Wetering, M., van Es, J.H., Mohammed, S., Heck, A.J.R., Maurice, M.M., et al. (2012). Tumour suppressor RNF43 is a stem-cell E3 ligase that induces endocytosis of Wnt receptors. Nature 488, 665–669. doi:10.1038/nature11308.
  40. Kuroda, T., Yasuda, S., Tachi, S., Matsuyama, S., Kusakawa, S., Tano, K., Miura, T., Matsuyama, A., and Sato, Y. (2019). SALL3 expression balance underlies lineage biases in human induced pluripotent stem cell differentiation. Nat. Commun. 10, 2175. doi:10.1038/s41467-019-09511-4.
  41. O’Connor, L.J., Schoech, A.P., Hormozdiari, F., Gazal, S., Patterson, N., and Price, A.L. (2019). Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476. doi:10.1016/j.ajhg.2019.07.003.
  42. Kandoth, C., McLellan, M.D., Vandin, F., Ye, K., Niu, B., Lu, C., Xie, M., Zhang, Q., McMichael, J.F., Wyczalkowski, M.A., et al. (2013). Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339. doi:10.1038/nature12634.
  43. Li, H., Zhang, D., Fu, Q., Wang, S., Zhang, X., Lin, Z., Wang, Z., Song, J., Su, Z., Xue, V., et al. (2023). WDR54 exerts oncogenic roles in T-cell acute lymphoblastic leukemia. Cancer Sci. 114, 3318–3329. doi:10.1111/cas.15872.
  44. 44.Wei, X., Wang, B., Wu, Z., Yang, X., Guo, Y., Yang, Y., Fang, Z., Yi, C., Zhang, L., Fan, X., et al. (2023). WD repeat protein 54-mediator of ErbB2-driven cell motility 1 axis promotes bladder cancer tumorigenesis and metastasis and impairs chemosensitivity. Cancer Lett. 556, 216058. doi:10.1016/j.canlet.2023.216058.
  45. 45.Jeong, E.-J., Kim, E., and Kim, Y.S. (2024). Identification of novel therapeutic targets for head and neck squamous cell carcinoma through bioinformatics analysis. Sci. Rep. 14, 32102. doi:10.1038/s41598-024-83680-1.
  46. 46.Yuan, Y., Qi, G., Shen, H., Guo, A., Cao, F., Zhu, Y., Xiao, C., Chang, W., and Zheng, S. (2019). Clinical significance and biological function of WD repeat domain 54 as an oncogene in colorectal cancer. Int. J. Cancer 144, 1584–1595. doi:10.1002/ijc.31736.
  47. 47.Maeda, A., Nishino, T., Matsunaga, R., Yokoyama, A., Suga, H., Yagi, T., and Konishi, H. (2019). Transglutaminase-mediated cross-linking of WDR54 regulates EGF receptor-signaling. Biochim. Biophys. Acta Mol. Cell Res. 1866, 285–295. doi:10.1016/j.bbamcr.2018.11.009.
  48. 48.Ji, J., Ng, S.H., Sharma, V., Neculai, D., Hussein, S., Sam, M., Trinh, Q., Church, G.M., McPherson, J.D., Nagy, A., et al. (2012). Elevated coding mutation rate during the reprogramming of human somatic cells into induced pluripotent stem cells. Stem Cells 30, 435–440. doi:10.1002/stem.1011.
  49. 49.Ramos-Carreño, C., Torrecilla, J.L., Hong, Y., and Suárez, A. (2022). scikit-fda: Computational Tools for Machine Learning with Functional Data. In 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) (IEEE), pp. 213–218. doi:10.1109/ICTAI56018.2022.00038.
  50. 50.Sprouffske, K., and Wagner, A. (2016). Growthcurver: an R package for obtaining interpretable metrics from microbial growth curves. BMC Bioinformatics 17, 172. doi:10.1186/s12859-016-1016-7.
  51. 51.Seabold, S., and Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with Python. In Proceedings of the 9th Python in Science Conference Proceedings of the python in science conference. (SciPy), pp. 92–96. doi:10.25080/Majora-92bf1922-011.
  52. 52.Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. doi:10.1093/bioinformatics/btp324.
  53. 53.Das, S., Forer, L., Schönherr, S., Sidore, C., Locke, A.E., Kwong, A., Vrieze, S.I., Chew, E.Y., Levy, S., McGue, M., et al. (2016). Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287. doi:10.1038/ng.3656.
  54. .Loh, P.-R., Danecek, P., Palamara, P.F., Fuchsberger, C., A Reshef, Y., K Finucane, H., Schoenherr, S., Forer, L., McCarthy, S., Abecasis, G.R., et al. (2016). Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448. doi:10.1038/ng.3679.
  55. Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M., and Lee, J.J. (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, s13742–015–0047–8. doi:10.1186/s13742-015-0047-8.
  56. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., and Chen, W.-M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873. doi:10.1093/bioinformatics/btq559.
  57. Danecek, P., Bonfield, J.K., Liddle, J., Marshall, J., Ohan, V., Pollard, M.O., Whitwham, A., Keane, T., McCarthy, S.A., Davies, R.M., et al. (2021). Twelve years of SAMtools and BCFtools. Gigascience 10. doi:10.1093/gigascience/giab008.
  58. Anderson, C.A., Pettersson, F.H., Clarke, G.M., Cardon, L.R., Morris, A.P., and Zondervan, K.T. (2010). Data quality control in genetic case-control association studies. Nat. Protoc. 5, 1564–1573. doi:10.1038/nprot.2010.116.
  59. Van der Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al. (2013). From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 11, 11.10.1–11.10.33. doi:10.1002/0471250953.bi1110s43.
  60. Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. doi:10.48550/arxiv.1303.3997.
  61. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. doi:10.1101/gr.107524.110.
  62. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311. doi:10.1093/nar/29.1.308.
  63. .Pemberton, T.J., Wang, C., Li, J.Z., and Rosenberg, N.A. (2010). Inference of unexpected genetic relatedness among individuals in HapMap Phase III. Am. J. Hum. Genet. 87, 457–464. doi:10.1016/j.ajhg.2010.08.014.
  64. Jun, G., Wing, M.K., Abecasis, G.R., and Kang, H.M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 25, 918–925. doi:10.1101/gr.176552.114.
  65. Patro, R., Duggal, G., Love, M.I., Irizarry, R.A., and Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419. doi:10.1038/nmeth.4197.
  66. Dyer, S.C., Austine-Orimoloye, O., Azov, A.G., Barba, M., Barnes, I., Barrera-Enriquez, V.P., Becker, A., Bennett, R., Beracochea, M., Berry, A., et al. (2025). Ensembl 2025. Nucleic Acids Res. 53, D948–D957. doi:10.1093/nar/gkae1071.
  67. Soneson, C., Love, M.I., and Robinson, M.D. (2015). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. [version 2; peer review: 2 approved]. F1000Res. 4, 1521. doi:10.12688/f1000research.7563.2.
  68. .Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. doi:10.1093/bioinformatics/bts635.
  69. .Kuznetsova, A., Brockhoff, P.B., and Christensen, R.H.B. (2017). lmertest package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26. doi:10.18637/jss.v082.i13.
  70. Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x.
  71. Taylor, H.J., Hung, Y.-H., Narisu, N., Erdos, M.R., Kanke, M., Yan, T., Grenko, C.M., Swift, A.J., Bonnycastle, L.L., Sethupathy, P., et al. (2023). Human pancreatic islet microRNAs implicated in diabetes and related traits by large-scale genetic analysis. Proc Natl Acad Sci USA 120, e2206797120. doi:10.1073/pnas.2206797120.
  72. Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. doi:10.1186/s13059-014-0550-8.
  73. Risso, D., Ngai, J., Speed, T.P., and Dudoit, S. (2014). Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902. doi:10.1038/nbt.2931.
  74. Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289–300.
  75. Yang, J., Lee, S.H., Goddard, M.E., and Visscher, P.M. (2011). GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82. doi:10.1016/j.ajhg.2010.11.011.
  76. Lowry, W.E., Richter, L., Yachechko, R., Pyle, A.D., Tchieu, J., Sridharan, R., Clark, A.T., and Plath, K. (2008). Generation of human induced pluripotent stem cells from dermal fibroblasts. Proc Natl Acad Sci USA 105, 2883–2888. doi:10.1073/pnas.0711983105.
  77. Zhang, D., Dey, R., and Lee, S. (2020). Fast and robust ancestry prediction using principal component analysis. Bioinformatics 36, 3439–3446. doi:10.1093/bioinformatics/btaa152.
  78. Chen, H., Wang, C., Conomos, M.P., Stilp, A.M., Li, Z., Sofer, T., Szpiro, A.A., Chen, W., Brehm, J.M., Celedón, J.C., et al. (2016). Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666. doi:10.1016/j.ajhg.2016.02.012.
  79. Dong, X., Li, X., Chang, T.-W., Scherzer, C.R., Weiss, S.T., and Qiu, W. (2021). powerEQTL: an R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics 37, 4269–4271. doi:10.1093/bioinformatics/btab385.
  80. McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thormann, A., Flicek, P., and Cunningham, F. (2016). The ensembl variant effect predictor. Genome Biol. 17, 122. doi:10.1186/s13059-016-0974-4.
  81. Rentzsch, P., Witten, D., Cooper, G.M., Shendure, J., and Kircher, M. (2019). CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894. doi:10.1093/nar/gky1016.
  82. Chen, H., Huffman, J.E., Brody, J.A., Wang, C., Lee, S., Li, Z., Gogarten, S.M., Sofer, T., Bielak, L.F., Bis, J.C., et al. (2019). Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am. J. Hum. Genet. 104, 260–274. doi:10.1016/j.ajhg.2018.12.012.

https://i.liadm.com/sync-container?duid=28e3293678dc–01k1hv9vygvqagd526m1kfdfv6&ds=did-0049&euns=1&s=CgA&version=v3.13.0&cd=.biorxiv.org&pv=46ed12a3-e833-48c4-99e8-e633e7cf954c