Advertisement
American Journal of Kidney Diseases

Genetic Investigations of Kidney Disease: Core Curriculum 2013

Published:March 04, 2013DOI:https://doi.org/10.1053/j.ajkd.2012.11.052
      It has long been known that inherited kidney disorders such as polycystic kidney disease (PKD) can be caused by mutations in a single gene. These disorders are termed monogenic or Mendelian diseases because they typically follow inheritance patterns consistent with Mendel's laws (Table S1, available as online supplementary material). Monogenic disorders often exhibit specific observable characteristics (the phenotype) and typically are rare. Conversely, some diseases “run in the family,” but are not caused by single-gene mutations and do not follow Mendelian inheritance patterns. The genetic contribution to these conditions can be influenced by genetic risk variants in many genes and be modified by their interactions with the environment. These diseases, like chronic kidney disease (CKD), are called complex diseases (Table 1).
      Table 1Characteristics of Monogenic Versus Complex Diseases
      Monogenic DiseasesComplex Diseases
      ExampleNephronophthisisChronic kidney disease
      CauseMutations in a specific gene (eg, any of the genes encoding nephrocystin 1-15)Multifactorial: multiple genetic susceptibility variants interact with nongenetic risk factors
      Inheritance patternMendelian inheritanceComplex
      PrevalenceIndividually rareHigh
      Risk conferred by individual gene variantHighLow
      Population relevanceIndividually lowHigh
      Methods for gene identificationLinkage analysis and positional cloning, panel candidate gene sequencing, homozygosity mapping coupled with sequencing, untargeted next-generation sequencing (whole exome or whole genome)Candidate and genome-wide association studies including admixture mapping
      Note: Several recent reviews provide a comprehensive overview for both monogenic kidney diseases (Hildebrandt [Genetic kidney diseases. Lancet. 2010;375:1287-1295] and McKnight et al [Unravelling the genetic basis of renal diseases; from single gene to multifactorial disorders. J Pathol. 2010;220:198-216]) and complex kidney diseases (McKnight et al; Köttgen [Genome-wide association studies in nephrology research. Am J Kidney Dis. 2010;56:743-758]; O'Seaghdha and Fox [Genome-wide association studies of chronic kidney disease: what have we learned? Nat Rev Nephrol. 2011;8:89-99]; see also additional readings).
      Investigations into genetic causes of disease are important for individuals with monogenic kidney diseases, as well as for those with complex kidney diseases. Genetic investigations of monogenic diseases can identify the cause of disease. Further, they constitute the basis for research into the underlying molecular mechanisms and facilitate patient counseling, risk prediction, and potentially, monitoring and treatment of disease. Understanding genetic susceptibility to complex diseases such as CKD also is important because complex diseases often are very prevalent: CKD affects ∼10% of the US adult population, with similar estimates in Europe and China. Consequently, even moderate increases in disease risk attributable to variants in one of many risk genes can be meaningful.
      Both monogenic and complex kidney diseases can progress to kidney failure. Data from the US Renal Data System show that in 2011, a total of 7.6% of the US Medicare population was affected by CKD, but their care accounted for 22.3% of the total expenditures. Because much of these costs are incurred by rather nonspecific treatments such as dialysis, better understanding of the underlying mechanisms of CKD represents a first step toward more directed therapies. The involved genes may point toward important pathways involved in a substantial proportion of kidney diseases and may lead to new approaches to prevent, diagnose, and treat these diseases.
      This Core Curriculum provides an overview of well-established and novel methods used in the investigation of genetic causes of kidney diseases. It contrasts differences between monogenic and complex diseases and the methods used to identify the underlying risk genes. It should help clinicians evaluate the literature in the field of kidney disease genetics and assist them in the planning and conduct of their own studies.

      Additional Readings

      • »
        Chronic Kidney Disease Prognosis Consortium, Matsushita K, van der Velde M, Astor BC, et al. Association of estimated glomerular filtration rate and albuminuria with all-cause and cardiovascular mortality in general population cohorts: a collaborative meta-analysis. Lancet. 2010;375(9731):2073-2081.
      • »
        Coresh J, Selvin E, Stevens LA, et al. Prevalence of chronic kidney disease in the United States. JAMA. 2007;298(17):2038-2047.
      • »
        Lifton RP, Somlo S, Giebisch GH, Seldin DW. Genetic Diseases of the Kidney. San Diego, CA: Elsevier; 2009.
      • »
        Thomas DC. Overview of genetic epidemiology. In: Thomas DC. Statistical Methods in Genetic Epidemiology. New York, NY: Oxford University Press; 2004:3-22.
      • »
        Zhang QL, Rothenbacher D. Prevalence of chronic kidney disease in population-based studies: systematic review. BMC Public Health. 2008;8:117.

      Evidence for a Genetic Contribution to Kidney Diseases: Familial Aggregation and Heritability Studies

      The genetic contribution to disease and number of genes involved differ substantially across diseases. Whereas mutations causing monogenic kidney disorders often exhibit high penetrance, which means that mutation carriers usually develop disease, susceptibility variants to polygenic complex kidney diseases only moderately modify disease risk (Table 1). It therefore is informative to attempt to quantify the genetic contribution to complex traits or diseases before embarking on the identification of susceptibility genes. Genetic epidemiology studies of familial aggregation and heritability in affected families or in the general population can address this question.
      For dichotomous traits (disease status affected or unaffected, eg, CKD present or absent), the recurrence risk ratio is a measure to assess familial aggregation. It is defined as the increase in disease risk for a certain type of relative of an affected individual, most commonly siblings, compared to the general population risk. Increased recurrence risk ratios for different types of relatives provide complementary lines of evidence for the importance of genetic factors.
      For example, in a population-based case-control study, the number of first-degree relatives who were affected with any type of kidney disease was compared between patients with end-stage renal disease (ESRD) and controls without ESRD. Having one first-degree relative with kidney disease increased the risk of ESRD 1.3-fold, and having 2 or more affected first-degree relatives increased the risk of ESRD 10.4-fold. These results are consistent with familial aggregation of kidney disease greater than that predicted by clustering of diabetes and hypertension in related individuals.
      Heritability is a measure to estimate the genetic contribution to a trait or disease and can be applied to quantitative traits, for example, estimated glomerular filtration rate (eGFR) as a measure of kidney function. Heritability commonly is estimated as the fraction of phenotypic variability that can be attributed to genetic variation (narrow-sense heritability h2 = additive genetic variance/total variance). For example, if the heritability of eGFR is estimated as 0.4, then 40% of the variability in eGFR in the examined study population can be explained by additive genetic effects.
      Many studies have evaluated the genetic contribution to quantitative kidney function parameters such as eGFR or albuminuria (measured as urinary albumin-creatinine ratio [UACR]). For example, the heritability of eGFR in families with type 2 diabetes mellitus was reported as 0.75. Depending on different GFR measures, the heritability of GFR in hypertensive families of African descent was estimated to range from 0.41-0.82. Because diabetes and hypertension themselves have strong genetic components, population-based studies have estimated heritability for eGFR after taking into account covariates such as age, sex, blood pressure, and diabetes as ranging from 0.33-0.40. A number of studies have estimated the heritability of UACR in diabetic families, and all of them detected significant heritability estimates ranging from 0.23-0.46. In a study of hypertensive families, a similar heritability of 0.49 is reported. These heritability estimates of various quantitative kidney traits support the contribution of genetic factors to the observed interindividual differences.
      Additional study designs to assess a potential genetic contribution to diseases or traits include twin studies, adoption studies, and migration studies; they are not covered in this Core Curriculum.

      Additional Readings

      • »
        Bochud M, Elston RC, Maillard M, et al. Heritability of renal function in hypertensive families of African descent in the Seychelles (Indian Ocean). Kidney Int. 2005;67(1):61-69.
      • »
        Fava C, Montagnana M, Burri P, et al. Determinants of kidney function in Swedish families: role of heritable factors. J Hypertens. 2008;26(9):1773-1779.
      • »
        Fox CS, Yang Q, Cupples LA, et al. Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J Am Soc Nephrol. 2004;15(9):2457-2461.
      • »
        Langefeld CD, Beck SR, Bowden DW, Rich SS, Wagenknecht LE, Freedman BI. Heritability of GFR and albuminuria in Caucasians with type 2 diabetes mellitus. Am J Kidney Dis. 2004;43(5):796-800.
        Glossary
        Allele: Alternative DNA sequences at the same physical position in the genome. SNPs typically have 2 alleles (biallelic) that correspond to the 2 DNA bases found in the population at this position.
        Complex disease: Multifactorial diseases thought to arise from a combination of different genetic effects and environmental influences. Often contrasted to monogenic or Mendelian disease.
        Hardy-Weinberg principle: This law states that allele and genotype frequencies in a population remain constant over generations if no influences disturbing this equilibrium are in effect, such as mutation, selection, nonrandom mating, migration, or random genetic drift.
        Heritability: The proportion of phenotypic variance in a population that can be attributed to genetic variation among individuals.
        Homozygosity mapping: A method to narrow down the location of genes causing recessive disease in consanguineous families. It searches for genomic regions in which segments of both chromosomes in an affected individual are inherited from the same ancestor.
        Imputation methods: Techniques to infer missing genotype data in a study population based on the genotype data available and the known correlation patterns between variants in a reference population (eg, provided by the HapMap or 1000 Genomes Project).
        Linkage disequilibrium: Alleles at different genomic loci occurring together more often than would be expected by chance alone and therefore providing some degree of information about each other.
        Mendelian disease: Disease caused by a mutation in a single gene that commonly follows a Mendelian inheritance pattern. Also commonly termed “monogenic” disease.
        Minor allele frequency (MAF): For a SNP, the frequency of the less common allele in a population.
        Penetrance: The proportion of individuals with a specific genetic variant who express the phenotype. The penetrance of a disease-causing mutation is the proportion of mutation carriers who show clinical signs of the disease.
        Phenotype: Any observable characteristic or trait of an organism.
        Polymorphism: Genetic variation in which each allele occurs in ≥1% of the population.
        Single-nucleotide polymorphism (SNP): DNA sequence variation resulting from a change of a single nucleotide base.
        Trait: A feature or quantity in an individual that can be measured and differs between individuals. Disease status is a dichotomous trait (eg, presence of ESRD); many clinical measurements are continuous traits (eg, GFR).
        Winner's curse: In genome-wide association studies, it refers to the overestimation of the effect size of a newly identified allele on disease risk.
      • »
        Lei HH, Perneger TV, Klag MJ, Whelton PK, Coresh J. Familial aggregation of renal disease in a population-based case-control study. J Am Soc Nephrol. 1998;9(7):1270-1276.

      Approaches to Investigate Monogenic Diseases

      Linkage Analysis and Positional Cloning

      For many years, identifying the causal mutation in monogenic diseases was accomplished through linkage studies. Families with multiple affected individuals were recruited and DNA samples were collected for genotyping of polymorphic genetic markers. These DNA markers, traditionally microsatellite markers and more recently single-nucleotide polymorphisms (SNPs), are spaced regularly across the whole genome with certain densities. Linkage analysis assesses whether polymorphic markers cosegregate with the disease according to Mendelian patterns of inheritance in families. If a marker can be found that consistently appears with the disease under study in a pedigree, this is good evidence that the disease gene locus is located close to the marker. The LOD (logarithm of odds) score is a measure of linkage between the disease gene locus and the marker; it compares the likelihood of observing the data if the disease and marker loci are linked to the likelihood of observing these data by chance alone. A large LOD score (>3-3.6, depending on the study design) is considered evidence that the marker is in genetic linkage to the disease locus; at a LOD score less than −2, linkage is excluded. After identification of a candidate genomic region, positional cloning, a laborious process, was used to narrow down the region until the gene and its mutations had been identified.
      With the completion of the human reference genome and access to new sequencing technologies, the positional cloning step is no longer a necessity to identify disease-causing mutations in a candidate region. Figure 1 shows a graphical representation of results from a genome-wide linkage analysis. In this example, results from 3 families with familial hemiplegic migraine were combined to obtain a LOD score peak of 5.9. The identified region of interest contained 75 genes, but the investigators were able to identify the causal mutation by concentrating on sequencing only a few candidate genes mapping into the interval.
      Figure thumbnail gr1
      Figure 1Typical graphical presentation of findings from genome-wide linkage studies. A linked genomic interval on chromosome 2q24.3 was identified in a study of 3 families with familial hemiplegic migraine (FHM). The y axis shows the LOD (logarithm of odds) score, the x axis shows the genomic position and the microsatellite markers with the peak LOD score mapping between markers D2S2330 and D2S399. Among a large number of genes in the identified 8.8-Mb interval, the 6 candidate ion channel genes are marked in the figure and the causal mutation was finally identified in SCN1A.
      Reproduced from Dichgans et al (Lancet. 2005;366:371-377), with permission of Elsevier.
      Linkage analysis has led to the successful identification of causal mutations for a large number of monogenic disorders, including the most common monogenic kidney disease, autosomal dominant PKD (ADPKD). ADPKD affects approximately 1 in 1,000 individuals worldwide. In 1985, a linkage study had mapped a genetic region on the short arm of chromosome 16 (16p13.3) with a LOD score of 25.85. Subsequently, additional studies confirmed this linkage and refined the position by using additional polymorphic markers. The European PKD Consortium isolated the causal PKD1 gene in 1994. However, in some ADPKD families, the disease is not linked to the 16p locus, which led to mapping of the PKD2 locus. Using highly polymorphic microsatellite DNA markers, the PKD2 locus was mapped between 2 markers on the long arm of chromosome 4 (4q22.1) with a LOD score of 22.43 in 1993, and the PKD2 gene was isolated and characterized in 1996. In patients with ADPKD who are screened for mutations in PKD1 and PKD2, 85% of patients carry mutations in PKD1, and 15% in PKD2. Various kinds of mutations are spread across the entire gene sequences; most are nonsense or frameshift mutations that are predicted to cause a truncated protein product. Clear inter- and intrafamilial variability with only modest genotype-phenotype correlation is observed. Potential implications for genetic testing and risk prediction are discussed in a later section.

      Additional Readings

      • »
        Almasy L, Blangero J. Contemporary model-free methods for linkage analysis. In: Rao DC, Gu CC, eds. Genetic Dissection of Complex Traits. 2nd ed. London, UK: Academic Press; 2008:175-193.
      • »
        Dichgans M, Freilinger T, Eckstein G, et al. Mutation in the neuronal voltage-gated sodium channel SCN1A in familial hemiplegic migraine. Lancet. 2005;366(9483):371-377.
      • »
        European Polycystic Kidney Disease Consortium. The polycystic kidney disease 1 gene encodes a 14 kb transcript and lies within a duplicated region on chromosome 16. Cell. 1994;77(6):881-894.
      • »
        Hughes J, Ward CJ, Peral B, et al. The polycystic kidney disease 1 (PKD1) gene encodes a novel protein with multiple cell recognition domains. Nat Genet. 1995;10(2):151-160.
      • »
        Mochizuki T, Wu G, Hayashi T, et al. PKD2, a gene for polycystic kidney disease that encodes an integral membrane protein. Science. 1996;272(5266):1339-1342.
      • »
        Rice JP, Saccone NL, Corbett J. Model-based methods for linkage analysis, and contemporary model-free methods for linkage analysis. In: Rao DC, Gu CC, eds. Genetic Dissection of Complex Traits. 2nd ed. London, UK: Academic Press; 2008:155-174.

      Homozygosity Mapping and Sequencing

      Although traditional positional cloning has enabled the successful discovery of a large number of Mendelian disease genes, the genetic cause of more than half of all known or suspected Mendelian diseases currently is unknown. The application of next-generation sequencing technologies in recent years has led to considerable progress in the field, in particular, the use of exome sequencing in Mendelian disease gene discovery. In this approach, only the protein-coding part of the genome (<2% of the whole genome sequence) is targeted and sequenced. Exome sequencing is likely to be a fruitful strategy because most of the known mutations causing Mendelian diseases are located in protein-coding regions.
      The exome sequencing workflow includes a step in which the DNA fragments are hybridized to oligonucleotide probes designed on the basis of all known and predicted protein-coding exons. The hybridized fragments then are recovered and sequenced. The produced short sequences are mapped to the human reference genome, and variants including SNPs and small insertions and deletions are identified. Currently available data suggest that around 20,000-30,000 high-confidence single-nucleotide variants are found in the exome of each individual. The final step is filtering the large number of variants depending on the mode of inheritance and other information, such as presence and frequency in a control data set and predicted function, to obtain a reasonably small number of candidate variants/genes.
      However, even after filtering, the number of candidate variants may still be too large to investigate all of them further. Exome sequencing therefore has been particularly successful when coupled with a mapping strategy. This means that only candidate variants, identified through sequencing, that fall into a specific genomic region implicated by the mapping approach are investigated further. One such mapping strategy is homozygosity mapping, a method to narrow down the location of genes causing recessive traits in consanguineous families. It searches for genomic regions in which segments of both chromosomes in an affected individual are inherited from the same ancestor (identical by descent). The search for disease-causing variants then is restricted to these regions. An illustration of the homozygosity mapping method is provided in Fig S1.
      A successful example of this approach is the search for genetic causes of nephronophthisis, an autosomal recessive kidney disease characterized by chronic tubulointerstitial nephritis that usually leads to ESRD in the first 3 decades of life. By 2008, mutations in 9 genes (encoding nephrocystin 1-9) were known to cause the disease, but together, mutations in these 9 genes accounted for only 30% of cases. In 2010, using homozygosity mapping coupled with candidate exome sequencing in families with nephronophthisis, SDCCAG8 was identified as the 10th nephronophthisis gene. In another study, mutations in the CUBN (cubulin) gene were identified as the cause of nephrotic syndrome by whole-exome sequencing in 2 siblings of consanguineous parents. In both studies, the candidate region obtained from homozygosity mapping significantly restricted the number of candidate variants and therefore greatly eased identification of the causal genes.
      Exome sequencing has limitations, such as the inability to find mutations in regulatory regions of genes or in yet unknown genes. Still, it is safe to expect that many more genes for Mendelian diseases will be discovered by exome sequencing in the coming years. With the decreasing cost of next-generation sequencing, whole-genome sequencing in addition to or instead of whole-exome sequencing will be applied more often in disease gene discovery. This likely will lead to the identification of additional mutations outside the coding sequence, such as intronic or regulatory variants. Recent data from the Encyclopedia of DNA Elements (ENCODE) project suggest that >80% of the genome can be assigned at least one potential function, and common disease-associated variants are concentrated in regulatory DNA regions such as transcription factor–binding sites. It will be interesting to see to what degree these findings extend to Mendelian disease genes, which will have to be answered by sequencing of specific genomic regions or entire genomes.
      Using both the traditional and modern approaches, many genes underlying monogenic forms of kidney disease have been identified during the last 2 decades. A comprehensive and recent review is provided by Hildebrandt.

      Additional Readings

      • »
        Al-Romaih KI, Genovese G, Al-Mojalli H, et al. Genetic diagnosis in consanguineous families with kidney disease by homozygosity mapping coupled with whole-exome sequencing. Am J Kidney Dis. 2011;58(2):186-195.
      • »
        Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12(11):745-755.
      • »
        ENCODE Project Consortium, Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57-74.
      • »
        Hildebrandt F. Genetic kidney diseases. Lancet. 2010;375(9722):1287-1295.
      • »
        Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987;236(4808):1567-1570.
      • »
        Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010;11(1):31-46.
      • »
        Otto EA, Hurd TW, Airik R, et al. Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal-renal ciliopathy. Nat Genet. 2010;42(10):840-850.
      • »
        Ovunc B, Otto EA, Vega-Warner V, et al. Exome sequencing reveals cubilin mutation as a single-gene cause of proteinuria. J Am Soc Nephrol. 2011;22(10):1815-1820.

      Genetic Association Studies to Investigate Complex Diseases

      Prior to the advent of genome-wide association studies (GWAS), candidate gene association studies were the main study design to evaluate genes underlying complex diseases. The candidate genes and variants were chosen based on known or suspected biology. This approach has had limited success, and many reported associations have been difficult to replicate. Two examples from candidate gene association studies that could be replicated successfully are the association between eGFR and CKD with genetic variants in the APOE and TCF7L2 genes.
      GWAS use dense maps of common SNPs that cover the whole genome to systematically search for allele frequency differences between cases and controls or for associations between genotype and a continuous parameter such as eGFR. As opposed to candidate gene association studies, GWAS are unbiased with respect to prior biological knowledge. Since 2005, the availability of catalogs of SNPs spanning the entire genome, the possibility to carry out high-throughput genotyping, and advances in the statistical analysis of such data have enabled the conduct of GWAS. As a result, the genetics community has seen a crop of results from GWAS of complex diseases, which are consistently collected and updated in a database maintained by the US National Human Genome Research Institute (www.genome.gov/GWAstudies).
      Figure 2 illustrates the principle behind GWAS. The combination of alleles for any biallelic SNP results in the presence of 3 possible genotype combinations in a given population. Statistical methods such as simple linear or logistic regression then are applied to test for differences in the phenotype, for example, differences in mean eGFR, across the 3 genotype groups, yielding 1 P value for each test/SNP. This procedure is iteratively carried out for millions of SNPs. As more and more genetic variants become known, it is possible to infer genotypes at millions of additional genomic locations using imputation methods. An important principle behind this method is linkage disequilibrium, which refers to the occurrence of certain allele combinations across SNPs in proportions different from those expected from a random combination based on the allele frequencies. One of the reasons for linkage disequilibrium is that SNPs in close physical proximity on a chromosome often are inherited together and therefore are correlated because it is unlikely that nearby variants are separated by a recombination event during gamete formation. For example, as illustrated in Fig 2, by genotyping SNP1 and having information about a high degree of correlation between SNP1 and SNP2 from a reference population, one can infer that individuals carrying the T allele at SNP1 also carry the A YAJKD54500 allele at SNP2, which has not been genotyped. By genotyping a subset of common SNPs across the genome, it therefore is possible to capture much of the information at millions of other common SNPs that have not been genotyped. With millions of tests conducted, a multiple testing correction becomes of paramount importance in order to reduce the number of false-positive findings. Among populations of European ancestry, a statistical significance threshold of 5×10-8 is commonly used to indicate genome-wide significance, which corresponds to a correction for the approximately 1 million estimated independent common SNPs in the genome (0.05/1 million). Results of GWAS then are summarized in a number of typical plots, 2 of which are the so-called Manhattan plot and the regional association plot, illustrated in Fig 3.
      Figure thumbnail gr2
      Figure 2Principle of genotype-phenotype association in genome-wide association studies. Abbreviations: GFR, glomerular filtration rate; LD, linkage disequilibrium; SNP, single-nucleotide polymorphism.
      Figure thumbnail gr3
      Figure 3Typical graphical presentation of findings from genome-wide association studies (GWAS). For both panels, the y axis shows the –log10(P values), and the x axis shows the chromosomal location. The closest gene in each region is indicated. (Upper panel) Manhattan plot from a GWAS of glomerular filtration rate estimated from serum creatinine level. The horizontal dotted line indicates the genome-wide significance threshold at 5×10−8. (Lower panel) The regional association plot provides a zoomed-in view of a region on chromosome 13 that contains significantly associated single-nucleotide polymorphisms (SNPs). The pairwise correlation between the SNP with the lowest P value (rs626277) and all other SNPs is color coded based on r2, a measure of linkage disequilibrium. Abbreviations: cM, centi-Morgan; hg, human genome; kb, kilobases; Mb, megabases.
      Reproduced from Köttgen et al (Nat Genet. 2010;42(5):376-384) with permission of Nature Publishing Group.
      In the field of complex kidney traits and diseases, GWAS have been conducted in population-based studies and case-control studies of specific kidney diseases. Because effect sizes (the risks conferred) of individual genetic susceptibility variants for complex diseases typically are small (see Table 1), these effects usually cannot be discovered in any one single study. Hence, the work typically is carried out in large consortia that combine genotype-phenotype association results from many individual studies by meta-analyses. For example, the upper panel of Fig 3 shows results of a screen for genetic variants associated with eGFR in the CKDGen Consortium, to which 20 population-based studies with about 67,000 individuals with genome-wide SNP data and serum creatinine measurements contributed. Multiple genetic regions contained SNPs significantly associated with eGFR. Most, but not all, of these associations could be replicated in independent study populations, emphasizing the need for independent replication studies even with strict statistical significance thresholds. The bottom panel of Fig 3 represents a regional association plot, which is a close-up view of the Manhattan plot for a specific genomic region. In this case, the SNP with the lowest P value is located within a gene. However, it is unusual that the gene underlying the observed association can be assigned unambiguously, for example, because an associated intergenic SNP does not necessarily influence the function of the closest gene. In addition, SNPs are only naturally occurring genetic markers. The observed association with a SNP marker may reflect the effect of a true unknown causal variant that is not genotyped or imputed, but is correlated with the observed SNP. GWAS among population-based studies have been applied successfully to identify genetic variants associated with serum creatinine, eGFR, CKD defined as eGFR <60 mL/min/1.73 m2, UACR, microalbuminuria, and serum urea nitrogen in individuals of European, African Americans, and East Asian ancestry.
      In addition to population-based studies, GWAS also have successfully identified susceptibility genes for specific kidney diseases in case-control studies, for example, of immunoglobulin A (IgA) nephropathy, idiopathic membranous nephropathy, or kidney cell carcinoma. One specific genome-wide approach used admixture mapping, a method in populations of mixed ancestry to identify genetic loci that contribute to differences in disease prevalence observed between the different ancestral populations. This approach led to the identification of the MYH9/APOL1 risk locus for nondiabetic ESRD in African Americans (see later section of this article).
      It is important to reiterate that the SNPs identified in association studies are merely markers, and an observed association between a SNP and a disease does not imply a causal role for the SNP in this disease. Table 2 highlights this issue; it contrasts the research techniques used to identify causes of monogenic and those of complex kidney diseases, as well as the inferences that can be drawn from these approaches. Several recent reviews listed next provide a detailed overview of findings in the field of complex kidney disease genetics, along with a comprehensive list of original articles.
      Table 2Genetic Research Techniques and Interpretation of Their Results for Monogenic Versus Complex Diseases
      Monogenic DiseasesComplex Diseases
      ExampleADPKDCKD
      Recruitment of participantsFamilies or siblings; recently extended to single affected individuals in sequencing projectsLarge numbers of related or unrelated individuals
      Technique for mapping the genomic regionInvestigation of affected families, eg, by linkage analysis or homozygosity mapping and sequencingAssociation studies
      Identified genomic regionLarge (usually many Mb); depending on the number of markers, intervals can contain dozens to several hundreds of positional candidate genesMore circumscribed (usually 50-500 kb); typically contain 1-12 candidate genes
      Causal gene identificationPossible, attempted by screening all positional candidate genes for mutationsMore difficult, sometimes attempted indirectly by linking with additional information, such as gene expression
      Successful identification of causal mutationOftenRarely
      Pathophysiology understoodIn ADPKD, some processes and cellular mechanisms are understood, although the exact mechanisms of cyst formation are still not understoodUnderstanding of molecular mechanisms in its infancy; the first studies measuring levels of the gene products to assess the ability to predict disease risk are slowly emerging
      Used for risk counselingYesNo
      Specific treatment availableNoNo
      Abbreviations: ADPKD, autosomal dominant polycystic kidney disease; CKD, chronic kidney disease.

      Additional Readings

      • »
        Böger CA, Chen MH, Tin A, et al. CUBN is a gene locus for albuminuria. J Am Soc Nephrol. 2011;22(3):555-570.
      • »
        Chambers JC, Zhang W, Lord GM, et al. Genetic loci influencing kidney function and chronic kidney disease. Nat Genet. 2010;42(5):373-375.
      • »
        Genovese G, Friedman DJ, Ross MD, et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science. 2010;329(5993):841-845.
      • »
        Gharavi AG, Kiryluk K, Choi M, et al. Genome-wide association study identifies susceptibility loci for IgA nephropathy. Nat Genet. 2011;43(4):321-327.
      • »
        Hsu CC, Kao WH, Coresh J, et al. Apolipoprotein E and progression of chronic kidney disease. JAMA. 2005;293(23):2892-2899.
      • »
        Kao WH, Klag MJ, Meoni LA, et al. MYH9 is associated with nondiabetic end-stage renal disease in African Americans. Nat Genet. 2008;40(10):1185-1192.
      • »
        Kopp JB, Smith MW, Nelson GW, et al. MYH9 is a major-effect risk gene for focal segmental glomerulosclerosis. Nat Genet. 2008;40(10):1175-1184.
      • »
        Köttgen A. Genome-wide association studies in nephrology research. Am J Kidney Dis. 2010;56(4):743-758.
      • »
        Köttgen A, Pattaro C, Böger CA, et al. New loci associated with kidney function and chronic kidney disease. Nat Genet. 2010;42(5):376-384.
      • »
        McKnight AJ, Currie D, Maxwell AP. Unravelling the genetic basis of renal diseases; from single gene to multifactorial disorders. J Pathol. 2010;220(2):198-216.
      • »
        Okada Y, Sim X, Go MJ, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in East Asian populations. Nat Genet. 2012;44(8):904-909.
      • »
        O'Seaghdha CM, Fox CS. Genome-wide association studies of chronic kidney disease: what have we learned? Nat Rev Nephrol. 2011;8(2):89-99.
      • »
        Stanescu HC, Arcos-Burgos M, Medlar A, et al. Risk HLA-DQA1 and PLA(2)R1 alleles in idiopathic membranous nephropathy. N Engl J Med. 2011;364(7):616-626.

      Assessing Evidence From Genetic Association Studies

      To keep up with the multitude of recent genetic findings in the field of kidney disease, a systematic way to evaluate the literature is helpful. Table 3 provides an overview of important questions that readers should ask when evaluating results from genetic association studies or reporting results from their own studies. These questions can be grouped according to the different parts of a study, as discussed next.
      Table 3Important Questions to Assess When Evaluating Evidence From Genetic Association Studies
      QuestionComment
      Study Design and Sample Selection
      Is the study sufficiently powered to detect a genetic effect?Authors should provide power calculations; sufficiently powered studies have >80% power to detect an association for a variant of given effect size and allele frequency at a observed disease prevalence
      Are cases and controls of the same ancestry?If allele frequencies and disease risk differ among subgroups of different ancestries, the association can be confounded; appropriate methods need to be used to address this
      Genotyping
      Is the genotyping call rate reported and of sufficient quality?The call rate typically should be at least >90% and often is >95% or even >98%
      Do authors report on concordance (genotype reproducibility)?Concordance of genotypes obtained from blind duplicates should be high (>99% for high-throughput centers)
      Do authors examine conformation to HWE? Is HWE examined among controls and among groups of the same ancestry?If tested, HWE departure should be assessed among controls, or in population-based samples, among individuals of the same ancestry; SNPs that do not conform to HWE expectations should be examined closely for potential genotyping errors and other reasons for departure from HWE expectations
      Are genotypes or haplotypes inferred and is a quality measure provided?Imputed (inferred) genotypes should be of sufficient quality; a typical quality metric used in genome-wide association studies is the r2 between true allele counts and estimated allele counts; an r2 measure >0.3 indicates acceptable quality, but some believe it should be higher
      Statistical Association
      Have the authors controlled for population stratification?In the setting of genome-wide association studies, this typically is done using methods such as genomic control or principal components analyses
      Have the authors appropriately accounted for relatedness?Simple regression models often used in association studies assume independent observations; multiple approaches exist to account for the nonindependence among related individuals
      Do the authors provide the counts of individuals in each genotype category?For variants with low minor allele frequencies, only a few individuals are homozygous for the rare allele even in studies of moderate size; this is especially relevant to binary outcomes, for which the number of cases homozygous for the rare genotype likely will be very small
      Did the authors account for multiple testing?Genome-wide association studies typically use a Bonferroni correction for 1 million independent tests, or less stringent methods such as the family-wise error rate or the false discovery rate
      Replication
      Has the association been replicated in an independent sufficiently powered sample with the same phenotype?The replication sample should be independent, similar to the discovery sample, and have information for the same phenotype; in a sufficiently powered replication sample, the observed effect size should be in the same direction and of magnitude similar to the initially observed association
      Abbreviations: HWE, Hardy-Weinberg equilibrium; SNP, single-nucleotide polymorphism.

      Study Design and Sample Selection

      Reports of the study design should address the issue of adequate statistical power by providing power calculations. Insufficiently powered studies can lead to failure to identify susceptibility variants of moderate or small effects or failure to replicate true associations (false negatives). For this reason, GWAS have been particularly successful when conducted in large consortia with adequate statistical power.
      Differences in risk-allele frequency and disease risk across groups of different ancestry can lead to biased associations when evaluating the combined groups, a phenomenon termed population stratification. Population stratification is an important concern, and ancestry differences between cases and controls or within a population studied for a continuous parameter such as eGFR need to be accounted for. Across study populations of different ancestry, recent evidence shows that the same genetic variants discovered in individuals of European ancestry, or other variants located nearby, also associate with eGFR and UACR in individuals of non-European ancestry, such as those who are African American or East Asian. This supports the notion that certain kidney disease risk genes are important across populations, whereas individual variants within these genes may vary in their frequency and association with the kidney disease.

      Additional Reading

      • »
        Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005;6(2):95-108.

      Genotyping

      Several issues should be considered when evaluating the quality of genotyping, as detailed in Table 3. The call rate for a given variant is defined as the proportion of successfully genotyped individuals among all in whom genotyping was attempted. Call rates in published genetic association studies of kidney diseases typically are >90% and often are >98%. Genotyping error rates typically are measured by evaluating genotype concordance among blind duplicate samples. Such samples should be included in any genotyping experiment, and concordance rates should be high (eg, 99%).
      Another indicator of poor genotyping quality can be departures of genotype frequencies from their expected distribution based on the Hardy-Weinberg principle. This principle proposes that genotype distributions should conform to Hardy-Weinberg proportions (p2 + 2pq + q2 = 1, where p and q are the major and minor allele frequencies, respectively) in a steady state and remain constant through generations. As there are several reasons other than genotyping error for deviations from Hardy-Weinberg equilibrium, SNPs that violate Hardy-Weinberg equilibrium assumptions may be examined more closely without the need to discard them.
      Finally, many SNP markers used in GWAS are inferred rather than directly genotyped. This means that they are imputed based on the study participants' genotyped SNPs and a more densely genotyped reference population such as those in the HapMap or the 1000 Genomes projects. The accuracy of imputation varies for each SNP, and commonly used measures of imputation quality should be provided for SNPs that show significant associations with kidney diseases.

      Additional Reading

      • »
        Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11(7):499-511.

      Statistical Analysis

      When evaluating statistical associations between genotype and phenotype, it is important to assess the potential impact of population stratification on the results (Table 3). The most common methods in the setting of GWAS to control for the influence of different subgroups within the study sample on the results are genomic control or the inclusion of additional covariates derived from principle components analyses. In addition, appropriate methods should be used to account for the presence of related rather than only independent individuals in a study population. For significantly associated genetic risk variants, the numbers of individuals with each genotype should be provided for cases and controls. This is particularly important for rare genetic variants because it allows for assessing how many individuals were actually homozygous for the rare allele. Finally, as we explained in a previous section, when evaluating associations between different and independent genetic variants and disease, it is important to correct for multiple hypothesis testing. This is particularly relevant when millions of tests are conducted, such as in GWAS.

      Additional Readings

      • »
        Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7(10):781-791.
      • »
        Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4):997-1004.
      • »
        Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904-909.

      Replication

      One of the most crucial aspects to enhance the credibility of study findings is successful replication of the identified associations (Table 3). Here, readers should assess whether the replication sample was independent of the discovery study sample, it was adequately powered to replicate the observed association, either the same or a highly correlated genetic variant was studied, and the same phenotype was investigated. If this is the case, the effect of the genetic variant on the trait should be in the same direction and of similar or somewhat lesser magnitude (due to the “winner's curse” phenomenon) than in the initial study.
      A detailed overview of these, as well as additional questions that pertain to any observational study, are provided in an article by the NIC-NHGRI Working Group on Replication in Association Studies and in the STREGA (Strengthening the Reporting of Genetic Association Studies) guidelines.

      Additional Readings

      • »
        Little J, Higgins JP, Ioannidis JP, et al. STrengthening the REporting of Genetic Association Studies (STREGA): an extension of the STROBE statement. PLoS Med. 2009;6(2):e22.
      • »
        NCI-NHGRI Working Group on Replication in Association Studies. Replicating genotype-phenotype associations. Nature. 2007;447(7145):655-660.

      Clinical Observations and Implications

      The accumulating evidence in the field of kidney disease genetics has provided the community with some clinically important insights on a variety of subjects.

      Molecular Mechanisms

      Understanding the molecular mechanisms of how genetic mutations cause disease is a lengthy and laborious process. For example, since the identification of the ADPKD genes PKD1 and PKD2 more than 15 years ago, it has been understood that the encoded proteins polycystin 1 and polycystin 2 interact and operate as a membrane receptor–ion channel complex. Although the function of the 2 proteins has been studied extensively, it remains elusive exactly how defects in polycystin proteins that result from genetic mutations lead to the development of kidney cysts. Gene identification therefore represents only the beginning of a better understanding of molecular mechanisms of disease. Elucidating the mechanisms behind the genes identified by GWAS is only beginning to emerge. Early approaches include attempts to identify the important gene in an associated region by evaluating an effect on transcript expression or by screening in model organisms such as knock-down experiments in zebrafish using synthetic antisense oligonucleotides (“Morpholinos”). Measurement of gene products in biospecimen such as urine is another early attempt to identify novel biomarkers of kidney disease.

      Additional Readings

      • »
        Gallagher AR, Germino GG, Somlo S. Molecular advances in autosomal dominant polycystic kidney disease. Adv Chronic Kidney Dis. 2010;17(2):118-130.
      • »
        Köttgen A, Hwang SJ, Larson MG, et al. Uromodulin levels associate with a common UMOD variant and risk for incident CKD. J Am Soc Nephrol. 2010;21(2):337-344.

      Phenotype Definition

      The definition of the phenotype affects the ability to identify underlying susceptibility genes. Because CKD is defined by clinical measures of kidney function, it includes individuals with a wide variety of disease causes, such as diabetic kidney disease, primary glomerular disease, and hypertensive kidney disease. Some of the subentities are well defined, and specific tests such as biopsy-proven diagnoses exist. A precisely defined and specific phenotype definition can lead to a substantial decrease in sample-size requirements to identify complex kidney disease risk genes. For example, the CKDGen Consortium identified 16 CKD susceptibility genes among 67,000 population-based individuals (including ∼6,000 CKD cases defined as eGFR <60 mL/min/1.73 m2). However, a recent meta-analysis of GWAS of idiopathic membranous nephropathy identified 2 novel genome-wide significant susceptibility regions with only 556 patients. The associations were detected in 3 individual studies with as few as 75 cases; the 2 identified genes were not identified in the screen of the CKDGen Consortium. These findings could indicate that using a broad clinical definition of CKD only allows for the identification of genetic risk factors that influence common pathophysiologic mechanisms underlying kidney disease regardless of the underlying cause. The latter is supported because risk genes identified in association with CKD to date have shown little evidence for a differential effect depending on the presence of hypertension or diabetes.
      Another interesting observation is that genetic associations may help redefine phenotypes that have been assigned previously based on clinical observations. For example, recent findings indicate that a substantial proportion of the hypertensive glomerulosclerosis observed in African Americans may be attributable to genetic variation in the MYH9/APOL1 gene region, leading to segmental or global glomerulosclerosis. It therefore will be interesting to observe whether the incorporation of genetic evidence will lead to reclassification of underlying disease cause.

      Additional Readings

      • »
        Freedman BI, Hicks PJ, Bostrom MA, et al. Polymorphisms in the non-muscle myosin heavy chain 9 gene (MYH9) are strongly associated with end-stage renal disease historically attributed to hypertension in African Americans. Kidney Int. 2009;75(7):736-745.
      • »
        Pattaro C, Köttgen A, Teumer A, et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet. 2012;8(3):e1002584.

      Insights Into Disease Prevalence

      Results from recent GWAS also have provided valuable insights into the genetic contribution to observed differences in kidney disease prevalence. Most impressively, genetic variation at the mentioned MYH9/APOL1 locus explains a large part of the excess risk of nondiabetic ESRD observed in African Americans compared with European Americans. One potential explanation for the high frequency of the risk allele in African American individuals could be the ability of the genetic ESRD risk variant to lyse trypanosomes, which may represent a selective advantage in Africa. In the case of IgA nephropathy, differences in risk allele frequencies at susceptibility loci identified through GWAS correlate with differences in disease prevalence among world populations. These differences help explain the observation that IgA nephropathy is relatively common in Asia, of lower prevalence in Europe, and rare in Africa.

      Additional Readings

      • »
        Genovese G, Friedman DJ, Ross MD, et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science. 2010;329(5993):841-845.
      • »
        Kiryluk K, Li Y, Sanna-Cherchi S, et al. Geographic differences in genetic susceptibility to IgA nephropathy: GWAS replication study and geospatial risk analysis. PLoS Genet. 2012;8(6):e1002765.

      Genetic Testing and Risk Prediction

      Genetic testing currently is part of clinical practice for monogenic diseases that are highly penetrant. Knowing the presence of disease-causing mutations can help in clinical decision making, such as patient counseling, monitoring, and selection and initiation of treatment. For example, genetic testing conducted prior to kidney transplantation can exclude ADPKD mutations in a donor who is related to the organ recipient or facilitate targeted screening for PKD-related complications such as urinary tract infections that otherwise may go unnoticed. The benefits of genetic testing need to be weighed carefully against the limited availability of preventive treatments to slow kidney function decline, as well as against potential issues related to employment or health insurance status for known mutation carriers. The decision to conduct and interpret the results of genetic testing should be considered further in light of the variable genotype-phenotype correlations observed in ADPKD.
      For CKD, neither genetic testing nor individual risk prediction has yet to be proved useful. In an article by O'Seaghdha et al, a genetic risk score based on 16 common SNPs identified by GWAS of eGFR did not predict incident CKD in a population-based study any better than did common clinical risk factors. Large studies with information about the incidence of additional forms of kidney disease are needed to evaluate whether the predictive ability of genetic risk variants is higher for other complex and possibly more specific kidney diseases. As investigators start to follow up on findings from GWAS, risk prediction also may be improved by the identification of risk markers more proximal to the disease than an associated genetic variant. For instance, although associations between a genetic variant and disease may be modest, the association of the concentration of the protein encoded by the gene with disease may be stronger and more direct. At present, the value of genetic discoveries using GWAS seems to be in the identification of mechanisms and potential drug targets rather than predicting disease based on common risk variants with small effects. Finally, even if accurate risk prediction was possible, a potential gap between the prediction and the availability of specific therapies is an important issue that needs to be considered for both monogenic and complex kidney diseases.

      Additional Readings

      • »
        Huang E, Samaniego-Picota M, McCune T, et al. DNA testing for live kidney donors at risk for autosomal dominant polycystic kidney disease. Transplantation. 2009;87(1):133-137.
      • »
        O'Seaghdha CM, Yang Q, Wu H, Hwang SJ, Fox CS. Performance of a genetic risk score for CKD stage 3 in the general population. Am J Kidney Dis. 2012;59(1):19-24.

      Challenges and Future Directions

      A current challenge in the field includes elucidation of the genetic architecture of kidney diseases, taking into account the full spectrum of genetic variation. To date, monogenic diseases have focused on rare disease-causing mutations, and complex diseases, on the effects of common susceptibility variants, partly due to technical reasons (see Table 1). However, the kidney phenotype in any individual patient likely is a consequence of both rare and common genetic risk variants, including structural and possibly epigenetic modifications, as well as their interaction with each other and with environmental factors.
      That there is a discrepancy between heritability estimates and the amount of phenotypic variation that can be explained by the risk variants identified to date has been termed “missing heritability.” For example, the estimated heritability for eGFR is ∼40%, yet common risk variants identified by GWAS together explain only ∼1.4% of the variance in eGFR. Potential explanations for this gap could be the presence of additional risk variants not captured by common GWAS SNPs, unmodeled gene-gene and gene-environment interactions, parent-of-origin effects, or inflated previous heritability estimates. Evaluation of the contribution of rare variants with presumably larger effect sizes on kidney disease susceptibility therefore is an important future direction for complex kidney diseases. For monogenic kidney diseases, whole-exome and whole-genome sequencing will allow not only for the identification of possibly disease-causing mutations, but also for the evaluation of an additional impact from common risk variants and modifier genes.
      Another challenge that follows from the identification of individual genetic susceptibility variants is the identification of individualized therapeutic strategies based on genetic risk profile. Although it is a realistic option to initiate general treatment and monitoring early in high-risk individuals, treatment of kidney diseases based on a specific genetic profile currently is in its infancy. This type of truly personalized medicine has already become reality in some other fields, with a prominent example being the evolving genotype-guided dosing of patients starting therapy with the anticoagulant warfarin. In a recent randomized controlled trial, genotype-based dosing of warfarin improved prediction of the therapeutic dose over that based on clinical parameters only. More generally, ethical challenges arise from the availability of complete genomic information. These include the handling of incidental findings, the presence of multiple rare mutations of unclear functional significance, and limited treatment options for the predicted disease.
      A promising future direction is the elucidation of physiologic mechanisms underlying kidney function by linking genetic variation with variation in biological pathways. The snapshot of the latter can be obtained by comprehensive measurement using high-throughput approaches such as metabolomics. For example, by linking genetic variation to concentrations of serum metabolites, the authors of a recent study found that genetic variants at the previously described CKD risk locus NAT8 associated with concentrations of N-acetylornithine. Concentrations of N-acetylornithine then were found to associate with reduced eGFR, motivating further research into a potential role of ornithine acetylation in kidney disease.

      Additional Readings

      • »
        Burmester JK, Berg RL, Yale SH, et al. A randomized controlled trial of genotype-based Coumadin initiation. Genet Med. 2011;13(6):509-518.
      • »
        Suhre K, Shin SY, Petersen AK, et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature. 2011;477(7362):54-60.
      • »
        Zaitlen N, Kraft P. Heritability in the genome-wide association era. Hum Genet. 2012;131(10):1655-1664.

      Conclusion

      In recent years, genetic investigations of kidney disease have provided many novel insights into not only monogenic but especially also complex forms of kidney disease. Although truly personalized medicine based on a patient's individual genomic information is still in its infancy, findings from Mendelian kidney diseases and recently also from complex kidney diseases have already proved highly useful in the identification of pathways that contribute to kidney disease pathophysiology. A better understanding of these mechanisms can serve as a basis for the identification of novel therapeutic targets. Whole-genome sequencing will lead to additional important insights into both monogenic and complex kidney diseases, and we anticipate the continuation of the rapid progress made in the field of kidney disease genetics.

      Acknowledgements

      Support: Drs Li and Köttgen were funded by the Emmy Noether Programme of the German Research Foundation (KO 3598/2-1 to Dr Köttgen).
      Financial Disclosure: The authors declare that they have no relevant financial interests.

      Supplementary Materials