Abstract
Indian demographic history includes special features such as founder effects, interpopulation segregation, complex social structure with a caste system and elevated frequency of consanguineous marriages. It also presents a higher frequency for some rare mendelian disorders and in the last two decades increased prevalence of some complex disorders. Despite the fact that India represents about one-sixth of the human population, deep genetic studies from this terrain have been scarce. In this study, we analyzed high-density genotyping and whole-exome sequencing data of a North and a South Indian population. Indian populations show higher differentiation levels than those reported between populations of other continents. In this work, we have analyzed its consequences, by specifically assessing the transferability of genetic markers from or to Indian populations. We show that there is limited genetic marker portability from available genetic resources such as HapMap or the 1,000 Genomes Project to Indian populations, which also present an excess of private rare variants. Conversely, tagSNPs show a high level of portability between the two Indian populations, in contrast to the common belief that North and South Indian populations are genetically very different. By estimating kinship from mates and consanguinity in our data from trios, we also describe different patterns of assortative mating and inbreeding in the two populations, in agreement with distinct mating preferences and social structures. In addition, this analysis has allowed us to describe genomic regions under recent adaptive selection, indicating differential adaptive histories for North and South Indian populations. Our findings highlight the importance of considering demography for design and analysis of genetic studies, as well as the need for extending human genetic variation catalogs to new populations and particularly to those with particular demographic histories.
References
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664. doi:10.1101/gr.094052.109
Al-Kandari W, Jambunathan S, Navalgund V et al (2007) ZXDC, a novel zinc finger protein that binds CIITA and activates MHC gene transcription. Mol Immunol 44:311–321. doi:10.1016/j.molimm.2006.02.029
Al-Mayouf SM, Sunker A, Abdwani R et al (2011) Loss-of-function variant in DNASE1L3 causes a familial form of systemic lupus erythematosus. Nat Genet 43:1186–1188. doi:10.1038/ng.975
Balaresque PL, Ballereau SJ, Jobling MA (2007) Challenges in human genetic diversity: demographic history and adaptation. Hum Mol Genet 16 Spec No:R134–R139. doi:10.1093/hmg/ddm242
Bamshad MJ, Ng SB, Bigham AW et al (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12:745–755. doi:10.1038/nrg3031
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265
Basu Mallick C, Iliescu FM, Möls M et al (2013) The light skin allele of SLC24A5 in South Asians and Europeans shares identity by descent. PLoS Genet 9:e1003912. doi:10.1371/journal.pgen.1003912
Basu A, Mukherjee N, Roy S et al (2003) Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res 13:2277–2290. doi:10.1101/gr.1413403
Bittles AH (2010) Consanguinity, genetic drift, and genetic diseases in populations with reduced numbers of founders. In: Speicher MR, Stylianos E, Antonarakis AGM (eds) Vogel Motulsky’s human genetics problem approaches. Springer-Verlag, Berlin, pp 507–528
Bosch E, Laayouni H, Morcillo-Suarez C et al (2009) Decay of linkage disequilibrium within genes across HGDP-CEPH human samples: most population isolates do not show increased LD. BMC Genom 10:338. doi:10.1186/1471-2164-10-338
Bowdish DM, Sakamoto K, Lack NA et al (2013) Genetic variants of MARCO are associated with susceptibility to pulmonary tuberculosis in a Gambian population. BMC Med Genet 14:47. doi:10.1186/1471-2350-14-47
Bustamante CD, Burchard EG, De la Vega FM (2011) Genomics for the world. Nature 475:163–165
Cann HM, de Toma C, Cazes L et al (2002) A human genome diversity cell line panel. Science 80(296):261–262
Carlson CS, Eberle MA, Rieder MJ et al (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120
Casals F, Bertranpetit J (2012) Genetics. Human genetic variation, shared and private. Science 337:39–40. doi:10.1126/science.1224528
Casals F, Sikora M, Laayouni H et al (2011) Genetic adaptation of the antibacterial human innate immunity network. BMC Evol Biol 11:202. doi:10.1186/1471-2148-11-202
Casals F, Hodgkinson A, Hussin J et al (2013) Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLoS Genet 9:e1003815. doi:10.1371/journal.pgen.1003815
Chadha VK, Kumar P, Jagannatha PS et al (2005) Average annual risk of tuberculous infection in India. Int J Tuberc Lung Dis 9:116–118
Chakrabarti B, Kumar S, Singh R, Dimitrova N (2012) Genetic diversity and admixture patterns in Indian populations. Gene 508:250–255. doi:10.1016/j.gene.2012.07.047
Consortium IGV (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20
Consortium TIGV (2005) The Indian Genome Variation database (IGVdb): a project overview. Hum Genet 118:1–11. doi:10.1007/s00439-005-0009-9
Court N, Vasseur V, Vacher R et al (2010) Partial redundancy of the pattern recognition receptors, scavenger receptors, and C-type lectins for the long-term control of Mycobacterium tuberculosis infection. J Immunol 184:7057–7070. doi:10.4049/jimmunol.1000164
Coventry A, Bull-Otterson LM, Liu X et al (2010) Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat Commun 1:131
Delaneau O, Marchini J, Zagury J-F (2012) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179–181. doi:10.1038/nmeth.1785
DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498
Fu W, O’Connor TD, Jun G et al (2012) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493:216–220
Gonzalez-Neira A, Ke X, Lao O et al (2006) The portability of tagSNPs across populations: a worldwide survey. Genome Res 16:323–330
Gravel S, Henn BM, Gutenkunst RN et al (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci USA 108:11983–11988
Izagirre N, García I, Junquera C et al (2006) A scan for signatures of positive selection in candidate loci for skin pigmentation in humans. Mol Biol Evol 23:1697–1706. doi:10.1093/molbev/msl030
Juyal G, Amre D, Midha V et al (2007) Evidence of allelic heterogeneity for associations between the NOD2/CARD15 gene and ulcerative colitis among North Indians. Aliment Pharmacol Ther 26:1325–1332. doi:10.1111/j.1365-2036.2007.03524.x
Juyal G, Midha V, Amre D et al (2009) Associations between common variants in the MDR1 (ABCB1) gene and ulcerative colitis among North Indians. Pharmacogenet Genomics 19:77–85. doi:10.1097/FPC.0b013e32831a9abe
Juyal G, Prasad P, Senapati S et al (2011) An investigation of genome-wide studies reported susceptibility loci for ulcerative colitis shows limited replication in North Indians. PLoS One 6:e16565. doi:10.1371/journal.pone.0016565
Keinan A, Clark AG (2012) Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 80(336):740–743
Kennedy RB, Ovsyannikova IG, Pankratz VS et al (2012) Genome-wide analysis of polymorphisms associated with cytokine responses in smallpox vaccine recipients. Hum Genet 131:1403–1421. doi:10.1007/s00439-012-1174-2
Kryukov GV, Pennacchio LA, Sunyaev SR (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80:727–739. doi:10.1086/513473
Laayouni H, Oosting M, Luisi P et al (2014) Convergent evolution in European and Rroma populations reveals pressure exerted by plague on toll-like receptors. Proc Natl Acad Sci USA 111:2668–2673. doi:10.1073/pnas.1317723111
Lamason RL, Mohideen M-APK, Mest JR et al (2005) SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310:1782–1786. doi:10.1126/science.1116238
Leutenegger A-L, Sahbatou M, Gazal S et al (2011) Consanguinity around the world: what do the genomic data of the HGDP-CEPH diversity panel tell us? Eur J Hum Genet 19:583–587. doi:10.1038/ejhg.2010.205
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760
Li Y, Vinckenbosch N, Tian G et al (2010) Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42:969–972. doi:10.1038/ng.680
Manolio TA, Collins FS, Cox NJ et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753. doi:10.1038/nature08494
Marth GT, Yu F, Indap AR et al (2011) The functional spectrum of low-frequency coding variation. Genome Biol 12:R84. doi:10.1186/gb-2011-12-9-r84
McKemy DD, Neuhausser WM, Julius D (2002) Identification of a cold receptor reveals a general role for TRP channels in thermosensation. Nature 416:52–58. doi:10.1038/nature719
McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. doi:10.1101/gr.107524.110
Metspalu M, Romero IG, Yunusbayev B et al (2011) Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia. Am J Hum Genet 89:731–744. doi:10.1016/j.ajhg.2011.11.010
Moorjani P, Thangaraj K, Patterson N et al (2013) Genetic evidence for recent population mixture in India. Am J Hum Genet 93:422–438. doi:10.1016/j.ajhg.2013.07.006
Negi S, Juyal G, Senapati S et al (2013) A genome-wide association study reveals ARL15, a novel non-HLA susceptibility gene for rheumatoid arthritis in North Indians. Arthritis Rheum 65:3026–3035. doi:10.1002/art.38110
Nelson MR, Bryc K, King KS et al (2008) The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am J Hum Genet 83:347–358. doi:10.1016/j.ajhg.2008.08.005
Nelson MR, Wegmann D, Ehm MG et al (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337:100–104. doi:10.1126/science.1217876
Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451. doi:10.1038/nrg2986
Peier AM, Moqrich A, Hergarden AC et al (2002) A TRP channel that senses cold stimuli and menthol. Cell 108:705–715
Pickrell JK, Coop G, Novembre J et al (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19:826–837
Pradhan S, Sengupta M, Dutta A et al (2011) Indian genetic disease database. Nucleic Acids Res 39:D933–D938. doi:10.1093/nar/gkq1025
Prasad P, Kumar A, Gupta R et al (2012) Caucasian and Asian specific rheumatoid arthritis risk loci reveal limited replication and apparent allelic heterogeneity in north Indians. PLoS One 7:e31584. doi:10.1371/journal.pone.0031584
Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. doi:10.1086/519795
Qin ZS, Gopalakrishnan S, Abecasis GR (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22:220–225. doi:10.1093/bioinformatics/bti762
Reich D, Thangaraj K, Patterson N et al (2009) Reconstructing Indian population history. Nature 461:489–494. doi:10.1038/nature08365
Rosenberg NA, Mahajan S, Gonzalez-Quevedo C et al (2006) Low levels of genetic divergence across geographically and linguistically diverse populations from India. PLoS Genet 2:e215. doi:10.1371/journal.pgen.0020215
Sabeti PC, Schaffner SF, Fry B et al (2006) Positive natural selection in the human lineage. Science 80(312):1614–1620
Sabeti PC, Varilly P, Fry B et al (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449:913–918. doi:10.1038/nature06250
Sironi M, Clerici M (2010) The hygiene hypothesis: an evolutionary perspective. Microbes Infect 12:421–427
Tennessen JA, Bigham AW, O’Connor TD et al (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337:64–69. doi:10.1126/science.1219240
Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4:e72
Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164. doi:10.1093/nar/gkq603
Weir BS, Hill WG (2002) Estimating F-statistics. Annu Rev Genet 36:721–750. doi:10.1146/annurev.genet.36.050802.093940
Xing J, Watkins WS, Hu Y et al (2010) Genetic diversity in India and the inference of Eurasian population expansion. Genome Biol 11:R113. doi:10.1186/gb-2010-11-11-r113
Acknowledgments
We thank Lara Nonell and Eulàlia Puigdecanet from the Servei d’Anàlisi de Microarrays (IMIM) for their invaluable help. We would like to acknowledge David Sondervan and Ingrid Bakker from the section Medical Genomics of the VUMC for sequencing of the samples. We thank Dr. A. R Rao and Dr. Namita Sidhu from IASRI, New Delhi, India for statistical assistance in the early part of the study. We deeply thank Txema Heredia, Ángel Carreño and Jordi Rambla for computational support, Marc Pybus for his help in the selection analysis, and David Comas for critical reading of the manuscript. International fellowship funded by Center for Neurogenomics and Cognitive Research (CNCR), VU, Amsterdam, The Netherlands to GJ; Research grant from J C Bose fellowship to BKT; grant # BT/01/COE/07/UDSC to BKT and salary support to GJ are gratefully acknowledged. FC was supported by a Beatriu de Pinós (2010-BP- B-00128) fellowship and MM by a PhD grant both from AGAUR (Generalitat de Catalunya). Funding to FC by grant SAF2012-35025 from the Ministerio de Economía y Competitividad (Spain); Funding to JB by grants BFU2010-19443 from the Ministerio de Ciencia y Tecnología (Spain), PRI-PIBIN-2011-0942 from the Ministerio de Economía y Competitividad (Spain), and from the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101).
Author information
Authors and Affiliations
Corresponding authors
Additional information
G. Juyal and M. Mondal have contributed equally to this work.
B. K. Thelma and F. Casals are co-senior authors.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Juyal, G., Mondal, M., Luisi, P. et al. Population and genomic lessons from genetic analysis of two Indian populations. Hum Genet 133, 1273–1287 (2014). https://doi.org/10.1007/s00439-014-1462-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-014-1462-0