Background Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world.
Objectives To build a representative CNV map for the Han Chinese population.
Methods We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery.
Results A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population.
Conclusions Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population.
- Copy number variation
- Dialect groups
- Next generation sequencing
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Contributors SX and LJ conceived the study. SX and FZ designed and supervised the project. SY and YY contributed to sample collection. JL, CL, BF, FP, JW, QW and YL managed laboratory work and contributed to data analysis. SX, LJ and FZ contributed reagents and materials. DL, CZ and XW developed pipeline for processing NGS data and performed variant calling analysis. HL, RF, ZW and XZ developed pipeline for structural variation analysis and prepared for the supplementary information. SX, HL and FZ wrote the main paper. All authors read the manuscript and discussed the results.
Funding This work was supported by National Basic Research Program of China (2012CB944600), the Strategic Priority Research Program (XDB13040100) and Key Research Program of Frontier Sciences (QYZDJ-SSW-SYS009) of the Chinese Academy of Sciences (CAS), National Natural Science Foundation of China (NSFC) grant (31625015, 91331204, 31525014, 31521003, 31601046, 31711530221, 3150101122 and 31571297), the Program of Shanghai Academic Research Leader (16XD1404700), the National Key Research and Development Program (2016YFC0906403), the Science and Technology Commission of Shanghai Municipality (STCSM) (14YF1406800 and 16YF1413900). SX is Max-Planck Independent Research Group Leader and member of CAS Youth Innovation Promotion Association. SX and FZ also gratefully acknowledge the support of the National Program for Top-notch Young Innovative Talents of The 'Wanren Jihua' Project. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Competing interests None declared.
Patient consent Obtained.
Provenance and peer review Not commissioned; externally peer reviewed.