Main

Genome-wide association studies (GWAS) seek to establish associations between genomic variants and diseases or quantitative traits.1, 2 The methodology involves the genotyping of a large number of genomic variants – presently usually just over 1 million variants per individual – for a large number of individuals with and without the disease of interest, or for those who show variance on the trait(s) in question. To identify these small genetic contributions, such studies usually involve at least a thousand individuals, and usually many more. Advances in the quality and efficiency of sequencing methods, such as the next generation sequencing techniques, will assist GWAS and generate vast quantities of rich data on many thousands of individuals.3, 4 With the cost of whole genome sequencing technologies falling rapidly, it is anticipated that whole genome sequences will no longer be the result of large scientific endeavours such as the Human Genome Project, but will be routinely used in GWAS and other studies.

The use of GWAS in medical research and the increased ability to share data give a new twist to the perennial issues of consent, feedback of results, privacy, and the governance of research, as many commentators have discussed.5, 6, 7, 8 GWAS create particular challenges because they produce fine, detailed, genotype information at high resolution, and the results of more focused studies can potentially be used to determine genetic variation for a wide range of conditions and traits. Although samples and data will have personal identifiers removed, individuals may still be re-identifiable because of the richness of the data derived from the analysis. The information from a GWA scan is derived from DNA that is a powerful personal identifier, and can provide information not just on the individual, but also on the individual's relatives, related groups, and populations. Furthermore, it creates large amounts of individual-specific digital information that is easy to share across international borders. The data produced are often shared informally, but more formal mechanisms have been put in place by funders to ensure the rapid sharing of GWAS data, such as the requirements to deposit data sets in open access archives. Examples are the European Genotype Archive and dbGaP (NIH-USA).

Many of the ethical challenges of GWAS arise from the quantity and significance of the data generated, and these issues will be heightened by the new sequencing techniques under development. Hence, many of the questions that lie ahead of us will have been foreshadowed by GWAS and the debates around ethical and policy issues that these have created. It is appropriate, therefore, to revisit some of the ethical issues in contemporary genomic studies and examine how they have arisen, and how they might best be addressed. This paper focuses on four key areas of particular significance for the use of whole genome techniques in medical research: consent; feedback of incidental findings; privacy; and the governance of research.

The context of genomic research

There are three important aspects of the context of whole genome research methods that may influence the ethical challenges that they raise. First, these methods are being used alongside the increasing accumulation of samples and information, which are held by projects, large international consortia, or within biobanks. Such resources can then be used for a number of different research projects as the information is compared, used, and exchanged between researchers who come together to address specific research questions. These data and sample storage infrastructures meet the scientific need for very large sample sizes to understand the complexity of genotypic, phenotypic, and environmental interactions. Funders in collaboration with the scientific community have been instrumental in facilitating and supporting such trends, with the creation of genomic reference libraries such as the Human Genome Project, the HapMap project, and more recently, the 1000 Genomes project, with the purpose of increasing scientific advances.

Second, data sharing is becoming a crucial element of scientific policy with the development of more open access guidelines. Funders are encouraging researchers to deposit data created in newly-funded projects to be shared with others, and increasingly the presumption is that the data should be shared.9, 10 Funders have also been active in financing and supporting the establishment of data-generating projects, especially for the creation of sequence data using GWAS methods. Examples of such projects are the Wellcome Trust Case Control Consortium (WTCCC) and MalariaGEN in the United Kingdom and the Genetic Association Information Network (GAIN) in the United States. The data created by these projects can be a valuable resource for potentially many different research projects and purposes. This raises a number of ethical issues as many of the principles and procedures in medical research are not designed for wide-scale data sharing.

Third, these genomic methods are used in a social context where privacy concerns about the loss and unauthorized use of digital information have come to take a prominent place, and where the democratization of biological technologies means that individuals have greater access to their own genomes through direct-to-consumer genetic testing companies as well as genealogical registries and ancestry tracing companies. These events outside of medical research cannot be ignored, as there is the potential for these to have an effect on the medical research context, which in the case of an unfavourable event could undermine the public trust and support that is necessary for medical research to continue and to thrive. The ethical challenges raised by the use of whole genome sequencing methods and data sharing, such as consent, need to be considered in the light of these events occurring in the broader social context.

Consent

The notion that informed consent is needed from participants in research has been a sine qua non of research ethics since the mid-twentieth century. This conception of consent, along with the concomitant power to withdraw from research without prejudice, arose originally in the context of biomedical research. It had the aim of protecting participants from abuse and from potential physical harms, and focused on clinical interventions and on the collection of samples rather than on data collection per se. In its inception, informed consent was strongly concerned with the protection of individuals. Genomics research, however, moves away from these origins on several counts.

In genomics research, potential medical and physical harms are less relevant than potential harms of infringements of privacy and the misuse of information. It is questionable how informed consent, traditionally conceived, can accommodate these concerns, which are of course heightened as research techniques produce ever more detailed information, and as more and more genomic data are shared. The complexity of genomics research, together with the difficulty of providing precise specifications of future use of data, have also prompted serious concerns about whether any consent to such research can be adequately ‘informed’. Furthermore, although the traditional notion of informed consent focuses on the individual, in genomics research there are pressing issues also for the family, community, and population groups.

Recent attempts to suggest solutions to these problems reveal substantial disagreements over both fundamental conceptual issues and practical details. There is disagreement over the place of informed consent as a central norm in genomics research ethics and whether a broad consent is more appropriate for the collection of samples and information that will involve many future research projects. It has also been suggested that informed consent should carry less ethical weight and that alternative values should take a more central role. For instance, some have argued that veracity could become a key value in truthfully explaining the limitations of withdrawal and of confidentiality to participants.8 A more common approach has been to retain the values that underpin informed consent, such as autonomy and self-determination, but to use other mechanisms to give voice to them. Suggestions are to supplement informed consent with governance structures, often with participant representation;7 and to enable individuals to have greater say about the use of their information through information technology. Such mechanisms would also provide an audit trail to ensure the proper use of data derived from a GWAS.

Existing samples and genomic data are seen as valuable resources for GWAS, especially given the need for large sample sizes, but there is no consensus about their reuse or the issue of re-consent. This again raises the question of how central a role the informed consent of individuals is to whole genome research. There are those who argue that risks to individuals are limited and of a different nature to the risks of clinical research, and that the furtherance of medical science is a valuable goal that takes precedence. Likewise, the ethical basis for any continued guarantee of the possibility of withdrawal of consent is in dispute. It could be grounded on protection from harms, or on autonomous control over personal data. Different practical implications arise from these different grounds. There is disagreement over the extent that there can be complete withdrawal, and how this might be done, as this can potentially be difficult and costly when data sets are shared.

Finally, the implications of participation in genomics research extend beyond an understanding of consent grounded in individual rights. Many have suggested encouraging discussions on research participation with family members. Resolving this point, however, necessitates grappling with the difficult confluence of more individualistic and more communitarian approaches to ethics. There is the danger that suggestions to involve the family will be nothing more than gestures at this major problem; on the other hand, a perhaps graver danger is that these vague admonitions will overlook the delicate and often charged complexity that makes up families, and that harms may result. There is a pressing need to learn from insights gained elsewhere, such as in genetic counselling and in family studies. Likewise, calls to involve the community in consent pose large ethical issues about individual and group rights, which may be different for communities across the globe.

Feedback of findings

Feedback of findings may be considered to be an important part of building and maintaining public trust in research. Providing participants with information about the general findings of research, such as publications based on the research, is an uncontroversial and welcome practice. In contrast, the feedback of individual results remains controversial in many areas of research11 and particularly in the area of whole genome sequencing. Areas of research where relative agreement has been reached include research involving MRI and CT scans, as well as communicable diseases.12 In these settings, the nature of the technology and the science involved means that researchers are sometimes faced with findings of clear, verifiable clinical significance for research participants. In the context of genetic studies, there appears to be some consensus that, where there is a serious, treatable condition, researchers or research teams have a moral obligation to feed this information back to research participants.12, 13 In cases where findings are of a less serious nature, untreatable, or of uncertain significance, the potential benefits for participants of being informed need to be balanced against the participant's right not to know. The thoughtful handling of such issues is of clear relevance to the maintenance of public trust in the research process.14

Feeding back research results from genomics studies is complicated by a number of factors.11, 15, 16 The most significant is that GWA is a research tool and is not designed for clinical diagnosis. Therefore, the results from GWA studies are applicable only in exceptional cases of clear clinical utility for research participants, and the techniques used in a research setting are generally not of the standard that would be required for clinical validation.1 Many of the discoveries of whole genome methods identify genetic variants that are responsible for very small increases in disease risk. It is quite common for putative associations not to be replicated in subsequent research. Even for valid associations, the significance for individuals can be very hard to interpret, especially for complex disorders where very many genetic variants, epigenetic effects, and environmental factors all have a part in disease genesis. In time, this currently uncertain information may also be shown to be of clinical relevance, raising the possible question of future contact of participants for feedback. However, even now, the fine detail results obtained from GWAS can provide information on a number of conditions, which can lead to the possibility of incidental findings of known variants that are clinically valid.15

In such situations, there is a need for management pathways14 to determine whether or what information is fed back; who is responsible for feeding back results; and whether this also extends to other family members. Genomics is a highly complex field, bringing together experts from different disciplines, and extremely high levels of expertise would be needed to interpret findings at the cutting edge of genomic science. Feeding back raw data without any clinical interpretation may be not only of limited use but also greatly misleading. However, good research practice would suggest that policies and procedures should be put in place to decide whether and how to feedback clinically valid results or incidental findings should these eventualities occur. Currently, these management pathways do not exist for all studies. This raises a number of questions. The first is the scope of the obligation to feedback and whether it includes the individual research participants as well as their families.11, 15 The second question is whether the obligation applies just to the researchers who conducted the original study or extends to secondary researchers who obtain the data through data sharing.11, 15 The third issue is how this should be done and what mechanisms need to be put into place to ensure that this happens in a way that is responsible and proportionate.

Finally, as with other ethical issues we have discussed, the changing landscape in which GWAS research takes place is likely to affect the question of feedback of findings. In particular, the introduction of consumer genetics services may change the expectations of participants in research. For example, 23andWe (23andMe's research arm) conducts research using the genetic and phenotypic information of individuals who have paid to use their service. Will participants in GWAS thus come to expect feedback of similar types of information to those provided by consumer genetics firms? The implications for the research relationship will be profound if there is a shift from participants as ‘health information altruists’17 to participants who have similar expectations to ‘customers’.

Privacy of research participants

Privacy and confidentiality have traditionally been the cornerstones of ethical medical research practice.18 Informed consent has been one way of protecting privacy by allowing individuals to choose whether they undertake the privacy risks associated with particular types of research. Although the clinical application of GWAS is still largely unknown, some of the potential privacy risks identified in genetics are the diagnosis of a disease or disclosure of paternity information that then becomes known to others, such as other family members, insurers, or employers. The protection of confidentiality in medical research has been maintained by safeguarding the identifiability of research participants by de-identifying data, and by keeping their records and personal information secure. Standard privacy measures have been the removal of identifiers and coding to de-identify data; placing firewalls between those who hold the coding keys and researchers; as well as restrictions on access to research data and the requirement of research ethics approval.

Research on existing samples or data that are anonymized or de-identified is seen as posing little risk to individual privacy, and in many cases could proceed without additional informed consent.19 However, the fine, detailed, uniquely identifiable genomic information that is produced by GWAS technologies, as well as the increase in data sharing, presents significant challenges to these traditional mechanisms of protecting privacy and confidentiality.

Individual sequence data can be used for many different research uses, as the same variants on the genome can be implicated in the expression of many different phenotypes. Once data have been generated by GWAS, it is of interest to many researchers and can be used for many research purposes on many conditions. At the time a sample is collected, it may be possible to inform individuals of the initial research use, but it is very difficult to provide individuals with all the necessary information about the secondary research uses of the data and all the researchers who will have access to the summary or raw data that are generated by GWAS, so as to enable them to make an informed decision about whether they are willing to accept the possible privacy risks of genomics research. Informed consent for every secondary research use is very difficult to achieve unless there is ongoing contact with research participants. A practical solution has been the use of broad consent, which is consent to a broad range of research uses, coupled with approval by a research ethics committee. This is controversial as it undermines the fundamental principle of medical research and the right to privacy, whereby individuals should have the knowledge in advance and be able to choose how their personal information is used.20

Efforts to protect privacy by de-identifying the data produced by GWAS are also problematic, as the traditional dichotomy between identifiable and non-identifiable data is hard to ensure.21 What is non-identifiable at present may not remain so with increased knowledge of the genome and more sophisticated analytical and statistical techniques. Genomics is moving very fast. Even 2 years ago, the state of the art was such that identifying individuals from genomic data was difficult, expensive, and time-consuming.5 However, in 2008, Homer et al22 showed a method of identifying individuals within a pooled sample from reference samples. Sharing of research data and technological advances will increase the possibility of re-identification of individuals. In addition, biological, mathematical, and statistical techniques are likely to continue to be developed to make identification easier in the future from smaller samples of DNA sequence.

Identification is also possible not just within a single database, but by linking together multiple sources of information, whether genomic, medical or social, so that it is possible to infer (or reveal) an individuals’ identity. Within the genomics community, new resources are being created through the linkage of existing data sets or through new biomedical resources and biobanks. To carry out research, comparisons are routinely being made between web-based genomic reference libraries and the GWAS results held by researchers. Increasingly, the research data are being shared through informal means or deposition in archives or resources that are accessible by many researchers. The possibility of being able to directly identify individuals as well as infer an individual, family, or group identity becomes easier as the data become richer and more detailed through the linkage of different data sets. Recent research has shown that it is possible to triangulate data sources freely available to the research community to identify individuals believed to be anonymous.23

At the same time that detailed data sets are accumulating within the research community, cheaper sequencing techniques have enabled commercial companies to make genomic information available to the public over the internet. Consumers can now have access to commercial genotyping of their genome from direct-to-consumer testing companies. Some companies encourage individual subscribers to share their information with other family members and friends. Linking these sources to other information that is readily available on the internet, such as births, deaths, and marriage records held by government bodies, provides the means to make information that was previously thought to be non-identifying, potentially identifiable. These companies make whole genome information available to the general public and it will no longer be held exclusively in the hands of scientists. Researchers are bound by research governance requirements, professional codes of conduct, and obligations of confidentiality, but many who gain access to genomic information as it becomes more freely available will not necessarily be under similar obligations.

At the moment, the identification of an individual in a collection of samples may not disclose very much clinically relevant information about that individual. However, it is important to consider the implication of the disclosure not only on the basis of current knowledge, but also for the future. With a better understanding of the genome, sequence information may reveal more about an individual's risk of disease.24 Technical methods can be used to protect the privacy of individuals, such as limiting the proportion of the genome released, statistically degrading the data, and reversibly de-identifying through codes.5 However, as was shown by Homer's paper, what is reasonably believed to be anonymous at one point in time may not remain so in the future. Many commentators argue that absolute promises of privacy and confidentiality are simply not possible in the context of genomics research.5, 8, 22 Protecting the privacy of research participants has long been regarded as essential to maintain public trust in medical research. Making promises that cannot be honoured has the potential to undermine much of the goodwill that the public has in relation to medical research. A loss of public trust will have a detrimental effect on research recruitment and could have many long-term effects.

Governance of research

Medical research is largely governed at the national level through a number of checks and balances that are based on professional norms embedded in practice. These are in turn supported and strengthened by a range of national guidelines and institutional requirements that are enacted by key gatekeepers such as research ethics committees and to some extent, funders. Some countries have governance frameworks for research that are complex, contradictory, and confusing, with a number of different bodies asserting specific requirements and guidelines. These systems are severely tested in the case of global genomic research that involves data sharing and whole genomes. It is difficult for research ethics committees to exercise their mandate to oversee research and to protect the interests of research participants when their authority is nationally based and the assessment of the risks of such research involves specialist expertise. For many GWAS data-generating projects, special governance systems have been established to supplement current oversight systems.

Research ethics committees were established within institutions for individual research projects, rather than to assess modern multicentre projects, such as GWAS, that span international borders. These projects create similar problems for national research ethics frameworks around the world. Many research ethics committees do not have the appropriate expertise and knowledge to deal with the complex legal and ethical issues that GWAS activities raise, and neither do they have the appropriate authority. Research ethics committees have traditionally held the principal investigators accountable for the execution of ethical research. In the case of data sharing of GWAS summary data and fine detailed phenotypic data, it is virtually impossible to hold the original collector responsible for the research or activities of researchers, as they may not know who may be located outside of the jurisdiction and the research ethics committee control. It is therefore difficult for committees to continue adequate monitoring of research and data the generation of which they originally approved.9 The fact that governance structures and legal frameworks are nationally based also fails to address the reality of increasing global research activity in the field of genomics and the way in which the science is developing.

Research ethics committees have focused mainly on protecting research participants’ interests and so in assessing genomics research tend to focus on the consent process25 and on approving the reuse of samples.26 Such committees are already facing increasing challenges in reviewing complex research proposals, particularly in relation to proportionality assessment (risks, benefits, safeguards) informed consent, and privacy protection.27 The use of the GWAS technology further raises particular privacy and feedback issues that are often beyond the scope of the expertise of many research ethics committees. It is therefore not clear how already over-burdened committees could take on the role of monitoring and approving data access – a task that requires significant insight into the techniques used to produce and analyse data in genomics. Neither are they equipped to carry out privacy risk assessments of global data sharing.

As the mechanisms for establishing precedents in research ethics committee decision-making are informal, there can be also be considerable variation in the decisions of committees. It is in the area of emerging technologies and innovative, global, collaborative research proposals that there can be the most discrepancies in decision-making between ethics committees. Decisions and procedures may vary within regions in countries, but the differences are most acute between countries, and where differing, ethical and legal frameworks exist. This creates additional burdens for collaborative, international consortia that have to obtain research ethics approval in each country for different parts of the research project. Further, if samples and information are being collected in different jurisdictions of the consortium to address a particular research question, they may be subject to different research ethics decisions.

To combat some of these ethical concerns, special Data Access Committees (DACs) have been established in the data-generating projects, such as the WTCCC and GAIN, to supervise access to these GWAS data sets. DACs determine who should have access to data and on what grounds. This primarily involves establishing whether a scientist is a ‘bona fide researcher’. The criteria used for this assessment are still in the process of being developed and are neither uniform nor publicly known. Research approval that must still be obtained from a research ethics committee as DAC approval is simply for access to the data set. However, there is no one body that is looking at disclosure when all of the genomic data sets are analysed together and there is a possibility of identification by using material available on the web.9

The difficulty with the governance of GWAS through current oversight structures is that it is largely carried out at the national level, on a case by case basis, by bodies that have inadequate enforcement powers and whose focus does not address the increasingly complex implications of this kind of genomics research. The issue for new methodologies such as whole genome sequencing methods is that it takes some time before a critical amount of expertise is built up within the research ethics community to make approval for research projects an efficient and straightforward process. In the case of emerging technologies, this can result in frustration for researchers and a slowing down of the research process. Data sharing between large international projects, or researchers based in different jurisdictions, can also lead to multiple applications for the same research to different nationally-based bodies with different requirements.9

Conclusion

We are at an important time in terms of policy making and thinking about the future of genomics research. The technologies in relation to individual whole genome sequencing are developing quickly and at the same time there is increased data sharing between researchers, which is encouraged by funders and made possible because of the internet and e-science grid technology. This poses challenges for the traditional focus of research ethics on individuals, as concerns are widened out from the participant to family and population groups, and as responsibilities are widened from individual researchers in close contact with research recruits to large, often global, research networks that may store data and samples for indefinite periods into the future.

The use of whole genome methods gives a new twist to perennial ethical issues, such as consent, feedback, and the protection of privacy, and the governance of research. The fact that it is very difficult to obtain informed consent for all new research uses well into the future and that it is impossible to ensure the complete privacy of participants challenges some of the fundamental ethical principles of medical research. The potential threats to privacy, confidentiality, and associated informational harms created by global data sharing present somewhat different ethical questions to the physical risks that are centrally at issue within the sphere of clinical research ethics. We need to rethink the current reliance on anonymization and consent as sufficient safeguards to protect participant interests in research, particularly in the case of infrastructure that aim to link existing research archives and repositories of information. It is impossible to obtain the traditional standard of informed consent for every secondary research use unless there is ongoing contact with research participants. The mechanisms that must be developed to protect individual autonomy must be carefully thought through. This may require a reevaluation of the fundamental tenets of the participant, research, and society relationship and the basis on which this should rest for genomic research.

Importantly, too, the significance of these ethical challenges is shaped by the changing landscape outside of genomics research, a landscape that includes widespread concerns about the use of personal information, as well as the activities of private and commercial entities and not just academic and research institutions. The commercialization of genome sequences and the way that this is being made rapidly available to the public has an impact on the way that people think about genomic research conducted in academic and research institutions. In light of the technological advances in sequencing methods and greater data sharing, the following key areas need further consideration:

  • Alternative mechanisms to informed consent to allow individuals to exercise their autonomy;

  • Feedback and the development of feedback management pathways;

  • The development of governance mechanisms for research suitable for global data sharing.

These are difficult issues that need further thought and analysis to ensure the ongoing participation of participants in research and public trust.