Like most websites The Translational Scientist uses cookies. In order to deliver a personalized, responsive service and to improve the site, we remember and store information about how you use it. Learn more.
Tools & Techniques Cancer, Informatics

Beyond the Moon

Over the past few months, the scientific community has responded eagerly to the creation of the National Cancer Institute’s Genomic Data Commons (GDC) – a first-of-its kind, open-access cancer database that will ultimately help advance Vice President Joe Biden’s Cancer Moonshot Initiative.

The GDC is a step in the right direction and has the potential to help the scientific community advance their understanding of complex diseases, such as cancer. Public data sets, including The Cancer Genome Atlas (TCGA) and the 1000 Genomes Project, have already contributed to our evolving understanding of, and approach to, disease research.  For example, according to the National Cancer Institute and National Human Genome Research Institute, the publicly available TCGA dataset includes 2.5 petabytes of data from over 11,000 patients, and has already contributed to more than a thousand cancer studies for 33 types of cancer. And though most agree that greater data sharing will benefit cancer researchers, the details of how best to support such a monumental database are less clear and present a number of interesting challenges. Through my experience working with customers and their multitude of research partners, I know that developing the necessary infrastructure to support the integration of data from varying sources – and different types – will be the cornerstone of success for this unique database.

As we’ve seen in other examples of translational research and pharmaceutical R&D, increasingly large datasets from diverse high-content methodologies, such as genomics, are typically stored in silos, which makes access and searching more difficult (or impossible). Here are some of the most common challenges we’ve seen researchers and scientists encounter when trying to integrate disparate data sources and varieties:

  1. Availability of data. The willingness and ability of researchers to share their data varies; some organizations may not want to share proprietary information about their genomic trials.
  2. Consent and legal issues. Publication of data may not be a part of patient consent procedures.
  3. Scope. Although genomics is an important piece of translational medicine, there are many other profiling technologies not supported by the GDC. A good example comes from PerkinElmer’s Quantitative Pathology team. While PD-L1 expression (genomics in nature) is an important biomarker for cancer immunotherapies, studies have shown that spatial distribution of immune system cells around the tumor can also be a predictor of response to treatment. The digital pathology data required for this kind of analysis are not currently in scope for the GDC.
  4. Access control. The GDC is designed to be an open platform and has little focus on restrictions. Though an open-access strategy makes sense for sharing public data, access controls are an important and complex part of a commercial solution dealing with clinical data.
Researchers need – and want – to be able to easily aggregate internal and external data, while maintaining their focus on the science.

Researchers need – and want – to be able to easily aggregate internal and external data, while maintaining their focus on the science. Complementary systems offered by experienced and specialized companies can help mitigate the challenges and advance collaborative efforts. Ultimately, the data that need to be integrated fall into three categories: public data, in-house data that could be public, and in-house data that cannot be public. The GDC gives companies the tools to make the second kind of data public. However, it’s the integration tools that will allow companies to merge their proprietary data (e.g., data from ongoing clinical trials or patient data from patients who have not consented for their data to be publicly available). These integration solutions can contribute to greater insights and faster conclusions about potential treatments.  With self-service access to a wide variety of data, researchers can more efficiently identify and manage biomarkers, which could help to streamline the development of drugs tailored to unique health needs.

Data sharing in itself is not enough to accelerate cures – researchers also need the appropriate tools to interpret, visualize, and analyze data. Ultimately, ensuring the success of the GDC may not only lead us closer to a cure for cancer, but also transform the way we approach translational research for a wide range of additional diseases.

Receive content, products, events as well as relevant industry updates from The Translational Scientist and its sponsors.

When you click “Subscribe” we will email you a link, which you must click to verify the email address above and activate your subscription. If you do not receive this email, please contact us at [email protected].
If you wish to unsubscribe, you can update your preferences at any point.

About the Author
Jens Hoefkens

Jens Hoefkens is the Director of Research in Strategic Marketing, at PerkinElmer.

Register to The Translational Scientist

Register to access our FREE online portfolio, request the magazine in print and manage your preferences.

You will benefit from:

  • Unlimited access to ALL articles
  • News, interviews & opinions from leading industry experts