The Diagnostic Power of DNA Methylation
How epigenomic analysis of peripheral blood, coupled with machine learning, could give labs access to clinically relevant information
At a Glance
- DNA methylation can be an important diagnostic indicator of genetic disease
- When combined with machine learning, methylation analysis becomes a powerful tool that grows more accurate and effective over time
- Methylation analysis can help determine whether variants of unknown significance are pathogenic
- Creating open databases will help facilitate the continued growth of epigenetics
When my career started, constitutional genetics and epigenetics were limited to very specific purposes, such as imprinting disorders or specific methylation assays. About 10 to 15 years ago, this targeted approach began to give way to genome-wide methods using micro-arrays and similar technologies. Baylor College of Medicine, where I had my clinical fellowship, was one of the hubs at the forefront of introducing these technologies, and now they’ve carried out over 100,000 pediatric microarrays. In the process, they have discovered new clinical associations for dozens of new microdeletion and microduplication syndromes. The work set a precedent – and it got me to start thinking about methylation technologies as something that I could exploit in a similar way.
My current lab at the London Health Sciences Centre, Canada, has been working with peripheral blood and detecting DNA methylation within it. We’re now at the point where we’ve tested thousands of patients in large, internal databases, and we’ve adopted a machine learning approach to gain new patient cohorts and ask the question: do patients with intellectual disabilities have identifiable and specific changes in their epigenomes – and can we find the answer in peripheral blood?
We discovered that, although the changes were mainly caused by gene mutations, there were significant genome-wide DNA methylation alterations as well – and that we could, in fact, detect them in peripheral blood. Carrying out these DNA methylation tests allowed us to determine, with absolute certainty, whether a variant of unknown significance (VUS) was pathogenic. A pathogenic VUS results in a consequent “epi-phenotype” DNA methylation signature, which we used as a secondary assessment in our investigations.
We looked into neurodevelopmental Mendelian disorders in children caused by epigenetic mutations involved in DNA methylation. Our goal was to demonstrate that we could combine machine learning with DNA methylation epi-signatures to yield a diagnostic indicator of the disorders. We recently published a number of papers describing clinical genetic variation in neurodevelopmental syndromes, including one that demonstrates the broad applicability of machine learning algorithms to the systematical identification of specific (epi)genetic disorders in peripheral blood samples (1).
Machine learning
I’ve found machine learning approaches to be useful in epigenetic analysis. We have developed an application of machine learning that allows us to assess DNA methylation samples from patient cohorts and identify unique epigenetic signatures. Our recent publication refers to a methylation classification score that predicts the probability of a methylation profile relating to any of our tested conditions with an epi-signature. This is scored from zero to one (a higher score means a higher chance of carrying a methylation profile), and it is entirely generated through machine learning. It allows users to concurrently generate scores for multiple conditions for various disease groups; in our study, for instance, we focused on Kabuki syndrome, ATRX syndrome, Sotos syndrome, CHARGE syndrome, Floating-Harbor syndrome, ADCA-DN, and intellectual disabilities caused by KDM5C.
The idea is that, once we’ve built these scores for any condition that has an epigenetic signature (and we have created a database of references), we could introduce those scores and the database into a clinical setting. Any lab in the world could generate DNA methylation profiles with a basic piece of equipment – and, when paired with microarray screening or exome sequencing, it could form an ad hoc diagnostic. Best of all, it’s a self-learning system, meaning that the more samples the database includes, the more refined the methylation signatures and scores become – our study demonstrated over 99 percent sensitivity and very high specificity.
There’s so much more we can explore with methylation and machine learning, and until we have databases with tens or even hundreds of thousands of signatures and scores, I don’t think we’re realizing its full potential. In other words, even with such useful information and high sensitivity, I think we’re only at the very early stages of taking epigenomics in this capacity to the clinical setting.
I’d compare it to the introduction of microarray screening in the pediatric population 15 years ago. Back then, only a dozen or so microdeletions were known – but now that millions of children have undergone screening and been included in the databases, we’ve seen the publication of hundreds and hundreds of papers that refine and define new microdeletion/duplication syndromes.
I think the machine learning approach is going to be high-throughput and computational without necessarily requiring a prior hypothesis – but I also think that hypothesis-driven analysis will allow us to derive more accurate data than we would otherwise. In some of our earlier publications, my lab used different kinds of statistical methods, but over the last year and a half, we’ve narrowed down our machine learning approaches. Now, they are more accurate and systemic, allowing us to derive signatures from conditions that would have eluded us before.
The pediatric perspective
Changes seen in pediatric patients are often associated with an underlying genotype. When we see a genome-wide methylation change, it is very highly specific and involves dozens or even hundreds of loci spread across the genome. Pediatric or constitutional hereditary testing is low-hanging fruit from an analytical standpoint, and introducing it into routine clinical care seems achievable – so that’s what we’re gearing up to do next.
There is a plethora of labs that do genetic testing, and there are hundreds, if not thousands, of patients who have reports with VUS. We now have the ability to resolve these through a simple peripheral blood DNA methylation test. Once this is rolled out into clinical settings, I think it’s only a matter of time until we’re able to tackle imprinting disorders and conditions like Fragile X. All it takes is some time to grow the size of the database, which will let us capitalize on the additional statistics to discover new things.
For now, we’re focused on pediatric constitutional hereditary conditions, but we have ongoing projects to expand beyond that. I think the technology is broadly applicable to areas outside those types of conditions. The issue is that, especially in oncology, DNA methylation changes are not often as specific as what we’re seeing in these constitutional “epi-syndromes.” DNA methylation is one of the only changes that is ubiquitous across cancer types, which means it could yield valuable information if we can develop a full understanding of its complexities.
We are also in the process of validating our technology for BRCA1 and BRCA2 gene deletion testing. The question here is: are these patients also affected by the “loss” of these genes through DNA methylation changes? The question is clinically significant because certain tumor subtypes that display a loss of BRCA1 or BRCA2 genes are responsive to PARP inhibitor therapy. We’ve performed DNA methylation testing on buccal swabs and many other tissue types, including tumor tissues, but the end goal is to perform it in peripheral blood – a liquid biopsy for DNA methylation.
Building databases
From my perspective, there are very few challenges standing in the way of making this testing method a clinical reality. The actual technology used for testing is widely available and not proprietary. The challenges mostly revolve around the licensing and reimbursement of testing. This is a completely new clinical diagnostic approach and, as such, clinical utility will need to be better defined. In the future, clinical guidelines will need to be written to address the most appropriate utilization of such testing along with existing clinical diagnostic technologies.
One significant limitation is the time and effort needed to create databases that can safely and efficiently store data. There are two main solutions: you can either try to build your own database over time, or you can partner with others to become an informatics/database resource for other labs that want to do the same. I think the latter option is where the future of information in this field is heading. I believe we’ll arrive at a point where databases – and the large cohorts of data within them – are shared and mutually accessible. At the moment, there are private databases and patent-locked data – but when that information becomes open to all, it enables more approaches and interpretations than any private collection of information, no matter how large. Resources like ClinVar (2), funded by the NIH, benefit the scientific community by helping to facilitate such data sharing. I predict that these types of databases are going to become increasingly important over time. Currently, they are primarily focused on curating genetic information – but I think that, as epigenetic information becomes more clinically useful, databases will expand. We’re sitting atop the tip of an epigenomics iceberg; I think that it will be the next big field to expand in the clinical diagnostic community.
About a decade ago when I was a postdoc in Toronto, I regularly chatted with colleagues in the epigenetics journal club about the hypothetical idea of genome-wide epigenomics as a clinically useful tool. We always thought it might eventually happen, but we never even dreamt about the possibility of using global epigenetics to direct immediate patient care – and now, it seems like that may soon be a reality.
Bekim Sadikovic is Head of Molecular Genetics at London Health Sciences Centre (LHSC), and Associate Professor at Western University, London, Canada.
- E Aref-Eshghi et al., “Genomic DNA methylation signatures enable concurrent diagnosis and clinical genetic variant classification in neurodevelopmental syndromes”, Am J Hum Genet, 102, 156–174 (2018). PMID: 29304373.
- O Griffith, H Rehm, “Variant database collaborations – for cancer and beyond”, The Pathologist, 40, 40–43 (2018). Available at: bit.ly/2umjGXd.
Bekim Sadikovic is Head of Molecular Genetics at London Health Sciences Centre (LHSC), and Associate Professor at Western University, London, Canada.