Research Activities
Characterizing the cancer genome

Cancer is a disease of the genome that is driven by a combination of possible germline risk-alleles together with a set of “driver” somatic mutations that are acquired during the expansion of increasingly fitter clones. In order to generate a comprehensive list of all inherited germline events as well as the somatic events that occurred during life, we are developing and applying highly sensitive and specific tools for detecting different types of mutations in massively-parallel sequencing data. The volume and complexity of these data require developing computational tools using state-of-the-art statistical and machine learning approaches to extract the signal from the noise. Among these tools are MuTect (Cibulskis, et al., Nature Biotechnology 2013), dRanger & BreakPointer (Bass, Lawrence, et al., Nature Genetics 2011; Chapman, et al., Nature 2011; Drier, et al., Genome Research 2012), SegSeq (Chiang, Getz, et al., Nature Methods 2009), CapSeg (Landau, Carter, Stojanov, et al., Cell 2013), HapSeg (Carter, et al., Nature Precedings 2011), MSMuTect (Maruvka, et al., Nature Biotechnology 2017), POLYSOLVER (Shukla, et al., Nature Biotechnology 2015), and RNA-MuTect (Yizhak, et al., Science 2019), as well as tools to detect various forms of contamination and artifacts, including ContEst (Cibulskis, McKenna, et al., Bioinformatics 2011), DeToxoG (Costello, et al., Nucleic Acids Research 2013), and deTiN (Taylor-Weiner, Stewart, et al., Nature Methods 2018).

Detecting cancer-associated genes

We analyze the detected somatic events (see above) across a cohort of samples searching for genes and pathways, as well as non-coding genomic elements, that show significant signals of positive selection. To that end, we construct a statistical model of the background mutational processes and then detect genes that deviate from it. We have developed tools for detecting significantly gained or lost genes in cancer, including GISTIC (Beroukhim, Getz, et al., PNAS 2007; Mermel, et al., Genome Biology 2011), and genes with increased density or irregular patterns of mutations, including the MutSig suite of tools (Getz, Höfling H, et al. Science 2007; Chapman, et al., Nature 2011; Lawrence, Stojanov, Polak, et al., Nature 2013; Lawrence, et al., Nature 2014; Rheinbay, et al., Nature 2017), CLUMPS/ EMPRINT (Kamburov, et al., PNAS, 2015), MSMutSig (Maruvka, et al., Nature Biotechnology 2017), NetSig (Horn, Lawrence, et al., Nature Methods 2017), and “driver”/“passenger” hotspots (Hess, et al., Cancer Cell 2019). Our work demonstrated the need to accurately model the heterogeneity of mutability across patients, sequence contexts, and the genome, when searching for cancer genes.

Driver discovery in non-coding regions of the genome

Beyond being a major contributor to The Cancer Genome Atlas (TCGA) projects, leading various analyses in many of the TCGA manuscripts (including co-chairing the papillary thyroid cancer paper, the first to use whole-genomes to study tumors without coding drivers (Integrated genomic characterization of papillary thyroid cancer, Cell 2014)), we helped lead TCGA into its next phase with the Pan-cancer Analysis of Whole Genomes (PCAWG) effort of TCGA and ICGC, contributing to several of the 23 papers released across the Nature journals in February 2020, including the flagship paper (The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium, Nature 2020). In one of our studies, we searched for drivers in the non-coding regions (~99% of the genome) in whole genomes from cancer patients by employing our statistically rigorous strategies to analyze both point mutations and structural variants (Rheinbay, Nielsen, Abascal, Wala, Shapira, et al. Nature 2020). Although we discovered a few novel non-coding driver genes (e.g., point mutations in the 5′ region of TP53), our study revealed that cancer drivers in non-coding regions are relatively rare, with the vast majority of drivers (87%) occurring in just the ~1% of the genome that is protein-coding.

Advances in mutational signatures

We were the first to use a Bayesian version of non-negative matrix factorization (NMF) for mutational signature discovery, uncovering key mechanisms by which cancers accumulate mutations. We have now optimized the performance of our SignatureAnalyzer algorithm (Kasar, Kim, et al., Nature Communications 2015; Kim, Mouw, Polak, et al., Nature Genetics 2016) by leveraging GPU computing to allow analysis of massive datasets (Taylor-Weiner, Aguet, et al., Genome Biology 2019), accelerating it to run ~200 times faster and enabling us to study larger datasets and obtain more accurate results. We demonstrated that asymmetries in mutational signatures can be used to study how and when they are generated, with signatures having transcriptional or replication asymmetries, describing a new mechanism of transcription-coupled damage and finding that APOBEC affects DNA during replication (Haradhvala, Polak, et al., Cell 2016). By studying a mutational signature that is most common in breast cancer and is associated with germline mutations in BRCA1/2 genes and loss of homologous recombination repair, we found that promoter methylation of RAD51C can also cause this signature (Polak, Kim, Braunstein, et al., Nature Genetics 2017). We also show that concurrent loss of mismatch repair and polymerase proofreading creates a unique signature, not represented by a linear combination of the two associated signatures (Haradhvala, Kim, Maruvka, et al., Nature Communications 2018).

Moreover, as part of the PCAWG efforts described above, we collaborated with other international leaders to describe the most comprehensive set of mutation signatures thus far by analyzing mutational signatures across a large number of whole genomes and whole exomes (Alexandrov, Kim, et al., Nature 2020). We also have applied our signature analysis tools to tumors with microsatellite instability (MSI): our MSIDetect analysis revealed that a unique cohort of constitutional MMRD syndrome cases have unique MS indel signatures that can be used to correctly classify them as MSI, even using normal cells from these patients (Chung, Maruvka, et al., Cancer Discovery 2020).

Heterogeneity and clonal evolution of cancer

Cancer samples are heterogeneous, containing a mixture of normal cells and cancer cells that often represents multiple subclones. We developed, and continue to develop, tools for characterizing the heterogeneity of cancer samples using copy-number and mutation data measured on bulk samples, including ABSOLUTE (Carter, et al., Nature Biotech 2012) and Phylogic (Landau, Carter, Stojanov, et al., Cell 2013), and now also using single cells. We recently used this concept as the foundation for our extended PhylogicNDT suite of tools for analyzing tumor heterogeneity from which we can infer the clonality of mutations, estimate the number of subclones and infer their phylogenetic relationships, as well as their distribution over space and time (Leshchiner, Livitz, Gainor, Rosebrock, et al., bioRxiv 2019). These tools have the ability to analyze tumor evolution, heterogeneity, and dynamics based on multiple samples from the same patient that have been harvested longitudinally (e.g., pre- and post-treatment) or spatially (e.g., across multiple organs, or within the same tumor), enabling us to study resistance to therapy and introduce these concepts into clinical trial strategies. PhylogicNDT has been used to address high-priority questions in cancer biology, including: (i) detecting cancer clones; (ii) inferring phylogenetic trees; (iii) inferring order and timing of events in individual patients and across subsets of patients; (iv) associating mutational signatures to each branch in the phylogenetic tree; and (v) estimating the clonal composition of each tumor sample (Parikh, Leshchiner, Elagina, et al., Nature Medicine 2019; Gruber, Bozic, Leshchiner, Livitz, et al. Nature 2019; Gerstung, Jolly, Leshchiner, Dentro, Gonzalez, et al., Nature 2020 (PCAWG efforts in whole genomes); Dentro, Leshchiner, Haase, et al., Cell 2021).

Pre-cancer to cancer transition and cancer cell-of-origin

By studying mutations in normal cells and pre-cancer lesions, we can detect clonal expansions in pre-malignant tissues. We have observed clonal expansions across essentially all normal tissue types by performing careful analysis of RNA sequences from normal bulk tissues in the Genotype–Tissue Expression (GTEx) project to uncover somatic mutations using our RNA-MuTect tool (Yizhak, et al., Science 2019). Interestingly, our findings show that lung, sun-exposed skin, and esophagus –– tissues routinely exposed to the environment –– acquired the most clones and increased with the age of the individual. These findings demonstrating the extent of somatic mosaicism in humans have implications for interpreting screening results from cancerous and precancerous lesions, wherein detecting a known cancer mutation may not necessarily indicate the presence of cancer.

Moreover, in other efforts, we are leveraging the association between mutation density and the epigenetic state of a cell to reveal the landscape of tissue- and cell-of-origins of all major cancer types, enabling better understanding the transition from pre-malignant clones to cancer (Kübler, Karlič, et al., bioRxiv 2019).

Drivers of Resistance

We have an ongoing collaboration with IBM Research to study why cancers become drug resistant, a study that is fostering many new clinical collaborations with investigators across the US (currently over 20), each bringing their own scientific questions and cohort of patients with a specific cancer type and treatments.

This collaboration is focused on better understanding how cancers become resistant to specific therapies. We, along with our collaborators at the Broad (Genomics Platform, Dependency Map), are generating tumor genome and transcriptome sequencing data from patients that initially respond to treatment but then become drug-resistant. Together with our IBM colleagues, we use our arsenal of tools as well as Artificial Intelligence to analyze these data and identify genomic patterns that may help researchers and clinicians to discover mechanisms of resistance as well as predict drug sensitivity and patient outcome. Concurrently, we employ new genome-editing methods to conduct large-scale cancer drug resistance studies in the laboratory based on our analyses of tumor genome data to help identify tumor-specific vulnerabilities.

This partnership is helping to expand our understanding of the basis of drug resistance in cancer –– both genetic and epigenetic mechanisms observed in patients –– and accelerate research across the cancer community to turn knowledge of resistance mechanisms into therapies.

An example highlighting this work is a study we published early in this partnership with Dr. Ryan Corcoran’s team at MGH (Parikh, Leshchiner, Elagina, et al. Nature Medicine 2019). This study used our PhylogicNDT suite of tools to compare liquid biopsies to standard tumor biopsies in gastrointestinal cancer patients that developed drug resistance. We found that liquid biopsies harbored genetic alterations associated with drug resistance that were not identified through standard tissue biopsies in 80% of cases. Moreover, our analysis revealed that individual patients developed not just one, but many resistance mechanisms, which were both shared and distinct across neoplasms. Beyond this published study, the broader IBM–Broad collaboration currently has many active sub-projects at various stages and across several cancer types, such as lymphoma, melanoma, lung, breast, gastrointestinal, brain, bladder, and head-and-neck cancers. These collaborative studies are revealing valuable and potentially clinically actionable insights into resistance mechanisms.

Development of tools for the detection and characterization of immune responses in cancer

We have developed a tool, POLYSOLVER (Shukla, et al., Nature Biotechnology 2015), for genotyping HLA alleles and identifying somatic mutations in these genes in tumors. We used mutation data and HLA haplotypes to infer neoantigens across cancer, and predicted neoantigens were used as part of a vaccination trial in melanoma and GBM.

Together with Dr. Nir Hacohen, we recently reported clustering-based analysis of single cells in melanoma patients receiving immune checkpoint blockade that showed two distinct states of CD8+ T cells associated with patient tumor regression or progression (Sade-Feldman, Yizhak, et al., Cell 2018). In addition to delineating the epigenetic landscape and clonality of these T cell states, this study further identified a transcription factor in CD8+ T cells, TCF7, that predicted positive clinical outcome in checkpoint-treated patients. Overall, this study presented a more generalized strategy for identifying predictors, mechanisms, and targets for enhancing checkpoint immunotherapy. Analysis of a much larger cohort with bulk DNA and RNA sequencing showed that expression signatures that combines genes reflecting the differentiation state of the melanoma cells and genes reflecting the immune infiltrating cells can improve the prediction of who will respond to immune checkpoint blockade therapy (Freeman, Sade-Feldman, et al., Cell Reports Medicine 2022 )

Together with Drs. Steve Lipkin, Nir Hacohen, Zsofia Stadler, and Catherine Wu, we are using our understanding of MSI cancers to explore the use of vaccines to prevent or delay the development of tumors in patients that have a predisposition for tumors with MSI due to inherited defects in the mismatch repair pathway (Lynch syndrome).

Single-cell analysis of the tumor microenvironment

Our efforts to understand cancer biology at the single-cell level have made major advances in the last few years. For example, in close collaboration with Dr. Irene Ghobrial’s lab, we performed single-cell RNA sequencing of tumor cells in bone marrow biopsies from Multiple Myeloma patients to map out how the composition and expression of each immune population change during disease progression, from the earliest asymptomatic stages to overt MM. Specifically, our analysis revealed loss of memory cytotoxic T cells and major histocompatibility complex class II dysregulation in CD14+ monocytes (Zavidij, Haradhvala, Mouhieddine, et al., Nature Cancer 2020). Additional studies in the lab are using single-cell data to focus on studying MM tumor cells and their relationship to the microenvironment (Boiarsky, et al., medRxiv preprint 2022), as well as understanding the response and resistance to CAR-T therapy in diffuse large B-cell lymphoma (DLBCL) (Haradhvala, Leick, Maurer, Gohil, et al., medRxiv preprint 2022).

Proteogenomic Analysis

Over the past decades, many post-translational modifications (PTMs), such as ubiquitination, phosphorylation, and acetylation, have been studied for their role in regulating cell signaling events that are key for all cellular physiological functions. Recent advances in mass spectrometry (MS) technologies have enabled measurement of protein levels as well as protein modifications across cancer. Most studies thus far have focused on a single type of modification in a specific tumor type. We seek to understand the joint underlying patterns of PTMs in molecular signaling pathways that are shared across multiple cancer types by studying changes in protein acetylation and phosphorylation. We are part of the NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) efforts in collaboration with Drs. D.R. Mani, Steven Carr, Lewis Cantley, and Li Ding, among others.

We previously developed innovative computational methods called CLUMPS (clustering of mutations in protein structures) that identifies significant spatial clustering of mutations within the protein 3D structure and surface (Kamburov, et al., PNAS 2015). The key advantage of CLUMPS is that it takes into account the 3D distance between amino acids and not just the 1D distance along the linear genome, and the fact that it uses homology modeling to map proteins to 3D structures. We are now applying a similar approach to identify clusters of altered PTM sites.

Comprehensive landscape of Chernobyl-associated papillary thyroid cancers

In collaboration with Dr. Stephen Chanock’s team at NCI, we created a comprehensive genomic landscape of papillary thyroid cancers (PTCs) that arose as a consequence of the 1986 Chernobyl nuclear accident (Morton, Karyadi, Stewart, Bogdanova, Dawson, Steinberg, et al., Science 2021). We found radiation dose–dependent enrichment of gene-fusion drivers and structural alterations in the DNA that bore hallmarks of repair pathways. Overall, the data suggest that exposure-related double-stranded breaks in the DNA were an early carcinogenic event, enabling PTC growth later in life.

Current and Past Funders
Internal Broad
Internal MGH
Honors, Awards and Fellowships