04 April 2022

Cancer is a disease of the genome that is driven by a combination of possible germline risk-alleles together with a set of “driver” somatic mutations that are acquired during the expansion of increasingly fitter clones. In order to generate a comprehensive list of all inherited germline events as well as the somatic events that occurred during life, we are developing and applying highly sensitive and specific tools for detecting different types of mutations in massively-parallel sequencing data. The volume and complexity of these data require developing computational tools using state-of-the-art statistical and machine learning approaches to extract the signal from the noise. Among these tools are MuTect (Cibulskis, et al., Nature Biotechnology 2013), dRanger & BreakPointer (Bass, Lawrence, et al., Nature Genetics 2011; Chapman, et al., Nature 2011; Drier, et al., Genome Research 2012), SegSeq (Chiang, Getz, et al., Nature Methods 2009), CapSeg (Landau, Carter, Stojanov, et al., Cell 2013), HapSeg (Carter, et al., Nature Precedings 2011), MSMuTect (Maruvka, et al., Nature Biotechnology 2017), POLYSOLVER (Shukla, et al., Nature Biotechnology 2015), and RNA-MuTect (Yizhak, et al., Science 2019), as well as tools to detect various forms of contamination and artifacts, including ContEst (Cibulskis, McKenna, et al., Bioinformatics 2011), DeToxoG (Costello, et al., Nucleic Acids Research 2013), and deTiN (Taylor-Weiner, Stewart, et al., Nature Methods 2018).