AbstractRetrospective data support overall survival (OS) advantage to high clonal tumor mutation burden (cTMB), high clonal neoantigen load (cNEO) and low intratumor heterogeneity (ITH) in cancer patients who receive immunotherapy. In order to explore this relationship prospectively with Vigil, a triple function targeted immunotherapy involving ovarian cancer patients in long term follow up of the Phase 2b VITAL trial, we developed an exome sequencing procedure and associated bioinformatics pipeline to determine clonal signal patterns. DNA libraries containing exome sequences tagged with unique molecular identifiers (UMI) were prepared from paired samples and sequenced on Illumina sequencers to high coverage depths of ~ 930X (tumor) and ~ 130X (normal). Raw sequence reads were processed into optimized binary alignment map (BAM) files, using the UMI information. The BAM files were inputted into modules for calling MHC-I alleles, annotating single nucleotide variants (SNVs) and small insertions/deletions (InDels), and for determination of allelic copy number. The outputs were used to predict the sequence of peptide neoantigens and to perform clonality analysis in order to assign each SNV and InDel in a patient tumor sample to a primary clone or subclone. The Clonal Neoantigen pipeline was further assessed using whole exome Illumina sequencing data from three previously published studies. Evaluation of the pipeline using synthetic sequencing data from a sub-clonal deconvolution tool benchmarking study, showed positive predictive value (PPV) and positive percent agreement (PPA) of > 97.5% and > 96.5%, respectively, for SNV and InDel detection with minimum requirements for variant density and allele fraction. Haplotype calls from the Clonal Neoantigen pipeline MHC-I/ MHC-II typing module matched a published benchmark for 91.5% of the calls in a sample of 99 patients. Analysis of exome sequencing data from 14 patients with advanced melanoma revealed a strong correlation between cTMB values determined by the Clonal Neoantigen pipeline as compared to those calculated from the published data (R2 = 0.99). Following validation, the wet lab process and Clonal Neoantigen pipeline was applied to a set of matched normal, tumor, and Vigil product samples from 9 (n = 27 samples) ovarian cancer subjects entered into the VITAL (CL-PTL-119) trial. Results demonstrated marked correlation (R2 = 0.98) of cTMB between tumor used to construct Vigil and Vigil product. Correlation between tumor and Vigil for the cNEO and ITH metrics, showed R2 values of 0.95 and 0.87, respectively. The consistency of the Clonal Neoantigen pipeline results with previously published data as well as the agreement between results for tumor and Vigil for the entire system provide a strong basis of support for utilization of this pipeline for prospective determination of cTMB, cNEO, and ITH values in clinical tumor tissue in order to explore possible correlative relationships with clinical response parameters.
IntroductionHigher levels of clonal tumor mutational burden (cTMB) and clonal neoantigen load (cNEO) along with lower levels of intra-tumor homogeneity (ITH) in tumor samples in a wide variety of malignancies have been associated with significantly improved overall survival following treatment with immune checkpoint inhibitors1,2,3,4,5,6. Mutations in the genomic DNA that occur early in the development of a tumor and are carried forward in all tumor cells are considered to be clonal. These clonal mutations are responsible for the biochemical transition from a normal cell to a cell that is associated with the development of the malignant state. Sub-clonal mutations arise later in the development of the tumor and are carried forward in daughter cells that make up only a sub-portion of the tumor7. Tumor mutational burden (TMB) is defined as the number of non-synonymous mutations per million base pairs (Mb) of genomic DNA8,9. cTMB is calculated in the same manner as TMB but includes only clonal mutations1,2. ITH is the estimated fraction of non-synonymous mutations in a population of tumor cells that are sub-clonal versus clonal3. Non-synonymous mutations in tumor cells may give rise to mutated proteins which are either ubiquitinated and processed into peptides that are presented on the cell surface in complex with MHC-1 molecules or they may be endocytosed by antigen-presenting cells and processed through the lysosomal pathway into peptides to be presented on the antigen-presenting cell surface also in complex with MHC-I10. Novel peptides derived from mutated proteins that are presented on the tumor cell surface in complex with MHC-I are termed neoantigens. Methods that generate a list of potential peptides derived from variant protein sequences and estimate the binding affinity of those peptides to the patient’s own MHC-1 molecules derived from the HLA-A, HLA-B, and HLA-C loci, can be used to predict with some level of accuracy which peptides will have a high affinity for their specific MHC molecules and therefore are likely to be presented on the cell surface11,12. The set of peptides that are derived from tumor-associated clonal mutations and are predicted to have high affinity for the patient’s MHC molecules, but are not encoded by the germline genome are termed clonal neoantigens. The cNEO is the count of these peptide neoantigens per Mb of genomic DNA.Vigil is a clinically tested precision immunotherapy that is prepared by harvesting autologous tumor cells. These cells are transfected with a dual plasmid expressing granulocyte macrophage colony stimulating factor (GM-CSF) as well as a bi-functional short hairpin (bi-sh) RNA that blocks the expression of furin, an enzyme responsible for TGFβ1 and TGFβ2 supply. Following irradiation and satisfaction of product release criteria, Vigil is injected intradermally into the patient from whom the tumor tissue used to construct Vigil was obtained13. The resulting down-regulation of TGFβ1 and β2 expression by furin knockdown and the expression of GM-CSF by the injected tumor cells as we have previously shown generate a dendritic cell response in the host that results in the expansion of effector cells that can target tumor neoantigens demonstrated effectively by ELISpot assay7. Vigil has been shown in multiple studies to elicit an effective dendritic cell response resulting in relapse free survival (RFS) and OS clinical benefit14. We thus hypothesized that levels of cTMB, cNEO and ITH will classify benefit and resistance of patients undergoing therapy with Vigil and will define clinical responders or non-responders. Moreover, we predicted that there would be a strong correlation across patients in cTMB, cNEO and ITH levels between autologous tumor used to construct Vigil and Vigil product. Consequently, our team constructed a Clonal Neoantigen pipeline and process algorithm to determine quantitative cTMB, cNEO, and ITH levels and to compare clonal signal expression between the tumor used to construct Vigil and Vigil product.ResultsClonal neoantigen pipeline developmentWe developed a complete bioinformatics pipeline that takes whole exome sequencing data from matched tumor and normal genomic DNA samples as input and outputs values for cTMB and ITH, and identifies a list of potential clonal neoantigens peptides expected to bind to the patient’s MHC-1 molecules with high affinity. By analyzing the allelic frequency and allele-specific copy number of tumor-specific variants using PyClone 6.1 it has been shown to be possible to identify non-synonymous “clonal” variants that are likely to be present in virtually all cells of the tumor15,16. The quantity of non-synonymous clonal variants per Mb of exome is the determined cTMB, whereas the fraction of non-clonal variants relative to total variants across the exome, again considering only non-synonymous variants, is the ITH. A list of potential clonal neoantigens is arrived at by in silico synthesis of 8–11-mer peptides associated with each non-synonymous clonal variant and examining their expected binding affinity to the patients predicted MHC Class I molecules12,17. A cNEO score which is the number of predicted high binding affinity peptides that span a clonal non-synonymous variant and have higher affinity than the corresponding reference-derived peptide sequence, per Mb of exome is then calculated.The procedural steps required for the Clonal Neoantigen pipeline were based on previously published work2,3,18, and are outlined in Fig. 1. The pipeline includes initial steps to process raw sequencing reads to optimized BAM files either with (step 1 and 2) or without UMIs (step 1 and 3). The optimized BAM files are then passed to a module for calling MHC-I and MHC-II haplotypes using HLA*LA19 (step 4), a module for calling and annotating SNVs and InDels using GATK Mutect2 and Funcotator (step 5), and a module for determining allelic copy number using FACETS20 (step 6). The output of variant calling and allelic copy number determination are used as input to clonal analysis using PyClone 6.115 (step 7) which produces a tabular report with tumor-associated variants assigned to individual clusters, each having an associated CCF (step 8). The outputs of the variant calling along with the MHC-1 haplotype calls are used as input for peptide neoantigen prediction using pVACSeq21 (step 9 and 10). Proprietary custom scripts are used to add information about predicted peptide neoantigens to the PyClone report, to filter the report to include only non-synonymous variants, and to summarize and calculate the key metrics for each patient tumor sample and write them into a TMB report (step 11). Additional information regarding the Clonal Neoantigen pipeline framework and processing steps as well as metric calculations are provided in Methods below.Fig. 1Flow diagram for the Clonal Neoantigen pipeline. The steps/ modules are numbered and described further in the main text. Output data from a step that is used as input for a downstream step is represented by blue arrows between steps. Red dotted line arrows indicate BAM files from the non-UMI alignment module flowing to the subsequent steps. Green arrows represent additional data fields from the output of variant calling and annotation and allelic copy number determination included in the PyClone variant level report.Full size imageOverview of clonal neoantigen pipeline performance verificationThe Clonal Neoantigen pipeline was qualified by using input Illumina-type whole exome NGS data from three different published studies and comparing the results generated from our pipeline or modules thereof to the published results. The developed Clonal Neoantigen pipeline contains modules for both UMI-based and non-UMI based alignment and production of BAM files. The UMI-based approach uses a combination of processing steps available in the fgbio software package and alignment with bwa-mem2 to identify multiple sequence reads derived from the same original genomic DNA molecule and combine them into a consensus sequence prior to additional alignment with bwa-mem2. This approach is expected to reduce the measured sequencing error rate and improve the accuracy of estimation of allelic frequency22,23. Although it would be desirable to verify the performance of both the UMI and non-UMI modules for BAM file generation, it was not possible to identify publicly available tumor-derived whole exome sequencing data generated using UMIs, and therefore in the pre-wet lab Clonal Neoantigen pipeline verification only the non-UMI module was used.Evaluation of core clonal neoantigen pipeline steps using simulated sequencing dataThe performance of the core portion of the pipeline, which included read trimming, BAM file generation, variant calling, allelic copy number determination and clonal analysis with PyClone version 6.1 was evaluated using simulated tumor exome sequencing data generated as part of a clonal deconvolution benchmarking study18. The authors of the study generated simulated sequencing reads from 9 different simulated tumor genomes and mixed each with simulated reads from a matching normal healthy genome in order to generate samples having 100%, 75%, 50%, or 25% tumor DNA. The tumor genomes were designed to simulate a tumor undergoing evolution with three to four generations of subclonal differentiation; each of these subclones have specific variants associated with them, with variants associated with subclones arising later in the evolution having lower allelic frequency (Supplemental Fig. 1, Supplemental Table 1). Simulated sequencing data from three of the tumor genomes, S1R1 containing a frequency of 1 × 10–6 and 2 × 10–7 SNVs and InDels, respectively, and S2R1, and S2R3 that contained a tenfold higher frequency of these mutation types (Supplemental Fig. 1), were analyzed in paired format with the simulated reads from their corresponding matching normal healthy genomes. The coverage depth analyzed was 250X for tumor and 60X for normal simulated sequencing reads. The PyClone variant-level preliminary report for each of the 12 simulated tumor DNA sequence samples from the Clonal Neoantigen pipeline was compared to a truth set from the Tanner et al. publication with respect to the variants detected versus those known to be in the truth set, the expected versus determined tumor purity, and the overall copy number for the chromosomal region spanning a variant versus the determined total copy number for that region (Table 1).Table 1 Comparison of PyClone variant-level report data to Tanner et al., truth set.Full size tableAnalytical specificity of variant detectionThe positive predictive value (PPV), calculated as the number of true positive variants detected divided by the total variants detected (TP/(TP + FP)) was > 97.5% for synthetic sequencing data derived from synthetic tumor genomes S2R1 and S2R3, regardless of tumor purity which ranged from 25 to 100% (Table 1), indicating that less than 2.5% of the variants in the final list were false positives for the simulated samples that had an average of 10 SNVs and 2 InDels per megabase of exome DNA. The system was less accurate for the S1R1 simulated tumor genome tumor which had a tenfold lower density of SNVs and InDels as compared to the other two simulated tumor genomes tested. However, it is not surprising that across the 50,446,305 nucleotide exome that 5–10 false positives would arise in the result set, and since the expected number of variants, combining SNVs and InDels, for tumor genome S1R1 was only 60, these few false positives had a disproportionate impact on the PPV. In fact, the maximum number of false positives for any of the samples was 6, yielding a worst case false positive rate of 1.19 * 10–7 when considering all nucleotides.Accuracy of prediction of CCF valuesThe accuracy of prediction of the CCF of the true positive detected variants was assessed by calculating the correlation coefficient to the known CCF values in the truth set. Excluding the 25% purity simulated tumor genomes, the correlation between the truth set CCF values and the determined CCF values was > 92% for all simulated tumor sequencing data samples (mean 95.0%). In the Tanner simulated tumor genomes, 40% of the total variants are represented at 100% CCF, 15% at 50% CCF, and the remaining 45% of variants range from 1%—20% CCF (Supplemental Table I). Since virtually all of the variants are heterozygous, the allelic frequency is half of the CCF, so in the synthetic data from 25% tumor purity samples, 45% of the variants must have an effective allele fraction of 2.5% or less (20% * 0.5 * 0.25). The expected minimum allelic frequency (AF) detectable at 250X coverage at 99% efficiency is 5% AF24, and therefore 45% of the variants in the 25% tumor samples would be below the typical limit of detection, leading to greater error in CCF estimation.Allelic copy number determination – accuracyFACETS estimates the total and minor allele copy number for chromosomal regions, and proprietary code in the Clonal Neoantigen pipeline converts these values to major and minor allele copy number and provides it as input to PyClone vs. 6.1 (see Methods). The sum of the major and minor allele copy number taken from the PyClone results files was compared to the total copy number (ploidy) from the truth set for the chromosomal region spanning each variant. The mean absolute difference across all variants detected in a sample was 0.2 copies, with the poorest performing sample showing a mean difference of 0.32 copies (Table 1). This compared favorably to the mean absolute copy number difference from the truth set of 0.591, reported by Tanner et al. in Supplemental Table IV of their publication18, for 75% and 100% purity samples. These results show that the FACETS application is running properly as a component of the pipeline and the output is faithfully utilized and reported in the PyClone results files.Sensitivity of variant detectionTo examine the sensitivity of detection of variants by the Clonal Neoantigen pipeline, the variants in the Tanner data sets for were binned by expected CCF value and the positive percent agreement (PPA) was calculated for each bin as TP/ (TP + FN). This analysis was performed only for the S2R1 and S2R3 synthetic genomes due to their tenfold higher variant density relative to S1R1. As expected, the ability to detect specific expected variants in the simulated genomes decreased with the known CCF. When the expected CCF was greater than or equal to 0.5 the positive percent agreement (PPA) was > 96.5% and this was consistent down to 25% tumor purity (Fig. 2). The PPA values lower than 100% for variants with CCF ≥ 0.5 were due to a small number (0–3) variants that were not detected in each sample. The reason for the lack of detection of these variants was investigated and in each case it was due to removal in the FilterMutect2Calls quality filtering step based on one or more of the error modes comprising clustered events, orientation bias, or presence in the panel of normals (not shown). Thus, at CCF ≥ 0.5 the PPA for variant detection was 100% prior to quality filtering. As the expected CCF range drops, a greater percentage of variants become undetectable and this is exasperated as the percentage of tumor in the simulated tumor sample genome drops from 100 to 25% (Fig. 2).Fig. 2Percentage of variants detected by the range of CCF values in the truth set for (A) S2R1 and (B) S2R3 simulated tumor genomes. The PPA is plotted for four different samples by CCF split into four bins as labeled on the x-axis. Varying shades of gray to blue represent simulated sequencing data containing 25%, 50%, 75%, or 100% simulated tumor genome sequence content, with the balance from simulated normal genome.Full size imageEvaluation of the reliability of the entire clonal neoantigen pipeline using published data from Riaz et al4
The complete non-UMI version of the Clonal Neoantigen pipeline was qualified using available raw data from a Riaz et al. publication4 examining the relationship between the fraction of mutations that were clonal, the mutational load, the clonal mutational load, and levels of peptide neoantigens, among other markers, to the patient response to nivolumab treatment in advanced melanoma4. There were a total of 68 patients treated with nivolumab in the study but most had been previously treated with ipilimumab which impacted the make-up of predicted peptide neoantigens in the patient tumors (Supplemental Fig. 2). Therefore, patients that had not previously received ipilimumab therapy were selected for analysis; this included all 7 non-pre-treated patients that showed a partial or complete response (PR/CR) to nivolumab and the first 7 patients who did not respond to nivolumab (progressive disease, PD). Riaz et al., noted a consistently high fraction of clonal mutations (i.e. low ITH) in the PR/CR patients relative to the PD patients, and a significant survival advantage for patients with a high clonal mutation load (i.e. high cTMB).A strong correlation between the Riaz data and our tabulated results was observed for both ITH (R2 = 0.758) and cTMB (R2 = 0.990) (Table 2). However, ITH values from the Riaz data were an average of 37.1% lower as compared to our Clonal Neoantigen pipeline, and the Riaz cTMB scores averaged 63.7% higher than the values for the same samples from our pipeline (Table 2). The higher ITH and lower cTMB achieved with our Clonal Neoantigen pipeline resulted from a trend towards higher CCF scores in the Riaz data and not to a lower rate of detection of non-synonymous variants (Table 2).Table 2 Comparison of cTMB and ITH level for the same raw sequencing data analyzed by Riaz versus Clonal Neoantigen pipeline.Full size tableImportantly, the analysis of the Riaz raw data using the Clonal Neoantigen pipeline developed in this study reached a similar conclusion to the published results. The cTMB values were generally higher in patients likely to respond to nivolumab as compared to non-responders, as evidenced by the separation (p = 0.0068) between CR/PR and PD patients on the plot of Riaz’s versus our cTMB values (Fig. 3). The current Clonal Neoantigen pipeline predicts a set of peptides that are likely to bind to each patient’s predicted MHC I molecules and calculates a cNEO value based on the number of high-scoring peptides that are derived from variants associated with the primary clone. The cNEO values generated by the pipeline are reported in Table 2 for each of the 14 patients selected from the Riaz study for use in the qualification. As was observed for the relationship between patient response and cTMB values, cNEO levels were significantly (p = 0.0068) higher in therapy-responsive patients (CR/PR) as compared to the non-responsive patients (PD) (Fig. 4).Fig. 3Correlation between cTMB values generated by the Clonal Neoantigen pipeline as compared to those calculated from supplementary data tables of Riaz et al. Red and green filled circles represent patients with a complete/ partial response or progressive disease, respectively, in response to nivolumab treatment.Full size imageFig. 4Comparison of PR/CR and PD groups for our cNEO metric. Raw sequencing data from Riaz et al. for seven patients with partial or complete response to nivolumab (green bars) or stable disease (red bars) were analyzed using the Clonal Neoantigen pipeline to generate cNEO values. Filtering of high-scoring potential peptides to select likely peptide noeantigens and calculation of the cNEO metric was conducted as described in Methods. Dotted line illustrates hypothetical threshold defining potential correlation to clinical response with cNEO. P-value shown was calculated by application of a two tailed unpaired T-test on log-transformed cNEO values.Full size imageEvaluation of the HLA haplotyping module of the clonal neoantigen pipelinePublicly available exome sequencing data for 99 subjects that participated in the 1000 genomes project was utilized for evaluation of the HLA haplotyping module of the Clonal Neoantigen pipeline. The patients were selected from among 820 total patients used to evaluate several different sequencing-based HLA haplotyping tools25. CRAM-format sequencing data as well as a truth set containing the expected MHC-I and MHC-II haplotypes for each of the 99 patients were downloaded from the authors Github repository25. The sequencing data files were pre-processed as described in methods to generate BAM files which were used as input to the HLA haplotyping module. The available truth set was originally developed as part of a previous study, by Abi-Rached et al., using the same 1000 genomes input exome sequencing data and PolyPheMe software26. HLA*LA generates G-type or 3-field haplotype calls that are based on both the protein and underlying nucleotide sequences in the antigen recognition domain (ARD). The Abi-Rached truth set was available in 2-field resolution which is based only on the predicted amino acid sequence of the ARD25; thus HLA*LA output data was truncated to two fields for purposes of comparison. The HLA haplotyping module for the Clonal Neoantigen pipeline showed an initial percentage match to the Abi-Rached truth set of 86.9%—94.9% depending on the locus (Table 3). Because it was noted in Thuesen, et al., that some of the haplotype calls in the Abi-Rached truth set were proven to be incorrect, the calls that did not match Abi-Rached were also compared to the published calls from Thuesen et al.25, using the same HLA*LA software. From 92.9%—100% of the haplotype calls matched either the Abi-Rached truth set or those published by Thuesen. All 7 of the remaining unmatched calls in HLA-C were shown to be caused by a difference of a single allele which was called as C*02:10 by our bioinformatics module versus C*02:02 in the Abi-Rached truth set and the Thuesen HLA*LA data. The match rate to the truth set obtained by Thuesen for all 820 patients is shown for reference (Table 3).Table 3 Evaluation of HLA Haplotyping Bioinformatics Module.Full size tableImplementation of TWIST exome 2.0 and UMI-based processing to improve clonal variant identification in whole exome sequencingIn order to improve the accuracy of clonal variant detection, this study utilized the Exome 2.0 panel recently developed by Twist Biosciences, incorporated UMI oligonucleotides into the library preparation process, and utilized UMI information in the Clonal Neoantigen pipeline. Twist exome capture technology was found to provide broader and more uniform coverage of the exome as compared to industry standard Agilent exome technology27. Incorporating UMIs in library preparation and utilizing the incorporated UMI tags to combine duplicate reads into a consensus sequence improves the accuracy of low frequency variant calling28.A set of 27 cryopreserved samples from 9 stage III/IV ovarian cancer patients selected from the Phase IIB trial of Vigil, comprised of a tumor, a Vigil product, and a normal PBMC sample from each were processed through genomic DNA extraction, library preparation, exome sequencing, and bioinformatic analysis incorporating the UMI processing. Genomic DNA samples from ATCC comprising a matched set of a non-small cell lung cancer line (NCI-H2126) and a B lymphoblast cell line (NCI-BL2126) from the same patient were processed alongside the clinical trial samples. Prior to exome selection, DNA libraries were mixed at a 1:9.4 ratio of normal versus tumor and Vigil in order to target a yield of sequence reads that was 9.4X higher in tumor or Vigil as compared to PBMC. An average of 558 million, 544 million, and 65 million sequencing reads were obtained from Tumor, Vigil, and PBMC / normal control samples, respectively (Table 4).Table 4 Coverage statistics for exome sequencing experiment.Full size tableCoverage metrics were collected prior to and after de-duplication as a result of UMI processing. Approximately 36%, 34%, and 28% of on-target bases were eliminated as duplicates during UMI processing from tumor, Vigil, and PBMC/normal sample data, respectively; these duplicates were collapsed into consensus reads improving the accuracy of the base calls. The coverage uniformity appeared excellent with an average of over 98% of nucleotide positions in the target for PBMC / normal samples having coverage of at least 50X. Tumor and Vigil samples had coverage of at least 500X for an average of 88.6% and 90.5% of nucleotide positions in the target, and ~ 98% of bases covered at 250X or higher (Table 4, supplemental Excel file 1). The total TMB for NCI-H2126, calculated by summing the TMB for each cluster of variants in the tumor, was 22.5 which was nearly identical to the published value of ~ 22.09, supporting the conclusion that the alignment, UMI processing, variant calling modules, and proprietary reporting scripts of the pipeline were performing as expected.Comparison of key metrics between tumor and Vigil samplescTMB, ITH, and cNEO scores were calculated for the tumor and Vigil samples from the 9 Phase IIB trial patients as well as the control NCI-H2126 non-small cell lung carcinoma cell line. Values of ITH, cTMB, and cNEO showed strong correlations (R2) between tumor and Vigil of 0.87, 0.98, and 0.95, respectively (Fig. 5). For all samples except for one a primary clone was identified in each tumor and Vigil sample. No primary clone was identified for the tumor sample from patient 103_009 (Fig. 5, red shaded circle). A primary clone corresponds to a cluster of variants that have been assigned a CCF > 0.9 by PyClone 6.1, making it likely that the variants are present in nearly 100% of the cells making up the tumor. The highest CCF value for any cluster in the tumor for patient 103_009 was 0.879 (see Supplemental Excel File 2, TMB Summary). The Vigil sample for the same patient did have a cluster with a CCF > 0.9 but it included only 5 variants, versus an average of 44.2 variants for all clinical trial tumor and Vigil samples (see Supplemental Excel file 2, Variant Detection Tables).Fig. 5Plots of tumor vs. Vigil levels, for 9 ovarian cancer patients selected from the Vigil Phase IIB trial, for cTMB (A), cNEO (B) and ITH (C). Patient 103_0009 in which no clonal variants were detected, is represented by red shaded circle.Full size imageConsistency in detection of specific clonal variants between tumor and VigilThe similarity in cTMB, cNEO and ITH scores between tumor and Vigil for each patient suggested that the exome DNA sequence was highly similar between tumor and Vigil samples of the same patient, and that the entire system, including the wet-lab and bioinformatic steps were reproducible. However, it was important to confirm that the specific set of non-synonymous variants detected were similar between tumor and Vigil within patients and that the top scoring peptides derived from each of these variants were the same. For the purposes of this analysis, we focused on the non-synonymous clonal variants since these are the set of variants that are used to calculate the cTMB score and for the predictions of peptides used to calculate the cNEO score. Additionally, the clonal variants should have the highest raw allele fraction in the sequencing data and therefore should be detected with the highest reproducibility.For each patient the clonal variants that were detected in the tumor were compared to those found in the Vigil sample. Except for the two samples with ITH > 0.95 which had a low number of clonal variants, 82.2% or more of the clonal variants were also detected and classified as clonal in the paired Vigil sample. The search for the tumor clonal variants was expanded to include both clonal and non-clonal variants in Vigil. Regardless of ITH score, a minimum of 88.9% of the clonal tumor variants were detected in Vigil as clonal or non-clonal. In five out of 9 patients 100% of the clonal variants detected in tumor were found in the Vigil sample as either clonal or non-clonal (Table 5, see Supplemental Excel file 2 – Clonal-variants-tables to view individual clonal variants detected).Table 5 Clonal variants agreement between tumor and Vigil by percentage.Full size tableThe results were even better in the reverse comparison, where the clonal variants detected in Vigil were compared to those detected in tumor for each patient. Again except for the two patients with ITH > 0.95, a minimum of 88.9% of the Vigil clonal variants were detected and classified as clonal in the matching tumor sample from the matching patient. Ninety-eight percent or higher of the Vigil clonal variants were detected and classified as either clonal or non-clonal in the tumor sample, and in most patients there was a 100% match (Table 5).The results indicate that the detection of the variants by the wet lab and bioinformatics system was very reliable but that the assignment of variants to the primary clone or a subclone by PyClone 6.1 was subject to a greater level of variability. Detailed supporting data and the final PyClone reports which contain the primary data used for all analyses can be reviewed in Supplemental Excel File 2.Prediction of peptide neoantigen sequencesThe pVAC-SEQ component of the pipeline takes the variant list generated by Mutect2 and generates the protein sequence spanning each variants, and then generates the peptide sequences of varying lengths (default 8–11-mers) that span the variant amino acid sequence21. The peptides are then scored based on their binding affinity (IC50) to the patient’s own MHC molecules for which the haplotypes were predicted by HLA*LA from the sequencing data, and the binding affinity for the variant peptide sequence is compared to that for the wild-type peptide sequence derived from the same genomic coordinates. Best peptides are selected and reported in the final primary data as the variant-derived peptide with the highest binding affinity (lowest IC50) for the patient’s predicted MHC-I molecules where the binding-affinity of the variant-derived peptide is stronger than the wild-type binding affinity. These high scoring peptides for the method qualification experiment are reported for each patient in the ‘Best-peptides-table’ of Supplemental Excel File 2.Since the matching normal sample was used to determine the HLA haplotype for both the tumor and Vigil samples of a patient, it would be expected that the top variant for a given peptide sequence would be consistent regardless of sample origin. As expected, for all clonal variants detected in both sample types, the predicted best peptides were identical between the tumor and Vigil samples of the same patient. The result confirms that the pVAC-Seq portion of the pipeline as well as the reporting of the results are reliable.Sub-sampling analysisExcluding the tumor sample for patient 102_0003 that had only a 535X mean coverage, the minimum mean target coverage depth in the sequencing run described above was 853X for tumor and Vigil samples and 110X for normal (i.e. PBMC and non-tumor control) samples after UMI processing (Supplemental Excel file 1). However, in a large study, due to variability in pooling and sample quality it is expected that some tumor and normal samples would have as low as 650X and 80X mean target coverage depth. Re-analysis of sub-sampled FASTQs from the original sequencing run was therefore undertaken in order to establish that the reduction in coverage depth to these limits would not impact the reliability of determination of cTMB, cNEO and ITH values.Reads were randomly subsampled to a mean target depth, before UMI processing, of 940X and 100X for tumor/Vigil and PBMC/normal samples, respectively in order to achieve coverage depth after UMI processing and duplicate removal of around 650X for the tumor and Vigil samples and 80X for the PBMC and normal samples. The actual coverage depth achieved after UMI processing and duplicate removal ranged from 665-727X (mean 686X) for the tumor (except 1 outlier) and Vigil samples and 78-83X (mean 81X) for the PBMC and normal samples (Supplemental Excel file 3).The cTMB, cNEO and ITH scores were compared between the processed results of the original and subsampled data files (Supplemental Excel file 4). Tumor samples from all 9 patients plus the tumor control and 7 Vigil samples showed differences in cTMB, cNEO and ITH of 20% or less between the original analysis and the subsampled data (Table 6), demonstrating that a coverage depth of greater than or equal to 665X for tumor samples and 78X for PBMC or normal samples is sufficient to deliver accurate metrics for the tumor samples. However, the two Vigil samples with the highest mean ITH and the lowest mean cTMB showed large percentage differences in cTMB and cNEO between the original and sub-sampled data (Table 6). The results suggest that when the ITH is very high (e.g. > 0.95) the reproducibility of the measurements are more sensitive to reduction in coverage depth.Table 6 Comparison of cTMB, cNEO and ITH scores between original and sub-sampled data.Full size tableDiscussionA complete laboratory and bioinformatics workflow has been developed in order to rapidly and accurately identify tumor-associated variants within coding regions of the genome, determine the allelic copy number for chromosomal regions spanning the exome, and to sort the variants into clusters based on common CCF. The Clonal Neoantigen pipeline also further incorporates functions to perform MHC-1 HLA typing and prediction of potential peptide sequences spanning tumor-associated variant sequences and uses this information to generate a list of potential neoantigen peptides based on predicted binding affinity of the patient’s MHC-1 molecules to the variant-associated peptides. Together, this information can be used to generate detailed variant-level reports for each tumor sample that provide information about CCF, protein-impacted, the type of mutation, allelic frequency, and associated likely peptide neoantigens for each variant. A summary report is also generated which contains the calculated cTMB, cNEO and ITH scores for each tumor sample represented in the sequencing run.Application of this workflow revealed impressive similarity of clonal signal involving all parameters (cTMB, cNEO, ITH) including most notably cTMB between Vigil product and actual autologous tumor tissue used to construct Vigil. Generation of results for both wet-lab and Clonal Neoantigen pipeline, for matched tumor, Vigil, and PBMC (normal) samples from 9 ovarian cancer patients that participated in the VITAL trial of Vigil (CL-PTL-119)29, demonstrated the robustness of the platform as evidenced by correlation coefficients (R2) between Vigil product and autologous tumor sample used to construct Vigil of 0.9802, 0.9525, and 0.8678 for cTMB, cNEO, and ITH, respectively. The high percentages of the tumor exome covered at a minimum of 250X or 500X and PBMC/normal exome covered at 50X suggested that the lab and bioinformatic processing procedures were suitable to achieve a clinical diagnostic goal of being able to consistently detect variants present at a CCF of 25% or higher in biopsy samples containing 20% tumor content. The high-level of cross correspondence of clonal non-synonymous variants detected in the same patient between tumor and Vigil samples further demonstrated the platform reliability for clinical application.The laboratory and bioinformatic process steps implemented followed closely the steps originally elucidated by McGranahan and colleagues2,3,18. However, several modifications were made which were expected to improve the accuracy and throughput of the method. The utilization of the Twist Biosciences exome capture technology is expected to provide better exome coverage uniformity than the widely used Agilent exome capture technology27, allowing for more comprehensive detection of variants. The use of oligonucleotide adapters containing UMIs during the library preparation and the utilization of the UMI information to collapse duplicate reads into consensus reads is expected to improve the reliability of the base calls and the accuracy of the determined allele fraction for variants28. However, at the current sequencing depth, only about 35% of the reads were removed as duplicates during UMI processing using the fgbio software suite, suggesting that a maximum of 35% of the collapsed consensus sequencing reads contain information from more than one parent read. Greater advantages of using UMIs can be achieved as gross sequencing depth increases.Besides the introduction of UMI processing, several changes, relative to the McGranahan study, were made to the Clonal Neoantigen pipeline that are expected to improve the reliability of the cTMB, cNEO and ITH scores. Prior to utilization for variant calling and copy number determination, base quality scores were re-calibrated and the BAM file was filtered to only retain sequence reads overlapping the exome target. For variant calling, we used the combination of GATK Mutect2 and FilterMutectCalls, versus the McGranahan study which used the intersection of variants called by Mutect version 1.1.4 plus Varscan2 to select true variants. The sensitivity of GATK Mutect2 compared favorably to several alternative variant callers including VarScan218,30, and the combination of GATK Mutect2 and FilterMutectCalls allowed for precise identification and removal of germline variants and variants found in a panel of normal as well as false-positive variants resulting from sequencing artifacts31. FACETS was utilized in the Clonal Neoantigen pipeline for allelic copy number analysis because it was shown to be one of the best performing allele-specific copy number analysis tools specifically designed for sequencing data as compared to ASCAT, the tool that was used by McGranahan and colleagues, which was originally designed for microarray data18,20. PyClone 6.1 also replaced the original PyClone version 0.12.7 used in the earlier study, with the main advantage being processing speed and a built-in capability to produce reliable results for studies with only a single tumor sample per patients15. Similar substitutions in tools to characterize the clonal architecture of tumor cells have been made by other researchers in recent studies. For example, a combination of GATK Mutect2 and Strelka v 2.9.10 for variant calling, an improved HLA haplotyping tool (Optitype), and an improved MHC-I peptide binding affinity prediction tool (MHC Flurry 2.0.4) were used to identify a correlation between higher cTMB and higher cNEO and responsiveness of urothelial cancer to atezolizumab therapy1.Our approach to identification of potential peptide neoantigens was to use the patient’s own MHC-I haplotype calls generated by HLA*LA software, which was demonstrated to be one of the most accurate and efficient software tools of its class25, combined with variant calls from Mutect2/ FilterMutect2Calls as input data for pVACSeq version 4.0 which is able to generate the sequences for all non-synonymous variants associated peptides of length 8–11 amino acids and has the built-in capability to utilize 8 different MHC Class I algorithms for peptide – MHC-1 binding affinity predictions21. The count of high scoring peptides for each variant as well as the top scoring peptide sequences were then associated, based on the variant nucleotide change and genomic location, to the matching variants in the PyClone report in order to stratify the data as clonal / non-clonal enabling the automatic calculation of a cNEO score for each tumor sample. The suitability of this approach was demonstrated by the significant difference in cNEO scores between advanced melanoma patients who responded and failed to respond to Ipilimumab treatment.This Clonal Neoantigen pipeline was benchmarked using three different published datasets. Variant calling, determination of allelic copy number, and prediction of CCF for each variant were verified using synthetic NGS data generated and analyzed by the authors of a subclonal deconvolution benchmarking study18. The accuracy of MHC-I and MHC-II typing was determined using whole exome NGS data from 99 patient whole blood samples that were generated as part of the 1000 genomes project and compared to a truth set compiled by the authors of an HLA typing benchmarking study25. Finally, publicly available whole exome NGS data was used to verify the integrity of the entire non-UMI version of the pipeline by comparing cTMB and ITH values generated to those published by the authors of a study that identified a significant association in response to nivolumab in melanoma patients to cTMB and ITH values4.The analysis of HLA typing results for 99 patients and comparison of the MHC-I and MHC-II haplotype calls to the published truth set showed an overall 91.5% match between the implemented HLA haplotyping module and a truth set25,26, and a 97.4% match to either the truth set or the results from a previously published comparison of HLA haplotype calling tools25.Analysis of the simulated tumor sequencing data from Tanner et al.18 showed that the PPV [calculated as TP/(TP + FP)] of variant calls by reference to the published truth set, was > 97.5% at an SNV and InDel density in the exome of 1 × 10–5 and 2 × 10–6, respectively. The PPA which was calculated as TP/(TP + FN) and is equivalent to the analytical sensitivity, was at least 96.5% for all samples when considering variants with a known CCF of at least 0.5, and this PPA value could be adjusted to 100% if variants that were filtered out of the final data table by Mutect2 Filter were removed from consideration as true positives. The performance of the allelic copy number determination and CCF estimation by FACETS and PyClone 6.1, respectively, as part of the pipeline, was also deemed acceptable. Based on tumor sequencing depth of 250X in the Tanner data and the established relationship between coverage depth and variant detection, similar PPV and PPA metrics should be able to be achieved for lab sequencing data with a tumor genome coverage depth of 500X or higher for variants with known CCFs as low as 0.25 in samples containing as low as 20% tumor content.Processing of publicly banked raw sequencing data, from a 2015 study of the relationship between genomic biomarkers and responsiveness of melanoma tumors to nivolumab treatment, through the entire non-UMI version of the Clonal Neoantigen pipeline generated cTMB and ITH values that were highly correlated to the published values4,18,25. Also consistent with the publication’s conclusion, patients with partial response or a complete response to nivolumab, as compared to patients with stable disease had significantly higher values of cTMB and cNEO scores calculated by our pipeline.The current results, particularly with respect to the re-analysis of the Riaz et al. Ipilimumab-response correlation to cTMB and cNEO levels, reinforce the potential utility of cTMB and cNEO as separate possibly related markers for likely responsiveness of a wide variety of cancers to immunotherapy1,2,3,4,32. Immune checkpoint inhibitors and Vigil share a common biological mechanism of promoting the destruction of tumor cells. However, immune checkpoint inhibitors lack a key component of inducing an optimal anticancer immune response that Vigil provides. As we now demonstrate with preservation of clonal signals between the growing tumor of the patient and the Vigil product, Vigil provides clonal signals of cTMB and cNEO, which have the capacity to induce expansion of circulating clonal neoantigen-specific cytotoxic T-cells7,33,34,35. We hypothesize that Ovarian cancer patients with higher cTMB and/or cNEO levels may show greater responsiveness to Vigil particularly within the HRP subpopulation in which stable DNA repair activity will minimize expansion of subclonal neoantigen profiles. High cTMB and/or cNEO will be optimized with low subclonal neoantigen expression (low ITH)7,36,37,38,39. The current results also demonstrate that the wet lab and Clonal Neoantigen pipeline for generation of cTMB, cNEO, and ITH scores can be reliably used to investigate the relationship between overall survival and in particular cTMB levels for patients involved in the CL-PTL-119 VITAL trial who have been prospectively followed since randomization to Vigil or placebo as well as additional assessment of patients undergoing checkpoint inhibitor therapy and/or other immune therapies. It is also of interest to understand if targeting on clonal neoantigen targeting of various neoantigen based vaccines and/or CAR-T approaches will be improved via use of Clonal Neoantigen pipeline assessment to determine cTMB, cNEO and ITH.MethodsSample selection and DNA extractionA set of 9 patients were selected from among 91 patients that participated in the CL-PTL-119 VITAL trial of Vigil in stage III/IV ovarian cancer patients29. Patients were selected to adequately represent the previously studied molecular profiles of all patients in the study, including representation from BRCA wildtype, BRCA mutant, homologous recombination proficient (HRP) and homologous recombination deficient (HRD) patients. All patients included in the study signed informed consent for tissue procurement and treatment on the study according to the requirements of each individual institution and in accordance with the Declaration of Helsinki. The study was approved by a central IRB, WCG (Princeton, MJ). The study was conducted at 25 sites across the United States.Genomic DNA was previously isolated, in 2019 and 2020, from cryopreserved tumor cell suspensions and matched normal peripheral blood mononuclear cells (PBMCs) samples of the 9 patients using the Qiagen MagAttract HMW DNA Kit. Genomic DNA was also extracted from cryopreserved Vigil preparations for each of the 9 patients using the Perkin Elmer Chemagic360 instrument according to the manufacturer’s instructions. After extraction, the gDNA was quantified by UV spectrophotometry and checked for integrity by electrophoresis on 0.8% agarose gels. DNA was stored at -80C except when removed on ice to be used in library preparation procedures.Matched genomic DNA samples isolated from a non-small cell lung cancer cell line (NCI-H2126) and a B lymphoblast cell line (NCI-BL2126) derived from the same patient were obtained from the American Tissue Culture Collection (ATCC) The TMB for NCI-H2126 had been previously characterized9 and could be compared to the value obtained from our method.Whole exome sequencingDNA sequencing libraries for the 27 patient and 2 control samples were prepared from 50 nanograms of gDNA using the Twist Exome 2.0 kit following the kit instructions except that Twist UMI adapters were used instead of Twist Universal Adapters. Prior to hybridization-based enrichment of exome-containing DNA fragments, libraries were pooled in batches of 6–8 such that the concentration of each DNA library derived from PBMC samples was 10.64% of the concentration of tumor or Vigil DNA libraries in the same pool. The hybridization utilized the Twist exome v2.0.2 baits which covers a 36.46 Mb human exome. DNA library pools were shipped to Discovery Life Sciences for sequencing on the Illumina NovaSeq X instrument to obtain 552 + /- 35 million reads (mean + /- standard deviation) for each tumor or Vigil sample and 65 + /- 5 million reads for each PBMC or normal control sample. The read format was paired-end, 150 nucleotides each.Bioinformatics analysisClonal neoantigen pipeline frameworkThe key elements of the pipeline framework include a primary python script, a config file, and sample information file. The config file contains settings for the desired bioinformatics steps to be performed in the run, as well as the settings, and paths to annotation files needed for each step. The sample information table is a tab-delimited list of the sequencing data identifiers that links each identifier to a specific patient ID and its tumor status (Tumor or NAT). The primary python script generates tables from the configuration file and sample information file, identifies the bioinformatic steps to be performed and passes the configuration and sample information to secondary python scripts that manage each major bioinformatic processing step. The secondary scripts read the sample information and config file, write linux shell scripts for each sample or tumor/normal pair, and launch the shell scripts as SLURM jobs on compute nodes of a cluster. Each submitted shell script launches a Docker environment that contains the appropriate executables and dependencies for each step.Sequencing read preparationSequencing reads were trimmed with Trimmomatic version 0.39 to remove Illumina adapter sequences and low quality bases (Q < 25) from the 3’-ends. Additionally, non-template encoded nucleotides were removed from the 5’-end of sequencing reads obtained from public non-UMI containing sequencing data used for pipeline verification studies but not for the UMI-containing reads generated during this study.Generation of BAM files from non-UMI-containing sequencing dataTrimmed sequence reads were aligned to the hg38 genome with bwa-mem and then sorted using Samtools sort. Duplicate reads were marked with Picard MarkDuplicates. Coverage metrics were generated with Picard HsMetrics. Read group notation was then added to the BAM files using GATK AddOrReplaceReadGroups, and the base quality scores were recalibrated using GATK BaseRecalibrator and ApplyBQSR. Following base score recalibration, the BAM files were cropped to contain only sequences covered by the exome using SamTools View and the corresponding bed file for the exome. Finally, the previously marked duplicate reads were removed using Picard Markduplicates.Generation of BAM files from UMI-containing sequencing dataTrimmed sequencing reads contain UMIs that were processed using fgbio software according to the recommended Best Practices to generate consensus reads, remove duplicates, and align the consensus reads to the genome. The process steps comprised in order:
(1)
creating BAM files containing unaligned reads with the UMI sequences removed and stored in tags using fgbio FastqToBam given the read structure "5M2S + T 5M2S + T",
(2)
aligning reads to the hg38 reference genome using bwa-mem2,
(3)
sorting the reads using Samtool sort and then recovering headers and tags for unmapped reads using fgbio ZipperBams,
(4)
generating initial coverage metrics using picard CollectHsMetrics,
(5)
Grouping of reads based on identical UMIs sequences using fgbio GroupReadsbyUMI,
(6)
calling consensus reads based on UMI sequence and mapped position with fgbio CallMolecularConsensusReads, based on a minimum of 1 read and a minimum base quality of 20,
(7)
aligning these consensus reads to the hg38 reference genome using bwa-mem2,
(8)
sorting the alignments generated from the consensus reads using Samtools sort,
(9)
recovering headers and UMI tags with fgbio ZipperBams,
(10)
filtering mapped consensus reads to select reads based on at least 1 consensus reads having a minimum base quality of 20 and maximum base error rate of 0.2 using fgbio FilterConsensusReads, and finally,
(11)
generating an additional set of coverage metrics for the mapped consensus reads.
Following completion of the fgbio best practices processing steps, the alignment files containing the consensus reads were further processed through base quality score recalibration and cropping to retain only exome sequences as described above for the non-UMI-containing sequencing data. Note that during the alignment steps using bwa-mem2, alignments were piped directly to Samtools view and compressed to binary format using that tool in order to save disk space.Preparation of BAM files for HLA haplotype callingTrimmed raw unaligned sequence reads were converted to BAM files using fgbio FastqToBam. Samtools merge was used to combine these unmapped reads with the BAM files containing aligned, base quality score recalibrated, and exome-cropped alignments. In the case of non-UMI containing reads the BAM files containing aligned reads had duplicates marked but not removed. In the case of UMI-containing reads, FastqToBam was given the read structure "5M2S + T 5M2S + T", and the BAM files containing aligned reads used were the final output of the processing sequence described above for UMI-containing sequence reads. Merged BAM files were indexed with Samtools index. Note that the sequencing data from normal samples, and not tumor samples is utilized for HLA haplotype calling.HLA haplotypingThe bioconda package for HLA*LA19 along with the data package PRG_MHC_GRCh38_withIMGT were downloaded and installed per the ReadMe.md file at https://github.com/DiltheyLab/HLA-LA/. Merged BAM files, generated as described above were used as input for HLA*LA in order to generate haplotypes for each allele of the HLA Class I (MHC-I) loci A through G and the seven different HLA Class II (MHC-II) loci.Variant callingGATK Mutect2 was used to call tumor-specific SNVs and InDels in paired tumor/normal mode. Paths to the 1000 genomes hg38 panel of normals database and the AF-only gnomad hg38 germline resource were provided to Mutect2 in order to exclude variants that occur with some frequency in the healthy human population. GATK LearnReadOrientationModel was used to generate tables of orientation bias artifacts using the f1r2.tar.gz files generated by Mutect2 during the variant calling. GATK FilterMutect2Calls was used to add the codes “PASS” or various error-mode codes to the INFO column of the VCF files generated by Mutect2 given an additional sample-specific input table of orientation bias generated by LearnReadOrientationModel. The filtered VCF files were annotated with GATK Funcotator.Allelic copy number determinationAllele specific copy number was estimated for chromosomal segments comprising the majority of the non-telomeric and -centromeric long arm and short arm of each chromosome using FACETs v0.5.14 (Shen & Seshan, 2016). FACETs estimates and reports the start and stop position, the total copy number, and the minor allele copy number for each of those segments and estimates the overall tumor purity for the tumor sample (i.e. determines the level of contamination from normal tissue). The Docker image available for installation from https://github.com/vanallenlab/facets was used for this step according to the provided instructions. Briefly, single nucleotide polymorphism (snp)-pileup was used to prepare input files for FACETS using a table of the hg38 reference genome, a table of known snps (00-common_all_prefix.dict-hg38minimal.sort.vcf) and the tumor and normal BAM files as input using the options minimum map quality 15, minimum base quality 20, and pseudo-snps 100. The r_script.R script included in the Docker package was then run using the facets input built by snp-pileup using settings initial seed 1234, cval 250, min_nhet 15, specifying the hg38 genome build.Clonal analysis using PyClone version 6.1A proprietary code was developed to handle extraction and formatting of data from filtered Mutect2 VCF files and FACETs tabular output in order to build PyClone input files for each tumor sample. The information required for each variant was the mutation_id which contains the chromosomal position and sequence change, the allelic depth for the reference and variant sequence, the normal copy number, the major allele copy number, the minor allele copy number, and the overall tumor purity estimate for the sample. PyClone 6.115, with settings of a maximum of 20 clusters and a total of 50 restarts was then used to generates output which includes the assigned cluster and the estimated CCF for each mutation_id. Additional proprietary code was developed to add additional information from the Mutect2 VCF and FACETS output tables to the PyClone 6.1 output tables to facilitate downstream steps and interpretation; key information added includes the variant annotation information derived from Funcotator, the coverage depth for each allele in both the tumor and normal sample, and the input information from FACETs. The finished table was termed the “preliminary PyClone variant-level report” (Fig. 1).Determination of Total TMB, Clonal TMB, and Intra Tumor HeterogeneityPyClone 6.1 assigns each variant detected in a tumor to a cluster, each with an estimated CCF. The cluster-specific tumor mutational burden (TMB) was calculated as the number of non-synonymous polymorphisms (including missense, nonsense, nonstop, in-frame insertions, in-frame deletions, frameshifts, or changes in the start codon) assigned to the cluster by PyClone 6.1 divided by the length of the exome in Mb. The total TMB was calculated as the sum of the cluster-specific TMBs for a sample and the cTMB was defined as the TMB for the cluster with the highest CCF that is a minimum of 0.9. Tumor samples with no associated clusters having a CCF > = 0.9 do not have a primary clone and therefore their cTMB is reported as 0. Intratumor heterogeneity (ITH), was calculated as total TMB – cTMB. A proprietary script was developed to calculate each of the above metrics and assemble them into a tab-delimited text format “clonal TMB report”.Prediction of peptide neoantigensUnannotated, quality-filtered VCF files generated as described above using GATK Mutect2 and GATK FilterMutect2Calls were annotated using Ensembl VEP software40 using a hg38.fa file as reference. VEP software was implemented using the Docker ensemblorg/ensembl-vep available from Dockerhub according to the documentation provided by Ensembl, using the plugins Frameshift and Wildtype. Note that the INSTALL.pl script was used to pre-download and install the data cache and FASTA file for human GRCh38 along with the required plugins.pVACSeq software21 was used to generate a list of peptide antigens along with estimated MHC-1 molecule binding affinity and fold-difference in binding affinity between mutant and wild-type peptides for each non-synonymous tumor-associated variant identified by Mutect2 and annotated with VEP. pVACSeq was installed by pulling the Docker image griffithlab/pvactools from DockerHub. pVACSeq was executed using the VEP-annotated VCF and a string containing the patient’s predicted MHC-1 A, B, and C haplotypes, from HLA*LA software, as input using the following parameters: peptide length “8,9,10,11”, number of threads “10”, minimum fold change “1”, only process VCF rows with PASS in info, prediction algorithm “MHCflurry”41. pVAC-Seq generated all possible 8–11-mer peptide sequences that overlap with at least one variant amino acid associated with a non-synonymous variant as well as the corresponding peptides for the reference genome sequence and generated a predicted MHC-1 binding affinity (IC50) using the MHCflurry algorithm, the fold-change in binding affinity for reference versus mutant peptide, and a percentile rank for each peptide among all predicted peptides for a sample.The pVacSeq filter command was used to filter the peptide list to select peptides with IC50 < 500 nM, a fold-change difference in IC50 for reference versus variant peptide of > 1, and finally to select peptides falling among the lowest 5% of total peptides in Kb value (selected experiments only). A proprietary script was used to update the preliminary PyClone variant-level report to include the quantity of unique peptides predicted for each non-synonymous variant that were retained in the filtered list from pVacSeq Filter, and also the sequence of the mutant peptide with the lowest IC50 along with its’ IC50 score and the score of the corresponding peptide derived from the reference sequence. This final variant-level PyClone report was filtered to include only nonsynonymous variants.Creation of final report clonal neoantigen summary reportA proprietary script was developed to update the original “clonal TMB report” produced as described above, to include summary metrics for the count of predicted neantigen peptides that met the filter metrics. A NEO score for each cluster was calculated by summing the filtered peptide count for all non-synonymous variants associated with the cluster. The cNEO value for the tumor/ normal sample pair was then determined by dividing NEO for the primary clone by the size of the exome in Mb; a tumor with no primary clone will have a cNEO of zero.Versions of software used in the pipeline: Trimmomatic 0.39, fgbio 2.1.0, bwa 0.7.17, bwa2 2.2.1, Samtools 1.18, picard 2.27.5, bedtools 2.31.1, gatk 4.3.0.0, HLA*LA 1.0.3, VEP 111.0, pVACseq 4.0.4, FACETS 0.5.14–2, PyClone 0.13.1.Clonal neoantigen pipeline verification using published data setsVerification of HLA haplotyping moduleWhole exome sequencing data from the 1000 Genomes Project was utilized for verifying the performance of the HLA Haplotyping module of the Clonal Neoantigen pipeline. Data from the first 99 individuals from a list of 820 patients that were analyzed previously with several HLA haplotyping tools25 were downloaded from the European Bioinformatics Repository, in CRAM format, using the links provided in Thuesen’s GitHub repository (https://github.com/nikolasthuesen/hla-typing-benchmark/blob/main/reference_data/gold_standard_url_list.txt). The individuals selected were from the first 100 hundred entries in the linked list above; one entry was a duplicate so 99 individuals were analyzed. The sequence files were converted from CRAM to FASTQ, aligned to the human genome using bwa-mem, and the output BAM files were further processed according to the flow chart provided in the supplementary materials of the Thuesen paper. The pre-processed BAM files, which contained both aligned and unaligned reads, were used as input to the bioinformatics module that utilizes the HLA*LA software. The output haplotypes for the HLA-A, HLA-B, HLA-C, HLA-DQB, and HLA-DRB genes were compared to haplotypes generated for the same 1000 Genomes patients using the same input exome sequencing data and PolyPheMe software26. The final Abi-Rached set of HLA-1 and HLA-2 haplotype calls for 2693 individuals in the 1000 Genomes Project were downloaded from Thuesen’s GitHub archive.(https://github.com/nikolasthuesen/hla-typing-benchmark/blob/main/reference_data/2018_1129_HLA_types_full_1000_Genomes_Project_panel.txt). For the purposes of the comparison, if the truth set contained more than one potential haplotype for one or both alleles of an HLA locus, the data from the Clonal Neoantigen pipeline was considered correct if each of the two haplotypes identified were represented among the multiple options in the truth set.Verification of core clonal neoantigen pipeline performance with synthetic sequencing data.The performance of the core portion of the Clonal Neoantigen pipeline, which included the overall pipeline framework, processing of raw sequencing data files (FASTQ) to optimized BAM files, detection of tumor-related variants (SNPs and InDels), allelic copy number estimation, and estimation of CCF and clonal assignment for each variant utilized to generate TMB, cTMB, and ITH scores, was evaluated using simulated tumor exome sequencing data18. Simulated sequencing reads from genomes S1R1, S2R1, and S2R3 that had been mixed in silico with normal healthy genomic DNA to generate mixtures having 100%, 75%, 50%, and 25% tumor DNA, a total of 12 simulated tumor sequencing data samples, were chosen for analysis. The three tumor genomes, S1R1, S2R1, and S2R3 were selected for analysis because the simulated sequencing read mixtures at 50% and 25% tumor were only available for these three genomes. The mean target coverage depth was 250X for the tumor and 60X for the normal sample sequencing reads. The density of expected SNVs and InDels in simulated tumor genome S1R1 was 1 × 10–6 and 2 × 10–7 respectively, while the density of SNVs and InDels was tenfold higher in samples S2R1 and S2R3. The synthetic sequence reads were trimmed as described in the methods for the Tanner et al. publication, then processed as described above for non-UMI containing sequence data to generate base quality recalibrated, de-duplicated, and exome cropped BAM files. BAM files were then processed through variant calling, allelic copy number determination, clonal analysis and preparation of the clonal TMB report.The PyClone report for each of the 12 simulated tumor DNA sequence samples from the Clonal Neoantigen pipeline run (described in Sect. 5.4.6 above) was compared to a truth set available on Georgette Tanner’s GitHub (https://github.com/GeorgetteTanner/benchmarking/) with respect to the expected versus determined tumor purity, the variants detected versus those known to be in the truth set, and the overall copy number for the chromosomal region spanning a variant versus the determined total copy number for that region. For the purposes of matching variants between the Clonal Neoantigen pipeline output and the truth set, variants in the truth set that were present in low coverage region of the genome, as represented by normal sample coverage of < 30X were not considered as true positives. A + /- 3 nt genomic starting position window was used to match the identified variants to the truth set because of difference in encoding of the genomic start position for some insertions and deletions between mutect2 and the truth set.Evaluation of complete clonal neoantigen pipeline performance using published data from advanced melanoma patientsEvaluation of the complete Clonal Neoantigen pipeline was performed using publicly available sequencing data from advanced melanoma patients4. Illumina 75 nucleotide paired-end sequencing data, collected using the Agilent version 2 exome-kit, was downloaded from the NIH sequence repository (SRA: SRP095809; BioProject: PRJNA359359). Fourteen patients that had not previously received ipilimumab therapy were selected for analysis including all 7 non-pre-treated patients that showed a PR or CR to nivolumab and the first 7 patients which did not respond to nivolumab (progressive disease, PD). The 7 non-responding patients were selected arbitrarily in order to maintain a balanced N for the analysis, not with respect to any data provided in the Riaz publication. Pre-analysis of the patient data was not performed prior to patient selection for this study. FASTQ files from both tumor and normal were processed through the non-UMI module for generation of quality-score recalibrated, de-duplicated, and exome-cropped BAM files, followed by all subsequent processing steps as described above under Methods. The mean coverage depth for the sequencing data was 150X for both tumor and whole blood genomic DNA.cTMB and ITH values from the Clonal Neoantigen pipeline described in this publication were compared to values calculated from Riaz et al. supplementary data. cTMB and ITH values for comparison were computed from the list of non-synonymous mutations detected and their cellular prevalence values (equivalent to CCF) reported on column 5 in the PyClone reports that were archived in a GitHub project (https://www.github.com/riazn/bms038_analysis) described in the publication. For the purposes of the comparison of values between Riaz et al. and internal results, variants with a CCF of > 0.85 as reported in the Riaz PyClone data were considered primary clonal variants. Note that the current study utilized PyClone 6.1 versus the original release of PyClone used by Riaz et al. In each case, only non-synonymous variants were considered in the calculations. A two-tailed unpaired t test was applied to log-transformed cNEO values to compare PR/CR and PD groups.
Data availability
Sequence files are available in the NCBI SRA repository (PRJNA1175657). Other data sharing requests from qualified scientific and medical researchers can be sent to Gradalis, Inc. Requests will be reviewed by Gradalis, Inc based on the medical and scientific merit of the project, purpose and data availability. The clinical trial protocol and synopsis are available upon request. Requests should be directed to Laura Nejedlik at Gradalis, Inc. (lnejedlik@gradalisinc.com).
ReferencesBoll, L. M. et al. The impact of mutational clonality in predicting the response to immune checkpoint inhibitors in advanced urothelial cancer. Sci. Rep. 13(1), 15287 (2023).Article
ADS
CAS
PubMed
PubMed Central
MATH
Google Scholar
Litchfield, K. et al. Meta-analysis of tumor- and T cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell 184(3), 596-614 e14 (2021).Article
CAS
PubMed
PubMed Central
MATH
Google Scholar
McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351(6280), 1463–1469 (2016).Article
ADS
CAS
PubMed
PubMed Central
MATH
Google Scholar
Riaz, N. et al. Tumor and microenvironment evolution during immunotherapy with nivolumab. Cell 171(4), 934-949 e16 (2017).Article
CAS
PubMed
PubMed Central
MATH
Google Scholar
Snyder, A. et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371(23), 2189–2199 (2014).Article
PubMed
PubMed Central
MATH
Google Scholar
Van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350(6257), 207–211 (2015).Article
ADS
PubMed
PubMed Central
MATH
Google Scholar
Nemunaitis, J. et al. Clonal neoantigen: Emerging “mechanism-based” biomarker of immunotherapy response. Cancers (Basel) 15(23), 5616 (2023).Article
CAS
PubMed
MATH
Google Scholar
Chalmers, Z. R. et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome. Med. 9(1), 34 (2017).Article
PubMed
PubMed Central
MATH
Google Scholar
Vega, D. M. et al. Aligning tumor mutational burden (TMB) quantification across diagnostic platforms: Phase II of the friends of cancer research TMB harmonization project. Ann. Oncol. 32(12), 1626–1636 (2021).Article
CAS
PubMed
MATH
Google Scholar
Leone, P. et al. MHC class I antigen processing and presenting machinery: organization, function, and defects in tumor cells. J. Natl. Cancer Inst. 105(16), 1172–1187 (2013).Article
CAS
PubMed
MATH
Google Scholar
Wells, D. K. et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. Cell 183(3), 818-834 e13 (2020).Article
CAS
PubMed
PubMed Central
MATH
Google Scholar
Hundal, J. et al. pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens. Genome. Med. 8(1), 11 (2016).Article
PubMed
PubMed Central
MATH
Google Scholar
Senzer, N. et al. Phase I trial of “bi-shRNAi(furin)/GMCSF DNA/autologous tumor cell” vaccine (FANG) in advanced cancer. Mol. Ther. 20(3), 679–686 (2012).Article
CAS
PubMed
Google Scholar
Nemunaitis, J. et al. 2024 Gemogenovatucel-T (Vigil): bi-shRNA plasmid-based targeted immunotherapy. Futur. Oncol. https://doi.org/10.1080/14796694.2024.2376518 (2024).Article
Google Scholar
Gillis, S. & Roth, A. PyClone-VI: Scalable inference of clonal population structures using whole genome data. BMC Bioinform. 21(1), 571 (2020).Article
Google Scholar
Roth, A. et al. PyClone: Statistical inference of clonal population structure in cancer. Nat. Method. 11(4), 396–398 (2014).Article
CAS
MATH
Google Scholar
Creaney, J. et al. Strong spontaneous tumor neoantigen responses induced by a natural human carcinogen. Oncoimmunology 4(7), e1011492 (2015).Article
PubMed
PubMed Central
Google Scholar
Tanner, G. et al. Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data. Nat. Commun. 12(1), 6396 (2021).Article
ADS
CAS
PubMed
PubMed Central
MATH
Google Scholar
Dilthey, A. T. et al. HLA*LA-HLA typing from linearly projected graph alignments. Bioinformatics 35(21), 4394–4396 (2019).Article
CAS
PubMed
PubMed Central
Google Scholar
Shen, R. & Seshan, V. E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucl. Acid. Res. 44(16), e131 (2016).Article
MATH
Google Scholar
Hundal, J. et al. pVACtools: A computational toolkit to identify and visualize cancer neoantigens. Cancer Immunol. Res. 8(3), 409–420 (2020).Article
CAS
PubMed
PubMed Central
MATH
Google Scholar
Frank, M. S. et al. Quantifying sequencing error and effective sequencing depth of liquid biopsy NGS with UMI error correction. Biotechniques 70(4), 226–232 (2021).Article
CAS
PubMed
MATH
Google Scholar
Osterlund, T. et al. UMIErrorCorrect and UMIAnalyzer: Software for consensus read generation, error correction, and visualization using unique molecular identifiers. Clin Chem 68(11), 1425–1435 (2022).Article
PubMed
MATH
Google Scholar
Petrackova, A. et al. Standardization of sequencing coverage depth in ngs: Recommendation for detection of clonal and subclonal mutations in cancer diagnostics. Front. Oncol. 9, 851 (2019).Article
PubMed
PubMed Central
Google Scholar
Thuesen, N. H. et al. Benchmarking freely available HLA typing algorithms across varying genes, coverages and typing resolutions. Front. Immunol. 13, 987655 (2022).Article
CAS
PubMed
PubMed Central
Google Scholar
Abi-Rached, L. et al. Immune diversity sheds light on missing variation in worldwide genetic diversity panels. PLoS One 13(10), e0206512 (2018).Article
PubMed
PubMed Central
Google Scholar
Yaldiz, B. et al. Twist exome capture allows for lower average sequence coverage in clinical exome sequencing. Hum. Genom. 17(1), 39 (2023).Article
CAS
MATH
Google Scholar
Kou, R. et al. Benefits and challenges with applying unique molecular identifiers in next generation sequencing to detect low frequency mutations. PLoS One 11(1), e0146638 (2016).Article
PubMed
PubMed Central
Google Scholar
Rocconi, R. P. et al. Gemogenovatucel-T (Vigil) immunotherapy as maintenance in frontline stage III/IV ovarian cancer (VITAL): A randomised, double-blind, placebo-controlled, phase 2b trial. Lancet. Oncol. 21(12), 1661–1672 (2020).Article
CAS
PubMed
Google Scholar
Maruzani, R. et al. Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection. BMC Genom. 25(1), 827 (2024).Article
CAS
Google Scholar
Benjamin, D. et al. Calling Somatic SNVs and Indels with Mutect2. bioRxiv https://doi.org/10.1101/861054 (2019).Article
MATH
Google Scholar
Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348(6230), 124–128 (2015).Article
ADS
CAS
PubMed
PubMed Central
MATH
Google Scholar
Goleva, E. et al. Our current understanding of checkpoint inhibitor therapy in cancer immunotherapy. Ann. Allerg. Asthma Immunol. 126(6), 630–638 (2021).Article
CAS
MATH
Google Scholar
Nemunaitis, J., Senzer, N. & Plato, L. Tumor vaccines and cellular immunotherapies. Ann. Transl. Med. 4(Suppl 1), S24 (2016).Article
PubMed
PubMed Central
Google Scholar
Rocconi, R. P. et al. Long-term follow-up of gemogenovatucel-T (Vigil) survival and molecular signals of immune response in recurrent ovarian cancer. Vaccines (Basel) 9(8), 894 (2021).Article
CAS
PubMed
MATH
Google Scholar
Wolf, Y. et al. UVB-Induced tumor heterogeneity diminishes immune response in melanoma. Cell 179(1), 219–235 (2019).Article
CAS
PubMed
PubMed Central
MATH
Google Scholar
Birkbak, N. J. et al. Tumor mutation burden forecasts outcome in ovarian cancer with BRCA1 or BRCA2 mutations. PLoS One 8(11), e80023 (2013).Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Strickland, K. C. et al. Association and prognostic significance of BRCA1/2-mutation status with neoantigen load, number of tumor-infiltrating lymphocytes and expression of PD-1/PD-L1 in high grade serous ovarian cancer. Oncotarget 7(12), 13587–13598 (2016).Article
PubMed
PubMed Central
MATH
Google Scholar
Germano, G. et al. Inactivation of DNA repair triggers neoantigen generation and impairs tumour growth. Nature 552(7683), 116–120 (2017).Article
ADS
CAS
PubMed
MATH
Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome. Biol. 17(1), 122 (2016).Article
PubMed
PubMed Central
MATH
Google Scholar
O’Donnell, T. J., Rubinsteyn, A. & Laserson, U. MHCflurry 2.0: improved pan-allele prediction of mhc class I-presented peptides by incorporating antigen processing. Cell Syst 11(4), 418–419 (2020).Article
PubMed
MATH
Google Scholar
Download referencesAcknowledgementsThe authors would like to thank Brenda Marr for her competent and knowledgeable assistance in the preparation of the manuscript.Author informationAuthors and AffiliationsFrontage Laboratories, Inc, Deerfield Beach, FL, USADavid Willoughby, Casey Nagel, Aman Pruthi, Nicholas Bild & Ericca StamperGradalis, Inc, 2545 Golden Bear Dr., Suite 110, Carrollton, Dallas, TX, 75006, USAErnest Bognar, Laura Stanbery, Gladice Wallraven, Donald Rao, Adam Walter & John NemunaitisAuthorsDavid WilloughbyView author publicationsYou can also search for this author in
PubMed Google ScholarErnest BognarView author publicationsYou can also search for this author in
PubMed Google ScholarLaura StanberyView author publicationsYou can also search for this author in
PubMed Google ScholarCasey NagelView author publicationsYou can also search for this author in
PubMed Google ScholarGladice WallravenView author publicationsYou can also search for this author in
PubMed Google ScholarAman PruthiView author publicationsYou can also search for this author in
PubMed Google ScholarNicholas BildView author publicationsYou can also search for this author in
PubMed Google ScholarEricca StamperView author publicationsYou can also search for this author in
PubMed Google ScholarDonald RaoView author publicationsYou can also search for this author in
PubMed Google ScholarAdam WalterView author publicationsYou can also search for this author in
PubMed Google ScholarJohn NemunaitisView author publicationsYou can also search for this author in
PubMed Google ScholarContributionsDW, EB, LS, and JN wrote the paper with input from all authors. DW, CN, AP, NB and ES developed the computational framework and laboratory analysis. EB, LS, JN, and AW conceived the study and provided oversight and management of the project. DR provided critical insight and scientific review. GW provided regulatory oversight. All authors provided critical feedback, analysis and manuscript review and editing.Corresponding authorCorrespondence to
John Nemunaitis.Ethics declarations
Competing interests
EB, LS, GW, DR, AW and JN are employees of Gradalis, Inc. JN is a board member of Gradalis, Inc. CN and AP are employed by Frontage Labs. DW and ES are recent previous employees of Frontage Labs. NB does not have conflict of interest to disclose.
Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary InformationSupplementary Information 1.Supplementary Information 2.Supplementary Information 3.Supplementary Information 4.Supplementary Information 5.Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissionsAbout this articleCite this articleWilloughby, D., Bognar, E., Stanbery, L. et al. Exome sequencing shows same pattern of clonal tumor mutational burden, intratumor heterogenicity and clonal neoantigen between autologous tumor and Vigil product.
Sci Rep 15, 8637 (2025). https://doi.org/10.1038/s41598-025-90136-7Download citationReceived: 28 September 2024Accepted: 11 February 2025Published: 13 March 2025DOI: https://doi.org/10.1038/s41598-025-90136-7Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative
KeywordsClonal tumor mutation burdenIntratumor heterogeneityAutologous tumorTumor mutation burdenNeoantigenVigil