HPVDetector pipeline identifies Salmonella sequence present in gallbladder cancer samples
We performed PCR based analysis of 26 gallbladder tumour and paired normal samples to detect the presence of Salmonella DNA using pan primers, as described earlier [12]. None of the gallbladder samples were found to be positive for Salmonella (data not shown). As a next step, whole exome data for these 26 samples (generated in house, manuscript in make) were analysed to detect Salmonella traces using HPVDetector pipeline, modified to include additional genome sequence of 6 common Salmonella isolates. The computational approach, in brief, subtracts all reads that align to human genome and aligns remaining reads to HPV and Salmonella reference database from NCBI. While HPV16 was detected in 1 gallbladder sample, Salmonella isolates were found across multiple samples: S. typhi Ty2 (3 samples), S. typhi CT18 (6 samples), S. typhimurium LT2 (10 samples), S. choleraesuis SCB67 (5 samples), S. paratyphi TCC (3 samples), and S. paratyphi SPB7 (7 samples). In total, Salmonella reads were found in 19 of 26 gallbladder tissues (tumor as well as adjacent normal tissues). Interestingly, 10 of 19 samples were co-infected with multiple isolates independent of gallstone status or gender of the patients, as shown in Fig. 1.
Annotation of the Salmonella reads found in gallbladder cancer samples
Variable number of overlapping reads of variant lengths for each isolate were assembled into contigs based on Clustal X2 multiple alignment as shown in Additional file 2: Figure S1. The unique stretch of contigs generated were annotated based on gene annotation database of Salmonella isolates from NCBI (National centre for biotechnology information) database. 114 reads of multiple Salmonella isolates were found in 19 of 26 samples analyzed. 47 of 114 reads of Salmonella ORF (open reading frame) were identified as encoding for bacterium genes known to be involved in metabolism and those related to the toxin-antitoxin system. Rest of the reads aligned to the Salmonella ribosomal genes, understandably due to their relatively higher abundance (Additional file 3: Table S1).
HPVDetector pipeline is specific and highly sensitive to detect true Salmonella traces
To assess specificity of our assay, we re-analyzed whole exome data of all samples by taking their reverse (not complement) to simulate random sequence, but retaining composition of nucleotides and genome complexity, using an in-house perl script, as described earlier [11]. We found no spurious Salmonella reads when the primary tumour whole exome sequence was reversed in any of the 26 samples, suggesting the computational pipeline used was specific to detect Salmonella traces. (Additional file 4: Figure S2A). To test the sensitivity of our assay, raw FASTQ file of a primary tumour sample 16 T that was found positive for Salmonella reads was down-sampled to 1X, 5X, 10X, 15X, 25X, 50X, 75X and 100X coverage using Picard Toolkit’s DownsampleSam function (http://broadinstitute.github.io/picard/), as described earlier [11]. The resulting FASTQ files were analysed for detection of Salmonella reads using the HPVDetector pipeline. Distinct Salmonella reads were detected at as low as 10X whole exome coverage that increased linearly (Additional file 4: Figure S2B).
Sanger validation of Salmonella reads identified in gallbladder cancer samples
We have attempted to validate the presence of Salmonella read sequences identified by HPVDetector in 4 of 16 Salmonella positive samples using Sanger sequencing (Additional file 5: Figure S3).