How we store different types of variants in output

We define genetic lesions (SNVs, INDELs, CNAs) based on their chrom, pos, SYMBOL, mutation_key, mutation_det, ref, alt defined below. Different genetic lesions can have slightly different definitions of these features. superFreq reports SNVs, short INDELs, CNAs and clones. Below are examples on how the fields above are defined for each genetic lesion.

Single Nucleotide Variants (SNVs) and short INDELs reported by superFreq

  • chrom: chromosome;
  • pos: position of point mutation in base pairs, with reference to a genome build;
  • SYMBOL: NCBI gene symbol where the SNV occurs;
  • ref: reference allele;
  • alt: alternative allele;
  • mutation_key: unique key for the mutation usually defined as SYMBOL-pos-ref-alt. This is needed to summarise and look for the same mutation across different samples.
  • mutation_det: details of the mutation usually defined as SYMBOL Any annotation. This can be used to annotate plots with features of a variant. Since import_goi_superfreq() only imports output from superFreq, the annotation used to populate the mutation_det field for SNVs is the annotation provided by a variant annotation tool, like VEP or [VariantAnnotation](https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html. However, mutation_det can be updated at will to change plot labeling.
PID chrom pos SYMBOL ref alt mutation_key Consequence mutation_det variant_type
P1 chr1 4573828 KIT A G KIT-4573828-A-G nonsynonimous KIT nonsynonimous SNV

Copy Number Alterations (CNAs)

A CNA involves deletions or amplifications of genomic regions of different sizes. In this pipeline we use superFreq to call genomic alterations but the ouput could be adapted for other callers. superFreq reports a CNA specifying the chrom where the alteration occurs; the widths of the region involved (150Bbp in the example below); and the alteration type. For example, if AB represents the normal genotype, then 150Mbp A is a 150Mbp loss of one allele. Since, it is common practice to summarise mutations occurring on genes, the mutation_key is defined here as SYMBOL-genotype. This means that when plotting CNAs for one patient, there could be multiple instances of the same CNA reflecting the number of studyGenes involved in that CNA. CNAs are not annotated by a variant annotation tool which is why the mutation details, mutation_det, is simply SYMBOL width_of_CNA genotype.

chrom pos SYMBOL ref alt Consequence mutation_key mutation_det variant_type
chrX 276323 KDM6A KDM6A-A KDM6A 150Mbp A CNA
chrX 276323 SMC1A SMC1A-A SMC1A 150Mbp A CNA
chrX 276323 BCORL1 BCORL1-A BCORL1 150Mbp A CNA

Clones

In superFreq clones are collections of SNVs and CNAs that change together over time. Therefore, they won’t have a specific notation apart from identifiying how much a clone changes over time (using an estimate of clonality, similar to VAF) and how many events are involved in each clone. varikondo will be updated so that it will be possible to extract the genes involved in every clones as compyted by superFreq.

Within every patient there will be a finite number of clones identified (3 in the example below) as sumamrised by mutation_key. mutation_det reports how many anchors (events) are in each clone identified.

PID chrom pos SYMBOL ref alt mutation_key mutation_det variant_type
P1 1 clone (4 anchors) clones
P1 2 clone (10 anchors) clones
P1 3 clone (7 anchors) clones

INDELs reported by VarDict

  • chrom, pos, SYMBOL, ref, alt, mutation_key are the same as for SNVs.
  • mutation_det: details of the mutation defined as SYMBOL-Consequence. If the Consequence from a variant annotation tool is not available then this can be for example the exon where the variant occurs. In the example below Consequence was defined as ITD which stands for Internal Tandem Duplication which specifies the type of INDEL occurring in the FLT3 gene.
chrom pos SYMBOL ref alt mutation_key Consequence mutation_det variant_type
chr13 764739898 FLT3 GATGATGAT chr13-764739898- -GATGATGAT ITD exon13 FLT3-ITD exon13 INDEL