vignettes/how-variants-are-stored.Rmd
how-variants-are-stored.Rmd
We define genetic lesions (SNVs, INDELs, CNAs) based on their chrom
, pos
, SYMBOL
, mutation_key
, mutation_det
, ref
, alt
defined below. Different genetic lesions can have slightly different definitions of these features. superFreq
reports SNVs, short INDELs, CNAs and clones. Below are examples on how the fields above are defined for each genetic lesion.
chrom
: chromosome;pos
: position of point mutation in base pairs, with reference to a genome build;SYMBOL
: NCBI
gene symbol where the SNV
occurs;ref
: reference allele;alt
: alternative allele;mutation_key
: unique key for the mutation usually defined as SYMBOL-pos-ref-alt
. This is needed to summarise and look for the same mutation across different samples.mutation_det
: details of the mutation usually defined as SYMBOL Any annotation
. This can be used to annotate plots with features of a variant. Since import_goi_superfreq()
only imports output from superFreq
, the annotation used to populate the mutation_det
field for SNVs
is the annotation provided by a variant annotation tool, like VEP
or [VariantAnnotation
](https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html. However, mutation_det
can be updated at will to change plot labeling.PID | chrom | pos | SYMBOL | ref | alt | mutation_key | Consequence | mutation_det | variant_type |
---|---|---|---|---|---|---|---|---|---|
P1 | chr1 | 4573828 | KIT | A | G | KIT-4573828-A-G | nonsynonimous | KIT nonsynonimous | SNV |
A CNA involves deletions or amplifications of genomic regions of different sizes. In this pipeline we use superFreq
to call genomic alterations but the ouput could be adapted for other callers. superFreq
reports a CNA specifying the chrom
where the alteration occurs; the widths of the region involved (150Bbp in the example below); and the alteration type. For example, if AB
represents the normal genotype, then 150Mbp A
is a 150Mbp loss of one allele. Since, it is common practice to summarise mutations occurring on genes, the mutation_key
is defined here as SYMBOL-genotype
. This means that when plotting CNAs for one patient, there could be multiple instances of the same CNA reflecting the number of studyGenes
involved in that CNA. CNAs are not annotated by a variant annotation tool which is why the mutation details, mutation_det
, is simply SYMBOL width_of_CNA genotype
.
chrom | pos | SYMBOL | ref | alt | Consequence | mutation_key | mutation_det | variant_type |
---|---|---|---|---|---|---|---|---|
chrX | 276323 | KDM6A | KDM6A-A | KDM6A 150Mbp A | CNA | |||
chrX | 276323 | SMC1A | SMC1A-A | SMC1A 150Mbp A | CNA | |||
chrX | 276323 | BCORL1 | BCORL1-A | BCORL1 150Mbp A | CNA |
In superFreq
clones are collections of SNVs and CNAs that change together over time. Therefore, they won’t have a specific notation apart from identifiying how much a clone changes over time (using an estimate of clonality
, similar to VAF
) and how many events are involved in each clone. varikondo
will be updated so that it will be possible to extract the genes involved in every clones as compyted by superFreq
.
Within every patient there will be a finite number of clones identified (3 in the example below) as sumamrised by mutation_key
. mutation_det
reports how many anchors
(events) are in each clone identified.
PID | chrom | pos | SYMBOL | ref | alt | mutation_key | mutation_det | variant_type |
---|---|---|---|---|---|---|---|---|
P1 | 1 | clone (4 anchors) | clones | |||||
P1 | 2 | clone (10 anchors) | clones | |||||
P1 | 3 | clone (7 anchors) | clones |
chrom
, pos
, SYMBOL
, ref
, alt
, mutation_key
are the same as for SNVs
.mutation_det
: details of the mutation defined as SYMBOL-Consequence
. If the Consequence
from a variant annotation tool is not available then this can be for example the exon where the variant occurs. In the example below Consequence
was defined as ITD
which stands for Internal Tandem Duplication
which specifies the type of INDEL occurring in the FLT3 gene.chrom | pos | SYMBOL | ref | alt | mutation_key | Consequence | mutation_det | variant_type |
---|---|---|---|---|---|---|---|---|
chr13 | 764739898 | FLT3 | GATGATGAT | chr13-764739898- -GATGATGAT | ITD exon13 | FLT3-ITD exon13 | INDEL |