List of programs
VAT Core Modules
snpMapper
snpMapper is a program to annotate a set of SNPs in VCF format. The program determines the effect of a SNP on the coding potential (synonymous, nonsynonymous, prematureStop, removedStop, spliceOverlap) of each transcript of a gene.
Usage
snpMapper <annotation.interval> <annotation.fa>
Inputs | Takes a VCF input from STDIN |
---|---|
Outputs | Outputs annotated SNPs in VCF format. The annotation information is captured as part of the INFO field. For details refer to the VCF format specification. |
Required arguments |
|
Optional Arguments | None |
indelMapper
indelMapper is a program to annotate a set of indels in VCF format. The program determines the effect of an indel on the coding potential (frameshift insertion, non-frameshift insertion, frameshift deletion, non-frameshift deletion, spliceOverlap, startOverlap, endOverlap) of each transcript of a gene.
Usage
indelMapper <annotation.interval> <annotation.fa>
Inputs | Takes a VCF input from STDIN |
---|---|
Outputs | Outputs annotated indels in VCF format. The annotation information is captured as part of the INFO field. For details refer to the VCF format specification. |
Required arguments |
|
Optional Arguments | None. |
svMapper
svMapper is a program to annotate a set of SVs in VCF format. The program determines if a SV overlaps with different transcript isoforms of a gene.
Usage
svMapper <annotation.interval>
Inputs | Takes a VCF input from STDIN |
---|---|
Outputs | Outputs annotated SVs in VCF format. The annotation information is captured as part of the INFO field. For details refer to the VCF format specification. |
Required arguments |
|
Optional Arguments | None. |
genericMapper
genericMapper is a program to annotate a number of different variants in VCF format. The program checks whether a variant overlaps with entries in the specified annotation set (it does not determine the effect on the coding potential).
Usage
genericMapper <annotation.interval> <nameFeature>
Inputs | Takes a VCF input from STDIN |
---|---|
Outputs | Outputs the annotated variants in VCF format. The annotation information is captured as part of the INFO field. |
Required arguments |
|
Optional arguments | None. |
vcfSummary
vcfSummary is a program to aggregate annotated variants across genes and samples.
Usage
vcfSummary <file.vcf.gz> <annotation.interval>
Inputs | None |
---|---|
Outputs | Generates two output files. The first file, named file.geneSummary.txt , contains the number of variants categorized by type for each gene. A second file, named file.sampleSummary.txt , summarizes number of variants categorized by type for each sample. |
Required arguments |
|
Optional arguments | None. |
vcfImages
vcf2images is a program to generate an image for each gene to visualize effect of the annotated variants.
Usage
vcf2images <file.vcf.gz> <annotation.interval> <outputDir>
Inputs | None. |
---|---|
Outputs | Generates an image in PNG format for each gene that has at least one annotated variant. |
Required arguments |
|
Optional Arguments | None. |
vcfSubsetByGene
vcfSubsetByGene is a program to subset a VCF file with annotated variants by gene.
Usage
vcfSubsetByGene <file.vcf.gz> <annotation.interval> <outputDir>
Inputs | None. |
---|---|
Outputs | Generates a VCF file for each gene that has at least one annotated variant. |
Required arguments |
|
Optional Arguments | None. |
vcfModifyHeader
vcfModifyHeader is a program to modify the header line (part of the meta-lines) in a VCF file. Specifically, it assigns each sample to a group or population (these assignments are used by other programs including vcfSummary).
vcfModifyHeader <oldHeader.vcf> <groups.txt>
Inputs | None. |
---|---|
Outputs | Generates a VCF header file. |
Required arguments |
|
Optional arguments | None. |
Auxiliary programs
gencode2interval
gencode2interval converts a GENCODE annotation file (in GTF format) to the Interval format.
Usage
gencode2interval
Inputs | Takes a GENCODE annotation file in GTF format from STDIN |
---|---|
Outputs | Outputs the GENCODE annotation file in Interval format to STDOUT |
Required arguments | None. |
Optional arguments | None. |
Note: Remove all header lines in the annotation file before running gencode2interval. Also filter out coding transcripts that do not have an annotated start or stop as follows: grep -v '^#' gencode.v19.annotation.gtf | awk '/\t(HAVANA|ENSEMBL)\t(CDS|start_codon|stop_codon)\t/ {print}' | grep -v mRNA_end_NF | grep -v mRNA_start_NF > gencode.v19.annotation.filtered.gtf gencode2interval < gencode.v19.annotation.filtered.gtf > gencode.v19.annotation.filtered.interval |
interval2sequences
Module to retrieve genomic/exonic sequences for an annotation set in Interval format.
Usage
interval2sequences <file.2bit> <file.annotation> <exonic|genomic>
Inputs | None. |
---|---|
Outputs | Reports the extracted sequences in FASTA format |
Required arguments |
|
Optional arguments | None. |
Note: You will want to cd into a directory where you have write permission since interval2sequences may create temporary files |
External programs
bgzip/tabix
Tabix is generic tool that indexes position-sorted files in tab-delimited formats to facilitate fast retrieval. This tool was developed by Heng Li. For more information consult the tabix documentation page.
VCF tools
VCF tools consists of a suite of very useful modules to manipulate VCF files. For more information consult the documentation page.