structural annotation

structural annotation #

HISAT2 RNA-Seq mapping for BRAKER #

For paired end reads I usually use splitted files which I first put into one folder. In this folder, I use the following command to print me out a list for every first or second file of the pairs, so that I can use this list for the HISAT2 command:
```
1# for full paths, forward and reverse read files respectively:
2find "$PWD" -name "*_1*" | paste -sd ','
3find "$PWD" -name "*_2*" | paste -sd ','
4
5# and for relative paths:
6find . -name "*_2*" | paste -sd ','
```
The use of these commands depend on the filenames. For the files I use, the forward and the reverse files are usually having something like “_1” or “_2” in their names, respectively. The filenames should also not contain whitespaces.

HISAT2 indexing and mapping commands ‘chained’ together:

1hisat2-build -p genome.fasta genome.fasta.index \ # -p is fot the number of threads
2&& hisat2 -p 27 -x genome.fasta.index \
3-1 rna-reads_sample1_1.fq.gz,rna-reads_sample2_1.fq.g \
4-2 rna-reads_sample1_2.fq.gz,rna-reads_sample2_2.fq.g \
5-S rnaseq_mapping.sam

GeMoMa #

Has many programs and options, here only what I use the most is mentioned

Reference assemblies and annotations are needed and optionally RNA hints can be included, starting with the program GeMoMaPipeline:

1java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI GeMoMaPipeline \
2t=target_genome.fasta \
3s=own i=RefSp a=reference-species_1.gff g=genome1.fasta \
4s=own i=RS2 a=reference-species_2.gff g=genome2.fasta \
5s=own i=RS2 a=reference-species_2.gff g=genome2.fasta \ # more reference species can be added, option 'i' is an optional ID/abbreviation for a species
6r=MAPPED ERE.m=Mapped_rna_seq1.bam ERE.m=Mapped_rna_seq2.bam \ # sam file possible or 'extracted' hints
7outdir=/path/to/out/directory/ pc=true pgr=true o=true Extractor.r=true \
8GAF.f="start=='M' and stop=='*' and (isNaN(score) or score/aa>='0.75')" \
9AnnotationFinalizer.r=NO threads=28 >> gemoma.log 2>&1

combining a GeMoMa results with braker results:

in short the gemoma commands:
1. extract RNA-Seq evidence
2. annotation evidence
3. annotation filter

Extract RNA-Seq Evidence:

java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI ERE \
m=/path/to/genome.RNA.sorted.bam \
outdir=/combi/out/1

Annotation evidence (braker gff)

java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI AnnotationEvidence a=/path/to/braker.gtf \
g=/path/to/genome.fasta c=UNSTRANDED coverage_unstranded=/combi/out/1/coverage.bedgraph i=/combi/out/1/introns.gff \
outdir=/combi/out/2

combination without filter:

java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI GAF g=/combi/out/2/annotation_with_attributes.gff g=/path/to/out/directory/final_annotation.gff f="start=='M' and stop=='*'" outdir=/combi/out/3 # one annotation specified with g here is the ouput from the GeMoMaPipeline