structural annotation #
HISAT2 RNA-Seq mapping for BRAKER #
- For paired end reads I usually use splitted files which I first put into one folder. In this folder, I use the following command to print me out a list for every first or second file of the pairs, so that I can use this list for the HISAT2 command:
1# for full paths, forward and reverse read files respectively: 2find "$PWD" -name "*_1*" | paste -sd ',' 3find "$PWD" -name "*_2*" | paste -sd ',' 4 5# and for relative paths: 6find . -name "*_2*" | paste -sd ',' - The use of these commands depend on the filenames. For the files I use, the forward and the reverse files are usually having something like “_1” or “_2” in their names, respectively. The filenames should also not contain whitespaces.
- HISAT2 indexing and mapping commands ‘chained’ together:
1hisat2-build -p genome.fasta genome.fasta.index \ # -p is fot the number of threads 2&& hisat2 -p 27 -x genome.fasta.index \ 3-1 rna-reads_sample1_1.fq.gz,rna-reads_sample2_1.fq.g \ 4-2 rna-reads_sample1_2.fq.gz,rna-reads_sample2_2.fq.g \ 5-S rnaseq_mapping.sam
GeMoMa #
- Has many programs and options, here only what I use the most is mentioned
- Reference assemblies and annotations are needed and optionally RNA hints can be included, starting with the program GeMoMaPipeline:
1java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI GeMoMaPipeline \ 2t=target_genome.fasta \ 3s=own i=RefSp a=reference-species_1.gff g=genome1.fasta \ 4s=own i=RS2 a=reference-species_2.gff g=genome2.fasta \ 5s=own i=RS2 a=reference-species_2.gff g=genome2.fasta \ # more reference species can be added, option 'i' is an optional ID/abbreviation for a species 6r=MAPPED ERE.m=Mapped_rna_seq1.bam ERE.m=Mapped_rna_seq2.bam \ # sam file possible or 'extracted' hints 7outdir=/path/to/out/directory/ pc=true pgr=true o=true Extractor.r=true \ 8GAF.f="start=='M' and stop=='*' and (isNaN(score) or score/aa>='0.75')" \ 9AnnotationFinalizer.r=NO threads=28 >> gemoma.log 2>&1 - combining a GeMoMa results with braker results:
- in short the gemoma commands:
- extract RNA-Seq evidence
- annotation evidence
- annotation filter
- Extract RNA-Seq Evidence:
java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI ERE \ m=/path/to/genome.RNA.sorted.bam \ outdir=/combi/out/1- Annotation evidence (braker gff)
java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI AnnotationEvidence a=/path/to/braker.gtf \ g=/path/to/genome.fasta c=UNSTRANDED coverage_unstranded=/combi/out/1/coverage.bedgraph i=/combi/out/1/introns.gff \ outdir=/combi/out/2- combination without filter:
java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI GAF g=/combi/out/2/annotation_with_attributes.gff g=/path/to/out/directory/final_annotation.gff f="start=='M' and stop=='*'" outdir=/combi/out/3 # one annotation specified with g here is the ouput from the GeMoMaPipeline - in short the gemoma commands: