structural annotation

structural annotation #

HISAT2 RNA-Seq mapping for BRAKER #

  • For paired end reads I usually use splitted files which I first put into one folder. In this folder, I use the following command to print me out a list for every first or second file of the pairs, so that I can use this list for the HISAT2 command:
    1# for full paths, forward and reverse read files respectively:
    2find "$PWD" -name "*_1*" | paste -sd ','
    3find "$PWD" -name "*_2*" | paste -sd ','
    4
    5# and for relative paths:
    6find . -name "*_2*" | paste -sd ','
    
  • The use of these commands depend on the filenames. For the files I use, the forward and the reverse files are usually having something like “_1” or “_2” in their names, respectively. The filenames should also not contain whitespaces.
  • HISAT2 indexing and mapping commands ‘chained’ together:
    1hisat2-build -p genome.fasta genome.fasta.index \ # -p is fot the number of threads
    2&& hisat2 -p 27 -x genome.fasta.index \
    3-1 rna-reads_sample1_1.fq.gz,rna-reads_sample2_1.fq.g \
    4-2 rna-reads_sample1_2.fq.gz,rna-reads_sample2_2.fq.g \
    5-S rnaseq_mapping.sam
    

GeMoMa #

  • Has many programs and options, here only what I use the most is mentioned
  • Reference assemblies and annotations are needed and optionally RNA hints can be included, starting with the program GeMoMaPipeline:
    1java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI GeMoMaPipeline \
    2t=target_genome.fasta \
    3s=own i=RefSp a=reference-species_1.gff g=genome1.fasta \
    4s=own i=RS2 a=reference-species_2.gff g=genome2.fasta \
    5s=own i=RS2 a=reference-species_2.gff g=genome2.fasta \ # more reference species can be added, option 'i' is an optional ID/abbreviation for a species
    6r=MAPPED ERE.m=Mapped_rna_seq1.bam ERE.m=Mapped_rna_seq2.bam \ # sam file possible or 'extracted' hints
    7outdir=/path/to/out/directory/ pc=true pgr=true o=true Extractor.r=true \
    8GAF.f="start=='M' and stop=='*' and (isNaN(score) or score/aa>='0.75')" \
    9AnnotationFinalizer.r=NO threads=28 >> gemoma.log 2>&1
    
  • combining a GeMoMa results with braker results:
    • in short the gemoma commands:
      1. extract RNA-Seq evidence
      2. annotation evidence
      3. annotation filter
    1. Extract RNA-Seq Evidence:
    java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI ERE \
    m=/path/to/genome.RNA.sorted.bam \
    outdir=/combi/out/1
    
    1. Annotation evidence (braker gff)
    java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI AnnotationEvidence a=/path/to/braker.gtf \
    g=/path/to/genome.fasta c=UNSTRANDED coverage_unstranded=/combi/out/1/coverage.bedgraph i=/combi/out/1/introns.gff \
    outdir=/combi/out/2
    
    1. combination without filter:
    java -jar /path/to/GeMoMa/GeMoMa-1.9.jar CLI GAF g=/combi/out/2/annotation_with_attributes.gff g=/path/to/out/directory/final_annotation.gff f="start=='M' and stop=='*'" outdir=/combi/out/3 # one annotation specified with g here is the ouput from the GeMoMaPipeline