statistics

statistics #

FASTA/FASTQ base count: #

  • FASTA with awk (usually preinstalled):
    1awk 'NF && !/^>/ {total += length} END {print total}' *fasta
    
  • You can specify one or more files. The sum of all bases from all files together will be the output.
  • FASTA/FASTQ with seqtk, parallel and awk:
     1parallel seqtk size ::: *.fasta | awk -F'\t' '{sum+=$2;} END{print sum;}'
     2
     3# or
     4parallel seqtk size ::: *.fastq | awk -F'\t' '{sum+=$2;} END{print sum;}'
     5
     6# without awk
     7parallel seqtk size ::: *.fastq # This prints no. of reads and base count per file
     8
     9# for just one file without parallel:
    10seqtk size file.fastq