Checking sequence quality using FastQC

Quality check using FastQC

The first thing you should do when getting new sequence data, either DNA or RNA, is to run a tool such as FastQC to check the quality of the reads, presence of sequencing adapters, GC-content etc. Fastqc is available on Abel via the command module load fastqc.

The easiest way to run FastQC is simply fastqc *.fastq.gz inside the directory with the sequence data (given that your sequence files ends with fastq.gz).

If you have a lot of sequence files it is wise to start FastQC as a slurm-job. Below is a script which loops over all the files ending with .fastq.gz and runs the program. Just paste this text into a file called fastqc_loop.slurm and run it by typing sbatch fastqc_loop.slurm:

#!/bin/sh
#SBATCH --job-name=fastqc
#SBATCH --account=uio
#SBATCH --time=5:00:00
#SBATCH --mem-per-cpu=8G
#SBATCH --output=slurm-%j.base

## Set up job environment:
source /cluster/bin/jobsetup
module purge   # clear any inherited modules
set -o errexit

module load fastqc

for i in ../SequenceData/RawSeqs/*.fq.gz
do
        fastqc -o ../Analyses/FastQC/ $i
done

Here I assume that you are in a directory (preferably named Scripts) which is on the same level as the SequenceData directory and that the raw sequences are in a directory named RawSeqs inside SequenceData. The directory Analyses (with FastQC inside) should be on the same level as SequenceData and Scripts (see this setup).

Leave a Comment