RNA-seq Workflow using STAR

This tutorial introduces a beginner-friendly, copy-paste-ready RNA-seq workflow optimized for personal PCs using: FastQC -> fastp -> STAR -> samtools -> featureCounts -> MultiQC. The executable script supports Single-End and Paired-End FASTQ.gz inputs and produces BAMs, gene counts, and a MultiQC report.

System

Minimum PC requirements

Download

All workflow files are hosted on GitHub for downloading GitHub. You can choose one of the two options below.

Option 1: Full package (Download ZIP)

On GitHub, click Code button and choose Download ZIP. This will download the complete package.

Option 2: Main script only

If you only want the main script, download or copy rnaseq_pipeline.sh from the repository and follow the step-by-step instructions on this page.

Warning This workflow has been successfully tested on local personal PCs, but it is still under active development and may require minor adjustments in some environments. If you encounter errors or issues, please report them to: rsat2026@gmail.com. Your input helps improve the tool.

Table of contents

Project structure

Project location (important): Place your RNA-seq project (including reference/STAR_index) on a local disk under the system root or your home directory. Do not use shared, synced, or network folders (e.g., OneDrive, Dropbox, Google Drive, or Windows /mnt/c/... paths in WSL2), as they can be slow and cause file errors.

Recommended locations

Avoid: shared or mounted paths like /mnt/c/.../OneDrive/...

Keep your STAR index directly under the main project folder (as shown here).

my_rnaseq_project/
  rnaseq_pipeline.sh
  fastqs/                         # input FASTQ.gz files here
    sample1_R1.fastq.gz
    sample1_R2.fastq.gz
    ...
  reference/
    GRCh38.primary_assembly.genome.fa.gz
    gencode.v44.primary_assembly.annotation.gtf.gz
    GRCh38.primary_assembly.genome.fa
    gencode.v44.primary_assembly.annotation.gtf
    STAR_index/               # STAR index folder (GENOME_DIR)
  rna_output/                 # pipeline output (OUTPUT_DIR) will be generated automatically after the run completes.
    

Install Ubuntu on Windows (WSL2)

Run these commands in Windows Command Prompt (CMD) or PowerShell:

Command
wsl --install -d Ubuntu

Reboot computer, then launch Ubuntu from the Start menu and continue the workflow inside the Ubuntu terminal.

Install Conda + Mamba (beginner-friendly)

Linux / Ubuntu / WSL2

1) Install Miniforge (Conda) and initialize it:

Command
cd ~
curl -L -o Miniforge3.sh https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3.sh -b -p "$HOME/miniforge3"
echo 'export PATH="$HOME/miniforge3/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
conda init

2) Restart the terminal. Install Mamba into base environment:

Command
conda install -n base -c conda-forge mamba -y

macOS

1) Install Miniforge (Conda):

Command
cd ~
# Apple Silicon (M1/M2/M3):
curl -L -o Miniforge3.sh https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
# Intel Mac (older):
# curl -L -o Miniforge3.sh https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh

bash Miniforge3.sh -b -p "$HOME/miniforge3"
echo 'export PATH="$HOME/miniforge3/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

2) Install Mamba:

Command
conda install -n base -c conda-forge mamba -y

Important (macOS only): your script needs Bash 4+. Install modern bash:

Command
brew install bash gawk

Install required tools (FastQC, MultiQC, fastp, STAR, samtools, featureCounts, etc.)

Create an isolated environment and install all tools:

Command
mamba create -n rnaseq -c conda-forge -c bioconda \
  fastqc multiqc fastp star samtools subread python pigz wget -y

mamba activate rnaseq

Verify installs:

Command
fastqc --version
multiqc --version
fastp --version
STAR --version
samtools --version
featureCounts -v
python --version
pigz --version
wget --version

Download genome FASTA + GTF

Command
cd ~
mkdir -p my_rnaseq_project/{fastqs,reference}
cd my_rnaseq_project/reference

Copy the command below to download the reference genome and annotation.

Command
wget -O gencode.v44.primary_assembly.annotation.gtf.gz \
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.primary_assembly.annotation.gtf.gz

wget -O GRCh38.primary_assembly.genome.fa.gz \
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz

Unzip both gz files using pigz:

Command
pigz -dk gencode.v44.primary_assembly.annotation.gtf.gz
pigz -dk GRCh38.primary_assembly.genome.fa.gz

Build STAR index

Warning From this point onward, all STAR index building and commands must be run inside your project directory (cd ~/my_rnaseq_project).

Create the STAR index folder directly under reference/:

Command
cd ~/my_rnaseq_project
mkdir -p reference/STAR_index

Build the STAR index:

Command (Standard)

STAR \
  --runMode genomeGenerate \
  --runThreadN 8 \
  --genomeDir reference/STAR_index \
  --genomeFastaFiles reference/GRCh38.primary_assembly.genome.fa \
  --sjdbGTFfile reference/gencode.v44.primary_assembly.annotation.gtf \
  --sjdbOverhang 100 \
  --genomeSAindexNbases 14 \
  --limitGenomeGenerateRAM 16000000000
Command (low-RAM)
STAR \
  --runMode genomeGenerate \
  --runThreadN 4 \
  --genomeDir reference/STAR_index \
  --genomeFastaFiles reference/GRCh38.primary_assembly.genome.fa \
  --sjdbGTFfile reference/gencode.v44.primary_assembly.annotation.gtf \
  --sjdbOverhang 100 \
  --genomeSAindexNbases 12 \
  --genomeSAsparseD 2 \
  --genomeChrBinNbits 18 \
  2>&1 | tee output/star_genomeGenerate.lowram.log

Warning STAR index generation is critical and may take hours. Try the standard build first; if memory errors occur, use the low-RAM version, which uses a compact genome index and runs reliably on most personal PCs. The full set of STAR index folder is roughly 15 to 20 GB in size for the human genome.

Run the main script

1) Put FASTQ.gz into fastqs/

Command
ls fastqs/*.gz

2) Make sure the script is executable

Command (No output means success)
chmod +x rnaseq_pipeline.sh

3) Run (Linux / Ubuntu / WSL2)

Command
bash rnaseq_pipeline.sh \
  -i fastqs \
  -o rna_output \
  -g reference/STAR_index \
  -a reference/gencode.v44.primary_assembly.annotation.gtf \
  -t 4 \
  --stranded 0

4) Run (macOS using Brew Bash 4+/5)

Find your brew bash path:

Command
which -a bash

Then run using the brew bash path (example shown):

Command
/opt/homebrew/bin/bash rnaseq_pipeline.sh \
  -i fastqs \
  -o rna_output\
  -g reference/STAR_index \
  -a reference/gencode.v44.primary_assembly.annotation.gtf \
  -t 4 \
  --stranded 0

Warning Processing time varies by dataset and hardware; a paired sample may take ~30+ minutes. If the job freezes, is killed, or shows errors, this usually indicates insufficient RAM. To solve this, reduce the thread value (-t, e.g., use 2), close other programs, or run the workflow on a higher RAM PC. If unsure, copy the error message into an AI tool for help resolving the memory issue. The default threads are set conservatively for personal PCs, but you can increase -t (e.g., 8) on stronger machines for faster execution.

Where results are saved