RNA-seq Workflow using STAR

This tutorial introduces a beginner-friendly, copy-paste-ready RNA-seq workflow optimized for personal PCs using: FastQC -> fastp -> STAR -> samtools -> featureCounts -> MultiQC. The executable script supports Single-End and Paired-End FASTQ.gz inputs and produces BAMs, gene counts, and a MultiQC report.

System

Windows: Use WSL2 + Ubuntu and run the workflow inside Ubuntu terminal.
macOS (MacBook): Works as-is, but you must run with Bash 4+ (macOS default bash is often 3.2).
Linux/Ubuntu: Works as-is using bash rnaseq_pipeline.sh ...

Minimum PC requirements

CPU: 4 cores minimum (8+ recommended)
RAM:32 GB recommended for smoother STAR indexing/mapping
Disk: plan for 50-150+ GB free (FASTQs + BAMs + STAR outputs + QC)
Internet: needed to download tools and reference files

Download

All workflow files are hosted on GitHub for downloading GitHub. You can choose one of the two options below.

Option 1: Full package (Download ZIP)

On GitHub, click Code button and choose Download ZIP. This will download the complete package.

Option 2: Main script only

If you only want the main script, download or copy rnaseq_pipeline.sh from the repository and follow the step-by-step instructions on this page.

Warning This workflow has been successfully tested on local personal PCs, but it is still under active development and may require minor adjustments in some environments. If you encounter errors or issues, please report them to: rsat2026@gmail.com. Your input helps improve the tool.

Project structure

Project location (important): Place your RNA-seq project (including reference/STAR_index) on a local disk under the system root or your home directory. Do not use shared, synced, or network folders (e.g., OneDrive, Dropbox, Google Drive, or Windows /mnt/c/... paths in WSL2), as they can be slow and cause file errors.

Recommended locations

WSL2: /home/<user>/my_rnaseq_project (inside the Linux root filesystem)
Linux/macOS: ~/my_rnaseq_project (local home directory)

Avoid: shared or mounted paths like /mnt/c/.../OneDrive/...

Keep your STAR index directly under the main project folder (as shown here).

my_rnaseq_project/
  rnaseq_pipeline.sh
  fastqs/                         # input FASTQ.gz files here
    sample1_R1.fastq.gz
    sample1_R2.fastq.gz
    ...
  reference/
    GRCh38.primary_assembly.genome.fa.gz
    gencode.v44.primary_assembly.annotation.gtf.gz
    GRCh38.primary_assembly.genome.fa
    gencode.v44.primary_assembly.annotation.gtf
    STAR_index/               # STAR index folder (GENOME_DIR)
  rna_output/                 # pipeline output (OUTPUT_DIR) will be generated automatically after the run completes.

Install Ubuntu on Windows (WSL2)

Run these commands in Windows Command Prompt (CMD) or PowerShell:

Command

wsl --install -d Ubuntu

Reboot computer, then launch Ubuntu from the Start menu and continue the workflow inside the Ubuntu terminal.

Install Conda + Mamba (beginner-friendly)

Linux / Ubuntu / WSL2

1) Install Miniforge (Conda) and initialize it:

Command

cd ~
curl -L -o Miniforge3.sh https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3.sh -b -p "$HOME/miniforge3"
echo 'export PATH="$HOME/miniforge3/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
conda init

2) Restart the terminal. Install Mamba into base environment:

Command

conda install -n base -c conda-forge mamba -y

macOS

1) Install Miniforge (Conda):

Command

cd ~
# Apple Silicon (M1/M2/M3):
curl -L -o Miniforge3.sh https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
# Intel Mac (older):
# curl -L -o Miniforge3.sh https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh

bash Miniforge3.sh -b -p "$HOME/miniforge3"
echo 'export PATH="$HOME/miniforge3/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

2) Install Mamba:

Command

conda install -n base -c conda-forge mamba -y

Important (macOS only): your script needs Bash 4+. Install modern bash:

Command

brew install bash gawk

Install required tools (FastQC, MultiQC, fastp, STAR, samtools, featureCounts, etc.)

Create an isolated environment and install all tools:

Command

mamba create -n rnaseq -c conda-forge -c bioconda \
  fastqc multiqc fastp star samtools subread python pigz wget -y

mamba activate rnaseq

Verify installs:

Command

fastqc --version
multiqc --version
fastp --version
STAR --version
samtools --version
featureCounts -v
python --version
pigz --version
wget --version

Download genome FASTA + GTF

Command

cd ~
mkdir -p my_rnaseq_project/{fastqs,reference}
cd my_rnaseq_project/reference

Copy the command below to download the reference genome and annotation.

Command

wget -O gencode.v44.primary_assembly.annotation.gtf.gz \
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.primary_assembly.annotation.gtf.gz

wget -O GRCh38.primary_assembly.genome.fa.gz \
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz

Unzip both gz files using pigz:

Command

pigz -dk gencode.v44.primary_assembly.annotation.gtf.gz
pigz -dk GRCh38.primary_assembly.genome.fa.gz

Build STAR index

From this point onward, all STAR index building and commands must be run inside your project directory (cd ~/my_rnaseq_project).

Create the STAR index folder directly under reference/:

Command

cd ~/my_rnaseq_project
mkdir -p reference/STAR_index

Build the STAR index:

Command (Standard)


STAR \
  --runMode genomeGenerate \
  --runThreadN 8 \
  --genomeDir reference/STAR_index \
  --genomeFastaFiles reference/GRCh38.primary_assembly.genome.fa \
  --sjdbGTFfile reference/gencode.v44.primary_assembly.annotation.gtf \
  --sjdbOverhang 100 \
  --genomeSAindexNbases 14 \
  --limitGenomeGenerateRAM 16000000000

Command (low-RAM)

STAR \
  --runMode genomeGenerate \
  --runThreadN 4 \
  --genomeDir reference/STAR_index \
  --genomeFastaFiles reference/GRCh38.primary_assembly.genome.fa \
  --sjdbGTFfile reference/gencode.v44.primary_assembly.annotation.gtf \
  --sjdbOverhang 100 \
  --genomeSAindexNbases 12 \
  --genomeSAsparseD 2 \
  --genomeChrBinNbits 18 \
  2>&1 | tee output/star_genomeGenerate.lowram.log

STAR index generation is critical and may take hours. Try the standard build first; if memory errors occur, use the low-RAM version, which uses a compact genome index and runs reliably on most personal PCs. The full set of STAR index folder is roughly 15 to 20 GB in size for the human genome.

Run the main script

1) Put FASTQ.gz into `fastqs/`

Command

ls fastqs/*.gz

2) Make sure the script is executable

Command (No output means success)

chmod +x rnaseq_pipeline.sh

3) Run (Linux / Ubuntu / WSL2)

Command

bash rnaseq_pipeline.sh \
  -i fastqs \
  -o rna_output \
  -g reference/STAR_index \
  -a reference/gencode.v44.primary_assembly.annotation.gtf \
  -t 4 \
  --stranded 0

4) Run (macOS using Brew Bash 4+/5)

Find your brew bash path:

Command

which -a bash

Then run using the brew bash path (example shown):

Command

/opt/homebrew/bin/bash rnaseq_pipeline.sh \
  -i fastqs \
  -o rna_output\
  -g reference/STAR_index \
  -a reference/gencode.v44.primary_assembly.annotation.gtf \
  -t 4 \
  --stranded 0

Warning Processing time varies by dataset and hardware; a paired sample may take ~30+ minutes. If the job freezes, is killed, or shows errors, this usually indicates insufficient RAM. To solve this, reduce the thread value (-t, e.g., use 2), close other programs, or run the workflow on a higher RAM PC. If unsure, copy the error message into an AI tool for help resolving the memory issue. The default threads are set conservatively for personal PCs, but you can increase -t (e.g., 8) on stronger machines for faster execution.

Where results are saved

rna_output/multiqc/multiqc_report.html (open in browser)
rna_output/SUMMARY.tsv (per-sample summary)
rna_output/counts/combined_counts.txt (combined gene counts matrix)

RNA-seq Workflow using STAR

System

Minimum PC requirements

Download

Option 1: Full package (Download ZIP)

Option 2: Main script only

Table of contents

Project structure

Install Ubuntu on Windows (WSL2)

Install Conda + Mamba (beginner-friendly)

Linux / Ubuntu / WSL2

macOS

Install required tools (FastQC, MultiQC, fastp, STAR, samtools, featureCounts, etc.)

Download genome FASTA + GTF

Build STAR index

Run the main script

1) Put FASTQ.gz into fastqs/

2) Make sure the script is executable

3) Run (Linux / Ubuntu / WSL2)

4) Run (macOS using Brew Bash 4+/5)

Where results are saved

1) Put FASTQ.gz into `fastqs/`