This GitHub repository includes codes and scripts that demonstrate the use of dada2
and phyloseq
(and associated tools and R packages) to analyze 16S rDNA amplicon sequencing data. An working example is included in the example
folder.
Disclaimer: Do not use any of the provided codes and scripts in production without fully understanding of the contents. I am not responsible for errors or omissions or for any consequences from use of the contents and make no warranty with respect to the currency, completeness, or accuracy of the contents from this GitHub repository.
I strongly recommend using conda to manage the software and R packages required for this analysis.
- We will create an environment with R >= 4.1 and Python 3 to perform majority of the analysis.
- We will create an environment with Python 2.7 to run
export2graphlan
andgraphlan
as they do not support Python 3. - If you are unable to install
picrust2
>= 2.4 or higher in the main environment with Python 3 due to conflicts, you will need to create a separate environment to run the latestpicrust2
. - Use
conda activate my_env
to activate an environment namedmy_env
, andconda deactivate
to deactivate an environment. Learn more about managing environments here.
# Create main "16S" env
# Remove `picrust2=2.4` from the list if it causes conflicts
conda create -n 16S -c conda-forge -c bioconda r-base=4.1 python=3 curl libcurl openssl boost-cpp \
r-doparallel r-devtools r-rcurl r-httr r-magick r-png r-ggplot2 r-data.table r-phangorn r-ape r-gridextra \
r-ggbeeswarm r-ggrepel r-vegan r-tidyverse r-gtools r-r.utils bioconductor-dada2 bioconductor-phyloseq \
bioconductor-decipher bioconductor-deseq2 bioconductor-shortread bioconductor-biostrings \
bioconductor-biomformat bioconductor-aldex2 cutadapt raxml raxml-ng lefse picrust2=2.4
# Use "conda activate 16S" to activate this environment
# Create "graphlan" env
conda create -n graphlan export2graphlan graphlan
# Use "conda activate graphlan" to activate this environment
# Create "picrust2" env if it causes conflicts when setting up the "16S" env
conda create -n picrust2 picrust2=2.4
# Use "conda activate picrust2" to activate this environment
1. PCR primer trimming using cutadapt
- Conda environment: 16S
- Script: "run_trimming.pl"
- Cutadapt can run on multiple CPU cores in parallel under Python 3
- See "2a - Option 2" to run
cutadapt
in R
Usage: perl run_trimming.pl project_folder fastq_folder forward_primer_sequence reverse_primer_sequence
- Example:
- project_folder: PRJEB27564
- fastq_folder: raw
- forward_primer_sequence: 5'-CCTACGGGNGGCWGCAG-3'
- reverse_primer_sequence: 5'-GACTACHVGGGTATCTAATCC-3'
Usage: perl run_trimming.pl PRJEB27564 raw CCTACGGGNGGCWGCAG GACTACHVGGGTATCTAATCC
2. Run DADA2 workflow
- Conda environment: 16S
- Script:
- Option 1: "dada2-per-run-processing.R" - Trimming already performed using
cutadapt
in terminal/console - Option 2: "cutadapt-and-dada2-per-run-processing.R" - Execute
cutadapt
in R by usingsystem2
function
- Option 1: "dada2-per-run-processing.R" - Trimming already performed using
- dada2 filtering and trimming
filterAndTrim
- Amplicon sequence variant (ASV) inference
- Script: "dada2-create-phyloseq-obj.R"
- Taxonomy and species assignment
- Sample data preparation
- Phylogenetic tree construction using RAxML and raxml-ng
- Phyloseq object construction
3. Explore microbiome profiles using phyloseq
- Conda environment: 16S
- Script: "phyloseq-analysis.R"
- Conda environment: 16S and graphlan
- Script: "lefse-analysis.R"
- Linear discriminant analysis (LDA) effect size (LEfSe) analysis
- Create cladogram using export2graphlan and GraPhlAn. Require Python 2.7
- Conda environment: 16S or picrust2 (depending on where
picrust2
is installed) - Script: "picrust2-aldex-analysis.R"
- Silva taxonomic training data formatted for DADA2 (Silva version 138.1)
https://zenodo.org/record/4587955#.YNWax3VKhkY
Note: These files are intended for use in classifying prokaryotic 16S sequencing data and are not appropriate for classifying eukaryotic ASVs.
- silva_nr99_v138.1_train_set.fa.gz
- silva_species_assignment_v138.1.fa.gz
- Create from NCBI 16SMicrobial blast DB
Install BLAST
conda install -c bioconda blast
or
sudo apt-get install ncbi-blast+
Download NCBI's 16S rRNA BLAST DB
wget ftp://ftp.ncbi.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz
tar zxf 16S_ribosomal_RNA.tar.gz
Convert 16SMicrobial BLAST DB into FASTA format
blastdbcmd -db 16S_ribosomal_RNA -entry all -out 16SMicrobial.fa
gzip 16SMicrobial.fa
http://evomics.org/phyloseq/taxa_summary-r/
Use gunzip taxa_summary.R.gz
to extract file