This document introduces the TaxNorm
R package, a
package for normalizing microbiome taxa data. Here, we will go through
how to install, analyze and visualize microbiome data using this
package. TaxNorm
implements the Zero Inflated Negative
Binomial (ZINB) method to normalize microbiome data.
There are three main steps in using this package:
Load and QC Input Data: In the package we have an example data set from the phyloeq package that shows shows the format needed for analysis. These data can be generated using methods blah blah blah.
Running ZINB Normalization Function: The
TaxNorm_Normalization
function is runn using the above data
on the input. This function implements the ZINB method for
normalization.
Visualizing and Quality Control: Last, visualization and quality control measures are built into the package for use.
TaxaNorm
requires the packages phyloeq
and
microbiome
which can be found on bioconductor.
For the newest, but potentially unstable, version of the package, direct github installation is also supported.
Basic Useage
data("TaxaNorm_Example_Input", package = "TaxaNorm")
# run normalization
TaxaNorm_Example_Output <- TaxaNorm_Normalization(data= TaxaNorm_Example_Input,
depth = NULL,
group = sample_data(TaxaNorm_Example_Input)$body_site,
meta.data = NULL,
filter.cell.num = 10,
filter.taxa.count = 0,
random = FALSE,
ncores = 1)
# run diagnosis test
Diagnose_Data <- TaxaNorm_Run_Diagnose(Normalized_Results = TaxaNorm_Example_Output, prev = TRUE, equiv = TRUE, group = sample_data(TaxaNorm_Example_Input)$body_site)
Built in example data as a phyloseq object can be loaded with the command below.
We have prepared several QC figures for the input data characters, which give a preliminary criteria on pre-filtering rare taxa with low information before any analysis. This will improve the power and computational efficiency for the algorithm. If the user already has the cleaned data or pre-processed the data by themselves before, they can ignore and skip this step.
Here we provide a popular option to ensure at least
filter.sample.num
samples with a count of
filter.taxa.count
or more, where
filter.sample.num
can be chosen as any arbitrary value or
the sample size of the smallest group of samples. By default, we used
filter.taxa.count=1
and filter.sample.num=10
.
This criteria is incorporated in the following main function
TaxNorm_Normalization()
as well.
filter.sample.num =1
filter.taxa.count = 10
taxaIn <- rowSums(abundances(TaxaNorm_Example_Input) > filter.taxa.count) > filter.sample.num
TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input)
Users can apply any of their customized filtering criteria as well. Alternatively, a basic pre-filtering is to keep only rows that have at least 10 reads total:
The normalization is run and returns a TaxaNorm_Results
object. This object contains the input data, raw data, normdata, ecdf,
model parameters, and convergence.