Title: | Microbe-Metabolite Interactions-Based Metabolic Profiles Predictor |
---|---|
Description: | Implements a computational framework to predict microbial community-based metabolic profiles with 'O2PLS' model. It provides procedures of model training and prediction. Paired microbiome and metabolome data are needed for modeling, and the trained model can be applied to predict metabolites of analogous environments using new microbial feature abundances. |
Authors: | Wenli Tang [aut, cre] , Guangchuang Yu [aut, ths] |
Maintainer: | Wenli Tang <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 0.1.1 |
Built: | 2024-11-06 02:56:47 UTC |
Source: | https://github.com/yulab-smu/mminp |
Calculation of the orthogonal variable influence on projection
calOrthVIP(SSDAO, SSD, loading)
calOrthVIP(SSDAO, SSD, loading)
SSDAO |
a value of sum of squares (SSDao in step2) for each deflated matrix |
SSD |
the sum of square values |
loading |
the normalized loading matrices |
Calculation of the predictive variable influence on projection
calPredVIP(SSXAP, SSYAP, SSD, loading)
calPredVIP(SSXAP, SSYAP, SSD, loading)
SSXAP |
the sum of squares values of deflated X matrix for the predictive VIPO2PLS |
SSYAP |
the sum of squares values of deflated Y matrix for the predictive VIPO2PLS |
SSD |
the sum of square values |
loading |
the normalized loading matrices |
This function throws an error if x
is not a numeric matrix or a data
frame with all numeric-alike variables, or if any elements of x
is
NA
.
checkInputdata(x)
checkInputdata(x)
x |
A matrix or data frame. |
No return value
Compare features' abundance obtained by prediction and measurement.
compareFeatures( predicted, measured, method = "spearman", adjmethod = "fdr", rsignif = 0.3, psignif = 0.05 )
compareFeatures( predicted, measured, method = "spearman", adjmethod = "fdr", rsignif = 0.3, psignif = 0.05 )
predicted |
A matrix or data frame. The feature table obtained by prediction. |
measured |
A matrix or data frame. The feature table obtained by
measurement. The abundances are expected to be normalized (i.e. proportion)
or be preprocessed by |
method |
A character string indicating which correlation coefficient is
to be used for the |
adjmethod |
A character string indicating correction
method ( |
rsignif |
A numeric ranging from 0 to 1, the minimum correlation coefficient of features which considered as well-predicted features. |
psignif |
A numeric ranging from 0 to 1, the maximum adjusted p value of features which considered as well-predicted features. |
A list containing a table of correlation results and a vector of well-predicted features.
Filter features of input table according to prevalence and/or abundance.
filterFeatures(x, prev = NA, abund = NA)
filterFeatures(x, prev = NA, abund = NA)
x |
A matrix or data frame. |
prev |
A numeric ranging from 0 to 1, the minimum prevalence of features to be retained. If set to NA, means no need to filter prevalence. |
abund |
A numeric greater than 0, the minimum abundance (mean) of features to be retained. If set to NA, means no need to filter abundance. |
A filtered feature table will be returned.
data(train_metag) d <- filterFeatures(train_metag, prev = 0.8) dim(train_metag) dim(d)
data(train_metag) d <- filterFeatures(train_metag, prev = 0.8) dim(train_metag) dim(d)
get components number using Cross-validate procedure of O2-PLS
get_Components( metag, metab, compmethod = NULL, n = 1:10, nx = 0:5, ny = 0:5, seed = 1234, nr_folds = 3, nr_cores = 1 )
get_Components( metag, metab, compmethod = NULL, n = 1:10, nx = 0:5, ny = 0:5, seed = 1234, nr_folds = 3, nr_cores = 1 )
metag |
Training data of sequence features' relative abundances.
Must have the exact same rows (subjects/samples) as |
metab |
Training data of metabolite relative abundances.
Must have the exact same rows (subjects/samples) as |
compmethod |
A character string indicating which Cross-validate procedure of O2PLS is to be used for estimating components, must be one of "NULL", "cvo2m" or "cvo2m.adj". If set to "NULL", depends on the features number. |
n |
Integer. Number of joint PLS components. Must be positive.
More details in |
nx |
Integer. Number of orthogonal components in |
ny |
Integer. Number of orthogonal components in |
seed |
a random seed to make the analysis reproducible, default is 1234. |
nr_folds |
Positive integer. Number of folds to consider.
Note: |
nr_cores |
Positive integer. Number of cores to use for CV. You might
want to use |
A data frame of components number
get components number from Cross-validate procedure of O2PLS
get_cvo2mComponent(x)
get_cvo2mComponent(x)
x |
List of class "cvo2m", produced by
|
A data frame of components number
This model was built using (MMINP.train
) with
preprocessed values in dataset and
.
A list containing an 'o2m' model, results of correlation analysis between metabolites of training data and its predicted values, components number, re-estimate information and iteration number of modeling.
data(MMINP_trained_model)
data(MMINP_trained_model)
This function aims to predict potentially metabolites in new microbial community using trained MMINP model. If genes in model are not appear in newdata, then this procedure will fill them up with 0. Note that this function does not center or scale the new microbiome matrixs, you would better do preprocessing on newdata in advance.
MMINP.predict(model, newdata, minGeneSize = 0.5)
MMINP.predict(model, newdata, minGeneSize = 0.5)
model |
List of class |
newdata |
New matrix of microbial genes, each column represents a gene. |
minGeneSize |
A numeric between 0-1, minimal size of genes in model contained in newdata. |
The model must be class 'mminp' or 'o2m'. The column of newdata must be microbial genes.
Predicted Data
data(MMINP_trained_model) data(test_metag) test_metag_preprocessed <- MMINP.preprocess(test_metag, normalized = FALSE) pred_metab <- MMINP.predict(model = MMINP_trained_model$model, newdata = test_metag_preprocessed)
data(MMINP_trained_model) data(test_metag) test_metag_preprocessed <- MMINP.preprocess(test_metag, normalized = FALSE) pred_metab <- MMINP.predict(model = MMINP_trained_model$model, newdata = test_metag_preprocessed)
Before doing MMINP analysis, abundances of both microbial features and metabolites should be preprocessed. Both measurements are expected to be transformed to relative abundance (i.e. proportion) and be log-transformed. To meet the need of O2-PLS method, data must be scaled.
MMINP.preprocess( data, normalized = TRUE, prev = NA, abund = NA, transformed = "none", scaled = TRUE )
MMINP.preprocess( data, normalized = TRUE, prev = NA, abund = NA, transformed = "none", scaled = TRUE )
data |
A numeric matrix or data frame containing measurements of metabolites or microbial features. |
normalized |
Logical, whether to transform measurements into relative abundance or not. |
prev |
A numeric ranging from 0 to 1, the minimum prevalence of features to be retained. If set to NA, means no need to filter prevalence. |
abund |
A numeric greater than 0, the minimum abundance (mean) of features to be retained. If set to NA, means no need to filter abundance. |
transformed |
character, select a transformation method: "boxcox", "log", or "none". |
scaled |
Logical, whether scale the columns of data or not. |
The rows of data must be samples and columns of data must be metabolites or
microbial features.
The filtering process (prev
and abund
) is before log/boxcox
transformation and scale transformation.
A preprocessed numeric matrix for analysis of MMINP.
data(train_metag) d <- MMINP.preprocess(train_metag) d <- MMINP.preprocess(train_metag, prev = 0.3, abund = 0.001) d[1:5, 1:5]
data(train_metag) d <- MMINP.preprocess(train_metag) d <- MMINP.preprocess(train_metag, prev = 0.3, abund = 0.001) d[1:5, 1:5]
This function contains three steps.
Step1, Build an O2-PLS model and use it to predict metabolites profile;
Step2, Compare predicted and measured metabolites abundances, then filter
those metabolites which predicted poorly (i.e. metabolites of which
correlation coefficient less than rsignif
or adjusted pvalue greater
than psignif
.);
Step3, (iteration) Re-build O2-PLS model until all reserved metabolites are
well-fitted.
MMINP.train( metag, metab, n = 1:3, nx = 0:3, ny = 0:3, seed = 1234, compmethod = NULL, nr_folds = 3, nr_cores = 1, rsignif = 0.4, psignif = 0.05, recomponent = FALSE )
MMINP.train( metag, metab, n = 1:3, nx = 0:3, ny = 0:3, seed = 1234, compmethod = NULL, nr_folds = 3, nr_cores = 1, rsignif = 0.4, psignif = 0.05, recomponent = FALSE )
metag |
Training data of sequence features' relative abundances.
Must have the exact same rows (subjects/samples) as |
metab |
Training data of metabolite relative abundances.
Must have the exact same rows (subjects/samples) as |
n |
Integer. Number of joint PLS components. Must be positive.
More details in |
nx |
Integer. Number of orthogonal components in |
ny |
Integer. Number of orthogonal components in |
seed |
a random seed to make the analysis reproducible, default is 1234. |
compmethod |
A character string indicating which Cross-validate procedure of O2PLS is to be used for estimating components, must be one of "NULL", "cvo2m" or "cvo2m.adj". If set to "NULL", depends on the features number. |
nr_folds |
Positive integer. Number of folds to consider.
Note: |
nr_cores |
Positive integer. Number of cores to use for CV. You might
want to use |
rsignif |
A numeric ranging from 0 to 1, the minimum correlation coefficient of features which considered as well-predicted features. |
psignif |
A numeric ranging from 0 to 1, the maximum adjusted p value of features which considered as well-predicted features. |
recomponent |
Logical, whether re-estimate components or not during each iteration. |
A list containing
model |
O2PLS model |
trainres |
Final correlation results between predicted and measured metabolites of training samples |
WFM |
Well-fitted metabolites |
components |
Components number. If |
re_estimate |
Re-estimate information, i.e. whether re-estimate components or not during each iteration |
trainnumb |
Iteration number |
data(test_metab) data(test_metag) a <- MMINP.preprocess(test_metag[, 1:20], normalized = FALSE) b <- MMINP.preprocess(test_metab[, 1:20], normalized = FALSE) mminp_model <- MMINP.train(metag = a, metab = b, n = 3:5, nx = 0:3, ny = 0:3, nr_folds = 2, nr_cores = 1) length(mminp_model$trainres$wellPredicted)
data(test_metab) data(test_metag) a <- MMINP.preprocess(test_metag[, 1:20], normalized = FALSE) b <- MMINP.preprocess(test_metab[, 1:20], normalized = FALSE) mminp_model <- MMINP.train(metag = a, metab = b, n = 3:5, nx = 0:3, ny = 0:3, nr_folds = 2, nr_cores = 1) length(mminp_model$trainres$wellPredicted)
O2PLS-VIP, an approach for variable influence on projection (VIP) in O2PLS models, is a model-based method for judging the importance of variables. For both X and Y data blocks, it generates VIP profiles for (i) the predictive part of the model, (ii) the orthogonal part, and (iii) the total model.
O2PLSvip(x, y, model)
O2PLSvip(x, y, model)
x |
Training data of sequence features' relative abundances.
Must have the exact same rows (subjects/samples) as |
y |
Training data of metabolite relative abundances.
Must have the exact same rows (subjects/samples) as |
model |
List of class |
It generates 6 VIPO2PLS profiles in total:
Two VIP profiles for the predictive components, which uncover the X- and Y-variables that are more important for the model interpretation in relation to the variation correlated to the Y- and X- data matrices respectively;
Two VIP profiles for the orthogonal components for both the X-block and the Y-block severally, profiles that uncover the X- and Y- variables that are more relevant in relation to the variation uncorrelated to the Y- and X- data matrices respectively;
Two VIP profiles for the total model (i.e. including the contributions of both predictive and orthogonal components) for both the X- and the Y- blocks severally, these VIP profiles point at the X- and Y- variables that are more significant for the whole model.
A list containing
xvip |
For the X-block, the VIP profiles for the predictive part of the model, the orthogonal part, the total model. |
yvip |
For the Y-block, the VIP profiles for the predictive part of the model, the orthogonal part, the total model. |
Galindo-Prieto B, Trygg J, Geladi P. A new approach for variable influence on projection (VIP) in O2PLS models. Chemometrics and Intelligent Laboratory Systems 2017; 160: 110–124.
#' data(test_metab) data(test_metag) a <- MMINP.preprocess(test_metag[, 1:20], normalized = FALSE) b <- MMINP.preprocess(test_metab[, 1:20], normalized = FALSE) mminp_model <- MMINP.train(metag = a, metab = b, n = 3:5, nx = 0:3, ny = 0:3, nr_folds = 2, nr_cores = 1) length(mminp_model$trainres$wellPredicted) vipres <- O2PLSvip(a, b, mminp_model) head(vipres$xvip) head(vipres$yvip)
#' data(test_metab) data(test_metag) a <- MMINP.preprocess(test_metag[, 1:20], normalized = FALSE) b <- MMINP.preprocess(test_metab[, 1:20], normalized = FALSE) mminp_model <- MMINP.train(metag = a, metab = b, n = 3:5, nx = 0:3, ny = 0:3, nr_folds = 2, nr_cores = 1) length(mminp_model$trainres$wellPredicted) vipres <- O2PLSvip(a, b, mminp_model) head(vipres$xvip) head(vipres$yvip)
This function is the print method for MMINP.train
.
## S3 method for class 'mminp' print(x, ...)
## S3 method for class 'mminp' print(x, ...)
x |
A model (an object of class "mminp") |
... |
additional parameters |
Brief information about the object.
estimation of the sum of squares of deviations
ssd(x)
ssd(x)
x |
matrix |
the sum of squares of deviations
This datasets were built from NLIBD dataset (Franzosa et al., 2019) by converting original HMDB IDs into KEGG compound IDs and removing unassigned and repeated features.
A data frame of metabolite relative abundances (i.e. proportion), with 65 subjects in rows and 130 KEGG compound IDs in columns.
Franzosa EA et al. (2019). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology 4(2):293-305.
data(test_metab)
data(test_metab)
This datasets were built from NLIBD dataset (Franzosa et al., 2019) by converting original UniRef90 IDs into KEGG Orthology (KO) IDs and removing unassigned and repeated features.
A data frame of gene family relative abundances (i.e. proportion), with 65 subjects in rows and 629 KEGG Orthology (KO) IDs in columns.
Franzosa EA et al. (2019). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology 4(2):293-305.
data(test_metag)
data(test_metag)
This datasets were built from PRISM dataset (Franzosa et al., 2019) by converting original HMDB IDs into KEGG compound IDs and removing unassigned and repeated features.
A data frame of metabolite relative abundances (i.e. proportion), with 155 subjects in rows and 135 KEGG compound IDs in columns.
Franzosa EA et al. (2019). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology 4(2):293-305.
data(train_metab)
data(train_metab)
This datasets were built from PRISM dataset (Franzosa et al., 2019) by converting original UniRef90 IDs into KEGG Orthology (KO) IDs and removing unassigned and repeated features.
A data frame of gene family relative abundances (i.e. proportion), with 155 subjects in rows and 733 KEGG Orthology (KO) IDs in columns.
Franzosa EA et al. (2019). Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nature Microbiology 4(2):293-305.
data(train_metag)
data(train_metag)