Title: | Finds Missing Links and Metric Confidence Intervals in Ecological Bipartite Networks |
---|---|
Description: | Provides methods to deal with under sampling in ecological bipartite networks from Terry and Lewis (2020) Ecology <doi:10.1002/ecy.3047> Includes tools to fit a variety of statistical network models and sample coverage estimators to highlight most likely missing links. Also includes simple functions to resample from observed networks to generate confidence intervals for common ecological network metrics. |
Authors: | Chris Terry [aut, cre, cph] |
Maintainer: | Chris Terry <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2024-11-04 04:34:42 UTC |
Source: | https://github.com/jcdterry/cassandra |
Calls CoverageEstimator() to calculate host-level coverage deficit, then divides this by the number of unobserved interactions of that host.
CalcHostLevelCoverage(list)
CalcHostLevelCoverage(list)
list |
Network List |
A network list, with 'C_defmatrix', a matrix of probabilities based on coverage deficit, and 'OverallChaoEst' an estimate of the overall coverage deficit of the network.
Compute Basic Confidence Intervals
ComputeCI(df)
ComputeCI(df)
df |
A data frame produced by |
a dataframe detailing confidence intervals at each tested sample size
data(Safariland, package = 'bipartite') X<-RarefyNetwork(Safariland, n_per_level = 100) PlotRarefaction(X)
data(Safariland, package = 'bipartite') X<-RarefyNetwork(Safariland, n_per_level = 100) PlotRarefaction(X)
An estimate of the sample coverage, which tries to use the most appropriate method
CoverageEstimator(x, cutoff = 5, BayesPrior = "Flat")
CoverageEstimator(x, cutoff = 5, BayesPrior = "Flat")
x |
A vector of integers, the observed sample counts |
cutoff |
When to switch from binomial model to Chao1 estimator |
BayesPrior |
Prior to use. Either 'Flat' or 'Jeffereys'. |
Sample coverage is defined as the probability that the next interaction drawn is of a type not yet seen
If the sample size is at or below a cutoff (5) or if all the samples are singletons, this is calculated as the posterior mean of a binomial model using a flat prior (this can be changed to a Jeffereys).
If there are singletons but no doubletons, the Turing-Good estimate is used: c_hat = 1 - (f1/n)
If there are both singletons and doubletons, the Chao1 index is used:
c_hat = 1 -( (f1/n) * ( (f1*(n-1))/((n-1)*(f1+(2*f2))) ) )
c_hat, the estimated coverage. (i.e. 1- C_def)
Gets a network in the base bipartite package format into a list format. N.B. Throughout this package uses hosts to refer to the focal layer, and 'wasps' the response layer, although this could equally be 'plants' and 'pollinators'.
CreateListObject(web)
CreateListObject(web)
web |
in format specified by the bipartite package. Rows = focal layer, columns = response layer |
A network list for use with other functions in EcoLinkPredict package
data(Safariland, package = 'bipartite') demolist<-CreateListObject(Safariland) str(demolist)
data(Safariland, package = 'bipartite') demolist<-CreateListObject(Safariland) str(demolist)
Internal function called by PredictLinks()
Fits the coverage deficit, Trait, Centrality, Matching-Centrality and SBM models by sequentially
calling the individual functions.
FitAllModels(list, RepeatModels = 10)
FitAllModels(list, RepeatModels = 10)
list |
A network list |
RepeatModels |
How many times to fit each model from different starting points. Uses best half (rounding up) |
A network list including the model fit
Fit a model that contains both a trait-matching and a centrality term based on Rohr et al. (2016)
FitBothMandC( list, N_runs = 10, maxit = 10000, method = "Nelder-Mead", ExtraSettings = NULL )
FitBothMandC( list, N_runs = 10, maxit = 10000, method = "Nelder-Mead", ExtraSettings = NULL )
list |
Network List |
N_runs |
Number of different start points for k2 and lambda to try. The best (maximum likelihood) half will be used to construct the probability matrix |
maxit |
Default = 10'000 |
method |
Passed to optim, default = 'Nelder-Mead' |
ExtraSettings |
Other control settings to pass to optim() |
Network list with added 'B_par',the best fitting parameters, 'M_ProbsMatrix', the probability matrix
Rohr, R.P., Naisbit, R.E., Mazza, C. & Bersier, L.-F. (2016). Matching-centrality decomposition and the forecasting of new links in networks. Proc. R. Soc. B Biol. Sci., 283, 20152702
Repeatedly fits a centrality model to a binary interaction network to return a probability matrix
FitCentrality( list, N_runs = 10, maxit = 10000, method = "Nelder-Mead", ExtraSettings = NULL )
FitCentrality( list, N_runs = 10, maxit = 10000, method = "Nelder-Mead", ExtraSettings = NULL )
list |
Network List |
N_runs |
Number of start points to try. The best (maximum likelihood) half will be used to construct the probability matrix |
maxit |
Default = 10'000 |
method |
Passed to optim, default = 'Nelder-Mead' |
ExtraSettings |
Other control settings to pass to optim() |
Network list with added 'C_par', best fitting parameters, C_ProbsMatrix, the probability matrix
Repeatedly fits a latent trait model to a binary interaction network to return a probability matrix
FitMatching( list, N_runs = 10, maxit = 10000, method = "Nelder-Mead", ExtraSettings = NULL )
FitMatching( list, N_runs = 10, maxit = 10000, method = "Nelder-Mead", ExtraSettings = NULL )
list |
Network List |
N_runs |
Number of start points for k2 and lambda to try. The best (maximum likelihood) half will be used to construct the probability matrix |
maxit |
Default = 10'000 |
method |
Passed to optim, default = 'Nelder-Mead' |
ExtraSettings |
Other control settings to pass to optim() |
The optimiser is started at values derived from the row-sums and column-sums of a CCA analysis, which correspond closely to latent traits by matching closely related species together.
The k2 and lambda parameters are started from points drawn from a uniform distribution 0:1.
Network list with added 'M_par',the best fitting parameters, 'M_ProbsMatrix', the probability matrix
Fit SBM Model
FitSBM(list, n_SBM = 10, G = NULL)
FitSBM(list, n_SBM = 10, G = NULL)
list |
Network List |
n_SBM |
Number of SBM models to fit. Default is 10. The top half (rounding up) are retained and averaged to produce a probability matrix. |
G |
The number of groups to divide the top layer and the focal layer into. |
Network list with 'SBM_ProbsMat', a matrix of probabilities assigned to each possible interaction, 'SBM1', the best model fit derived from Optimise_SBM(), and 'SBM_G', the number of fitted groups.
Core model adapted from: "Sampling bias is a challenge [...]: lessons from a quantitative nichemodel" by Jochen Frund, Kevin S. McCann and Neal M. Williams
make_true_and_sample_web( seed = NULL, specpar = 1, n_hosts = 10, n_wasps = 10, TargetTrueConn = 0.5, SampleObs = 1000, abun_mean = 5, abun_sdlog = 1, traitvsnested = 0.5, hosttrait_n = "two" )
make_true_and_sample_web( seed = NULL, specpar = 1, n_hosts = 10, n_wasps = 10, TargetTrueConn = 0.5, SampleObs = 1000, abun_mean = 5, abun_sdlog = 1, traitvsnested = 0.5, hosttrait_n = "two" )
seed |
Random number generator seed, if specified. |
specpar |
Specialisation parameter, equal to 1/sd of the normal curve that defines the consumption range |
n_hosts |
Number of focal level species (e.g. hosts, flowers) |
n_wasps |
Number of non-focal level species (e.g. parasitic wasps, pollinators) |
TargetTrueConn |
Proportion of possible interactions to keep |
SampleObs |
Number of samples to draw |
abun_mean |
Mean abundance level (log scale). |
abun_sdlog |
Distributon of abundance level (SD log scale). |
traitvsnested |
The relative balance between the nestedness generator and the trait-based generator |
hosttrait_n |
Number of trait dimensions. Default 'two', uses two traits, with one dominant. 'single' and 'multi' retained from Frund et al. |
Abundances are assigned by generating abundances that match a log-normal distribution (but without introducing noise)
A network list containing 'obs' a matrix of observations, 'TrueWeb' a matrix of the 'true'] drawn web, and number of other properties of these networks.
make_true_and_sample_web()
make_true_and_sample_web()
Designed to be called by FitSBM()
Optimise_SBM(i = NULL, A, G, N_Rounds_max = 500, plot = FALSE)
Optimise_SBM(i = NULL, A, G, N_Rounds_max = 500, plot = FALSE)
i |
Seed |
A |
Binary Interaction Matrix |
G |
Number of Groups |
N_Rounds_max |
Maximum number round to keep drawing |
plot |
If set to TRUE, plots the progress of likelihood improvement, used to check if convergence is good. |
Based on optimising algorithm described in Larremore, D.B., Clauset, A. & Jacobs, A.Z. (2014). Efficiently inferring community structure in bipartite networks. Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., 90, 1-12
Initially all species are randomly assigned to groups. Then, one at a time, each species is swapped into a different group and the likelihood of the model assessed (with SBMLik()).
The best model of all these swaps is then selected (even if it is worse) and used in the next round of swapping.
This fits the 'degree-corrected' biSBM mdoel of Larremore et al., which is generally better when there are broad degree distributions
This is repeated until either n_rounds_max is reached, or the (most commonly), if the best model in the last 20 is within 0.1 log-likelihood of the best overall (implying it has stopped improving).
A list containing 'LogLik' (the maximum likelihood found) 'SB_H', the group assignments of the host, 'SB_W', the group assignments of the other level, and 'Omega_rs', the interaction probabilities between groups.
Optimiser wrapper for network models
Optimiser( i = NULL, maxit = 10000, method = "Nelder-Mead", A, N_p, fixedSt_P = c(), N_unif_P = 0, func, ExtraSettings = NULL )
Optimiser( i = NULL, maxit = 10000, method = "Nelder-Mead", A, N_p, fixedSt_P = c(), N_unif_P = 0, func, ExtraSettings = NULL )
i |
RNG Seed to set |
maxit |
Maximum number of iterations to be passed to optim (default is 10000) |
method |
Optimiser method to pass to optim. Default is |
A |
Interaction Presence-Absence matrix |
N_p |
Number of parameters to draw from a normal distribution |
fixedSt_P |
Vector of fixed parameters to pass |
N_unif_P |
Number of parameters to take from a uniform distribution |
func |
Function to optimiser |
ExtraSettings |
Additional setting to pass to control |
A 'fit' object form optim, with a few of the input parameters attached.
Takes the output from other functions (including PredictLinks()
) to visualise the fit to the
data and predictions of missing links.
PlotFit( list, Matrix_to_plot, OrderBy = "Default", addDots = TRUE, title = NULL, Combine = "+", RemoveTP = FALSE, GuidesOff = TRUE )
PlotFit( list, Matrix_to_plot, OrderBy = "Default", addDots = TRUE, title = NULL, Combine = "+", RemoveTP = FALSE, GuidesOff = TRUE )
list |
A list-format network (output from xxx) |
Matrix_to_plot |
Which matrix / matrices to plot. One or more of 'C_def','C', 'M', 'B', 'SBM' |
OrderBy |
How to order the plot. One of 'Default','Degree','Manual', 'LatentTrait','SBM', 'AsPerMatrix' |
addDots |
Should dots be added to show observations. TRUE, FALSE or 'Size', to plot by interaction strength |
title |
A title. By default it will use the value of Matrix_to_plot |
Combine |
How should multiple matrices be combined. Either '+' which averages them (default), or '*' which multiples |
RemoveTP |
Should true positives be set to NA in order to highlight differences in predictions. Default is FALSE |
GuidesOff |
Should the legends be switched off. Defaults to TRUE |
See the vignette for a more through description and examples.
A ggplot object, which by default will print to the device, but can be added to make further tweaks
## Not run: data(Safariland, package = 'bipartite') Predictions<- PredictLinks(Safariland) PlotFit(Predictions, Matrix_to_plot = 'SBM') ## End(Not run)
## Not run: data(Safariland, package = 'bipartite') Predictions<- PredictLinks(Safariland) PlotFit(Predictions, Matrix_to_plot = 'SBM') ## End(Not run)
Used to plot the output from RarefyNetwork()
. See vignette!
PlotRarefaction(df)
PlotRarefaction(df)
df |
A data frame produced by RarefyNetwork |
A ggplot
data(Safariland, package = 'bipartite') X<-RarefyNetwork(Safariland, n_per_level = 100) ComputeCI(X)
data(Safariland, package = 'bipartite') X<-RarefyNetwork(Safariland, n_per_level = 100) ComputeCI(X)
First calls CreateListObject
to convert a matrix suitable for the bipartite package into a list structure.
PredictLinks(web, RepeatModels = 10)
PredictLinks(web, RepeatModels = 10)
web |
in format specified by the bipartite package. Rows = focal layer, columns = response layer |
RepeatModels |
How many times to fit each model from different starting points. Uses best half (rounding up) |
Then it calls FitAllModels
to fit each of the missing link models in turn.
A network list including a large number of outputs.
## Not run: data(Safariland, package = 'bipartite') PredictLinks(Safariland) ## End(Not run)
## Not run: data(Safariland, package = 'bipartite') PredictLinks(Safariland) ## End(Not run)
Resamples empirical network observations at a range of sampling levels and calls networklevel() function from bipartite package to calculate network metrics.
RarefyNetwork( web, n_per_level = 1000, frac_sample_levels = seq(0.2, 1, l = 5), abs_sample_levels = NULL, metrics = "info", PARALLEL = FALSE, cores = 2, output = "df", ... )
RarefyNetwork( web, n_per_level = 1000, frac_sample_levels = seq(0.2, 1, l = 5), abs_sample_levels = NULL, metrics = "info", PARALLEL = FALSE, cores = 2, output = "df", ... )
web |
A matrix format web, as for |
n_per_level |
How many samples to take per sample level. Default is 1000. |
frac_sample_levels |
Sequence of fractions of original sample size to resample at. |
abs_sample_levels |
If supplied, vector of absolute sample sizes to use to override |
metrics |
vector of metrics to calculate. Will be passed to |
PARALLEL |
Logical. If TRUE, will use parallel package to speed up metric calculation. Default = FALSE |
cores |
If using parallel, how man cores to use. Default = 2 |
output |
String specifying output. If 'plot' will return a ggplot facetted by metric using |
... |
Additional arguments to pass to |
Can return either a data frame of raw metrics, a ggplot or a data frame of 'confidence intervals'.
These CI are calculated from the set of resamples by ordering the network values and taking the value of the metric ranked at the 5th and 95th percentile. (this method is very similar to that employed by Casas et al. 2018 Assessing sampling sufficiency of network metrics using bootstrap Ecological Complexity 36:268-275.)
Note that confidence intervals for many metrics, particularly qualitative ones, will be biased by the issue of false-negatives. Resampling of observations will not introduce missing links.
By default the size of resamples are taken to be proportional to the original sample size. Original sample size is
defined as the sum of the supplied web. If a specific set of sample sizes is wanted, use abs_sample_levels
It is possible to extrapolate how increases sample size may lead to increased confidence in a metric too.
Set the sequence to frac_sample_levels
to go beyond 1.
Either a dataframe or a ggplot object. See details.
data(Safariland, package = 'bipartite') RarefyNetwork(Safariland, n_per_level = 100)
data(Safariland, package = 'bipartite') RarefyNetwork(Safariland, n_per_level = 100)
Adds a dataframe that defines each interaction as true positive, false negative or true negative
SortResponseCategory(list)
SortResponseCategory(list)
list |
Network list |
A Network list object with ObsSuccess, a dataframe detailing all the interactions and whether they are True Positives, False Negative or True Negatives
The function assumes FitAllModels() has already been run. It is a wrapper for 'SortResponseCategory()' and 'TestAUC()'
TestAllModels(list)
TestAllModels(list)
list |
A network list |
the network list with added AUC data. Key values are 'AUC', a dataframe with the AUC of each model and many combinations.
Test via AUC the predictive capacity of each model or combination of models
TestAUC(list)
TestAUC(list)
list |
Network List |
a list with 'DataforAUC', a data frame with each interaction as a row and the predictions of each model, and 'AUC', a data frame with the predictive capacity of all the models and many combinations