Package 'cassandRa' reference manual

Title:	Finds Missing Links and Metric Confidence Intervals in Ecological Bipartite Networks
Description:	Provides methods to deal with under sampling in ecological bipartite networks from Terry and Lewis (2020) Ecology <doi:10.1002/ecy.3047> Includes tools to fit a variety of statistical network models and sample coverage estimators to highlight most likely missing links. Also includes simple functions to resample from observed networks to generate confidence intervals for common ecological network metrics.
Authors:	Chris Terry [aut, cre, cph]
Maintainer:	Chris Terry <[email protected]>
License:	GPL-3
Version:	0.2.0
Built:	2025-03-04 04:18:26 UTC
Source:	https://github.com/jcdterry/cassandra

Estimated probabilities of missing links based on the host Level Coverage Deficit

Description

Calls CoverageEstimator() to calculate host-level coverage deficit, then divides this by the number of unobserved interactions of that host.

Usage

CalcHostLevelCoverage(list)
CalcHostLevelCoverage(list)

Arguments

list

Network List

Value

A network list, with 'C_defmatrix', a matrix of probabilities based on coverage deficit, and 'OverallChaoEst' an estimate of the overall coverage deficit of the network.

Compute Basic Confidence Intervals

Description

Compute Basic Confidence Intervals

Usage

ComputeCI(df)
ComputeCI(df)

Arguments

`df`	A data frame produced by `RarefyNetwork()`

Value

a dataframe detailing confidence intervals at each tested sample size

Examples

 data(Safariland, package = 'bipartite')
 X<-RarefyNetwork(Safariland, n_per_level = 100)
 PlotRarefaction(X)
data(Safariland, package = 'bipartite')
 X<-RarefyNetwork(Safariland, n_per_level = 100)
 PlotRarefaction(X)

Coverage Estimator, using Chao1 Index, Turing-Good or Binomial depending on what is possible

Description

An estimate of the sample coverage, which tries to use the most appropriate method

Usage

CoverageEstimator(x, cutoff = 5, BayesPrior = "Flat")
CoverageEstimator(x, cutoff = 5, BayesPrior = "Flat")

Arguments

`x`	A vector of integers, the observed sample counts
`cutoff`	When to switch from binomial model to Chao1 estimator
`BayesPrior`	Prior to use. Either 'Flat' or 'Jeffereys'.

Details

Sample coverage is defined as the probability that the next interaction drawn is of a type not yet seen

If the sample size is at or below a cutoff (5) or if all the samples are singletons, this is calculated as the posterior mean of a binomial model using a flat prior (this can be changed to a Jeffereys).

If there are singletons but no doubletons, the Turing-Good estimate is used: c_hat = 1 - (f1/n)

If there are both singletons and doubletons, the Chao1 index is used:

c_hat = 1 -( (f1/n) * ( (f1*(n-1))/((n-1)*(f1+(2*f2))) ) )

Value

c_hat, the estimated coverage. (i.e. 1- C_def)

Generates a network list from a food web

Description

Gets a network in the base bipartite package format into a list format. N.B. Throughout this package uses hosts to refer to the focal layer, and 'wasps' the response layer, although this could equally be 'plants' and 'pollinators'.

Usage

CreateListObject(web)
CreateListObject(web)

Arguments

web

in format specified by the bipartite package. Rows = focal layer, columns = response layer

Value

A network list for use with other functions in EcoLinkPredict package

Examples

data(Safariland, package = 'bipartite')
demolist<-CreateListObject(Safariland)
str(demolist)
data(Safariland, package = 'bipartite')
demolist<-CreateListObject(Safariland)
str(demolist)

Fit all the models

Description

Internal function called by PredictLinks() Fits the coverage deficit, Trait, Centrality, Matching-Centrality and SBM models by sequentially calling the individual functions.

Usage

FitAllModels(list, RepeatModels = 10)
FitAllModels(list, RepeatModels = 10)

Arguments

`list`	A network list
`RepeatModels`	How many times to fit each model from different starting points. Uses best half (rounding up)

Value

A network list including the model fit

Fit Matching-Centrality Model

Description

Fit a model that contains both a trait-matching and a centrality term based on Rohr et al. (2016)

Usage

FitBothMandC(
  list,
  N_runs = 10,
  maxit = 10000,
  method = "Nelder-Mead",
  ExtraSettings = NULL
)
FitBothMandC(
  list,
  N_runs = 10,
  maxit = 10000,
  method = "Nelder-Mead",
  ExtraSettings = NULL
)

Arguments

`list`	Network List
`N_runs`	Number of different start points for k2 and lambda to try. The best (maximum likelihood) half will be used to construct the probability matrix
`maxit`	Default = 10'000
`method`	Passed to optim, default = 'Nelder-Mead'
`ExtraSettings`	Other control settings to pass to optim()

Value

Network list with added 'B_par',the best fitting parameters, 'M_ProbsMatrix', the probability matrix

References

Rohr, R.P., Naisbit, R.E., Mazza, C. & Bersier, L.-F. (2016). Matching-centrality decomposition and the forecasting of new links in networks. Proc. R. Soc. B Biol. Sci., 283, 20152702

Fit Centrality Model

Description

Repeatedly fits a centrality model to a binary interaction network to return a probability matrix

Usage

FitCentrality(
  list,
  N_runs = 10,
  maxit = 10000,
  method = "Nelder-Mead",
  ExtraSettings = NULL
)
FitCentrality(
  list,
  N_runs = 10,
  maxit = 10000,
  method = "Nelder-Mead",
  ExtraSettings = NULL
)

Arguments

`list`	Network List
`N_runs`	Number of start points to try. The best (maximum likelihood) half will be used to construct the probability matrix
`maxit`	Default = 10'000
`method`	Passed to optim, default = 'Nelder-Mead'
`ExtraSettings`	Other control settings to pass to optim()

Value

Network list with added 'C_par', best fitting parameters, C_ProbsMatrix, the probability matrix

Fit Latent Trait (Matching Model)

Description

Repeatedly fits a latent trait model to a binary interaction network to return a probability matrix

Usage

FitMatching(
  list,
  N_runs = 10,
  maxit = 10000,
  method = "Nelder-Mead",
  ExtraSettings = NULL
)
FitMatching(
  list,
  N_runs = 10,
  maxit = 10000,
  method = "Nelder-Mead",
  ExtraSettings = NULL
)

Arguments

`list`	Network List
`N_runs`	Number of start points for k2 and lambda to try. The best (maximum likelihood) half will be used to construct the probability matrix
`maxit`	Default = 10'000
`method`	Passed to optim, default = 'Nelder-Mead'
`ExtraSettings`	Other control settings to pass to optim()

Details

The optimiser is started at values derived from the row-sums and column-sums of a CCA analysis, which correspond closely to latent traits by matching closely related species together.

The k2 and lambda parameters are started from points drawn from a uniform distribution 0:1.

Value

Network list with added 'M_par',the best fitting parameters, 'M_ProbsMatrix', the probability matrix

Fit SBM Model

Description

Fit SBM Model

Usage

FitSBM(list, n_SBM = 10, G = NULL)
FitSBM(list, n_SBM = 10, G = NULL)

Arguments

`list`	Network List
`n_SBM`	Number of SBM models to fit. Default is 10. The top half (rounding up) are retained and averaged to produce a probability matrix.
`G`	The number of groups to divide the top layer and the focal layer into.

Value

Network list with 'SBM_ProbsMat', a matrix of probabilities assigned to each possible interaction, 'SBM1', the best model fit derived from Optimise_SBM(), and 'SBM_G', the number of fitted groups.

Make an artificial bipartite networks with some properties of ecological networks, then sample from it

Description

Core model adapted from: "Sampling bias is a challenge [...]: lessons from a quantitative nichemodel" by Jochen Frund, Kevin S. McCann and Neal M. Williams

Usage

make_true_and_sample_web(
  seed = NULL,
  specpar = 1,
  n_hosts = 10,
  n_wasps = 10,
  TargetTrueConn = 0.5,
  SampleObs = 1000,
  abun_mean = 5,
  abun_sdlog = 1,
  traitvsnested = 0.5,
  hosttrait_n = "two"
)
make_true_and_sample_web(
  seed = NULL,
  specpar = 1,
  n_hosts = 10,
  n_wasps = 10,
  TargetTrueConn = 0.5,
  SampleObs = 1000,
  abun_mean = 5,
  abun_sdlog = 1,
  traitvsnested = 0.5,
  hosttrait_n = "two"
)

Arguments

`seed`	Random number generator seed, if specified.
`specpar`	Specialisation parameter, equal to 1/sd of the normal curve that defines the consumption range
`n_hosts`	Number of focal level species (e.g. hosts, flowers)
`n_wasps`	Number of non-focal level species (e.g. parasitic wasps, pollinators)
`TargetTrueConn`	Proportion of possible interactions to keep
`SampleObs`	Number of samples to draw
`abun_mean`	Mean abundance level (log scale).
`abun_sdlog`	Distributon of abundance level (SD log scale).
`traitvsnested`	The relative balance between the nestedness generator and the trait-based generator
`hosttrait_n`	Number of trait dimensions. Default 'two', uses two traits, with one dominant. 'single' and 'multi' retained from Frund et al.

Details

Abundances are assigned by generating abundances that match a log-normal distribution (but without introducing noise)

Value

A network list containing 'obs' a matrix of observations, 'TrueWeb' a matrix of the 'true'] drawn web, and number of other properties of these networks.

Examples

make_true_and_sample_web()
make_true_and_sample_web()

Custom optimiser function for SBM models

Description

Designed to be called by FitSBM()

Usage

Optimise_SBM(i = NULL, A, G, N_Rounds_max = 500, plot = FALSE)
Optimise_SBM(i = NULL, A, G, N_Rounds_max = 500, plot = FALSE)

Arguments

`i`	Seed
`A`	Binary Interaction Matrix
`G`	Number of Groups
`N_Rounds_max`	Maximum number round to keep drawing
`plot`	If set to TRUE, plots the progress of likelihood improvement, used to check if convergence is good.

Details

Based on optimising algorithm described in Larremore, D.B., Clauset, A. & Jacobs, A.Z. (2014). Efficiently inferring community structure in bipartite networks. Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., 90, 1-12

Initially all species are randomly assigned to groups. Then, one at a time, each species is swapped into a different group and the likelihood of the model assessed (with SBMLik()).

The best model of all these swaps is then selected (even if it is worse) and used in the next round of swapping.

This fits the 'degree-corrected' biSBM mdoel of Larremore et al., which is generally better when there are broad degree distributions

This is repeated until either n_rounds_max is reached, or the (most commonly), if the best model in the last 20 is within 0.1 log-likelihood of the best overall (implying it has stopped improving).

Value

A list containing 'LogLik' (the maximum likelihood found) 'SB_H', the group assignments of the host, 'SB_W', the group assignments of the other level, and 'Omega_rs', the interaction probabilities between groups.

Optimiser wrapper for network models

Description

Optimiser wrapper for network models

Usage

Optimiser(
  i = NULL,
  maxit = 10000,
  method = "Nelder-Mead",
  A,
  N_p,
  fixedSt_P = c(),
  N_unif_P = 0,
  func,
  ExtraSettings = NULL
)
Optimiser(
  i = NULL,
  maxit = 10000,
  method = "Nelder-Mead",
  A,
  N_p,
  fixedSt_P = c(),
  N_unif_P = 0,
  func,
  ExtraSettings = NULL
)

Arguments

`i`	RNG Seed to set
`maxit`	Maximum number of iterations to be passed to optim (default is 10000)
`method`	Optimiser method to pass to optim. Default is
`A`	Interaction Presence-Absence matrix
`N_p`	Number of parameters to draw from a normal distribution
`fixedSt_P`	Vector of fixed parameters to pass
`N_unif_P`	Number of parameters to take from a uniform distribution
`func`	Function to optimiser
`ExtraSettings`	Additional setting to pass to control

Value

A 'fit' object form optim, with a few of the input parameters attached.

Plot the fitted network models

Description

Takes the output from other functions (including PredictLinks()) to visualise the fit to the data and predictions of missing links.

Usage

PlotFit(
  list,
  Matrix_to_plot,
  OrderBy = "Default",
  addDots = TRUE,
  title = NULL,
  Combine = "+",
  RemoveTP = FALSE,
  GuidesOff = TRUE
)
PlotFit(
  list,
  Matrix_to_plot,
  OrderBy = "Default",
  addDots = TRUE,
  title = NULL,
  Combine = "+",
  RemoveTP = FALSE,
  GuidesOff = TRUE
)

Arguments

`list`	A list-format network (output from xxx)
`Matrix_to_plot`	Which matrix / matrices to plot. One or more of 'C_def','C', 'M', 'B', 'SBM'
`OrderBy`	How to order the plot. One of 'Default','Degree','Manual', 'LatentTrait','SBM', 'AsPerMatrix'
`addDots`	Should dots be added to show observations. TRUE, FALSE or 'Size', to plot by interaction strength
`title`	A title. By default it will use the value of Matrix_to_plot
`Combine`	How should multiple matrices be combined. Either '+' which averages them (default), or '*' which multiples
`RemoveTP`	Should true positives be set to NA in order to highlight differences in predictions. Default is FALSE
`GuidesOff`	Should the legends be switched off. Defaults to TRUE

Details

See the vignette for a more through description and examples.

Value

A ggplot object, which by default will print to the device, but can be added to make further tweaks

Examples


## Not run: 
data(Safariland, package = 'bipartite')
Predictions<- PredictLinks(Safariland)
PlotFit(Predictions, Matrix_to_plot = 'SBM')

## End(Not run)




## Not run: 
data(Safariland, package = 'bipartite')
Predictions<- PredictLinks(Safariland)
PlotFit(Predictions, Matrix_to_plot = 'SBM')

## End(Not run)

Plot Metric Response To Network Rarefaction

Description

Used to plot the output from RarefyNetwork(). See vignette!

Usage

PlotRarefaction(df)
PlotRarefaction(df)

Arguments

`df`	A data frame produced by RarefyNetwork

Value

A ggplot

Examples

 data(Safariland, package = 'bipartite')
 X<-RarefyNetwork(Safariland, n_per_level = 100)
 ComputeCI(X)

data(Safariland, package = 'bipartite')
 X<-RarefyNetwork(Safariland, n_per_level = 100)
 ComputeCI(X)

Generates a network list from a food web and fits all network models

Description

First calls CreateListObject to convert a matrix suitable for the bipartite package into a list structure.

Usage

PredictLinks(web, RepeatModels = 10)
PredictLinks(web, RepeatModels = 10)

Arguments

`web`	in format specified by the bipartite package. Rows = focal layer, columns = response layer
`RepeatModels`	How many times to fit each model from different starting points. Uses best half (rounding up)

Details

Then it calls FitAllModels to fit each of the missing link models in turn.

Value

A network list including a large number of outputs.

Examples

## Not run: 
data(Safariland, package = 'bipartite')
PredictLinks(Safariland)

## End(Not run)






## Not run: 
data(Safariland, package = 'bipartite')
PredictLinks(Safariland)

## End(Not run)

Recalculate Network Metrics With Rarefied Webs

Description

Resamples empirical network observations at a range of sampling levels and calls networklevel() function from bipartite package to calculate network metrics.

Usage

RarefyNetwork(
  web,
  n_per_level = 1000,
  frac_sample_levels = seq(0.2, 1, l = 5),
  abs_sample_levels = NULL,
  metrics = "info",
  PARALLEL = FALSE,
  cores = 2,
  output = "df",
  ...
)
RarefyNetwork(
  web,
  n_per_level = 1000,
  frac_sample_levels = seq(0.2, 1, l = 5),
  abs_sample_levels = NULL,
  metrics = "info",
  PARALLEL = FALSE,
  cores = 2,
  output = "df",
  ...
)

Arguments

`web`	A matrix format web, as for `bipartite`
`n_per_level`	How many samples to take per sample level. Default is 1000.
`frac_sample_levels`	Sequence of fractions of original sample size to resample at.
`abs_sample_levels`	If supplied, vector of absolute sample sizes to use to override `frac_sample_levels`. Default = NULL
`metrics`	vector of metrics to calculate. Will be passed to `index` of `networklevel()`. Default = 'info'
`PARALLEL`	Logical. If TRUE, will use parallel package to speed up metric calculation. Default = FALSE
`cores`	If using parallel, how man cores to use. Default = 2
`output`	String specifying output. If 'plot' will return a ggplot facetted by metric using `PlotRarefaction()`. If 'CI' will return a data frame (using `ComputeCI()` containing 5 columns: Metric, LowerCI, UpperCI, Mean, SampleSize. Otherwise will return a data frame of the raw recalculated metrics, with a separate column for each metric, and the last column specifying the resample size.
`...`	Additional arguments to pass to `networklevel`. e.g. empty.web=FALSE

Details

Can return either a data frame of raw metrics, a ggplot or a data frame of 'confidence intervals'.

These CI are calculated from the set of resamples by ordering the network values and taking the value of the metric ranked at the 5th and 95th percentile. (this method is very similar to that employed by Casas et al. 2018 Assessing sampling sufficiency of network metrics using bootstrap Ecological Complexity 36:268-275.)

Note that confidence intervals for many metrics, particularly qualitative ones, will be biased by the issue of false-negatives. Resampling of observations will not introduce missing links.

By default the size of resamples are taken to be proportional to the original sample size. Original sample size is defined as the sum of the supplied web. If a specific set of sample sizes is wanted, use abs_sample_levels

It is possible to extrapolate how increases sample size may lead to increased confidence in a metric too. Set the sequence to frac_sample_levels to go beyond 1.

Value

Either a dataframe or a ggplot object. See details.

Examples

data(Safariland, package = 'bipartite')
RarefyNetwork(Safariland, n_per_level = 100)
data(Safariland, package = 'bipartite')
RarefyNetwork(Safariland, n_per_level = 100)

Adds a dataframe that defines each interaction as true positive, false negative or true negative

Description

Adds a dataframe that defines each interaction as true positive, false negative or true negative

Usage

SortResponseCategory(list)
SortResponseCategory(list)

Arguments

list

Network list

Value

A Network list object with ObsSuccess, a dataframe detailing all the interactions and whether they are True Positives, False Negative or True Negatives

Test the models by AUC

Description

The function assumes FitAllModels() has already been run. It is a wrapper for 'SortResponseCategory()' and 'TestAUC()'

Usage

TestAllModels(list)
TestAllModels(list)

Arguments

list

A network list

Value

the network list with added AUC data. Key values are 'AUC', a dataframe with the AUC of each model and many combinations.

Test via AUC the predictive capacity of each model or combination of models

Description

Test via AUC the predictive capacity of each model or combination of models

Usage

TestAUC(list)
TestAUC(list)

Arguments

list

Network List

Value

a list with 'DataforAUC', a data frame with each interaction as a row and the predictions of each model, and 'AUC', a data frame with the predictive capacity of all the models and many combinations

Package 'cassandRa'

Help Index

Estimated probabilities of missing links based on the host Level Coverage Deficit

Description

Usage

Arguments

Value

Compute Basic Confidence Intervals

Description

Usage

Arguments

Value

Examples

Coverage Estimator, using Chao1 Index, Turing-Good or Binomial depending on what is possible

Description

Usage

Arguments

Details

Value

Generates a network list from a food web

Description

Usage

Arguments

Value

Examples

Fit all the models

Description

Usage

Arguments

Value

Fit Matching-Centrality Model

Description

Usage

Arguments

Value

References

Fit Centrality Model

Description

Usage

Arguments

Value

Fit Latent Trait (Matching Model)

Description

Usage

Arguments

Details

Value

Fit SBM Model

Description

Usage

Arguments

Value

Make an artificial bipartite networks with some properties of ecological networks, then sample from it

Description

Usage

Arguments

Details

Value

Examples

Custom optimiser function for SBM models

Description

Usage

Arguments

Details

Value

Optimiser wrapper for network models

Description

Usage

Arguments

Value

Plot the fitted network models

Description

Usage

Arguments

Details

Value

Examples

Plot Metric Response To Network Rarefaction

Description

Usage