# Spartan

Successful integration of computer simulation with real world experimentation requires the relationship between simulation and the real-world system to be established. Spartan, described in our 2013 paper in PLoS Computational Biology, is a package of statistical techniques specifically designed to understand this relationship and provide novel biological insight. These techniques help reveal the influence that pathways and components have on simulation behaviour, offering valuable biological insight into aspects of the system under study.

Spartan is open source, implemented within the R statistical environment, and freely available from both the Comprehensive R Archive Network (CRAN) and on this page below. Use of the package is demonstrated via the tutorial published in the R Journal. Example simulation data for each technique described are available in the tabs below.

.

.
.

.
.

#### Techniques and Tutorial Example Data

.

Consistency analysis operates by contrasting distributions of responses from stochastic simulations, all generated using the same fixed set of parameter values and containing identical numbers of simulation samples. By varying the number of samples comprising the distributions, the analysis determines the number required to obtain statistically consistent distributions, where the response can be attributed to the parameter values and not affected by randomness in the simulation.

Tutorial Simulation Data

The robustness of a simulation response to parameter alteration can be determined through the use of this approach. A set of parameters of interest are identi ed, and a range of potential values each parameter could lie within is assigned. The technique examines the sensitivity to a change in one parameter. Thus, the value of each is perturbed independently, with all other parameters remaining at their calibrated value, and responses compared to determine the impact of parameter value change.

Tutorial Simulation Data

Though Robustness Analysis elucidates the effects of perturbations of one parameter, it cannot show any non-linear e ects which occur when two or more are adjusted simultaneously. A Global Sensitivity Analysis technique is needed to identify such effects, and to give an indication of the parameters which have the greatest influence on the simulation output. Using this technique, a subset of parameter values are perturbed simultaneously, allowing correlations between parameter value and response to be calculated for each parameter.

Tutorial Simulation Data

Similarly to latin-hypercube sampling, the eFAST technique perturbs a selection of parameter values simultaneously. Parameter sampling and response analysis is conducted using the eFAST Approach (extended Fourier Amplitude Sampling Test). Values for each parameter are chosen using fourier frequency curves through a parameters potential range of values. A selected number of values are selected from points along the curve. Though all parameters are perturbed simultaneously, the method does focus on one parameter of interest in turn, by giving this a very different sampling frequency to that assigned to the other parameters. As this is the case, this technique is fairly complex, and we would recommend that those applying the technique study the references in the tutorial.

Tutorial Simulation Data

From Version 1.3, we have added the functionality to combine the statistical analysis available in Spartan with simulations generated in Netlogo.

This integrates Netlogo’s parameter sweep function, Behavior Space, with an extended version of Spartan, enabling local and global sensitivity analyses to be performed on Netlogo models. With the addition of SPARTAN, the researcher can automatically create Netlogo experiment files for both local (individual parameter) and global (latin-hypercube and Fourier frequency) analyses, run these experiments in Netlogo, and receive detailed statistical information on the influence a parameter has on simulation response: vital information for translating a simulation result to a hypothesis grounded in the system being studied. The tutorial on using this technique utilises a slightly modified version of the Virus transmission and perpetuation model that is available in the Netlogo model library (available for download below).

The tutorial can be found in the list of vignettes above, and uses the following downloads:
Sample Data for Netlogo Robustness Analysis (contains example R scripts)
Sample Data for Netlogo Latin-Hypercube Analysis (contains example R scripts)
Sample Data for Netlogo eFAST Analysis (zip , 34,191kb)

Spartan 3.0 utilises five machine learning algorithms (neural network, random forest, gaussian process model, general linear model, and support vector machine) to generate emulations of a simulator from a set of simulation results generated using latin-hypercube sampling, and from those emulations an ensemble. This supports work in our paper that is currently under review, where we show that emulation can support the engineering and analysis of simulations of biological systems, by mitigating the time and resource constraints associated with resource intensive simulations and statistical analysis. The package includes means of creating and assessing the performance of emulations and ensembles, as well as wrappers that enable an emulator to be used within a sensitivity analysis routine, within routines to perform Approximate Bayesian Computation, and Multi Objective Evolutionary Computation routines to establish parameter values (see next tab)

Spartan Tutorial Files for Emulation (zip , 3,545kb)

The application of Approximate Bayesian Computation, to deduce the posterior distribution of the parameters, could provide an increased understanding of simulation behaviour. The EasyABC package in R permits the performance of ABC techniques, and is well supported and documented. In running any of the numerous techniques in EasyABC, one has to provide a model, in which generated parameter value sets are processed and output responses generated. In order to make the ensemble models produced by spartan compatible with EasyABC, we have provided a wrapper which acts as the model input to the EasyABC call, which in turn produces predictions using the ensemble, normalising the EasyABC generated parameter set first if need be then rescaling the predictions to their original scale.

Instructions on using the ABC technique in Spartan can be found in Technique 9 of the vignette

To determine emulator inputs that correspond to a desired output configuration, spartan includes the non-dominated sorting genetic algorithm II (nsgaII), a multi-objective evolutionary algorithm. In this heuristic scheme a solution is called non-dominated, Pareto optimal, Pareto efficient or non-inferior, if none of the objective functions can be improved in value without degrading some of the other objective values. If the Pareto front comprises more members than the population size, a subset composed of those Pareto members having the largest fitness differences between their immediate neighbours summed for all objectives is selected. If the Pareto front comprises fewer members than the population size then members of the next front (those dominated by only one other solution) are selected in the same manner, and so on until the entire population has been selected. New solutions are generated through crossover of parents with mutation.

Each candidate solution is assessed by a user defined fitness function, which nsga2 seeks to minimise. For the case study described in the vignette and in our upcoming paper, we use this exemplar function: Example Fitness Function for Technique 10 ( 1kb download)

Version 2.3 adds functionality to compare the results of spartan analyses at selected simulation timepoints. This new functionality is described in a new paper currently in review.