DEBtool Toolbox: lib_pet

Routines for the add_my_pet collection. The aim is to
  • estimate DEB parameters from a collection of data sets
  • present goodness of fit and accuracy
  • evaluate implied properties

Preparation of an entry

Make sure that the path to DEBtool_M and AmPtool and their subdirectories has been set in Matlab. An AmP entry consists of a set of 4 files, 3 Matlab functions (mydata_my_pet.m, predict_my_pet.m, pars_init_my_pet.m) and a Matlab script (run_my_pet.m), where my_pet is replaced by the scientific name of your species with spaces replaced by _. The mydata-file sets data and references, the pars_init-file specifies the model type and initial parameter values, the predict-file computes expectations for the data, given parameter values and the run-file runs the parameter estimation procedure. The notation in these files follows the DEB notation rules. The 4 files are now discussed in some more detail:
mydata
Copy-and-rename template-file DEBtool_M/lib/pet/mydata_my_pet.m in your current directory, and have a look at mydata-files for related species for editing inspiration. Temperatures have to be specified in Kelvin, while function C2K converts Celsius to Kelvin. Since Celsius is more intuitive, all entries use this function. Bibkeys follow the structure: max 4 characters for the first author, max 4 for the second one (if existing), 4 for the year.

Consult Wikipedia for taxonomic info, and AmPeco for available eco-codes. The fields data_0 and data_1 refer to the zero- and uni-variate data sets that are specified in the mydata-file. See the AmP Estimation page for allowed codes; the data sets themselves don't need to have these names, the concepts matter. E.g., while data-code ap, for age at puberty, is in the list, our advice is to work, instead, with tp for time since birth at puberty, to avoid that uncertainty about age at birth (due to uncertainty about the start of development), affects the results too much. COMPLETE quantifies the level of data-completeness.

The zero-variate data (in sequence of the units) are filled first, then the uni-variate data. Contrary to the above-mentioned fields, the data-fields can be entry-specific, but all data must have units, label, bibkey and, if time is in the units, also temp, units.temp, label.temp. The comment-field is optional but we stimulate frequent use, e.g. with remarks about accuracy. Remove empty fields.

The weights only need to be changed if more or less emphasis in terms of goodness of fit is desired. Avoid large numbers (weights multipliers by factor 3 of 5 already have a big effect); value zero means that the predictions for that dataset do not contribute to the loss-function, so don't effect parameter estimation. The weights for the pseudo-data generally do not require editing; the default values are chosen such that pseudo-data hardly affects parameter estimation in the case that real data determines parameters well.

Plots with the same labels and units can be combined into one plot, by assigning datasets to a group and setting a caption. Please observe that AmP follows the lava-colour scheme from high to low: white, via red and blue, to black. Female data first, followed male data, whith the consequence that female data is plotted in red, male data in blue. For data at several temperatures: high temperatures first; the same for food levels. The presentation of results can (optionally) be customized with function custom_results_my_pet. A detailed account of the method is presented in theat page AmP Estimation.

Discussion points can have optional references, set by metaData.bibkey.D1 = bibkey;, with bibkey replaced by the correct key. Discussion points are required if your entry is a modification of an existing one. Please explain what is new and why. Facts must have references.

The links to websites, as used in the species-toolbar, depend on the species; general websites are allways included (if they have the species), but the rest is taxon-specific (fishes, molluscs, etc). See existing entries.

Acknowledgements are optional; you can here refer e.g. to grant numbers.

References follow the rules of BibTex, and this program is actually run to produce the web-presentation. Make sure that all data has a reference. If you measured the data yourself, please select type "Mics" and mention this in field "note". Do something similar for personal communications. Please notice that the type of bib-entry is not a free choice; a limited number of types are allowed by Bibtex.

predict
Copy-and-edit a predict-file from a related species. The species-list shows the data-types for all entries, which can be used to find predict files of related species that specify additional predictions for your data. Notice that all assigned variables have a comment that starts with a specification of its units. Notice also that predictions can depend on the model-type, as specified in the pars_init-file. The model-type is, however, not a free choice: related species have the same model-type. Make sure that all data specified in the mydata-file are predicted, but no more.

The predict-file starts with extracting all parameters, as set in the pars_init-file, and all data, as set in the mydata-file. Matlab function parscomp_st is run to compute frequently used (simple) functions of the parameters, called compound-parameter, which do not depend on food level.

Then temperature correction factors, TC, are computed for all data sets. All parameters are specified at the reference temperature of 20 C, and character T is added to the name of the variable to indicate that it is temperature-corrected. This has to be done for all variables that have time in their dimensions.

The predictions start with a number of scaled times, scaled lengths and scaled rates that concern the full life cycle for later use, followed by required expectations for zero-variate data following the life-cycle, which are outputted to structure prdData. Notice that the sequence reflect dimensions of variables. This is not essential, but convenient for checking.

The final step is the specification of all required expectations of uni-variate data-sets; notice how the first column is used to modify expectations for the second one. Auxiliary data can be used for a similar purpose.

Most entries assume data-specific constant temperatures and/or food levels. Quite a few entries, however, work with time-varying temperatures and/or food levels. In that case predictions must be based on specifications in terms of ordinary differential equations (ode's), which must be integrated with one of Matlab's ode-solvers. Entries that integrate ode's can be located with AmPtool function select_predict, by searching for the string "ode" with code [species, nm] = select_predict('ode'). Some entries even reconstruct temperature and/or food level trajectories from other data. Search the data-base for examples, see the AmPtool manual for doing this search efficiently.

pars_init
Copy-and-edit a pars_init-file from a related species. The specification of the model-type and the reference temperature are then already correct. Even the parameter values rarely need modification, unless the ultimate body sizes are very different. The setting of free determines which parameters will be estimated and which are fixed. The first block of parameters is obligatory for the chosen model type, the second block is ad-hoc for the current entry, and can be extended.
run
Copy-and-edit a run-file from any species. The run-file starts with setting a global variable for the species that will be estimated and with checking the entry for consistency.

Depending on option settings, results will be printed to screen and/or a .mat file and/or .html file. This .mat file can be read with printmat('my_pet').

First make sure that estim_options('results_output', 3); is selected, to save the .mat file and report the result in your browser. Then out-comment estim_options('method', 'no'); by placing % in front of it. You start parameter estimation from the initial setting as specified in pars_init_my_pet with estim_options('pars_init_method', 2); using a rather small number of maximum iteration steps (we frequently use 500). Then continuate estimation with estim_options('pars_init_method', 1);, in which case the initial parameter values are taken from results_my_pet.mat, which was created in the previous run. Then continuate just by repeating the last comment (arrow-up + enter) till convergence, or lack of further progress. The significance of this sequence is that during the estimation iteration, the simplex shrinks, and a re-start restores the original size. In this way you reduce (but not eliminate) the probability of arriving at a local minimum of the loss function that is not global.

You can reduce or enhance the effect of particular data sets/points by changing weight coefficients in the mydata-file. If you are satified by the results use mat2pars_init to copy the results from the .mat file into the pars_init-file. Then activate the line estim_options('method', 'no'); again, and select estim_options('results_output', 4); to see the implied properties in your browser, or estim_options('results_output', 5); for comparison with related species. If all seems fine, your entry is ready for submission.

For curators only

The core code estim_pars is a macro around regression function petregr_f (with filters or petregr without; model-specific filters prevent the estimation process to sample outside the allowed parameter space); options can be set with estim_options. Fix or release settings of parameters and chemical parameters are always taken from pars_init_my_pet; the parameters values might be set by get_pars if estim_options('pars_init_method', 0) (and parameters are free, not fixed) or are set by results_my_pet.mat if estim_options('pars_init_method', 1). The function matisinit can be used to check if the values in results_my_pet.mat equal those in pars_init_my_pet. If so, the .mat file was not produced via estim_pars and method-option 0 was used in combination with output-option 1 or 2.
filters
This regression function uses filters for the various models, such as filter_std, while warnings are specified by e.g. warning_std. Filter holds are reported by print_filterflag in estim_pars. Customized filters can be build into the predict-file, directly after unpacking of inputs, by conditionally emptying output Prd_data, setting info = 0 and return.
weights/errors
Weight coefficients are set by setweights. Relative errors are computed by mre_st; These means are about the absolute values of the relative errrors.
pseudodata
Pseudodata is data for (simple functions of) parameter values, while data and their predictions still might differ. They are used to avoid unrealistic values for poorly determined parameters. Pseudodata is added with addpseudodata, removed by rmpseudodata and predicted by predict_pseudodata.
automized initial estimates
Automized initial parameters estimates are generated with get_pars, which is a macro around get_pars_2 till 9. Specific density of biomass is set by get_d_V on the basis of taxonomic relationship.
several species
The code allows for parameter estimation of several species simultaneously. The function mydata_pets catenates mydata files, predict_pets catenates predict files, and result_pets does the same with results files.