Toolbox: lib/regr

Functions for the identification of parameters of user-defined models are parked in this domain. They also quantify the uncertainty in parameter values and show the goodness of fit in graphs. All functions use a Maximum Likelihood criterion for parameter estimation; the underlying assumptions, the algorithm and the number of variables are indicated in the names of the routines.

The string regr in the function names stands for "regression", and assumes a constant normally distributed scatter around a deterministic user-defined model. The script-files mydata_regr and mydata_regr2 show examples of applications of the regression routines.

The string surv in the function names stands for "survivor", and assumes a multinomially distributed number of surviving individuals. The first independent variable always has the interpretation "time". The script-files mydata_surv and mydata_surv2 show examples of applications of the survivor routines.

The string vc in the function name (nrvcregr, nmvcregr, nmvcregr2, pvcregr, mrevc) stands for "variation coefficient". These functions assume that the scatter is (independently) normally distributed with a variation coefficient (i.e. the ratio of the standard deviation and the mean) that is constant.

The following table gives a survey of available parameter-identification routines:
general data survival data
algorithm 1 var 2 vars 1 var 2 vars 3 vars
newton raphson nrregr nrvcregr nrregr2 scsurv scsurv2 scsurv3
nelder mead nmregr nmvcregr nmregr2 nmvcregr2 nmsurv nmsurv2 nmsurv3
genetic algorithm garegr garegr2 gasurv gasurv2 gasurv3
par-covariances pregr pvcregr pregr2 psurv psurv2 psurv3
goodness of fit ssq ssq2 dev dev2 dev3
show routines shregr shregr2 shsurv shsurv2
example script mydata_regr mydata_regr2 mydata_surv mydata_surv2 mydata_surv3

The uni-variate routines can handle multiple samples, each described by its own model, which allows for the implementation of fuzzy constraints on parameter values. Particular parameters of DEB models occur in respiration as well as in growth data; the maximum throughput rate in a chemostat is frequently known, knowledge that can be used in fitting substrate and biomass data by including this function of parameter values as a new data set.

The Newton Raphson method converges typically fast for models that are sufficiently smooth, but it has a small domain of attraction; the initial parameter estimate must be chosen within that domain for convergion. The jacobian (i.e. the matrix of derivatives of the loss function with respect to the parameters) is obtained numerially using a fixed (small) step in the parameter values. The "best" parameter values are typically rather accurate. The method of scores that is used for survival data, is a Newton Raphson method, applied to the expected values, rather than observed one.

The Nelder Mead method converges much slower, but has a much larger domain of attraction. Since it does not use derivatives it typically works well on models that are not smooth. An initial parameter estimate must still be specified in the domain of attraction to the "best" value. The "best" parameter values are typically less accurate, and it is a good idea to run a Newton Raphson method on the result of a Nelder Mead method.

The Genetic Algorithm converges slowly, and uses a stochastic rather than a deterministic algorithm to arrive at the "best" value. It does not require an initial parameter estimate, just a range in which the "best" value must be. The end result is less sensitive to local minima. It stops if a fixed number of steps does not lead to further improvement (which does not garantee that no improvement is possible). Since genetic algorithms maximize functions, the method is run on minus the weighted sum of squared deviations. It is a good idea to run a Newton Raphson method on the result of a genetic algorithm.

All methods suffer from the problem that the result can represent a local rather than a global minimum (in decreasing sensitivity of the three methods). It is always a good idea to check the result graphically (using the "show" routines that start with the string sh, and to use common sense (both for checking the goodness of fit and the parameter values). Since the user-interface of the methods is rather similar, only the instructions for the Newton Raphson method are more detailed.s


shregr

Plots model predictions and data(optional). Plot options can be set with routine shregr_options.

Input as in nrregr. The plot-range can set optionally in a vector of length 2, with the lower and upper boundaries. The columns >1 in the parameter matrix are ignored. If column 2 in the data matrix is not present, model predictions are plotted only. The colums >2 in the data matrices are ignored.

Example of use, assuming that function_name, pars, and data xy are defined properly: shregr('function_name', pars, xy), or if the user-defined function codes for two data sets, for instance shregr('function_name', pars, xy1, xy2).


shregr_options

Sets options for plotting routines shregr and shsurv. This routine sets default values which you can overwrite, once you have run shregr.

Input:

or

nrregr2

Finds parameter values in (non-linear) weighted least-squares regression problems like nrregr, but for 2 independent variables, rather than a single one. So this routine fits a surface, no a curve. See nmregr2 for the simplex method, and garegr2 for a genetic algorithm in 2 independent variables.

Options in the numerical procedure can be set by nrregr_options.

Input:

Output:

The iteration is terminated if the norm, i.e. the sum of squared derivetives of sum of squared deviations with respect to the iterated parameters, is less than the maximum norm or if the number of iterations exceeds a maximum values (see nrregr_options).

For advices see nrregr. There is no need to use the values of the second dependent variable in the user defined function. This allows the implementation of a multi-variable function in compact form, where all clumns are treated as dependent variables that have the same values for the single independent variable. Use nrregr, if the dependent variables have different values for the independent variable. Both matrices for the independent variables can have more than one column, to pass information about other variables to the user-defined function. See cnplim for an application.

Example of use, assuming that 'xvalues', 'yvalues', 'zdata', 'weights', function 'function_name' and initial paramer estimates 'ipars' are properly defined: pars = nrregr2('function_name', ipars, xvalues, yvalues, zdata, weights). See script file mydata_regr2.m for an example of specification. If progression seems hopeful, but the number of iterations not large enough, you can continue with pars = nrregr2('function_name', pars, xvalues, yvalues, zdata). Alternatively you can increase the maximum number of iterations with nrregr_options.


nmregr2

Finds parameter values in (non-linear) weighted least-squares regression problems using Nelder Mead's simplex method. The convergence is usually slow, but the domain of attraction is rather large. See nrregr2 for the definition of the user-defined function and nmregr for one independent variable with multiple samples.

Options in the numerical procedure can be set by nmregr_options.

It is usually a good idea to run nrregr2 on the result of nmregr2.


nmvcregr2

Finds parameter values in Maximum Likelihood regression problems using Nead Melder's simplex method. The standard deviation is here taken to be proportional to the mean (unlike nmregr2, where the standard deviation is assumed to be constant).

Options in the numerical procedure can be set by nmregr_options.


garegr2

Finds parameter values in (non-linear) weighted least-squares regression problems using a genetic algorithm. The (stochastic) convergence is usually slow, but it does not require an initial estimate, only a range. The process starts from an initial set of (randomly generated) trials. See nrregr2 for the definition of the user-defined function and garegr for one independent variable with multiple samples.

Options in the numerical procedure can be set by garegr_options.

It is usually a good idea to run nrregr2 on the result of garegr2.


pregr2

Calculates covariance matrix and standard deviations of parameters in regression models, like pregr, but for 2 independent variables.

Input as in nrregr2.

Output (4 items):

The elements in the covariance and correlation matrices equal zero for parameters that have code 0 in the second row of the parameter input matrix. The values are the maximum likelihood estimates in the case of a identially normally distributed scatter distribution. Therefore, no corrections for bias are made.

Example of use, assuming that function_name, pars, xvalues, yvalues, zdata, and weights are defined properly: [cov, cor, sd, ss] = pregr2('function_name', pars, xvalues, yvalues, zdata, weights) .


ssq2

Calculates weighted sum of squared deviations from model predictions.

Input as in nrregr2.

Output:

Example of use, assuming that function_name, pars, xvalues, yvalues, zdata, and weights are defined properly: ssq2('function_name', pars, xvalues, yvalues, zdata, weights).


shregr2

Plots model predictions and data (optionally). Options can be set by shregr2_options. It allows you to plot a response surface in 3 dimensions, together with your data, or sets of xz-curves or yz-curves. The data are projected on the surface, and on the curves, to show the difference between the data and model predictions. These projections help to reveal which data-point relates to which curve.

Inputs as in nrregr2; the data matrix is optional. The ranges of the two independent variables can be set optionally in two 2-vectors with upper and lower values.

Example of use, assuming that function_name, pars, xvalues, yvalues, and zdata are defined properly: shregr2('function_name', pars, xvalues, yvalues, zdata) or shregr2 ('function_name', pars, xvalues, yvalues).


shregr2_options

Sets options for plotting routine shregr2. This routine sets default values which you can overwrite, once you have run shregr2.

Input:

or

scsurv

Finds maximum likelihood estimates from survival data using the method of scores, with numerically obtained values for the jacobian. It can deal with an arbitrary number of samples, which might share one or more parameters. The convergence is usually fast, but the domain of attraction can be small, depending on data and model. See nmsurv for simplex method, and scsurv2 and nmsurv2 for 2 independent variables.

Options in the numerical procedure can be set by scsurv_options.

Input:

Output:

The iteration is terminated if the norm, i.e. the sum of squared derivetives of the deviance with respect to the iterated parameters, is less than the maximum norm or if the number of iterations exceeds a maximum values (see scsurv_options).

Example of use, assuming that data and function function_name and initial paramer estimates ipars are properly defined: pars = scsurv('function_name', ipars, data) or pars = scsurv('function_name', ipars, data1, data2) (or more data sets, depending on the definition of the model functions.). See script file mydata_surv.m for an example of specification. If progression seems hopeful, but the number of iterations not large enough, you can continue with pars = scsurv('function_name', pars, data). Alternatively you can increase the maximum number of iterations with scsurv_options.


scsurv_options

Sets options for parameter finding routine scsurv and scsurv2 and scsurv3. These routines set default value which you can overwrite. See nrregr_options for further explanation.

nmsurv

Finds maximum likelihood estimates from survival data using Nelder Mead's simplex method. The convergence is usually slow, but the domain of attraction is rather large. See scsurv for the definition of the user-defined function, and scsurv2 and nmsurv2 for 2 independent variables and scsurv3 and nmsurv3 for 3 independent variables.

Options in the numerical procedure can be set by nmsurv_options.

It is usually a good idea to run scsurv on the result of nmsurv.


nmsurv_options

Sets options for parameter finding routine nmsurv, nmsurv2 and nmsurv3. These routines set default value which you can overwrite. See nmregr_options for further explanation.

gasurv

Finds maximum likelihood estimates from survival data using a genetic algorithm. The (stochastic) convergence is usually slow, but it does not require an initial estimate, only a range. The process starts from an initial set of (randomly generated) trials. See scsurv for the definition of the user-defined function, and scsurv2, nmsurv2 and gasurv2 for 2 independent variables and scsurv3, nmsurv3 and gasurv3 for 3 independent variables. See garegr for a description of the outputs.

Options in the numerical procedure can be set by gasurv_options.

It is usually a good idea to run scsurv on the result of gasurv.


gasurv_options

Sets options for parameter finding routine gasurv, gasurv2 and gasurv3. These routines set default value which you can overwrite. See garegr_options for further explanation.

psurv

Calculates covariance matrix and standard deviations of parameters in survivor models.

Input: user-defined function name, data matrix, and parameter values, as in scsurv.

Output:

The elements in the covariance and correlation matrices equal zero for parameters that have code 0 in the second row of the parameter input matrix. The values are the maximum likelihood estimates, therefore, no corrections for bias are made.

Example of use, assuming that function_name, pars, and tn1 (and possibly more data matrices) are defined properly: [cov, cor, sd, ss] = psurv('function_name', pars, tn1, tn2, ...).


dev

Calculates the deviance: two times the difference between the log likelihood function and its maximum (which based on the multinomial distribution where the death probabilities correspond with the observed relative frequencies). The maximum likelihood estimates minimize the deviance. See dev2 for the surface-equivalent.

Input: user-defined function name, data matrix, and parameter values, as in scsurv.

Output: scalar with deviance. Identical with 4th output of psurv.

Example of use, assuming that function_name, pars, and tn1 (and possibly more data matrices) are defined properly: dev('function_name', pars, tn1, tn2, ...).


plsurv

Calculates the profile likelihood function for a parameter that is indicated with code 2 in the second column of the parameter matrix. The other parameters with positive codes are ml-estimated using the simplex method for each evaluation of the target parameter. See plsurv2 for response surfaces.

Input: user-defined function name, parameter values, and one or more data matrices as in scsurv. The last argument is a 2 or 3 -vector with the lower and upper boundaries of the indicated parameter. The value of that parameter in the parameter matrix must be within the range. The optional element 3 specifies the numer of parameter evaluations; default: 100.

Output:

Example of use, assuming that function_name, pars, tn1 (and possibly more data matrices) and Range are defined properly: plsurv('function_name', pars, tn1, tn2, ..., Range).


shsurv

Plots model predictions and data (optional). Plot options can be set with routine shsurv_options.

Input: user-defined function name, parameter values, and one or more data matrices as in scsurv. The plot-range can set optionally in a vector of length 2, with the lower and upper boundaries. The columns >1 in the parameter matrix are ignored. If column 2 in the data matrix is not present, model predictions are plotted only. The colums >2 in the data matrices are ignored.

Example of use, assuming that function_name, pars, and data tn are defined properly: shsurv('function_name', pars, tn), or if the user-defined function codes for two data sets, for instance shsurv('function_name', pars, tn1, tn2).

Set options for plotting routine shsurv with shregr_options.


scsurv2

Finds maximum likelihood estimates from survivor data like scsurv, but for an additional independent variable, rather than time only. So this routine fits a surface, no a curve.

Options in the numerical procedure can be set by scsurv_options.

Input:

Output:

The iteration is terminated if the norm, i.e. the sum of squared derivetives of the deviance with respect to the iterated parameters, is less than the maximum norm or if the number of iterations exceeds a maximum values (see scsurv_options).

Example of use, assuming that 'tvalues', 'yvalues', 'numbers', function 'function_name' and initial paramer estimates 'ipars' are properly defined: pars = scsurv2('function_name', ipars, tvalues, yvalues, numbers). See sample file 'mydata_surv2.m' for an example of specification. If progression seems hopeful, but the number of iterations not large enough, you can continue with pars = scsurv2('function_name', pars, tvalues, yvalues, numbers). Alternatively you can increase the maximum number of iterations with scsurv_options.


nmsurv2

Finds maximum likelihood estimates from survival data using Nelder Mead's simplex method. The convergence is usually slow, but the domain of attraction is rather large. See scsurv2 for the definition of the user-defined function, and scsurv and nmsurv for one independent variable (time) with multiple samples.

Options in the numerical procedure can be set by nmsurv_options.

It is usually a good idea to run scsurv2 on the result of nmsurv2.


gasurv2

Finds maximum likelihood estimates from survival data using a genetic algorithm. The (stochastic) convergence is usually slow, but it does not require an initial estimate, only a range. The process starts from an initial set of (randomly generated) trials. See scsurv2 for the definition of the user-defined function, and scsurv, nmscsurv and gasurv for one independent variable (time) with multiple samples.

Options in the numerical procedure can be set by gasurv_options.

It is usually a good idea to run scsurv2 on the result of gasurv2.


psurv2

Calculates covariance matrix and standard deviations of parameters in survivor models, like psurv, but for 2 independent variables.

Input: user-defined function name, 2 vectors of independent variables, a matrix with surviving individuals, and parameter values, as in scsurv2.

Output:

The elements in the covariance and correlation matrices equal zero for parameters that have code 0 in the second row of the parameter input matrix. The values are the maximum likelihood estimates, therefore, no corrections for bias are made.

Example of use, assuming that function_name, pars, tvalues, yvalues, numbers, are defined properly: [cov, cor, sd, ss] = pregr2('function_name', pars, tvalues, yvalues, numbers).


dev2

Calculates deviance, see dev.

Input: user-defined function name, 2 vectors of independent variables, a matrix with surviving individuals, and parameter values, as in scsurv2.

Output: scalar with the deviance. Identical with 4th output of psurv2.

Example of use, assuming that function_name, pars, tvalues, yvalues, numbers are defined properly: dev2('function_name', pars, tvalues, yvalues, numbers).


plsurv2

Calculates the profile likelihood function for a parameter that is indicated with code 2 in the second column of the parameter matrix. The other parameters with positive codes are ml-estimated using the simplex method for each evaluation of the target parameter. See plsurv for the multiple sample case.

Input: user-defined function name, parameter values, times, second dependent variable, data matrix as in scsurv2. The last argument is a 2 or 3 -vector with the lower and upper boundaries of the indicated parameter. The value of that parameter in the parameter matrix must be within the range. The optional element 3 specifies the numer of parameter evaluations; default: 100.

Output:

Example of use, assuming that function_name, pars (parameter matrix), t (time-vector), y (vector with second dependent variable), tny (matrix with numbers of survivors) and Range (2-vector with lower and upper boundary for target parameter) are defined properly: plsurv('function_name', pars, t, y, tny, Range).


Back to the DEBtool page