AmPeps: AmP Entry Prepare System

This page presents guidance for how to prepare an entry for Add-my-Pet (AmP). An AmP entry has 4 source files:
  1. mydata_my_pet.m: Matlab function that sets 5 Matlab structures (data, auxData, metaData, txtData, weights) from scratch
  2. pars_init_my_pet.m: Matlab function that sets 3 Matlab structures (par, metaPar, txtPar) from metaData
  3. predict_my_pet.m: Matlab function that sets 1 Matlab structure (prdData) and a boolean (info) from (par, data, auxData)
  4. run_my_pet.m: Matlab script that runs a checking procedure and the parameter estimation procedure.
AmPeps does not include the parameter estimation itself, only the procedure that prepares for it by creating the 4 source files, but not necessarily with the correct parameter values. The 4 files are all you need to start the estimation procedure, with the idea that this leads to the correct parameter values, starting for the initial values as specified in the pars_init file. If the estimation procedure has produced an acceptable result, you can copy these values back in the pars_init file with function mat2pars_init. Only then are the source files ready for submission to an AmP curator. After checking he/she will pass the entry to the uploader, who adds the species to the taxonomic tree of AmP and the entry to the collection. In this addition step, lots of implied traits are evaluated and reported as part of the collection on the web. A rather essential element is that the parameter values and implied traits are added to AmPdata, a large structure that can be downloaded from the site (see dropdown COLLECTION), and the traits of the entry compared with the many other AmP species with AmPtool.

AmPeps has 2 phases: the initiation phase and the post-editing phase.

AmPeps initiation phase

The initiation phase writes a proposal for the 4 AmP source files, using Matlab function AmPeps; these 4 files might still require post-editing. They are not written directly, but via structures data, auxData, metaData and txtData, which are produced by AmPgui. Start Matlab and first cd to a directory where you want to create the 4 source files. Make sure that the current directory has no files named "results_*". Type AmPeps in the Matlab command window; it first opens this page and the AmPeco-page for guidance, shows 4 figures that help in completing eco-codes and starts AmPgui.

AmPgui: the graphical user interfase of AmP

During the AmPgui session you will see various dialog-windows to edit; proceed by hitting the "OK" button and end the session via save/pauze. When running AmPgui, hitting windows on the screen to jump to the front is blocked, but you can start other processes from the Window's taskbar or start-button, e.g. to copy information for pasting in AmPgui. The main dialog window cannot be moved on the screen, but the others can. AmPgui allows that you take a pause (even followed by quitting Matlab) and later resume the session. This pause-action stores a results_my_pet.mat file in the current directory (with "my_pet" replaced by your species), and the resume-action loads it again, on the assumption that the current directory has no other "results_" files (since the name of your species is not known by AmPeps at resuming). You might use pause every now and then to update results_my_pet.mat for security reasons and just continue the AmPgui-session.

The main dialog of the AmPgui shows font colours for each topic, green means OK, red means that further editing is required, black means that editing is facultative. The topic "discussion" shows black at the start (discussion points are generally facultative), when there are no discussion points, but if you add a data set for males for instance, it can become red, since you are supposed to add a discussion point that explains in what parameters males are supposed to differ from females. AmPgui does not read the discussion points, so it does not know if one of the discussion points actually deals with this topic. It is important to realise that these colour indicators only reflect internal consistency, not e.g. if the data contain enough information to estimate the parameters, after AmPeps has done its job. You should specifiy at least one data set to proceed with AmPeps, but a single data set only allows the estmation of very few parameters, if any.

The "author"-field and all "bibkey"-fields allow multiple items: separate them using ",".

You can edit the various fields in any sequence, but the preferred sequence of completing fields is: start with "species" (for reasons explained below), then "links" (because the sites can help to know where the species occurs, which you need to complete "ecoCode") and edit "biblist" finally, after entring all "facts" and "data" (when all bibkeys are known).

species

AmPgui assumes that you start with filling the species-field. It first checks if your species is in the Catalog of Life website, if not, it opens this site for selection of the proper name. Copy the proper name of the species into the species-field of AmPgui. Then it checks if the species is already in AmP and if so, it assumes that you want to prepare a new modification and you are asked to exit AmPgui and proceed to the post-editing phase. Any data that has been filled in this AmPgui-session then becomes lost, which is why you better start with filling the name of the species. If the species-name is accepted and not yet present in AmP, AmPgui fills the taxonomic relationships and the common name (if present in CoL) automatically. This all might take a while; an OK button appears when the process is completed.

ecoCode

The naming of the codes is standardized and given on the page AmPeco, which is why this page was already opened by AmPeps. Since details matter, the filling of the codes is from selection lists; you can make multple selection by pressing ctrl. The codes for habitat and food require a stage-indicator; AmPqui will ask for it, please watch the Matlab window for communication. The fields climate and ecozone can only be filled when you know where the species occurs. This is the reason why the websites were opened; the wiki-page typically provides the required info.

grp

If your entry has several uni-variate data sets with identical x- and y-labels, they can be grouped in figures that compare data with predictions. The choice of the colors is from high to low in the lava-color scheme: from white to black, via red and blue. So choose first females, then males, or for different temperatures from high to low temperatures.

discussion

Discussion points relate to general remarks you might have about the entry; remarks that relate to particular data are supposed to be mentioned in the comment field for that data. Since you might feel the need for a discussion during the parameter estimation session (after AmPeps is completed), the proposed mydata-file by AmPeps with have at least one discussion field. Bibkeys for discussion points are facultative.

facts

Facts supposed to present valuable information from the literature about the species, which is why each fact needs to have a bibkey. If no facts are filled AmPeps with remove the facts-field after leavinf AmPgui, and you will not find it in the proposed mydata-file.

links

You are asked to complete identification-fields for a number of websites that relate to your species (so not the whole address). These fields are presented as part of the adress-field of the browser (the top-line), and are typically a species-name (watch the exact spelling, including "_" or "-") or a number. The AmP-supported websites classify as general or taxon-specific and are already opened by AmPgui (using the classification of your species), and the id's are described. So you only need to use the search-field on those pages to find the page of your species.

biblist

AmP uses the BibTex format in the section "biblist". Each item in a biblist is linked to a bibkey, and each data set is also linked to a bibkey. The naming of the bibkeys is standardized by taking max 4 characters of the first author, and max 4 again for the second author, if present, and not more than 2 authors, followed by 4 digits for the year. During the AmPgui session the bibitems in the biblist are structured, but this structure is flattened by AmPeps upon leaving AmPgui. You don't need to know this, but it is one of the reasons why you cannot copy a results_my_pet.mat file from AmP species in your current directory and hit "resume". Make sure that all data sets have a bibkey, and that all bibkey's have an associated bibitem in the references. The font-color in the biblist shows if some bibitems are still missing, also watch the screen-output in the Matlab-window.

BibTex converts titles of the bibtype "Article" to lowercase, except the firt letter. Project from this by placing curly brackets around letters, like "{L}ineus". Avoid nesting. Scientific names should be set in italics, like in "\emph{Passer domesticus}".

data

If you measured the data yourself in 2020, and your name is e.g. "Janssen", choose bibkey Jans2020, select bibtype "Mics" in editing biblist, fill in your name, year, and a note field where you specify that you measured the data yourself. If, on the contrary, particular data is required to estimate some parameters, but unknown, guess it, use bibkey "guess" and explain in a comment for that data set on what basis. If a data set concerns time (e.g. all rates), it requires auxData of the type "temp" (standing for temperature). The GUI presents a field for it. It might well be that the real temperature is unknown, so there is a need to guess it and to mention that in a comment for that dataset.

Data sets might have associated auxData (other than of the type temperature), required for the computation of the prediction of that dataset (for instance that the reproduction rate is for a particular length or weight, rather then for ultimate size). Since AmPeps writes a proposal for the predict-file (after leaving AmPgui), it is best to add auxData in the post-editing phase, since this is hard to standardize, and requires post-editing of the predict-file as well.

Most uni-variate data in AmP were extracted from graphs in pdf's, which were copied from the pdf using the Window's Snipping Tool (under Windows Accessories), which sets it on the clipboard, and this clipboard can be pasted in Jorn Bruggeman's PlotReader for digitalisation. (Select all data in the data-window, copy them with ctrl-c, select fixed-point, English, and 3 decimals, and paste them in the AmPgui window with crtl-v). Use the comment field for the data as much as possible, e.g. to enlarge on accuracy of the data.

COMPLETE

This indicator for the level of completeness of the data, a number between 0 and 10, has to be estimated based on the critaria that are listed in the window. Some data types of lower levels are typically not available, others might be guessed, so we subtract "punish" points and interpolate. Many entries have COMPLETE = 2.5.

After AmPgui in AmPeps

If all required fields are filled and AmPgui closed via "pause/save", AmPeps does some extra checks, and might re-open AmPgui again, but otherwise edits the structures and prints the mydata- and run-files. It copies the pars, txtPar and metaPar structures of the most related AmP species, as identified by AmPtool function clade, to write the pars_init file after editing the species name. The model type is read from metaPar, which dominates the selection of the code that is used for the specification of the predict-file from the structure DEBtool_M/lib/pet/prdCode.mat. After that, the auxiliary parameters of the pars_init file are edited to match the needs of the predict-file. During the process, it checks and completes fields, all automatically after exiting the AmPgui.

AmPeps opens the resulting 4 source-files in the Matlab editor for post-editing as final action of the Matlab function AmPeps. Consult source files of other entries for inspiration of further editing, such as adding data types that were not included in AmPgui. Se further "Starting an estimation" on DEBwiki.

If you have already a mydata file, but not yet the other 3 source files, you still use AmPimport('mydata_my_pet'), with "my_pet" replace by the name of your species, for further editing the entry and proceed to writing the 4 source-files. Exist AmPgui via "save/pause".

AmPeps post-editing phase

The post-editing phase uses the Matlab editor. This phase has less restrictions and allows, e.g., the addition of new data types in the mydata-file, which also requires code for the computation in the predict-file and, possibly, the addition of new parameters in the pars_init-file. The initiation phase assumes that data are in the preferred units (d, cm, g, J, K). If a data set actually uses e.g. years, rather than days, you need to convert to days in this post-editing step, by e.g. data.tWw(:,1) = data.tWw(:,1) * 365;. Notice that all parameters in the pars_init file use the preferred units, so does the predict-file. If, for some reason, it is important to deviate from this choice, and the mydata-file uses different units, there is a need to edit the predict-file. The units that the predict file uses are specified in the comment for each assignment of variables. If AmPeps was unable to specify the required code in the predict-file for certain data sets, as specified in the mydata-file, the missing specification is identified, and you have to add the code in the predict-file. Watch the red colors in the Matlab-editor for the need of further editing.

The far-reaching automatization of AmPeps was only possible because DEB notation is well standardized; the code follows this notation as closely as possible. Notice, for instance, that ap stands for "age at puberty" and tp for "time since birth at puberty". The difference is obviously age at birth ab = ap - tp. AmPeps only uses tp, not ap, to avoid that uncertainty in a delay in the onset of development, which affects ab, also affects the prediction for ap. AmPeps assumes that the leading name of uni-variate datasets identifies the type of data. So tWw is an (n,2)-array with the interpretation of time (in d) and wet weight (in g) for the columns. If more than one deta sets have this structure, append "_something", e.g. tWw_f for females and tWw_m for males. Males are always given "_m" in names of variables; females have "_f" facultatively, and are taken to be the default sex. Notice also that Lw is used in the predict file to specify physical length, as opposed to L for structural length. Since the latter cannot be measured directly, the mydata-files uses L for physical length: one of the many differences between the real (in mydata) and the fantasy (in predict) worlds. It pays to keep them well-separated.

The logical structure of AmPeps for its interplay between structures and code.