JAABA's ground-truthing mode is designed to quantitatively measure the classifier's performance. The classifier's performance is measured by its accuracy at predicting the behavior class for frames it was not trained on. To do this, the user manually labels a random selection of frames without looking at the classifier's predictions on these frames. These labels are used as the ground truth against which the classifier's predictions are compared. Note that these labels are not added to training set.
To run JAABA in ground-truthing mode, select Open in Ground-Truthing Mode from the File menu. This can only be done when no JAABA project is currently open. Then, select the JAABA project for which you want to perform ground-truthing.
Next you will need to modify the experiment list. To do this, select the menu File->Change Experiment List, then add the experiment directories you want to use for ground-truthing. In most cases, we recommend that you do not ground-truth on experiments you trained the classifier on, if it can be avoided. This will give you the best sense of how your classifier will generalize to videos it has never seen.
To get an accurate estimate of JAABA's performance, we suggest using an unbiased method to choose frames to label. JAABA provides several methods to select frames for labeling that are all available under the menu View->Suggest Ground-Truthing Intervals:
The suggested intervals are shown by a cyan line below the manual label timeline.
In ground-truthing mode, you can label trajectories for flies as you would in the normal training mode. These labels are used as the ground truth against which the classifier's predictions are compared.
We recommend performing ground-truthing in Advanced Mode (select this under the Edit menu). In Advanced mode, JAABA provides 5 buttons to label behavior: Important [Behavior], [Behavior], Important None, None and Unknown.
Frames that are clearly the behavior or clearly not the behavior should be labeled as Important [Behavior] and Important None, respectively. A high error rate on these important frames would cause you to be reluctant to use the classifier.
Frames in which the behavior or not-behavior is somewhat present but are on border should be labeled as [Behavior] and None. These are frames for which either the user is not confident in the behavior class or frames for which there are other reasons the classifier might find the frame difficult (e.g. tracking is poor for that frame). The user is less demanding about classifier's performance and would be satisified if the classifier got these mostly right.
If JAABA is started in normal ground-truthing mode, only 3 buttons are provided: [Behavior], None and Unknown. Frames labeled as [Behavior] and None are treated as Important [Behavior] and Important None while reporting the ground-truth performance.
After labeling ground truth, the user can get a measure of the classifier's accuracy on the ground-truth labels by selecting the menu item Classifier-> Compute Ground Truth Performance. If a classifier is loaded then JAABA will use the classifier's predictions to compute the performance. If the scores are loaded then they are used to compute the performance. If both scores and classifier are loaded in, then JAABA uses scores to compute the performance. JAABA will return a table showing the types of errors made.
The columns of the table correspond to the classifiers' predictions, and the rows correspond to the manual labels. Each element of the table corresponds to the number and (percent) of frames with the given type of manual labels that have the given prediction. Percentages are computed over rows.
The columns are:
The rows are:
By default, classifier's predictions are hidden so that they don't bias the users labeling. If the user wants to see the predictions for some reason (e.g. to make sure that the correct files are loaded in), they can select View -> Show Predictions.