JAABA: Ground-Truthing

JAABA's ground-truthing mode is designed to quantitatively measure the classifier's performance. The classifier's performance is measured by its accuracy at predicting the behavior class for frames it was not trained on. To do this, the user manually labels a random selection of frames without looking at the classifier's predictions on these frames. These labels are used as the ground truth against which the classifier's predictions are compared. Note that these labels are not added to training set.

Preparing Data for Ground-Truthing
- Running JAABA in Ground-Truthing Mode
- Setting Videos to Ground-Truth
Selecting Frames to Ground Truth
- Navigating to Suggestions
Labeling in Ground-Truthing Mode
Ground-Truth Performance
Saving Labels

Preparing Data for Ground-Truthing

Running JAABA in Ground-Truthing Mode

To run JAABA in ground-truthing mode, select Open in Ground-Truthing Mode from the File menu. This can only be done when no JAABA project is currently open. Then, select the JAABA project for which you want to perform ground-truthing.

Setting Videos to Ground-Truth

Next you will need to modify the experiment list. To do this, select the menu File->Change Experiment List, then add the experiment directories you want to use for ground-truthing. In most cases, we recommend that you do not ground-truth on experiments you trained the classifier on, if it can be avoided. This will give you the best sense of how your classifier will generalize to videos it has never seen.

Selecting Frames to Ground Truth

To get an accurate estimate of JAABA's performance, we suggest using an unbiased method to choose frames to label. JAABA provides several methods to select frames for labeling that are all available under the menu View->Suggest Ground-Truthing Intervals:

Random: The user can ask JAABA to suggest random intervals to label by selecting View->Ground Truth Suggestion -> Random. The user is prompted for the size of the interval and the number of such intervals. Then, JAABA randomly selects such intervals for the user to label.
Balanced Random: For behaviors that occur rarely, if frames are selected randomly, there will be very few positive frames to label. Thus, the number of frames that the user needs to label to obtain an accurate estimate of the false negative rate will be quite large. In such cases, users can select View-> Ground Truth Suggestion - Balanced Random. As with Random, this will ask the user for the size of the interval and the number of intervals per experiment. However, in Balanced Random, JAABA selects intervals to label randomly, but increases the weight of intervals that contain predicted positive frames. The weight of a given interval is the sum of the weights of its constituent frames, and the weights of individual frames are set so that the total weight of predicted positive and predicted negative examples is the same. To use this method of suggestion, the scores must already be computed and imported for all the movies.

The suggested intervals are shown by a cyan line below the manual label timeline.

Navigating to Suggestions

To navigate to the suggested frames to label:

Use the Switch Target interface to find animals and videos for which some frames have been suggested. The Switch Target interface gives information about the number of frames that have been suggested by JAABA for each animal.
Under Go->NavigationPreferences, set Shift+Arrows to jump to Ground Truth Suggestions. Once this is done, shift+right and shift+left will jump to the next and previous suggested intervals to label.

Labeling in Ground-Truthing Mode

In ground-truthing mode, you can label trajectories for flies as you would in the normal training mode. These labels are used as the ground truth against which the classifier's predictions are compared.

We recommend performing ground-truthing in Advanced Mode (select this under the Edit menu). In Advanced mode, JAABA provides 5 buttons to label behavior: Important [Behavior], [Behavior], Important None, None and Unknown.

Frames that are clearly the behavior or clearly not the behavior should be labeled as Important [Behavior] and Important None, respectively. A high error rate on these important frames would cause you to be reluctant to use the classifier.

Frames in which the behavior or not-behavior is somewhat present but are on border should be labeled as [Behavior] and None. These are frames for which either the user is not confident in the behavior class or frames for which there are other reasons the classifier might find the frame difficult (e.g. tracking is poor for that frame). The user is less demanding about classifier's performance and would be satisified if the classifier got these mostly right.

If JAABA is started in normal ground-truthing mode, only 3 buttons are provided: [Behavior], None and Unknown. Frames labeled as [Behavior] and None are treated as Important [Behavior] and Important None while reporting the ground-truth performance.

Ground-Truth Performance

After labeling ground truth, the user can get a measure of the classifier's accuracy on the ground-truth labels by selecting the menu item Classifier-> Compute Ground Truth Performance. If a classifier is loaded then JAABA will use the classifier's predictions to compute the performance. If the scores are loaded then they are used to compute the performance. If both scores and classifier are loaded in, then JAABA uses scores to compute the performance. JAABA will return a table showing the types of errors made.

Screen capture of the ground-truth accuracy table

The columns of the table correspond to the classifiers' predictions, and the rows correspond to the manual labels. Each element of the table corresponds to the number and (percent) of frames with the given type of manual labels that have the given prediction. Percentages are computed over rows.

The columns are:

[Behavior] Predicted: Frames predicted as the project behavior, Chase in the example table below.
Not Predicted: Depending on the classifier parameters set, some frames may not be predicted on. This will happen for frames whose scores lie between the scores threshold that are set using Classifier -> Set confidence thresholds. These thresholds are both by default 0, thus there is a prediction for each frame and the middle column should be all zeros.
None Predicted: Frames predicted as not the behavior.

The rows are:

[Behavior] Important: Frames that were labeled as important examples of the behavior (Chase below).
[Behavior]: Frames that were labeled as the behavior (e.g. Chase), regardless of their importance.
None Important: Frames that were labeled as important examples of not the behavior.
None All: Frames that were labeled as not the behavior, regardless of their importance.

Viewing Predictions

By default, classifier's predictions are hidden so that they don't bias the users labeling. If the user wants to see the predictions for some reason (e.g. to make sure that the correct files are loaded in), they can select View -> Show Predictions.

Saving Labels

The user can save the ground-truth labels to the current JAABA project by selecting the menu item File->Save. Note that this will not affect any labels used for training the classifier, as both the experiments and labels for ground-truthing are stored in a separate field from those used for training.

JAABA: Ground-Truthing

Contents