Machine learning is about pattern matching. Using a "training set" of publications that have been labelled as "relevant" or "irrelevant" to a given topic, a machine-learning algorithm can be trained to identify relevant publications, based on the pattern of words in the titles and/or abstracts of these publications.
Machine learning is not perfect. Like humans, a machine-learning algorithm can make mistakes. We can quantify these mistakes by testing the performance of the trained algorithm on a "test set" of publications that have also been labelled as "relevant" or "irrelevant" (different publications than those in the "training set"). When we do this test, we can see that there is a trade-off between "precision" and "recall". Precision is the proportion of publications that the algorithm predicts to be relevant that are truly relevant. Recall is the proportion of publications that are truly relevant that the algorithm predicts to be relevant.
Table 1: Trade-off between precision and recall
|Existential risk||"High recall"||0.9603||0.2309||973||225|
|Existential risk||"Medium recall"||0.7550||0.3465||389||135|
|Existential risk||"Low recall"||0.5033||0.4691||144||68|
By selecting a model with high recall, you will get more "false positives" (irrelevant publications that the algorithm predicts to be relevant), but you will also get fewer "false negatives" (relevant publications that the algorithm predicts to be irrelevant). By selecting a model with low recall, you will get fewer irrelevant publications to sort through (and thus you will save time), but you will also lose some relevant publications. You should select a model based on the amount of time you have and your preference for either high precision or high recall. Please note that publications are shown in order of decreasing predicted relevance. Therefore, the same publications (those with the highest predicted relevance) are shown on the first pages of the bibliographies for all models (low, medium, and high recall).
In Table 1, the number of publications that are likely to be truly relevant are in the column called "True Positives". Thus, in the first row in the table, the model predicts that 973 publications are relevant ("Positives"). Of these publications, 225 are likely to be truly relevant, based on the precision of the model (true positives = positives x precision). However, this is only an estimate, based on the performance of this model on the test set. This model is not likely to perform identically on the test set and the new set of unassessed publications.