The Existential Risk Research Assessment (TERRA)

The problem: an overwhelming volume of research

An overwhelming volume of research has been published, and more is being published all the time. It is taking an increasingly long time to find all of the publications that are relevant to our research. We need new methods of efficiently searching for relevant publications, and we need these methods to be systematic, to minimize bias in the publications that we read and the conclusions that we reach1.

The solution: a semi-automated system for finding relevant research

This system uses volunteers from The Existential Risk Research Network to identify relevant publications in a set of search results, and then it uses machine learning to identify similar publications in new sets of search results. It is a "recommender system" or "recommendation engine".

1. Humans do some of the work

We search a database2 for publications that might be relevant to existential risk. The results of this search are a "corpus" of publications that might be relevant. Using the titles and/or abstracts of the publications in this corpus, we assess the relevance of each publication, labelling it as "relevant" or "irrelevant".

2. Machines do some of the work

We "train" a machine-learning algorithm to identify relevant publications in the corpus, by telling it which publications are relevant and which are not, and giving it the titles and/or abstracts of these publications3. We then set up an automated and regularly-scheduled search for new publications, using the same search strategy4 as we used before, and the trained algorithm tells us which of the new publications are likely to be relevant.

3. Humans do some more of the work

Because the algorithm is not perfect, we double-check the publications that it predicts to be relevant, as a method of quality control. If we agree that a publication is relevant, then we add it to our bibliography, which is published on this website as a resource for the research community. Thus, this is a semi-automated system5. Humans are still part of the process, but we save time by not searching through all of the publications that the machine has (correctly or incorrectly) identified as irrelevant.

Why not Google it?

Why do we do all this, when we could search for "existential risk" or any other topic in a search engine?

Transparent and repeatable searching

The algorithms that are used by search engines are "black boxes" that are not open to public scrutiny and may give different results for different users. In contrast, our methods are transparent, and therefore they are open to scrutiny and they are repeatable, both of which are of critical importance to scientific progress (improving the methods over time, and gauging our confidence in the results).

Collaborative and cumulative results

It is inefficient or impossible for everyone who is interested in a topic, such as existential risk, to go through all of the search results by themselves. We share the work between many people (and machines), and we also share the results. If someone wants to know about this topic, then they will have access to a systematically and transparently collected bibliography that represents a vast amount of collective work and knowledge, rather than having to "reinvent the wheel" by doing their own search.

Notes

1For more information on this problem and its solutions, please see O'Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews, 4:5 (DOI).

2We search the Scopus database at present, but we plan to search additional databases (such as Web of Science) in the future.

3Please see Methods and Machine Learning for details.

4We use a set of keywords, defined by members of the research community and refined over time. Please see Methods for details.

5For examples of similar semi-automated systems, please see Lyon A., Grossel, G., Burgman, M., Nunn, M. (2013). Using internet intelligence to manage biosecurity risks: a case study for aquatic animal health. Diversity and Distributions, 19, 640-650 (DOI).