Blog: Word2vec

How to quickly find relevant literature on 3R

(posted by: Nils Hijlkema, 1/9/2021)

Replacement, Refinement and Reduction. 3R, for short.

It’s vital to look into when setting up research experiments that normally require animal studies: it’s a legal EU requirement, to be precise.

But it’s also highly time-consuming for Animal Welfare Officers and other researchers, not to mention complicated. After all, you will need to browse through enormous amounts of research to find available literature that helps to identify the best method for your experiment.

Fortunately, there is a new technology to help you.

Word2vec

Word2vec is a smart new text mining tool that uses a neural network model to learn word associations from a large corpus of text. Once trained, the model can detect synonymous words or suggest additional words for a partial sentence.

This will enable you to find relevant literature without needing to think of a set of keywords for a BOOLEAN search of what you are looking for first.

How word2vec works

We have used word2vec to create a ‘Concept Network’ as shown in the picture below.

This Concept Network is generated by machine learning. Clusters of nodes are depicted with lines connecting the nodes. The nodes represent words that are specific for 3R related papers. The presence of lines indicates whether words are related or not.

An example: finding 3R-literature for keratinocytes and carcinogens

For instance, the blow-out at the right shows the terms ‘keratinocytes’, ‘carcinogens’, ‘corneal’ and ‘cornea’ clustered together. The computer has identified these words as descriptors of animal alternatives. This means that if one reads some of these words together in an abstract or title with the word ‘draize’, which is a term for an animal test, chances are that this abstract is about an alternative for a Draize test.

Surprise findings

Another interesting cluster of nodes in the Concept Network shows 4 concepts that include ‘russell’ (and ‘russel’, a common ‘typo’ by authors), ‘burch’ and ‘SETAC’ that seem really predictive to whether an article is about alternative experimental models. This cluster is highlighted at the right-bottom. These are clearly not biological terms. Instead, these are names of two authors that are often cited in publications around 3R. Closer review learns that Russell & Burch are considered the founding fathers of the 3R principles back in 1959. They are often referred to by authors when describing the 3R principles in relation to their work. This shows that also less obvious words can be good describers for finding specific articles on 3R.

Save time and ease the search

To conclude, machine learning models such as word2vec that train computers can yield visually strong and informative networks. Such networks can also capture relevant search words on a certain topic. This is certainly relevant for finding animal replacement studies. By using such search words expert researchers can save valuable time. Also, it helps those that are less experienced in a field to find and select relevant research articles to read. To learn more, read this online tutorial: https://medium.com/@zafaralibagh6/simple-tutorial-on-word-embedding-and-word2vec-43d477624b6d

At TenWise we are always exploring existing and new technologies that add benefit to our KMAP database covering over 500,000 million biological keywords that describe human physiology.

Learn more about TenWise and AI within 3R research and on our ongoing collaboration: https://www.linkedin.com/feed/update/urn:li:activity:6785133163770322944