
The growing volume of biomedical literature is both a help and a hindrance to health economics and outcome research (HEOR). New text-mining tools, however, are radically improving the efficiency of literature-based information retrieval, making this form of information retrieval more efficient than conventional keyword searching. Three techniques are of particular value to HEOR: named entity recognition, biomedical ontologies, and visualization.
When conducting HEOR, researchers may expend significant effort to formulate keyword queries that are broad enough to capture all relevant literature while precise enough to narrow in on content that is relevant to the researcher’s interest. Keywords can have varying meanings in different contexts. Multi-word phrases may not be interpreted correctly unless researchers use search engine-specific syntax. These problems arise because keywords are not always good proxies for the concept a researcher is investigating.
Named Entity Recognition
One way to improve keyword searching is to create indexes of concepts discussed within a paper, presentation or other content. Concepts can be partially represented by noun phrases within a text. For example, “pro-inflammatory cytokines”, “IL-1β-mediated cellular changes” and “nerve growth factor” are concepts associated with pain related to inflammation. A text-mining operation known as named entity recognition identifies words and phrases that describe specific types of entities, such as drug names, biological processes and diagnostic terms. Once named entities are identified, they can be indexed in ways that allow more precise searching than simple keyword searching.
Biomedical Ontology
Named entities recognition is even more effective when combined with a biomedical ontology. Ontologies describe relationships between concepts. Inflammation, for example, is a type of pathological process in the Medical Subject Headings ontology; it also generalizes more specific concepts such as neurogenic inflammation and septic shock. Ontologies enable concept-based searching by allowing researchers to readily navigate across ontology concepts to more general or more specific terms.
Named entities and related ontology concepts allow biomedical literature repositories to link documents in ways that are not possible with keyword indexing. Consider a researcher who has found a useful review article. The references included in the review are potentially of interest as well. If the researcher is starting with reviews but primarily interested in more narrowly focused studies, only a small number of references are candidates for further review. To identify the most likely useful references, she may need to read not only the review but additional abstracts to find the most relevant ones. With concept-based linking enabled by named entity recognition and biomedical ontologies, researchers can navigate from the review to other documents based on the concepts discussed in the review, including documents that are related but not necessarily listed in the references.
Visualization
Visualization tools provide a crucial third component of text-mining tools. Rather than review text summaries of search results, researchers can review graphical or network layouts of related documents or concepts. Text-mining visualization tools can display high-level views of a large corpus of papers as well as more fine-grained views of small areas of the network of related concepts and documents.
The upshot: Text-mining techniques like named entity recognition, biomedical ontologies, and visualization enable more efficient literature searching in complex domains such as health economics and outcome research.