Natural language processing

Natural language processing (NLP) is an area of active research in artificial intelligence concerned with human languages. Natural language processing programs use human written text or human speech as data for analysis. The goals of natural language processing programs can vary from generating insights from texts or recorded speech to generating text or speech.

The first area of natural language processing to gain wide usage in radiology was speech recognition. In earlier literature, speech recognition was often referred to as voice recognition , but the trend in nomenclature is towards differentiating voice recognition and speech recognition, with only the latter implying the use of dictated recordings to create reports. In many radiology practices, radiologists use speech recognition programs to create reports routinely.

Increasing research in artificial neural networks has sparked an interest in topic modeling algorithms of natural language processing which can be used to automate the labeling of images. Examples include the NIH chest x-ray data set ChestX-ray8.

Due to the brevity, limited vocabulary, and structured nature of radiology reports, many different algorithm types have proven successful at annotation of radiology reports.

Areas of active research for the application of natural language processing in radiology include areas of natural language understanding (NLU) such as topic modelling, other forms of information extraction and keyword searching. Natural language processing also includes natural language generation (NLG).

Practical Points

Several organizations have undertaken efforts to standardize radiology reports . One byproduct of standardized reports is that the reports are more amenable to rule based and/or decision tree algorithms for NLP, however at present much progress has been made in interpreting free text by using algorithms that use statistical operations on matrices derived from texts such as the Latent Dirichlet allocation.