National Library of Medicine NLM-Gene: towards automatic gene indexing in PubMed articles

Gene indexing is part of the NLM’s MEDLINE citation indexing efforts for improving literature retrieval and information access. Currently, gene indexing is performed manually by expert indexers. To assist this time-consuming and resource- intensive process, we have developed NLM-Gene, an automatic tool for finding gene names in the biomedical literature using advanced natural language processing and deep learning methods. Its performance has been assessed on gold- standard evaluation datasets and is to be integrated into the production MEDLINE indexing pipeline.