PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity (different authors share the same name) may lead to irrelevant retrieval results. Thus we have developed a machine-learning method to score the features for disambiguating a pair of papers with ambiguous names.
Subsequently, agglomerative clustering is employed to collect all papers belong to the same authors from those classified pairs. Disambiguation performance is evaluated with manual verification of random samples of pairs from clustering results, with a higher accuracy than other state-of-the-art methods. It has been integrated into PubMed to facilitate
author name searches.
Date
Oct 2022
Source URL
Organization Type
Government