The deduplication algorithm is applied to nonpublic data in the FDA Adverse Event Reporting System (FAERS) to identify duplicate reports. Unstructured data in free text FAERS narratives is processed through a natural language processing system to extract relevant clinical features. Both structured and unstructured data are then used in a probabilistic record linkage approach to score pairs of reports by evaluating multiple data fields and applying relative weights per field. The output of potential duplicate reports is further placed in groups to facilitate identification of FAERS reports during case series evaluation for safety issues of concern.
Date
Oct 2022
Source URL
Organization Type
Government