Combining Investigator Data from Three U.S. Government Databases
Editors note: Recently Ron and Romiya collaborated with Norman Goldfarb to publish an article related to this topic in the Journal of Clinical Research Best Practices.
The Clinicaltrials.gov database is rich with data, but we are often challenged to extract meaningful information that can move forward research and practice. Several researchers have remarked on the incompleteness and inconsistency between data elements in the database. Oftentimes this results in us making assumptions in our interpretation of the clinical trial and investigator landscape.
We know the quality of our decisions is made based on the quality of the information. Stakeholders report that sponsors search for “perfect” investigators, presenting access barriers to clinical trial participation for new investigators. When identifying and pre-selecting qualified and experienced sites and investigators for industry-sponsored interventional trials, we should be making these choices with the best available information. Fortunately, in addition to ClinicalTrials.gov there are at least two additional public sources of data (CMS and FDA) with some of the missing data elements. By matching and merging data from all three databases into a single record for each investigator, we found that complete records could be identified for 7,936 (15.7%) US investigators in 2017.
For the study, we analyzed industry-sponsored interventional trials in the US with start dates in 2017. Here are some of the findings:
In the three databases, there were 65,890 investigator records in 2017 (not counting overlaps).
After matching, we identified 50,414 unique investigators, 20,474 (40.1%) were matched from two or more sources.
38,383 (76.1%) investigators can be matched with 1 or more clinical trial identifiers.
Of the investigators that first appear in 2017, 29% first appear in that year.
While the paper shows we are able to overcome some of the challenges other researchers and developers have experienced in analyzing the clinical trial investigator landscape, the industry can do better by enforcing standards for research site, investigator, and city names. Without consistent naming algorithms must be augmented with human curation.
We conclude the reporting with a call for the inclusion of cleaner, more comprehensive inputs to make trial registration data meaningful so that we have a fairer chance of understanding investigator demographics and activity. Some changes are needed at the policy level, however, there are steps we can take to “do better” in providing the database inputs.