Beta Update: Cleaning the Investigator Database

I appreciate all the support as we evolve TrialIO!

In between prospect demo's we were head down on product development last week. Here is the update:

Site / Investigator Data Cleansing

Last week we undertook a major sub-project to "clean" the investigator and site profiles. I will detail the statistics in a separate blog post, but here are some high level details that might interest you:

  • 497,530 investigator records, culled from 903,933 total entries in ClinicalTrials.gov, Bioresearch Monitoring, and CMS Sunshine data.
  • 296,203 records have at least 1 Trial ID attached, 110,127 records have an attached email address
  • 244,901 of the 296,203 are sourced uniquely from ClincalTrials.gov, 13,451 are sourced uniquely from Sunshine. The balance are composites of all three sources CT.gov, Sunshine, and BMIS
  • There were hundreds of mis-spelled cities and towns and thousands of mismatched site names. Using text processing we fixed the addresses and reduced the number of unique site names from 290,381 to 141,886. This clean up has a dramatic impact on the de-duplication quality.
  • The database is international. The United States accounts for 209,042 (42%) of the investigator records. Thus, more than half (58%) investigators are non-US based.

Site / Investigator Browsing

You can now search investigators and sites by name. View animated investigator browser demo snippet here.

 

Photo by Andy Fitzsimon on Unsplash