Duplicate patients in your analyzable sample - Kidney Health Education and Research Group

Duplicate patients in your analyzable sample

Hi all,
For those of you working on papers/abstracts/analysis with mixed patient pools (e.g. your sample is the patients in Barriers and patients in ETO), you will encounter the issue of patient matching. Essentially, it is possible for your sample here to include patients who enrolled and completed both the Barriers and the ETO study. This double information can have a negative impact on your analysis, and should be removed.

Some of you may have noticed that we have a variable in the MergedSet.dta dataset called “partial_signature”. This variable is an ‘encrypted’ key identifier of a patient’s initial of first name + initial of last Name + DOB
e.g. my signature would be NE01211995
This information is based what recruiters put into the master database around enrollment and is then encrypted to something like (089a4088aa291c0554c0aef0e837581b), so you can’t try to figure out the original signature (NE01121995) which will identify the patient.

Using this, you now have the ability to remove these patients. This is what your do file code will look like:

use MergedSet.dta, clear //import dataset
keep if real_enrollment_status == 3 //keep enrolled patients
keep if researchstudy2 == 1 | researchstudy2 == 3 //keep barriers and eto patients

gen doedesc = -doe  
bysort partial_signature (doedesc): gen dup = cond(_N == 1, 0 , _n) if !mi(partial_signature)
drop if dup > 1 & !mi(dup)
drop doedesc dup

The first 3 lines is what you will edit based on who you are keeping in your analysis for your defined sample.
The last 4 lines are fixed. Do not edit these. It will take care of finding the duplicates in your sample and remove them.

Once again, a reminder that STATA is available remotely through our network liscence and you should not have patient data on your personal machines.

Please feel free to reach out to me if you ever have any questions.

Thanks,
Nathaniel Edwards

Leave a reply

Your email address will not be published.