A de-Identified data set is a data set that meets both of the following:
- Does not identify any individual that is a subject of the data.
- Does not provide any reasonable basis for identifying any individual that is a subject of the data.
A dataset is de-identified under HIPAA Privacy Rule by one of the following means:
- Safe Harbor Method
- Expert Determination Method
Safe Harbor Method of removing HIPAA identifiers, which includes both the following provisions,
- Removal of all 18 elements enumerated in the Privacy Rule that could be used to identify the individual or the individual's relatives, household members, and employers (when applicable)
- Geographic subdivisions smaller than a state.
- All elements of dates (except year) for dates that are directly related to an individual, and all ages over 89 and all elements of dates (including year) indicative of such age
- Telephone numbers
- Fax numbers
- Email addresses
- Social security numbers
- Medical record numbers
- Health plan numbers
- Account numbers
- Certificate or license numbers
- Vehicle identification/serial numbers, including license plate numbers
- Device identification/serial numbers
- Universal Resource Locators (URLs)
- Internet protocol (IP) addresses
- Biometric identifiers, including finger and voice prints
- Full face photographs and comparable images
- Any unique identifying number, code, or other similar information.
Note on #2: ZIP codes, counties, census tracts, and other equivalents must be removed; the first 3 digits of a zip code may be included in a de-identified data set for an area where more than 20,000 people live. Many levels of geographic identifiers are permitted in a Limited Data Set
Notes on #3: Many records contain dates of service or other events that imply age. Elements of dates that are not permitted in a HIPAA-de-identified dataset include the day, month, and any other information that is more specific than the year of an event. For instance, "January 1, 2009" and "January 2009" are both considered to contain PHI.
Not only birth or death dates, but also dates of service (appointment, biopsy, surgery, etc.) are considered dates “directly related to the individual.”
Dates are permitted in a Limited Data Set
Note on #18: According to OCR Guidance on Satisfying the Safe Harbor Method, examples include
- identifying number - study-specific subject identification numbers,
- identifying code - barcodes designed to be unique for each patient for tracking purposes
- identifying characteristic - anything that distinguishes an individual and allows for identification; this may also be called an “indirect identifier.”
- The covered entity or its workforce, e.g., the principal investigator, has no actual knowledge that the remaining information could be used alone or in combination with other information to identify the individual who is the subject of the information
Expert Determination Method based on statistical analysis. In order to be considered de-identified under this method, an individual with knowledge of and experience with generally accepted statistical and scientific methods for rendering information not individually identifiable must provide certification that the data is de-identified. When making such a determination, the individual should find that the risk is very small that the information could be used (either alone or in combination with other reasonably available information) to identify any individual who is a subject of the data. Additionally, the methods and results of the analysis must be documented, and retained by the principal investigator to provide to the covered entity upon request.
Refer also to Michigan Medicine Policy 01-04-340 on De-identification and Re-identification of Protected Health Information (PHI).