De-identified Data Sets

Sep 9, 2020 10:15 pm

NOTE: This page provides HIPAA-related guidance on “de-identified data sets,” applicable only to data based on Protected Health Information (usually medical records). Other federal regulations enforced by the IRB have different standards and definitions for “de-identified,” which may impact IRB regulatory status. See heading below “Contrast with Common Rule.”

  • Definition

    A de-Identified data set is a data set that meets both of the following:

    • Does not identify any individual that is a subject of the data.
    • Does not provide any reasonable basis for identifying any individual that is a subject of the data.

    A dataset is de-identified under HIPAA Privacy Rule by one of the following means:

    • Safe Harbor Method
    • Expert Determination Method

    Safe Harbor Method of removing HIPAA identifiers, which includes both the following provisions,

    • Removal of all 18 elements enumerated in the Privacy Rule that could be used to identify the individual or the individual's relatives, household members, and employers (when applicable)
      1. Name
      2. Geographic subdivisions smaller than a state. 
      3. All elements of dates (except year) for dates that are directly related to an individual, and all ages over 89 and all elements of dates (including year) indicative of such age
      4. Telephone numbers
      5. Fax numbers
      6. Email addresses
      7. Social security numbers
      8. Medical record numbers
      9. Health plan numbers
      10. Account numbers
      11. Certificate or license numbers
      12. Vehicle identification/serial numbers, including license plate numbers
      13. Device identification/serial numbers
      14. Universal Resource Locators (URLs)
      15. Internet protocol (IP) addresses
      16. Biometric identifiers, including finger and voice prints
      17. Full face photographs and comparable images
      18. Any unique identifying number, code, or other similar information.

      Note on #2: ZIP codes, counties, census tracts, and other equivalents must be removed; the first 3 digits of a zip code may be included in a de-identified data set for an area where more than 20,000 people live. Many levels of geographic identifiers are permitted in a Limited Data Set

      Notes on #3: Many records contain dates of service or other events that imply age. Elements of dates that are not permitted in a HIPAA-de-identified dataset include the day, month, and any other information that is more specific than the year of an event. For instance, "January 1, 2009" and "January 2009" are both considered to contain PHI.

      Not only birth or death dates, but also dates of service (appointment, biopsy, surgery, etc.) are considered dates “directly related to the individual.”

      Dates are permitted in a Limited Data Set

      Note on #18: According to OCR Guidance on Satisfying the Safe Harbor Method, examples include

      • identifying number - study-specific subject identification numbers,
      • identifying code - barcodes designed to be unique for each patient for tracking purposes
      • identifying characteristic - anything that distinguishes an individual and allows for identification; this may also be called an “indirect identifier.”
    • The covered entity or its workforce, e.g., the principal investigator, has no actual knowledge that the remaining information could be used alone or in combination with other information to identify the individual who is the subject of the information

    Expert Determination Method based on statistical analysis. In order to be considered de-identified under this method, an individual with knowledge of and experience with generally accepted statistical and scientific methods for rendering information not individually identifiable must provide certification that the data is de-identified.  When making such a determination, the individual should find that the risk is very small that the information could be used (either alone or in combination with other reasonably available information) to identify any individual who is a subject of the data.  Additionally, the methods and results of the analysis must be documented, and retained by the principal investigator to provide to the covered entity upon request. 

    Refer also to Michigan Medicine Policy 01-04-340 on De-identification and Re-identification of Protected Health Information (PHI).

  • Creating a De-Identified Data Set

    Michigan Medicine Policy 01-04-340 (level-2 login required) permits its workforce to create de-identified data sets for research purposes. Before accessing the PHI, researchers should seek a determination from the IRB to confirm appropriate de-identification by filling out an eResearch Regulatory Management (eResearch or eRRM) application.

  • Research involving a De-identified Data Set

    Researchers intending to obtain an already-de-identified data are encouraged but not required to seek a determination from the IRB by filling out an eResearch Regulatory Management (eResearch or eRRM) application for “Activities not regulated as human subjects research.”

    • Health information that has been properly de-identified according to HIPAA Privacy Rule is not considered to be PHI.
    • Research on non-identifiable information, or on coded private information where the researchers never have access to “re-identify,” does not qualify as “research involving human subjects” per OHRP Guidance.
    • U-M Human Research Protections Program does not require formal IRB determination for activities falling outside the definitions of “research involving human subjects” (HRPP Operations Manual Part 4.V)

    If you are sharing data outside U-M, open an "Outgoing DUA" Unfunded Agreement (UFA) form in eResearch Proposal Management (eRPM). The Medical School Office of Research Data & Biospecimen sharing expects a formal DUA for external sharing of any individual-level clinical data, even if de-identified.

    If you are receiving a dataset from an outside entity that requires a formal DUA, use the “incoming DUA” Unfunded Agreement (UFA)  in eResearch Proposal Management (eRPM). DUAs may not be required for HIPAA-de-identified data.

  • Retaining a code to permit re-identification

    HIPAA Privacy Rule permits a covered entity or its workforce to assign to, and retain with, de-identified health information a code or other means of record identification if that code

    1. is not derived from or related to the information about the individual, and
    2. could not be translated to identify the individual.

    The covered entity may not use or disclose the code or other means of record identification for any other purpose than re-identification, and may not disclose its method of re-identifying the information.

    A table showing data elements permitted in de-identified data and limited data sets is available through the References section of UMHS Policy 01-04-032 on Limited Data Sets.

  • Contrast with Common Rule

    Under the Common Rule a dataset is “de-identified” only when no one could “re-identify” the data: not the recipients, nor the data provider, nor anyone else. If the data were “coded,” any “key to the code” must be destroyed to “de-identify” the dataset.

    The Common Rule does not recognize as “de-identified” information that retains a code to permit re-identification: rather, this is “coded” information which is “indirectly identifiable.” Therefore, a dataset can be “identifiable” under Common Rule definitions while also meeting HIPAA “de-identified” criteria.

  • See also


Contact us at or 734-763-4768 / (Fax 734-763-1234)

2800 Plymouth Road, Building 520, Room 3214, Ann Arbor, MI 48109-2800

A list of IRBMED staff is available in the Personnel Directory, or view the list of Regulatory Teams.

Edited By:
Last Updated: May 22, 2023 2:00 PM