De-identified Data Sets

Apr 17, 2019 10:00 am

NOTE: This page provides HIPAA-related guidance on “de-identified data sets,”applicable only to data based on Protected Health Information (usually medical records). Other federal regulations enforced by the IRB have different standards and definitions for “de-identified,” which may impact IRB regulatory status. See heading below “Contrast with Common Rule.”

  • Definition

    A de-Identified data set is a data set that meets both of the following:

    • Does not identify any individual that is a subject of the data.
    • Does not provide any reasonable basis for identifying any individual that is a subject of the data.

    A dataset is de-identified under HIPAA Privacy Rule by one of the following means:

    • Safe Harbor Method
    • Expert Determination Method

    Safe Harbor Method of removing HIPAA identifiers, which includes both the following provisions,

    1. Name
    2. Geographic subdivisions smaller than a state. 
    3. All elements of dates (except year) for dates that are directly related to an individual, and all ages over 89 and all elements of dates (including year) indicative of such age
    4. Telephone numbers
    5. Fax numbers
    6. Email addresses
    7. Social security numbers
    8. Medical record numbers
    9. Health plan numbers
    10. Account numbers
    11. Certificate or license numbers
    12. Vehicle identification/serial numbers, including license plate numbers
    13. Device identification/serial numbers
    14. Universal Resource Locators (URLs)
    15. Internet protocol (IP) addresses
    16. Biometric identifiers, including finger and voice prints
    17. Full face photographs and comparable images
    18. Any unique identifying number, code, or other similar information.
    • The covered entity or its workforce, e.g., the principal investigator, has no actual knowledge that the remaining information could be used alone or in combination with other information to identify the individual who is the subject of the information

    Note on #2: ZIP codes, counties, census tracts, and other equivalents must be removed; the first 3 digits of a zip code may be included in a de-identified data set for an area where more than 20,000 people live. Many levels of geographic identifiers are permitted in a Limited Data Set

    Note on #3: Many records contain dates of service or other events that imply age. Elements of dates that are not permitted in a HIPAA-de-identified dataset include the day, month, and any other information that is more specific than the year of an event. For instance, "January 1, 2009" and "January 2009" are both considered to contain PHI. Dates are permitted in a Limited Data Set

    Expert Determination Method based on statistical analysis. In order to be considered de-identified under this method, an individual with knowledge of and experience with generally accepted statistical and scientific methods for rendering information not individually identifiable must provide certification that the data is de-identified.  When making such a determination, the individual should find that the risk is very small that the information could be used (either alone or in combination with other reasonably available information) to identify any individual who is a subject of the data.  Additionally, the methods and results of the analysis must be documented, and retained by the principal investigator to provide to the covered entity upon request. 

    Refer also to UMHS Policy 01-04-340 on De-identification and Re-identification of Protected Health Information (PHI).

  • Creating a De-Identified Data Set

    UMHS Policy permits its workforce to create de-identified data sets for research purposes. Before accessing the PHI, researchers should seek a determination from the IRB to confirm appropriate de-identification by filling out an eResearch Regulatory Management (eResearch or eRRM) application.

  • Research involving a De-identified Data Set

    Researchers intending to obtain an already-de-identified data are encouraged but not required to seek a determination from the IRB by filling out an eResearch Regulatory Management (eResearch or eRRM) application for “Activities not regulated as human subjects research.”

    If you are sharing data outside the covered entity or receiving data from outside U-M under the terms of a Data Use Agreement (DUA), the DUA should be processed through the Unfunded Agreement (UFA) form in eResearch Proposal Management (eRPM).

  • Publishing a case study

    Ideally, written HIPAA authorization for the use of patient PHI should be obtained for publishing “case studies” based on records on only one or two patients. Researchers should seek a determination from the IRB by filling out an eResearch Regulatory Management (eResearch or eRRM) application for “Activities not regulated as human subjects research,” “Case Studies – Clinical” subtype, including addressing whether authorization was/will be obtained.

  • Retaining a code to permit re-identification

    HIPAA Privacy Rule permits  a covered entity or its workforce to assign to, and retain with, de-identified health information a code or other means of record identification if that code

    1. is not derived from or related to the information about the individual, and
    2. could not be translated to identify the individual.

    The covered entity may not use or disclose the code or other means of record identification for any other purpose than re-identification, and may not disclose its method of re-identifying the information.

    A table showing data elements permitted in de-identified data and limited data sets is available through the References section of UMHS Policy 01-04-032 on Limited Data Sets.

  • Contrast with Common Rule

    Under the Common Rule a dataset is “de-identified” only when no one could “re-identify” the data: not the recipients, nor the data provider, nor anyone else. If the data were “coded,” any “key to the code” must be destroyed to “de-identify” the dataset.

    The Common Rule does not recognize as “de-identified” information that retains a code to permit re-identification: rather, this is “coded” information which is “indirectly identifiable.” Therefore, a dataset can be “identifiable” under Common Rule definitions while also meeting HIPAA “de-identified” criteria.

  • See also


Contact us at or 734-763-4768 / (Fax 734-763-1234)

2800 Plymouth Road, Building 520, Room 3214, Ann Arbor, MI 48109-2800

A list of IRBMED staff is available in the Personnel Directory, or view the list of Regulatory Teams.

Edited By:
Last Updated: April 17, 2019 10:00 AM