Training AI for clinical or research use in healthcare requires feeding algorithms patient data, and lots of it. This opens data custodians—typically hospitals—to various points of potential legal exposure. Tops among the worries are complying with HIPAA, de-identifying patient data and otherwise protecting patients from having their privacy invaded.
Two attorneys specialized in such matters break down the constituent concerns in a podcast posted online this month. Beth Pitman, JD, and Shannon Hartsfield, JD, both with the Holland & Knight firm in Nashville, center their discussion around Dinerstein v. Google. However, they cover principles and precedents transcending any one case. Here are slices of their legal expertise drawn from the discussion and lightly edited for clarity and conciseness.
Q1. From a hospital law team’s perspective, how is AI development different from other reasons for sharing de-identified patient data with outside organizations?
A. The way the AI works—and the thing that’s so great about AI—is it can accumulate a large amount of data, and not just from one source. The de-identified data comes initially from the healthcare provider and goes to the AI system. It may have been de-identified correctly at the healthcare provider location. But then, because AI is learning and has access to so much other information that’s independent of the healthcare provider’s information, that information could be used to re-identify an individual. That’s one of the unique underlying concerns with AI.
Q2. What are HIPAA’s guidelines for sharing de-identified patient info with tech companies inside which someone could, if they wanted to, re-identify patients?
A. Under what we call the HIPAA de-identification safe harbor, you have to remove 18 specific identifiers. But then you also can’t have actual knowledge that the information could be used alone or in combination with other information to identify patients. So the question becomes, “When I’m handing this de-identified data over to the AI developer, might they be able to re-identify it?”
Q3. What about a case in which patient data is not fully de-identified but is, instead, limited by contract to a particular AI use case?
A. Information that is not fully de-identified may potentially qualify as a limited dataset, but an AI use of a limited data set must comply with HIPAA. If a limited dataset is to be disclosed, that is perfectly permissible under HIPAA as long as there is a HIPAA compliant data use agreement and the limited dataset, as well as the purpose of the disclosure, conform to HIPAA. When healthcare providers and tech companies work together, the key is to look at the details of the specific situation you’re dealing with and then carefully analyze those facts to make sure that you’re complying with HIPAA.
Q4. What constitutes a limited dataset?
A. A limited dataset is a term defined in HIPAA, and it includes certain direct identifiers, such as the name, the postal address, social security number, medical record number. It can include some other demographic information, like your ZIP code, other elements in a medical record, like admission, discharge, date of service and other items. It is not completely de-identified, but it does include a limited amount of information, which is why it’s called limited dataset. Limited datasets may be used under HIPAA for research purposes and also for other specific purposes listed in the HIPAA rules, as long as there’s a data use agreement in place that meets the requirements for HIPAA and does sufficiently protect the information.
Q5. How far along is U.S. healthcare in its thinking on AI vis-à-vis patient privacy?
A. Given the ways AI develops and learns, its use in healthcare continues to raise a lot of issues and concerns related to when a healthcare provider can appropriately disclose protected health information through the technology and how it can be disclosed. This is still very much an evolving area. Of course, the uses of AI in healthcare have not really been fully fleshed out. The issues it raises will probably continue growing and developing for many years to come.
Listen to the podcast (or read the full transcript) here.