Medical Data Security: What You Need to Know Before Deploying AI
Your clinic is evaluating an AI solution for patient engagement. The demo is impressive, the ROI has been calculated, the team is excited. Then your legal counsel asks: "Where exactly will our patients' data be stored?"
That is the right question. And it is the one to start with in any AI procurement conversation in healthcare. This article gives you a framework: what to verify, what requirements apply, and how to avoid the trap of an attractive interface with unacceptable data processing terms.
Why Medical Data Is a Special Category
Health data is the most sensitive category of personal information. Not just because the law says so — though it does — but because the consequences of a breach or misuse are the most severe.
A leaked credit card gets replaced. Leaked health data can cost someone their job, their insurance coverage, or expose a condition they kept private. This is not hypothetical: documented cases exist across many countries.
Health data includes:
- Diagnoses, medical history, lab results.
- Prescriptions and medications.
- Mental health and substance use records (a special category within a special category).
- Genetic data.
- Pregnancy, HIV status, oncology records.
All of this is processed with patient consent, for specific purposes, with defined technical safeguards. Passing this data to an AI system without a proper legal framework is not "just a technical question."
Core Risks When Using AI in Healthcare
Risks fall into three categories: technical, legal, and operational.
Technical risks:
- Data breach in transit. Patient data is sent to the AI provider's server over an unencrypted channel or through an insecure API.
- Data breach at rest. Data is stored in a cloud environment without proper access segregation or encryption.
- Use of data for model training. Some providers use user inputs to fine-tune their models — your patient data becomes part of the training set.
Legal risks:
- Processing personal health data without adequate patient consent.
- Cross-border data transfers without a valid legal basis.
- Using data for purposes not covered by the original consent.
Operational risks:
- Vendor lock-in without a data export path.
- No documented incident response plan for a security breach.
On-Premise vs Cloud: What Should a Clinic Choose
This is not a question of "which is more secure." It is a question of "which matches your requirements and capabilities."
Cloud deployment is appropriate when:
- The provider is localized in your jurisdiction (US servers for US data under HIPAA).
- A signed Business Associate Agreement (BAA) is in place.
- The provider explicitly does not use your data to train models.
- The clinic lacks technical resources to manage its own infrastructure.
On-premise deployment is warranted when:
- You process sensitive specialty data (behavioral health, oncology).
- Internal security policies prohibit sending data to external cloud services.
- You are subject to heightened regulatory requirements.
- You have the technical team to maintain and update the system.
On-premise LLM deployment is a real option today. Modern language models for clinical applications can run on clinic-owned hardware. Requirements: a GPU server (40+ GB VRAM for mid-size models), DevOps expertise, and an ongoing maintenance budget.
What Must Be in Your Contract with an AI Provider
The contract with your AI vendor is not a formality. Here is what must be explicitly stated:
- Purposes of data processing. Only the purposes for which you use the service. No "to improve our product" without your explicit, separate consent.
- Location of data storage. Specific country and jurisdiction. HIPAA requires PHI to stay within covered, compliant infrastructure.
- Subprocessors. A full list of third parties the provider shares data with. You need to know the entire chain.
- Data retention period. After contract termination, data must be deleted within an agreed timeframe.
- Incident response procedure. How many hours until you are notified of a breach (72 hours per GDPR is a good benchmark).
- Right to audit. The ability to verify compliance with contract terms.
- Explicit prohibition on training models with your data. This must be a named clause, not implied.
HIPAA, GDPR, and State Laws: Key Requirements in Plain Language
HIPAA (USA). Applies to covered entities and their business associates. Core requirements for AI solutions:
- A Business Associate Agreement (BAA) is mandatory with any AI vendor processing PHI — no exceptions.
- Minimum necessary standard: access only the data needed for the specific function.
- Technical safeguards: encryption, access controls, audit logs.
- Security incident procedures: documented response plan and timely notification.
GDPR (EU). Applies when processing data of EU residents. Health data is a special category under Article 9. Additional requirements:
- Data Protection Impact Assessment (DPIA) required before deploying AI processing special category data at scale.
- Patient right to erasure and data portability.
- Notification of a supervisory authority within 72 hours of a breach.
State laws (CCPA, NY SHIELD, etc.). Several US states have enacted additional privacy legislation that may impose stricter requirements than HIPAA in certain contexts. Review applicable state law for your patient population.
For more on AI implementation in clinical settings, see: AI in the Clinic: How to Automate Triage Without the Headache.
Frequently Asked Questions
Can ChatGPT or other public AI services process patient data?
No. Public AI services (ChatGPT, Gemini, and others) are not designed for processing protected health information. OpenAI, for example, explicitly states in its terms of service that the standard service cannot be used for processing special categories of personal data without enterprise-level agreements. For real patient data, use specialized solutions with the appropriate legal framework.
What is an on-premise LLM and how does it work?
An on-premise LLM is a language model deployed on the clinic's own servers. Data never leaves the organization's perimeter. This requires a GPU server (such as NVIDIA A100 or H100), installation and configuration, and a team for ongoing support. Infrastructure costs start at $50,000–$150,000. Appropriate for large health systems with strict security requirements.
Is a BAA sufficient to make an AI tool HIPAA-compliant?
A BAA is necessary but not sufficient. It establishes the legal framework but does not automatically mean the technical implementation is compliant. You still need to verify encryption standards, access controls, audit logging, and incident response procedures. A BAA without technical due diligence is not compliance — it is just paperwork.
How can we verify that our AI provider is not training models on our data?
Only through a contract with an explicit prohibition and an audit right. Technical verification without access to the provider's infrastructure is not feasible. This is why vendor selection is always both a question of legal guarantees and a question of trust in the provider's practices.
What is the clinic's liability if there is a data breach through the AI system?
The clinic as a covered entity remains the primary responsible party regardless of whether the breach originated with the vendor. A well-drafted contract allows recourse against the vendor, but does not remove the clinic's direct obligations to patients and regulators under HIPAA and applicable state law.
What is a DPIA and when is it required?
A Data Protection Impact Assessment (DPIA) is a structured evaluation of the data processing risks associated with a new technology. Under GDPR, it is mandatory when deploying technology that processes special categories of data at scale. Even where not legally required — for example, for US-only operations — conducting a DPIA is a sound practice that identifies risks before deployment rather than after an incident.
Symptomatica is an informational reference service. Not a medical service; does not diagnose or prescribe treatment. For any symptoms, please consult a doctor.