Why this evaluation is harder than it looks

Radiology AI software is not like most enterprise software. The consequences of getting it wrong are clinical, not just operational. Data privacy failures in a dose management system aren't a GDPR fine — they're a HIPAA breach involving patient PHI. Integration failures don't mean the sales pipeline breaks — they mean dose data stops flowing and compliance reporting fails.

At the same time, the vendor landscape is genuinely varied. There are products built by clinicians who understand the workflow and products built by software teams who have never seen a DICOM file. There are honest products with realistic performance claims and products whose demo environments bear little resemblance to what gets deployed. Distinguishing between them requires asking questions that vendors can't answer with a slide deck.

The demos are designed to impress. The questions below are designed to find out what happens after the demo — in your environment, on your PACS, with your patient mix, on day 180 of deployment rather than day 1.

68%of healthcare AI projects fail in post-pilot rollout
2–3×typical true TCO vs initial licence cost
6 moaverage time from contract to live in large hospitals
1

Where does patient data live during processing — and after?

This is the first question and the most important. Healthcare data governance requirements mean you need a clear, written answer — not a vague reassurance. There are three possible answers, each with different implications.

On-premise: Processing happens entirely on servers inside your network. PHI never leaves your infrastructure. This is the most secure option and the one most clinical governance committees will prefer. It requires you to provision server hardware (or a private VM), but it eliminates data residency concerns entirely.

Private cloud: Processing happens on dedicated cloud infrastructure, typically in a specific data centre region, under a signed Business Associate Agreement (BAA) if HIPAA applies. The data leaves your building but stays in a controlled environment. Acceptable for many departments, but requires reviewing the BAA carefully.

Shared cloud / SaaS: Your data is processed in the same infrastructure as other customers. This is the highest-risk option for PHI. Some vendors structure this compliantly; many do not disclose it clearly.

All processing happens on your servers via Docker. Zero external network dependencies. PHI never leaves your infrastructure.
All processing is cloud-based, but "we're HIPAA compliant" with no BAA offered, no data residency details, and no clear answer on data retention after processing.
2

How exactly does it integrate with our PACS — and what happens when the PACS changes?

Every vendor says their product integrates with "any DICOM-compliant PACS." This is technically true and practically meaningless. DICOM compliance covers the basics; real-world integration involves DICOM modality worklist, RDSR parsing, routing rules, and — in many cases — vendor-specific PACS extensions that are anything but standard.

Ask for a list of specific PACS systems the vendor has deployed against in production (not tested in a lab — deployed in a live clinical environment). Ask which PACS they have a tested, maintained connector for. Ask what happens when you upgrade your PACS version.

Also ask: does integration require a dedicated server acting as a DICOM intermediary, or does it connect directly? If an intermediary is required, who maintains it, and what happens if it goes down?

Native tested connectors for Orthanc, Sectra, and Agfa, with DIMSE Q/R and C-STORE support for everything else. PACS upgrades are tested against new versions within 30 days of release.
Uses a third-party DICOM router that you are responsible for licensing and maintaining. PACS connector is "custom-built per deployment" with no documentation of what "custom-built" means.
3

What does deployment actually require — and how long does it really take?

Every vendor's sales timeline is optimistic. The question is not how long deployment takes at a greenfield site with a single modality and a cooperative IT team — it is how long it takes at a site like yours, with your PACS vendor, your number of modalities, your IT change management process, and your clinical governance requirements.

Ask for references from hospitals with a similar profile to yours — same approximate PACS vendor, similar bed count, similar modality mix. Ask those reference sites specifically how long deployment took and what the surprises were. Ask what the vendor's standard implementation team looks like (dedicated engineer? shared resource? documentation and self-serve?)

For on-premise deployments, ask for the server specifications required. A system that nominally runs "on your infrastructure" but requires a $40,000 GPU server and a full-time Linux administrator is not the same as one that runs on a standard hospital VM.

Docker-based deployment. Reference hospitals with similar PACS going live in 3–5 days. Minimum server spec is a standard 8-core VM with 32 GB RAM — no GPU required for dose management.
Typical deployment takes "6–12 weeks" with no further detail. Reference sites provided are all academic medical centres with dedicated IT infrastructure teams.
4

Where is the AI accuracy validated — and on patients like mine?

AI performance in radiology degrades predictably when the patient population differs from the training data. A segmentation model trained predominantly on healthy adults at academic medical centres may perform poorly on elderly patients, paediatric patients, post-surgical anatomy, or obesity-related anatomical variation. A dose anomaly detection model trained on one CT vendor may miss subtleties specific to a different manufacturer's dose encoding.

Ask for peer-reviewed validation publications — not white papers, not internal testing reports. Published papers mean the methodology was reviewed by independent scientists. Ask specifically for external validation (a held-out dataset from different institutions) rather than cross-validation on the training set.

For segmentation tools, ask for per-structure Dice Similarity Coefficient (DSC) results, not an averaged figure. An average DSC of 0.88 across 100 structures can hide the fact that the 20 most clinically important structures achieve 0.72 — acceptable for some applications but not for surgical planning.

External validation published in a peer-reviewed journal. Per-structure DSC results available for review. Validation dataset includes patients from institutions outside the training set, with representative age and BMI distribution.
"Validated on thousands of cases" with no published methodology. DSC figures quoted without specifying which structures or patient demographics. No external validation data available.
5

What does the true cost of ownership look like beyond year one?

The licence fee is the smallest component of the total cost of a healthcare AI deployment. The components that often exceed the licence cost include: implementation and integration labour (internal IT time, not just vendor time), ongoing model updates and revalidation, infrastructure costs for on-premise deployments, training and change management for clinical staff, and the internal resource required to actually use the system — running queries, reviewing reports, investigating flagged cases.

Ask for a written total cost of ownership estimate covering at least three years. Ask specifically about model update policy: when the AI model is updated, is revalidation included in the licence? Who bears the cost if integration breaks after a PACS upgrade? Is training included or charged per session?

Also ask about the exit terms. If you decide to switch vendors after two years, can you export your historical data in a standard format? Or does your dose history and segmentation archive live in a proprietary database that you lose access to if you stop paying?

Flat annual licence inclusive of model updates, PACS connector maintenance, and standard training. Data is stored in open formats (DICOM, CSV) and fully exportable at any time.
Model updates billed separately. Integration support charged hourly. Historical data stored in a proprietary format with no documented export pathway.
EVALUATION FRAMEWORK — WHAT GOOD ANSWERS LOOK LIKE QUESTION GOOD ANSWER SIGNALS RED FLAGS Q1: Data Residency Where does PHI go during processing? ✓ On-prem via Docker ✓ Zero ext. dependencies ✓ Written network diagram ✗ "We're HIPAA compliant" ✗ No BAA offered ✗ No data residency docs Q2: PACS Integration Which systems in live production? ✓ Named PACS (Orthanc, Sectra…) ✓ DIMSE + C-STORE tested ✓ PACS upgrade process clear ✗ "Any DICOM-compliant" ✗ Third-party DICOM router ✗ No named production sites Q3: Deployment Reality Timeline for sites like ours? ✓ Docker, days not months ✓ Standard VM spec given ✓ Similar-site references ✗ "6–12 weeks typical" ✗ Only AMC references ✗ GPU server required Q4: Clinical Validation External validation data? ✓ Peer-reviewed publication ✓ Per-structure DSC given ✓ External validation set ✗ "Validated on 1000s" ✗ Averaged DSC only ✗ No external dataset Q5: True Cost of Ownership 3-year TCO including updates and data portability? ✓ Flat annual licence ✓ Model updates included ✓ Open-format data export ✓ Written 3yr TCO estimate ✗ Updates billed separately ✗ Proprietary data format ✗ No export pathway ✗ Hourly support charges
Evaluation framework — five questions with concrete signals distinguishing genuine answers from sales-deck responses.

The questions behind the questions

Running through all five questions is a single underlying theme: what happens after the sale? Every vendor performs well in the demo. The ones that perform well in the second year of deployment are the ones whose answers to the above questions are specific, documented, and verifiable by reference customers.

A vendor who can't give you a clear answer on data residency either hasn't built the infrastructure properly or doesn't want you to look too closely at it. A vendor who can't name specific PACS systems they've deployed against in production has likely built their product in a lab environment. A vendor who can't provide per-structure validation DSC figures either doesn't have the data or knows it doesn't look good.

Contrast this with the questions that don't discriminate well: "Is it HIPAA compliant?" (everyone says yes), "Does it integrate with our PACS?" (everyone says yes), "Is it AI-powered?" (meaningless without specifics). These questions filter out no one at the demo stage.

A note on on-premise vs cloud

The data residency question is particularly live for radiology departments, because the clinical governance answer and the IT convenience answer point in opposite directions. On-premise is more secure and satisfies most governance committees without detailed due diligence. Cloud is easier to deploy and update. The right answer depends on your trust board's risk appetite, your IT team's capacity, and — critically — whether the vendor actually offers a genuine on-premise option or one that is "on-premise" in name but phoning home to cloud services for core processing.

Ask to see a network diagram of the data flow in the on-premise deployment. If any arrow in that diagram points to an external server for processing (as opposed to update distribution or licence validation), you are not looking at a true on-premise product.

Key takeaways