healthcare-eval-harness

Official

by Api.AirforcePrepends a system promptTesting & QA000 uses202,700

Patient safety evaluation harness for healthcare application deployments. Automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance. Blocks deployments on safety failures.

open-sourceclaude-codetesting-qaaffaan-m

What this skill does

When applied, it prepends a system prompt before your request is sent — no extra calls and no change to how you are billed beyond the added tokens.

---
name: healthcare-eval-harness
description: Patient safety evaluation harness for healthcare application deployments. Automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance. Blocks deployments on safety failures.
origin: Health1 Super Speciality Hospitals — contributed by Dr. Keyur Patel
version: "1.0.0"
---

# Healthcare Eval Harness — Patient Safety Verification

Automated verification system for healthcare application deployments. A single CRITICAL failure blocks deployment. Patient safety is non-negotiable.

> **Note:** Examples use Jest as the reference test runner. Adapt commands for your framework (Vitest, pytest, PHPUnit, etc.) — the test categories and pass thresholds are framework-agnostic.

## When to Use

- Before any deployment of EMR/EHR applications
- After modifying CDSS logic (drug interactions, dose validation, scoring)
- After changing database schemas that touch patient data
- After modifying authentication or access control
- During CI/CD pipeline configuration for healthcare apps
- After resolving merge conflicts in clinical modules

## How It Works

The eval harness runs five test categories in order. The first three (CDSS Accuracy, PHI Exposure, Data Integrity) are CRITICAL gates requiring 100% pass rate — a single failure blocks deployment. The remaining two (Clinical Workflow, Integration) are HIGH gates requiring 95%+ pass rate.

Each category maps to a Jest test path pattern. The CI pipeline runs CRITICAL gates with `--bail` (stop on first failure) and enforces coverage thresholds with `--coverage --coverageThreshold`.

### Eval Categories

**1. CDSS Accuracy (CRITICAL — 100% required)**

Tests all clinical decision support logic: drug interaction pairs (both directions), dose validation rules, clinical scoring vs published specs, no false negatives, no silent failures.

```bash
npx jest --testPathPattern='tests/cdss' --bail --ci --coverage
```

**2. PHI Exposure (CRITICAL — 100%

Use this skill

Per request

Add a "skill" field with the skill’s ID to your chat completion request. It is applied server-side before your prompt is sent — no extra calls.

{
  "model": "gpt-4o-mini",
  "skill": "imp-4417fdb2-bb10-4e27-bab8-cea70dc28753",
  "messages": [{ "role": "user", "content": "…" }]
}

Always on — no field to send

Install the skill, enable it in your dashboard and (optionally) limit it to specific models. It then applies automatically to every matching request — with no "skill" field to send each time.

Set it up in your dashboard

More skills

node-connect

Diagnose OpenClaw Android, iOS, or macOS node pairing, QR/setup code, route, auth, and connection failures.

1password

Set up and use 1Password CLI for sign-in, desktop integration, and reading or injecting secrets.

apple-notes

Create, view, edit, delete, search, move, or export Apple Notes via the memo CLI on macOS.

apple-reminders

List, add, edit, complete, or delete Apple Reminders and reminder lists via remindctl.

bear-notes

Create, search, and manage Bear notes via grizzly CLI.

blogwatcher

Monitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.

blucli

BluOS CLI (blu) for discovery, playback, grouping, and volume.

camsnap

Capture frames or clips from RTSP/ONVIF cameras.