JHU — Nikki Piccini

Context

Teaching Surgeons Is One of Medicine's Hardest Problems

With over 1.7 million procedures performed annually, cataract surgery is the most common surgical procedure in the United States. Yet skill development has historically relied on subjective, inconsistent human feedback. A surgeon in training might wait days for a senior to review a recorded procedure and offer notes. That delay compounds into feedback loops which lead to slow improvement, and downstream consequences for patients.

Circlage, a startup founded inside Johns Hopkins University, had spent years solving the hard technical problem. Using neural networks trained on expert-annotated video, their AI could segment a recorded cataract procedure into discrete phases and score a surgeon's technique against established benchmarks. While the model worked, what didn't exist yet was an interface for surgeons to actually use.

This meant that rather than figuring out how to make an existing product better, my job was to figure out what this product should be. What information did surgeons actually need from an AI evaluator? In what form, and at what granularity? But most importantly, what it should feel like to receive AI feedback on your own professional surgical technique?

My Role

Lead UX — From Accelerator Prototype to Shipped Platform

I joined this engagement through the studio I was working with, embedded alongside the Circlage founders inside a tech accelerator at the start of the project. As the product expanded in scope, so did my role. I owned the full UX research and strategy lifecycle — from initial discovery through IA and concept validation — with two UI designers and two developers supporting me across design and build phases.

My primary responsibility on the UX side was research, strategy, and information architecture. On the team management side, I coordinated between the studio's designers and developers while serving as the principal point of contact with the Circlage and JHU stakeholder group, translating a complex, multi-party institutional environment into actionable direction for the people I was leading.

What I owned

Research, Strategy & IA

User research and synthesis, UX strategy, information architecture, concept definition, usability testing, and stakeholder alignment across JHU, Circlage, and the studio.

Where I supported

UI & Prototyping

UI design and high-fidelity prototyping were led by the two designers I was managing. I contributed directional feedback and cross-functional review, not production design.

Design Constraints

Three Non-Negotiables That Shaped Everything

Before any research or strategy work could produce actionable direction, I needed to map the constraints that would bound every design decision downstream. Three dominated, and unlike most product constraints, all three had real institutional weight behind them.

HIPAA Compliance

Surgical video is protected health information. Every interaction with that data — upload, storage, access, display — had to be designed with HIPAA in mind from the start.

ADA Accessibility

JHU's institutional requirements mandated full ADA compliance. In a clinical context, that meant designing for a professional user base that could include users with a range of visual or motor needs.

Architectural Scalability

The founders wanted to expand beyond cataract surgery. The IA and design system couldn't be hardcoded to one procedure, as it had to work as a platform from the start with a structure that could absorb additional surgical domains over time.

Treating these legal dependencies as design inputs fundamentally changed how I structured the research phase and what questions I asked of both the JHU compliance team and the surgeons themselves.

Research

Understanding What "Useful Feedback" Means to a Surgical Expert

I structured the research phase around a core question that most AI product teams skip: what makes feedback feel credible or dismissible to someone with deep domain expertise and high professional stakes?

Surgeons are not passive recipients of feedback. They have finely tuned internal models of their own performance, built over years of training and mentorship. Introducing an AI evaluator into that context means navigating a specific kind of resistance: not general skepticism about technology, but the professional friction that comes from having your technique assessed by a system you didn't train and can't fully interrogate.

The insight that most shaped the product strategy: surgeons didn't distrust AI scoring simply because it was AI, which was surprising. They distrusted it when it gave them a number without a reason, and devoid of context. The same score from a senior colleague with context, nuance, and an explanation of the "why" would be acted on immediately. Without that scaffolding, it was just a grade.

That finding reframed the entire design problem. Rather than standard-issue concerns about presenting scores well visually, it was about designing a feedback system that was epistemically legible — one where a surgeon could trace the logic from their video, to the AI's analysis, to a specific area for improvement, and decide for themselves how much weight to place on it.

What the research covered

The research phase focused on three areas: how surgeons currently receive and process performance feedback in clinical training; what makes that feedback feel authoritative versus dismissible in a high-stakes professional context; and where an AI-generated score might create anxiety or resistance if not carefully framed — and how to design against those failure modes proactively.

Those three threads fed directly into the IA, the interaction logic of the score display, and the hierarchy of information in the feedback report — particularly decisions about what level of detail was surfaced at which point in the experience.

Visual reference — Standard Beagle Studio case study (public)

Dashboard and score report UI as published by the studio

Information Architecture

System Development over Screen Design

The IA challenge on Circlage was multi-layered. I had to reconcile four structures simultaneously: the AI model's output (what it actually produced), the surgeon's mental model of their own procedure (how they narrated their performance internally), the clinical rubric that informed the scoring (the expert benchmark hierarchy), and the platform's need to scale across future surgical domains without a rebuild.

These four structures didn't map onto each other cleanly. The AI segmented video into phases that didn't always correspond to how surgeons described their own technique. The rubric surfaced different things depending on whether you were a trainee or an attending. And the scalability requirement meant I couldn't design an IA that assumed a fixed number of steps or a fixed procedure-specific vocabulary.

Core IA decisions — abstracted

Content layers

Aggregate score dashboard

Procedure-phase breakdown

Individual technique scores with rubric rationale

Historical performance trends

Personal video library

System design principles

Procedure-agnostic structure throughout

Role-sensitive content hierarchy (trainee vs. attending)

Score and rationale always structurally paired

HIPAA-compliant data access surfaces

Progress view elevated above single-session performance

The decision I'm most proud of

Making score and rationale structurally inseparable. In earlier concept explorations, a surgeon could see their aggregate score without being one step away from the phase-level breakdown and reasoning. My research showed this was the exact pattern that eroded trust — a top-line number with no visible path to the thinking behind it.

I pushed to restructure the hierarchy so the rubric breakdown was always one level away from any score surface, not buried at the bottom of a drill-down. The IA enforced the right mental model regardless of where a user entered the experience.

Designing for AI

Calibrated Confidence as a Design Principle

Most AI interfaces fall into one of two failure modes: they present model output with the same visual authority as established fact, or they hedge so aggressively that the output feels meaningless. Neither worked for a clinical training context where surgeons needed to develop a calibrated, long-term relationship with the system.

The principles I developed during the strategy phase governed how AI output was represented throughout the product — in the score report, in the feedback language, and in the hierarchy of the dashboard itself.

Score and reasoning are always surfaced together

No score appears without the rubric criteria and phase-level breakdown that produced it. The surgeon should always be able to trace the "why" without having to hunt for it. This was an IA decision before it was a visual one.

Design for expert judgment, not expert deference

The interface positions the AI as a structured input to a surgeon's own assessment, rather than a verdict to accept or reject. Language, visual hierarchy, and framing all reinforce the mental model of a mentor, not a grading system.

Progress over performance

The historical trend view is elevated in the hierarchy — not buried beneath individual session scores. The platform's value compounds over time, and the IA reflects that. A single session score is a data point; a trend is an insight.

One tension I navigated throughout strategy: the founders reasonably wanted to demonstrate the AI's accuracy and confidence. Designing for expert users meant arguing that expressing calibrated uncertainty was a feature, rather than a weakness. A surgeon who trusts a system that acknowledges its own limits will use it more, and with a higher degree of confidence.

HIPAA & Systems Thinking

Compliance as Information Architecture

HIPAA requirements don't just affect what data you store — they shape what you can surface, to whom, under what authentication conditions, and how failure states are communicated. I worked directly with JHU's compliance and security teams during the strategy phase well before any design work was in production to map those implications onto the IA and user flows.

Three specific decisions came directly out of that work:

Access scoping by role

The system needed to distinguish between a trainee's personal video library and any aggregated institutional view. These are meaningfully different data surfaces with different consent, access, and audit trail implications. They had to be architecturally separate and visually distinct from the start.

Error states that don't expose PHI

When something failed during video processing, the system needed to communicate that failure clearly without surfacing any identifying information about the underlying data. I designed a set of error patterns specific to this context — distinguishing between system failures, processing delays, and access errors in a way that was informative without being a compliance liability.

Session and authentication flow

I advised on how the UI handled session state and re-authentication in a way that met JHU's security standards without creating friction that surgeons might try to find a workaround for. A compliance edge case that needed to be considered was surgeons who may be accessing the platform between cases in a clinical setting.

Results

A Shipped Platform, a Scalable System, and a Gold Award

The platform launched, passed usability testing with strong results across all sessions, and won a Hermes Creative Award Gold in 2022. Circlage has continued operating as an active training platform within Johns Hopkins, with the AI and platform team working toward expanding coverage beyond cataract surgery — a trajectory the IA and design system were built to support from the start.

100%

Usability completion rate

Across all moderated sessions. Users navigated the score report and feedback flow independently with strong satisfaction scores, including the multi-step rationale drill-down.

Gold

Hermes Creative Award, 2022

Web Application / SaaS Product Design category. Awarded to the studio engagement — publicly cited in the studio's published case study.

1

Procedure-agnostic design system

The IA and component structure were built to accommodate additional surgical domains without a structural rebuild — a non-negotiable from the founders from day one.

3

Compliance frameworks designed into — not onto — the product

HIPAA, ADA, and JHU security standards all treated as design inputs from the research phase, instead of legal reviews at the end.

Reflection

What Designing for AI Actually Means

The hardest problems on this project were epistemological. How do you help a user develop a calibrated relationship with a system they can't fully interrogate? How do you design for expert judgment rather than expert deference? How do you build an IA that's honest about what the AI knows and doesn't know, without undermining the product's value proposition?

Those questions don't have interface-level answers alone. They require research to understand the user's relationship to their own expertise, strategy to define what the system should and shouldn't claim to know, and an information architecture that structurally enforces the right mental model, regardless of which screen a user lands on first.

The deepest lesson from this project was that trust is an architecture problem before it's a design problem. You can't solve it with copy tone or visual styling. The structure of the information has to make the reasoning visible, and it has to do that by default, rather than something a user has to opt into finding.

What I would do differently

I would push earlier and harder for a longitudinal research component — something that followed a small cohort of surgeons using the platform over several months rather than relying entirely on moderated sessions. The trust and calibration questions I was most interested in are inherently temporal; they require observing how a user's relationship with an AI system changes over time instead of how they respond to it on first contact. That kind of research takes longer to set up, but it produces the kind of insight that reshapes both UX decision-making and product strategy.

What this project demonstrates about how I work

I'm most useful on problems where the design question isn't yet fully formed. The challenge here wasn't "design a score display" — it was "figure out what a trustworthy AI feedback system for expert users actually requires, and build the structural logic for it." That kind of work starts in research, moves through strategy, and resolves in an IA that the rest of the team can build on. That's the kind of engagement I'm most interested in doing more of.

Designing the Interface for an AI Surgical Mentor