Meta | Reality Labs

Defining Reliability in Simulation Tools

How understanding the developer view of reliability shaped roadmap and product prioritization decisions


Context

Prior research surfaced reliability as a recurring pain point for developers that used simulation and debugging tools. While stakeholders broadly agreed that reliability was a problem, discussions consistently stalled around a basic question: what does “reliable” actually mean to developers?

Without a shared definition, teams struggled to evaluate whether existing improvements addressed real developer needs or supported future product direction. As a result, insights often remained abstract and disconnected from decisions about where to invest or what to deprioritize.

The work involved global collaboration across teams in the US and Zurich, spanning multiple technical perspectives and product surfaces within SceneAPI, a mixed-reality framework for capturing background data.


The research set out to answer two questions:

How do developers define reliability when using simulation and debugging tools?
How should that developer-defined understanding of reliability guide product direction and prioritization for Scene API?

Without answering the first question, teams lacked a shared language for discussing reliability. Without answering the second, insights risked being disconnected from real product decisions.


Leadership

I was brought in as the owner to define the problem and lead the work end to end. I owned research framing, study design, synthesis, and cross-team alignment. I partnered closely with stakeholders across time zones, aligning on scope, interpreting findings collaboratively, and translating insights into clear implications for product direction.

The goal was not to validate individual features, but to establish a shared, developer-informed understanding of reliability that teams could use to make decisions with confidence.


Findings

Research showed that developers did not experience reliability as a single system attribute, but as a judgment shaped by trust, consistency, and transparency. The value of these dimensions was not only in defining reliability, but in revealing what happened when reliability was questioned.

  • Using clarity to mitigate risk
    When system limits were unclear, developers overestimated where the tool could be trusted. Silent failures and edge cases were more damaging to confidence than visible breakdowns, particularly when developers assumed coverage that did not exist. In these cases, reliability broke down not because the system failed outright, but because its boundaries were difficult to interpret.
  • Why consistency mattered
    Developers struggled to trust a system that behaved reasonably but unpredictably over time. Even when behavior was directionally useful, inconsistency made it difficult to form stable expectations and introduced uncertainty in workflows where predictability mattered more than raw capability. This variability reduced confidence in moments when developers needed to rely on the tool.
  • How transparency shaped expectations
    Reliability was most fragile when there was a gap between how developers expected the system to behave and what they observed in practice. Speed gains without confidence increased the risk of misinterpretation and rework, and technically acceptable behavior was often discounted when it conflicted with expectations. In these moments, transparency mattered less as a feature and more as a condition for trust.

Across these dimensions, reliability functioned less as a quality to optimize and more as a decision condition. Developers did not treat reliability as an abstract attribute, but as a signal of whether they could act. Traditional metrics were insufficient to capture this judgment, particularly when decisions were ambiguous or costly.

These findings reframed reliability from a system property to measure into a risk to manage, rooted in how developers judged whether they could act on what they were seeing.


Impact

Since reliability is experienced as judgment rather than a metric, clarity is critical in AI-driven tools like SceneAPI. By establishing a shared, developer-informed definition of reliability, the work gave teams a common language for evaluating tradeoffs and making decisions with greater confidence. Roadmap discussions shifted from abstract debates about improvement to concrete questions about whether changes would meaningfully increase developer trust.

This reframing clarified where team investment would have the greatest impact and where it would not, enabling confident prioritization and safer deprioritization.

↑ Back to top