When we showed early RUQA to a friend who's an HR director, the first thing she asked was: "How do you stop people from gaming the system?" The second thing she asked was: "When do you start using the scores?"
The second question is more important than the first.
Trust in evaluation systems is asymmetric. It's slow to build, fast to destroy. A team that finds out — six months in — that the AI scores were "actually being used" the whole time will never trust the system again. And worse, neither will the next team they work with.
So we wrote the calibration period into the product instead of into a policy doc.
For the first 90 days a workspace exists, every score RUQA computes is shown to the user, shown to their manager, and explicitly excluded from any export, any roll-up, any compensation-adjacent surface. Even our own analytics. The number exists, the user sees it improving as RUQA learns their voice, but it has no weight.
This is unusual for two reasons. First, most enterprise software does the opposite — it starts collecting data immediately and lets you "configure" what's used. Second, it costs us money: we provide the full product for 90 days knowing some teams will use it as a free trial and leave.
Both of those are fine. The reason: a team that walks away after 90 days having seen the algorithm in action is a better long-term outcome than a team that adopts under coercion and games the system for the next two years.
The implementation is small but specific. There's a calibration_until field on every workspace. While it's in the future, the API hides scores from any export and the UI shows a banner. After it expires, the data stays — including the 90 days of "calibration" data, which is now usable as an honest history of what the team's work looks like before they were aware they were being measured.
That's the only honest baseline anyone has. We don't want to lose it.