How a 60-Person SaaS Company Fixed Its OKR Rollout: A Case Study in Getting the Framework Right

A detailed case study following one B2B SaaS company through a failed OKR launch, a structured reset, and the specific changes that made the second cycle produce real alignment.

Fieldnote is a composite case, constructed from patterns that appear repeatedly in documented OKR rollouts. The company, team names, and specific metrics are illustrative — but the failure modes, reset steps, and outcomes reflect what actually happens when organizations course-correct from a broken OKR implementation.


The Setup: A Team That Adopted OKRs for the Wrong Reason

Fieldnote was a 60-person B2B SaaS company that had grown from seed-stage scrappiness to a business with real enterprise customers, a sales team, and a product organization starting to struggle with coordination.

The head of product, call her Nadia, had read Measure What Matters over a long weekend. She pushed for OKRs as the solution to a problem she could feel but hadn’t fully diagnosed: the product and engineering teams were working hard but seemed to be optimizing for different things. Engineering cared about tech debt and uptime. Product cared about feature velocity. Sales cared about three specific features enterprise customers kept asking for. Nobody was wrong, but the vectors weren’t adding up.

Leadership agreed to try OKRs. They announced the rollout two weeks before Q2 began.


Cycle 1: What Went Wrong

The first cycle produced a company-level OKR document that was technically well-formatted. It had seven Objectives and 21 Key Results. Most of the Key Results looked like this:

  • Complete migration to new infrastructure
  • Hire two senior engineers
  • Launch SCIM provisioning for enterprise customers
  • Update the onboarding documentation
  • Conduct three customer advisory board sessions

Every item on the list was reasonable. None of them were Key Results in the sense Grove and Doerr intended. They were project milestones — activities the team planned to do, formatted as outcomes.

The weekly check-in cadence, which had been proposed as a 30-minute Friday slot, quietly disappeared by week three. There was nothing meaningful to discuss. Either you had done the thing or you hadn’t.

At the quarter’s end, the team scored 14 out of 21 Key Results as complete. The retrospective lasted 20 minutes. The consensus was that OKRs seemed useful in theory but generated a lot of overhead for results you could have gotten from a project tracker.


The Reset: Three Structural Changes

Nadia recognized the failure mode after reading more carefully through Grove’s High Output Management. The team agreed to reset before Q3 with three specific changes.

Change 1: Outcome-only Key Results

The team spent a full afternoon converting their Q2 activity-based Key Results to outcome-based ones. The exercise was revealing.

“Complete migration to new infrastructure” became “Reduce average API response time from 340ms to under 150ms.” “Launch SCIM provisioning” became “Enable five enterprise customers to provision accounts without manual IT involvement, reducing average time-to-provisioned from 12 days to under 3.” “Conduct three customer advisory board sessions” became “Identify and validate two high-priority enterprise feature gaps, resulting in at least one added to the H2 roadmap.”

These rewrites forced the team to articulate what the activity was supposed to accomplish — something they hadn’t explicitly discussed before.

Change 2: Explicit Committed vs. Aspirational Labeling

The team categorized every Key Result before the cycle began. Infrastructure reliability targets and the enterprise provisioning KR were labeled Committed. The revenue pipeline KR and new-segment acquisition KR were labeled Aspirational.

This single change transformed the retrospective conversation. When a Committed KR scored 0.8 at the end of Q3, the team could discuss what went wrong without defensiveness — because the label clarified that 0.8 on a Committed KR actually did mean something had gone wrong. When an Aspirational KR scored 0.65, the conversation was about what drove the result and what to do differently next cycle — not about who had underperformed.

Change 3: Weekly Check-In With a Structured Agenda

The Friday check-in was reinstated with a three-question format for each Key Result:

  1. Current metric value
  2. Confidence score (1–10, would this KR hit target at current trajectory?)
  3. Top blocker

The third question changed the energy in the room. By week 4 of Q3, the product team had surfaced that the provisioning KR was blocked by a dependency on an enterprise customer’s IT procurement process — not anything Fieldnote’s engineering team controlled. That surface allowed the team to change the Key Result’s definition of success to reflect what Fieldnote could actually influence.


What Cycle 2 Produced

The Q3 retrospective looked nothing like Q2’s.

The company-level OKR set had been reduced from seven Objectives to four. Key Results numbered 14 instead of 21. The scoring conversation took 90 minutes because there was actually something to discuss.

The Committed KRs averaged 0.92. The Aspirational KRs averaged 0.61. On its face, 0.61 looks worse than Q2’s 0.67 overall average. But the leadership team recognized that the Q3 Aspirational KRs were genuinely harder — they were measuring real outcomes rather than activity completion. A 0.61 on “Grow net new ARR in the $100K+ deal segment from $0 to $280K” was more meaningful information than a 1.0 on “Conduct three customer advisory board sessions.”

Sales had created a pipeline, closed one $150K deal, and had two more in late stages. The score was 0.54 — and the team had a clear picture of exactly where the deals were and what would have needed to happen differently to hit 0.7.


The Insight That Changed the Organizational Dynamic

The most consequential change wasn’t the Key Result rewriting or the labeling — it was a meta-change that happened in week 5 of Q3.

The CEO, who had been supportive of OKRs in the abstract, started attending the weekly check-ins. Not to supervise, but because she found the blocker conversation useful. Within two weeks, she had surfaced that the engineering team was blocked on a dependency she could resolve directly with a vendor.

That single unblocking — which would never have surfaced through the previous reporting structure — saved approximately three weeks of engineering delay.

The team began to understand that the OKR check-in wasn’t a reporting mechanism. It was a coordination mechanism. The check-in’s job was to move information about blockers to people who could remove them, faster than the normal organizational hierarchy allowed.

That insight — which is explicit in Grove’s original formulation — had been completely absent from the first cycle because the first cycle had nothing worth reporting on.


How AI Tools Fit Into This Story

By Q4, Nadia’s team had started using AI assistance in two places in the OKR workflow.

The first was the Key Result quality check. Before publishing the Q4 OKR set, she ran the draft Key Results through a structured AI prompt: “Review these Key Results. Identify any that describe activities rather than outcomes. Rewrite the activity-based ones as measurable outcomes, and flag any that lack a clear baseline.” The output wasn’t perfect — some rewrites missed important context — but it systematically caught the same category of writing mistake that had derailed Q2.

The second was the weekly check-in prep. Each Friday morning, the team lead for each function submitted a brief update, and Nadia used an AI assistant to synthesize across all four teams: “Identify the Key Results with the lowest confidence scores and summarize the stated blockers. What patterns appear across multiple teams?” That synthesis, which previously would have taken 20 minutes to compile manually, took 3 minutes and consistently surfaced cross-team patterns that would otherwise have been invisible.

Beyond Time’s OKR module makes the second workflow even more direct — the platform aggregates confidence scores across a team’s Key Results weekly, flags declining trends before they become end-of-quarter surprises, and connects OKR status to the weekly planning calendar. The connection between “KR is at risk” and “what is scheduled this week to address it” is where most OKR implementations have a gap.


What “Working OKRs” Actually Looks Like

By the end of Q4, Fieldnote’s OKR implementation didn’t look dramatically different from the outside. There was still a document, still quarterly planning, still weekly reviews.

What had changed was the conversation quality. When teams set goals, they could articulate why the specific numbers mattered — not just that they’d been set in a planning session. When the weekly check-in surfaced a problem, the team had a clear path from “blocked” to “unblocked” that usually involved someone specific making a decision.

The organization had stopped doing OKRs and started using them.

The difference, as Grove argued 40 years ago, is whether the framework changes what decisions get made — or just creates a documentation layer around the decisions that would have been made anyway.


Where to Start If Your First Cycle Failed

A failed first cycle is not a reason to abandon OKRs. It is usually a reason to run a more honest retrospective about which of the common failure modes actually applied.

Before your next planning cycle, audit your previous OKRs against three questions:

  1. Are the Key Results measuring outcomes or activities?
  2. Did you explicitly distinguish Committed from Aspirational goals?
  3. Did your weekly check-in produce any actual unblocking decisions?

If the answer to all three is no, your next cycle has a clear starting point. Run the OKR rewrite exercise Fieldnote used — convert your planned activities to measurable outcomes — and add the two-question label to every Key Result.

That is enough structure to make the second cycle meaningfully different from the first.


Tags: OKR case study, OKR implementation, how to fix OKRs, objectives and key results, goal setting, SaaS planning

Frequently Asked Questions

  • What is the most common reason a first OKR cycle fails?

    Most first cycles fail because teams write activity-based Key Results instead of outcome-based ones, and because the scoring norm is unclear — teams don't know whether they're being graded against a 70% aspirational standard or a 100% committed standard.
  • How long does it take to get OKRs working well?

    Most practitioners report that OKRs begin producing meaningful alignment by the second or third cycle, assuming the framework is implemented with proper scoring norms, weekly check-ins, and a genuine separation from performance reviews.
  • Should the CEO set OKRs too?

    Yes. Doerr's documentation of Google's implementation shows that leadership OKR transparency is one of the framework's most powerful alignment mechanisms. If senior leadership exempts itself from OKRs, teams interpret that as a signal that OKRs are for accountability, not direction.