Why are focus scores from productivity apps inaccurate?

Focus scores from apps like RescueTime classify time by application category rather than by actual cognitive engagement. They cannot distinguish reading a research paper from browsing entertainment within the same browser, making the score a proxy for application preferences rather than attention quality.

What is Goodhart's Law and why does it apply to focus tracking?

Goodhart's Law states that when a measure becomes a target, it ceases to be a good measure. Applied to focus scores, once you start optimizing your RescueTime score, you make choices that improve the metric without necessarily improving your actual cognitive output.

What should I track instead of a focus score?

Track three distinct signals: deep hours per day (volume), session completion rate (environmental integrity), and distraction count per hour (in-session attention quality). Three dimensions produce more actionable information than any composite score.

Does having a high focus score mean you did good work?

Not necessarily. A high focus score means you spent time in applications categorized as productive. Whether the work you did in those applications was cognitively demanding, creative, or output-generating is invisible to the score.

Why Focus Scores Are Misleading (And What to Track Instead)

A number that tells you how focused you were sounds useful. It is usually not.

The focus score — popularized by RescueTime and adopted by several other productivity apps — has a compelling surface logic: track which applications you use, classify them as productive or unproductive, and produce a daily number that rises when you use “productive” apps and falls when you do not.

The problem is not with the measurement intent. It is with what is actually being measured.

What Focus Scores Actually Measure

RescueTime, Timing, and similar tools work by categorizing applications. Chrome might be “neutral” by default; your writing app might be “very productive.” The score is a weighted average of how much time you spent in each category.

But application categories do not map onto cognitive engagement. Consider what gets categorized identically in these systems:

Reading a primary research paper in Chrome and reading political commentary for 45 minutes
Writing a first draft in Google Docs and fixing typos in a document you wrote three weeks ago
Reviewing code in VS Code and making trivial formatting changes

The application is the same. The cognitive depth is completely different. The focus score cannot see the difference.

This is not a secret. RescueTime acknowledges in its own documentation that its productivity score reflects user-defined category preferences and is meant to be customized — not treated as an objective measure of cognitive performance. The marketing around the score, however, tends toward authority and precision that the underlying methodology does not support.

The Goodhart’s Law Problem

Even if app-category scoring were a reasonable proxy for focus, there is a second problem: using a score as a target.

The economist Charles Goodhart articulated this in the context of monetary policy: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” In practice: when a measure becomes a target, it ceases to be a good measure.

Applied to focus scores, this plays out predictably. Once you know RescueTime is running and your score will reflect your application usage, you start making choices that improve the score rather than choices that improve your work. You keep your writing app open during low-quality sessions instead of closing your laptop and taking a thinking walk. You avoid switching to a browser for research because it will lower your score, even when the research is exactly what the work requires.

The measure you designed to understand your behavior ends up distorting your behavior. The score goes up; the quality of your work is unchanged or worse.

The Single-Number Problem

Focus scores carry a third flaw that is more fundamental: they collapse a multidimensional phenomenon into a single number.

Focus is not one thing. It has at least three distinct dimensions that behave independently:

Volume: How many hours of deep work did you actually do?

Session integrity: When you scheduled deep work, did those sessions run to completion, or did they get interrupted and abandoned?

In-session attention quality: During your sessions, how fragmented was your attention?

A person with excellent in-session attention but a schedule consumed by meetings will have low volume. A person with good volume and good in-session attention but an office environment full of interruptions will have low session integrity. A person with good volume and high session completion rate but an anxious relationship with difficult work will have high distraction counts within sessions.

These three profiles require completely different interventions. A single focus score hides which problem you actually have.

Gloria Mark’s research on interruptions and attention recovery illustrates why in-session fragmentation specifically matters: each significant context switch carries a recovery cost measured in tens of minutes, not seconds. A session with eight distraction switches per hour — even if it is logged in a “productive” application — is doing far less cognitive work than a session with two.

The Plausible Defense of Focus Scores

It is worth taking the counterargument seriously.

The case for app-based focus scores is that they are zero-friction and complete. You do not have to remember to log anything. The data captures all your computer time, not just the sessions you choose to record. And for users who genuinely do not know how their time is distributed, even an imperfect measure can be a useful wake-up call.

All of that is true. App-based tracking has real value for time auditing — understanding your macro time distribution across different types of work. If you are surprised to learn you spend three hours a day in email when you thought it was one hour, that is useful information even if the “focus score” attached to it is noisy.

The problem is not the data collection. It is the single number synthesized from it, and the implicit claim that this number reflects how well you focused rather than just how you distributed your application usage.

What Three Numbers Tell You That One Cannot

Consider two knowledge workers with identical focus scores of 72 (on a 100-point scale):

Worker A: 3.5 deep hours per day, 85% session completion rate, 4.2 distractions per hour. This person has strong volume and excellent environmental protection. Their in-session distraction rate is slightly high — possibly because their work type generates a lot of micro-uncertainties that prompt checking behaviors. The intervention is targeted: reduce the conditions that trigger self-interruption within sessions.

Worker B: 1.2 deep hours per day, 48% session completion rate, 1.8 distractions per hour. This person has excellent in-session attention — when they get into deep work, they are genuinely focused. But sessions keep getting cut short, and the total volume of deep work is low. The intervention is structural: protect session time from interruption, and increase the number of deep work blocks scheduled.

Identical focus score. Completely different performance profiles. Completely different interventions.

The three-metric approach — deep hours, session completion rate, distraction count per hour — does not require any special software. It requires honesty and five minutes per day.

How This Changes What You Do This Week

If you currently use a focus score as your primary productivity signal, the most useful shift is not to abandon the app but to stop treating the score as a measure of cognitive performance.

Use app-based tracking for what it does well: understanding where your hours go in aggregate. Is more time going to email than you realized? Is your calendar fragmented with short meetings in a way that leaves no room for sustained work? App data can answer those questions.

For focus quality, replace the score with three logged signals you collect yourself. A distraction tally during sessions, a completion note at the end, and a rough hour count of genuine deep work. These three numbers, reviewed with AI at the end of each week, will tell you more than a daily focus score ever could.

The first step is to log just one of the three metrics — pick distraction count — for every deep work session you do this week.

Tags: focus score myth, RescueTime limitations, Goodhart’s Law productivity, deep work measurement, attention metrics

Frequently Asked Questions

Why are focus scores from productivity apps inaccurate?

Focus scores from apps like RescueTime classify time by application category rather than by actual cognitive engagement. They cannot distinguish reading a research paper from browsing entertainment within the same browser, making the score a proxy for application preferences rather than attention quality.
What is Goodhart's Law and why does it apply to focus tracking?

Goodhart's Law states that when a measure becomes a target, it ceases to be a good measure. Applied to focus scores, once you start optimizing your RescueTime score, you make choices that improve the metric without necessarily improving your actual cognitive output.
What should I track instead of a focus score?

Track three distinct signals: deep hours per day (volume), session completion rate (environmental integrity), and distraction count per hour (in-session attention quality). Three dimensions produce more actionable information than any composite score.
Does having a high focus score mean you did good work?

Not necessarily. A high focus score means you spent time in applications categorized as productive. Whether the work you did in those applications was cognitively demanding, creative, or output-generating is invisible to the score.