Self-initiated · Research-through-design

Driftline

Does how we appraise our screen time predict wellbeing better than how much we use?

Role
Sole researcher & designer
Methods
Experience sampling · Multilevel modeling (R)
Timeline
June 2026
Status
Designed · pre-registered · fielding pending
Experience sampling R / lme4 Pre-registration (OSF) Multilevel mediation React Native Research-through-design

Honest status: this is a designed and analytically validated project, not a fielded one. The interaction is prototyped, the study is pre-registered, and the analysis pipeline is built and validated on simulated data with known parameters. No participants have been run yet, so there are no empirical findings about real users. The numbers below are method validation, not results.

01

The wrong number

Digital-wellbeing tools almost all put one number in front of you: hours and minutes of use. I started from a measurement problem with that number.

It is poorly measured — a 2021 meta-analysis (Parry et al., Nature Human Behaviour) found self-reported phone use correlates only modestly with logged use, and the gap is worst exactly for the "problematic" use these apps care about. And even measured well, it barely predicts anything: Orben & Przybylski (2019), across 350,000+ people, found technology use explains on the order of 0.4% of variance in wellbeing.

0.4%
of wellbeing variance explained by screen time (Orben & Przybylski, 2019)
350k+
people across the datasets behind that estimate
self-reported use vs. device logs rarely match (Parry et al., 2021)

The field is built on a metric that is both noisy and nearly non-predictive — while the variable that should matter, how a person appraises their own use, sits almost entirely unmeasured.

This continues a thread that runs through my research: subjective appraisal tends to predict psychological outcomes more strongly than objective behavioral measures — the same structure as fear-of-crime research, where perceived risk predicts fear far better than actual victimization risk. Driftline applies that lens to digital life.

02

Why the existing apps fall short

I ran a competitive analysis of the intentionality-app category (One Sec, ClearSpace, Opal, ScreenZen, Intently, and others). Two patterns repeat: they measure quantity (the noisy, non-predictive number), and they hand the work back to the user through timers, blocks, and willpower — treating the problem as a discipline failure rather than a product engineered to capture attention.

Importantly, the "pause before you open the app" mechanic is already well-established and even has a published study behind it, so I deliberately did not claim that as a contribution. The real gap is narrower: no app in the category treats appraisal as longitudinal data, pairs an intentionality read with an affect read, or closes the loop into a measurable research instrument.

03

The concept & design

Driftline replaces the minutes dashboard with two objects. First, an appraisal moment — a two-tap, experience-sampling check-in that captures a session on two axes:

Intentionality
drifted in ↔ chose it
Did you mean to be there, or did the session just happen to you?
Affect
depleted ↔ restored
A continuous slider, not a moral verdict — so the instrument doesn't manufacture the dichotomy it measures.

Second, the mirror — instead of reporting volume, it reflects the shape of your sessions back to you. Each session is a point in a 2×2 field, and two perceptual encodings carry the meaning without a legend: warmth = affect (restored sessions glow warm, depleting ones go cold) and blur = drift (sessions you drifted into render literally out of focus; chosen ones are sharp).

In the working prototype, the headline insight runs counter to the usual story: the most depleting sessions aren't mindless scrolling but intentional late-night work — and some "drift" is genuinely restorative. A minutes-based tool can never surface that, because minutes don't know how a session felt or whether it was chosen.

The design intent is that the measurement itself is the intervention — a brief, non-judgmental moment of noticing — while the reflection surfaces something non-obvious. A functional interactive prototype of this loop is built (React); the production target is React Native / Expo.

04

The research design

The concept is only worth building if the underlying hypothesis holds, so I designed a study to test it rather than asserting it: a within-person experience-sampling study — signal-contingent prompts ~4×/day for 14 days, with baseline and exit measures. Because each participant is their own control across pings, every stable confounder (personality, baseline wellbeing, chronic use level) is automatically held constant.

I then wrote a full pre-registration using the field-standard ESM template (Kirtley et al., 2021), which locks the decisions most vulnerable to after-the-fact flexibility:

A subtle but critical decision: the appraisal of a session and the wellbeing outcome are measured as separate items with different referents, to avoid a circular model that predicts wellbeing from wellbeing.

05

The analysis pipeline

Read this first: the numbers in this section come from synthetic data with effects I planted, run to prove the pre-registered models work and to size the real study. They are not findings about real people.

I built the full analysis pipeline in R (lme4 / lmerTest) and validated it by simulating a dataset to the pre-registration's exact shape (50 people × 14 days × 4 pings, ~65% compliance) with known true effects, then checking recovery. Every pre-registered estimate landed on its planted value within sampling error.

Recovered effect of appraisal on mood — simulated data
Within-person (the causal-relevant estimate)0.29
Between-person0.67
Naive regression (ignores nesting)0.49
Standardized coefficients. The naive estimate is a blend that equals neither — which is exactly why the multilevel decomposition is non-negotiable.
Validation result · 1
The centering decision changes the answer
Within-person 0.29, between-person 0.67, and naive regression 0.49 — a blend of the two. Skip the multilevel step and the headline estimate is biased by roughly double.
Validation result · 2
The core hypothesis reproduces mechanically
Once appraisal enters the model, the effect of minutes themselves collapses toward zero — "minutes barely predict once you know how it felt," shown rather than asserted.
Validation result · 3
Power is counterintuitive — and it set the recruitment plan
A Monte-Carlo simulation showed the primary within-person test is well-powered even at N≈15 (power comes from occasions, not headcount), while between-person effects stay under-powered even at N=50. So: recruit for compliance and retention, not a big roster.
Pipeline validation — full output
Four-panel validation figure: centering contrast, parameter recovery, within-person relationship, and power curves
Recovery of known parameters and within/between power, from the reproducible R pipeline on simulated data.
06

What this demonstrates

Research-to-product reasoning — originating a design from a measurement insight in the literature, not redesigning an existing screen.
Methodological rigor — intensive-longitudinal design, pre-registration, multilevel modeling, mediation, simulation-based power, and causal reasoning about within- vs. between-person effects.
Honest scoping — a pressure-tested concept that names its own competitors and limitations rather than overclaiming.
End-to-end range — literature synthesis → interaction design → working prototype → reproducible statistical pipeline.
07

Honest status & roadmap

Literature synthesis & competitive analysisDone
Concept & interaction designDone
Working interactive prototype (core loop)Built (React)
Study designDone
Pre-registration (OSF, public)Registered
Analysis pipeline (R), validated on simulated dataDone
Production instrument (React Native / Expo)Planned
Real-world fielding (participants)Not started
Empirical findingsNone yet

Next steps

Build the production instrument with scheduled prompts and data export; run a small pilot to confirm timing and item comprehension; field the study; run the confirmatory analysis on real data and add the results here.

The pre-registration is public and timestamped: osf.io/7ugtn.

Reflection

Designing the study taught me more than running a quick survey would have. Forcing myself to separate appraisal from outcome, to pre-register a single primary hypothesis, and to validate the analysis on simulated data before touching real participants is a discipline I'll carry into applied research work — it's the difference between a finding you can defend and one you merely hope for.