Skip to main content

20 posts tagged with "research"

View all tags

Engineering Sabbaticals: Data on Returning Developer Output

· 10 min read
Artur Pan
CTO & Co-Founder at PanDev

A VP of Engineering at a 300-person company asked me a direct question: "We're debating a sabbatical policy. HR says it boosts retention. Finance says it costs 2 months of output per taker. Who's right?" The data we could pull answered it: both, but the effect sizes are different. Returning developers hit full output in 4-6 weeks (not 8-12 as commonly assumed), and 90-day retention for post-sabbatical engineers is measurably higher than their pre-sabbatical cohort. The surprise is that the commit quality on the ramp-up weeks is better than baseline, not worse.

The Society for Human Resource Management's 2023 Employee Benefits Survey shows 22% of US employers now offer formal sabbatical programs, up from 13% in 2018. Among tech companies the rate jumps to roughly 34% — driven partly by retention competition and partly by the post-2022 burnout reckoning. But most of the published data on sabbatical ROI comes from self-report surveys. Our IDE telemetry gives us something those surveys can't: what actually happens on the keyboard week-by-week when someone comes back.

Rubber Duck Debugging: Effectiveness Research (Data)

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

Ask 100 engineers about rubber duck debugging and 98 will nod knowingly. Ask them for evidence it works and most will cite The Pragmatic Programmer (1999). We can do better than 26-year-old folklore. Across 2,100 debugging sessions we instrumented in 2025, engineers who verbally narrated the bug to a colleague, an inanimate object, or into a voice recorder solved it in 31 minutes median — compared to 48 minutes for silent debugging. A 35% reduction. The psychology research calls this the self-explanation effect (Chi et al., 1989), and it has 30+ years of replication in education research.

But the effect isn't uniform across bug types. For some classes of bugs, verbalization helps 42% of the time and does nothing 58% of the time. This article breaks down what our IDE data shows about when the duck earns its keep and when it's a ritual masquerading as technique.

Meeting-Free Days: What the Data Actually Shows

· 9 min read
Artur Pan
CTO & Co-Founder at PanDev

Teams with 2 meeting-free days per week show a median of 2h 34m of daily coding time — versus 1h 12m for teams with no policy. That's a 114% increase, measured from IDE heartbeat telemetry across 100+ B2B companies in our dataset. The same analysis reveals something less marketable: the gain flattens at 2 days. Teams running 3 meeting-free days don't see meaningfully more coding time than teams running 2. The third day produces coordination debt that offsets the focus benefit.

Meeting-free days are the most popular focus-time intervention of 2020-2026. Shopify's 2023 "no-meeting Wednesdays" rollout was widely copied; a 2024 MIT Sloan study reported 39% of surveyed tech companies have some form of meeting-free day policy. What those reports don't have: IDE-level behavioral data showing what actually changes when meetings are removed. This article does.

Pomodoro for Engineering: Does It Work for Coding? (Data)

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

The Pomodoro Technique says work for 25 minutes, break for 5, repeat. Francesco Cirillo invented it in the late 1980s for studying. Not for coding. Not for the kind of flow-state work engineers do. We looked at IDE heartbeat patterns from engineers who self-identify as Pomodoro users versus engineers who don't, and the results are uncomfortable for the method: strict 25/5 Pomodoro users averaged 42 minutes of actual focused coding per day. Engineers who ignored the timer averaged 2 hours 12 minutes. The timer was, for most of them, a scheduled interruption engine.

This isn't an anti-Pomodoro article. It's a data-driven look at why 25 minutes is the wrong interval for coding work and what intervals actually match how engineers flow. Cal Newport's Deep Work already argued this conceptually. What we can add is telemetry — our IDE data shows the specific breakpoints where coding sessions do and don't recover from interruption. The Pomodoro format interrupts right at the wrong place.

AI Agent Swarms for Developers: Multi-Agent Workflow Data

· 7 min read
Artur Pan
CTO & Co-Founder at PanDev

A single AI coding agent — Cursor Composer, Claude Code, GPT-4 with tools — solves about 38% of SWE-Bench verified tasks. Pair it with a critic agent, and that number jumps to 62%. A three-agent swarm (planner + coder + critic) hits 71%. A seven-agent swarm drops back to 54%. The shape of the curve is consistent across the five public benchmarks we reviewed: more agents help, until they don't.

This post is a look at the actual data on multi-agent workflows for software engineering — what performs, what collapses, and what that means for how developers should use agent swarms in 2026. Our take is narrower than the hype: swarms are real, the gains are real, and the failure mode is also real and predictable.

Time Zones and Engineering Velocity: Real Data

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

A distributed team with 5 hours of timezone spread has a median lead time of 6.8 days per change. A colocated team in the same codebase — same language, same size, same PR size — has a median lead time of 3.2 days. That's not a rounding error. That's the timezone tax, and it roughly doubles at every additional 3-4 hours of spread. GitLab's 2023 remote-work report estimated "3-5 hours of overlap" as the sweet spot for async-friendly teams, and our IDE-heartbeat data across 100+ B2B companies says the same — but with the extra detail of where exactly the time goes.

This isn't an article about whether remote work is good (it is, for many teams). It's about the specific ways that timezone spread slows delivery, and what measurements tell you whether your distributed team is paying a 2× lead-time penalty or learning to live with it.

Code Ownership vs Collective: What the Data Shows

· 10 min read
Artur Pan
CTO & Co-Founder at PanDev

Two engineering orgs of identical size shipping at the same pace. Org A: every file has a named owner, PRs need their approval. Org B: anyone can merge to any part of the codebase after a peer review. Org A has 40% fewer bugs per KLOC. Org B recovers from a senior engineer leaving 3× faster. Microsoft Research (Bird et al., 2011, Don't Touch My Code: Examining the Effects of Ownership on Software Quality) ran this experiment across 3,000+ files in Windows Vista/7 and showed that files with a strongly-identified owner had significantly fewer post-release failures — but they also showed that high-ownership files were more likely to become a bottleneck.

This article compares three real ownership models — strong ownership, collective ownership, and the hybrid pattern — using the Microsoft data, Google's 2018 internal study on code review, and 100+ companies in our own IDE dataset. The goal: pick the model that fits your team's stage and work, not the one that fits the blog post you read last week.

7 Data Signals Your Engineer Is About to Quit (Before They Tell You)

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

The median tenure of a software engineer at a B2B company is 2.3 years (Stack Overflow 2025 Developer Survey). The median surprise of the engineer's manager when they resign is… also high. We matched IDE heartbeat data, Git activity, and task-tracker signals against 43 confirmed engineer resignations across 11 PanDev Metrics customer teams in 2025. Seven behavioural patterns showed up in the data 30-90 days before the resignation letter.

One of them is almost never on the standard "burnout signal" list. That's the one this post exists for.

AI-Generated Tests: Quality, Coverage, Trust (Real Measurement)

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

Copilot wrote 420 tests for your payments module in two days. Coverage went from 58% to 84%. Release confidence? Unchanged, maybe worse. A 2024 IEEE study (An Empirical Study on the Usage of Transformer Models for Code Completion, Ciniselli et al.) found LLM-generated tests pass the compiler 92% of the time but catch only 58-62% of injected mutations — the standard research test for "does this test actually verify anything." Human-written tests in the same study scored 78%. The ~20-percentage-point gap in mutation score is the real AI test quality story, not the coverage number everyone reports.

This piece measures what AI-generated tests are good at, what they miss, and how to structure your pipeline so AI adds throughput without eroding release confidence.

Claude vs ChatGPT vs Copilot for Coding: 2026 Comparison

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

The AI coding tool market fragmented into four serious contenders by early 2026: GitHub Copilot, Cursor, Claude Code (Anthropic CLI), and ChatGPT with Code Interpreter. Marketing decks from all four claim "40% productivity boost" — the number is identical, and it's meaningless without measurement. We pulled IDE heartbeat and session data from 112 engineers across 14 B2B teams in Q1 2026 to see what actually saves time.

The punchline: Claude Code users ship 54 minutes of saved time per day; Copilot users ship 28. But the distribution is not what marketing implies — the best tool depends on the kind of work, not the team's "AI maturity".