Skip to main content

Kubernetes Engineering Observability: What to Track in 2026

· 7 min read
Artur Pan
CTO & Co-Founder at PanDev

A platform team running 11 production Kubernetes clusters has 94,000 metrics scraped every 15 seconds, 2.4 TB of logs per day in Loki, and a Grafana instance with 340 dashboards. When their VP of Engineering asked "are our teams shipping reliably on K8s?", nobody could answer in under an hour. They had cluster observability. They had zero engineering observability.

These are two different problems. Cluster observability tells you whether pods are healthy. Engineering observability tells you whether engineering on top of those clusters is healthy — whether deployments are fast, whether rollbacks are rare, whether developers are waiting on infrastructure or fighting with it. Most K8s shops have solved the first and ignored the second. The 2024 CNCF annual survey reported that 68% of enterprise K8s users struggle with "making observability actionable", which is a polite way of saying they have metrics but no decisions come out of them.

HR + Engineering: Collaboration Playbook for Growing Teams

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

In 2024, LinkedIn's Workforce Report flagged "HR-Engineering misalignment" as the #2 reason scaling tech teams lose senior engineers, right behind compensation. The usual failure mode: HR designs job ladders on a generic template, Engineering runs calibration as an undocumented side-channel, and two months later the best senior left because their title didn't update with their responsibilities.

This is not an HR problem, and not an Engineering problem. It's a collaboration problem that surfaces every 6-12 months during promotion and compensation cycles. Here's a playbook for making the partnership actually work — who owns what, when, and which data gets shared.

Junior to Senior: Promotion Criteria Backed by Data

· 9 min read
Artur Pan
CTO & Co-Founder at PanDev

A 3.5-year engineer at a 120-person scaleup I worked with last year was "obviously senior" — by everyone's intuition. Her Git and IDE data told a different story: she was shipping more features than any senior on the team, but she wasn't reviewing PRs from people outside her squad, never owned a system-design proposal end-to-end, and her commits clustered in a narrow 2-component surface area. Her manager's gut said senior. The behavioral evidence said: ready in 6-9 months, not today. The 6-month data revisit confirmed it — she got there, and the promotion landed stronger than the intuition-based one would have.

Promotion decisions fail in two directions. Promote-too-early produces under-supported seniors who quietly under-perform and sometimes leave. Promote-too-late loses your best engineers to competitors who saw the readiness first. A 2023 First Round Review study on engineering careers found the single largest driver of senior-engineer regret was "promoted without being ready," cited by 41% of respondents. Data-backed criteria reduce both errors.

Travel and Hospitality Engineering: Booking Platform Teams

· 10 min read
Artur Pan
CTO & Co-Founder at PanDev

A former Expedia engineer told me the quote that should be pinned above every travel-engineering team's desk: "We don't ship software — we ship promises about the future availability of physical objects." An Amadeus GDS query returns inventory that's simultaneously being consumed by 50+ competing distribution channels. Your code has to reconcile that in under 400ms or the user gives up.

Phocuswright's 2024 travel-technology report pegs the global online-travel industry at $1.06 trillion in gross bookings, with roughly 38% flowing through technology platforms that sit between travelers and suppliers. Amazon Web Services' travel-vertical analysis documents that peak-season traffic on booking engines routinely exceeds 15× the yearly baseline — more extreme than any other e-commerce vertical except Black Friday retail. Engineering teams built on "just scale horizontally" assumptions discover, on the first December, that search-cache misses on an unreachable GDS generate cascading failures 90 seconds deep.

AdTech Engineering: Data-Heavy Teams and Productivity

· 7 min read
Artur Pan
CTO & Co-Founder at PanDev

In our IDE dataset of 100+ B2B companies, engineers on AdTech platforms ship 38% fewer pull requests per month than engineers in SaaS tooling — and produce more customer revenue per head. Meanwhile The Trade Desk disclosed it processes over 13 million ad requests per second. Scale like that reshapes what "productive" means. A PR count that would look alarming in a consumer app is perfectly normal when a single configuration line is deployed across 10 million QPS.

AdTech engineering is different, and measuring it with generic DORA-only dashboards misses the point. This article lays out what data-heavy teams actually spend time on, what the numbers look like across the 14 AdTech companies in our dataset, and which productivity signals matter more than throughput for real-time bidding, attribution, and ad-server work.

Staff Engineer: Career Framework with Real Metrics

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

Will Larson's 2021 survey of 14 staff engineers at large tech companies produced a finding most ladders still ignore: only one in three senior engineers wants the Staff title, and of those, fewer than half make it in five years. The promotion is not a natural continuation of Senior. It's a role change — different work, different signals, different failure modes. Engineering ladders that treat it as "Senior+" produce stalled careers and a pile of ICs who quit for an EM job at another company.

This framework is what actually predicts readiness, drawn from a mix of Larson's research, Tanya Reilly's The Staff Engineer's Path, and the patterns we see in delivery data across 100+ B2B engineering organizations.

Media and Streaming Engineering: Building for Peak Load

· 9 min read
Artur Pan
CTO & Co-Founder at PanDev

When Super Bowl LVIII streamed on CBS in 2024, peak concurrent viewers hit 123 million — a number that isn't a KPI, it's a physics problem. Disney+'s Ahsoka finale generated 14 million account logins in a 15-minute window. Netflix's Tyson-Paul fight in late 2024 failed visibly on Twitter because the streaming stack buckled at ~60 million concurrent streams. Media engineering is not optimizing for average throughput. It's optimizing for the one hour per quarter where your graphs go vertical.

The companies that do this well share a specific team shape, a specific release cadence, and a specific set of measurement habits that don't apply to most B2B SaaS. Pulling DORA metrics off a streaming platform and comparing them to a CRM is apples and typhoons. This is a field guide for the engineering leaders who run — or are about to run — a media platform through peak.

Principal Engineer: How to Measure Your Real Impact

· 8 min read
Artur Pan
CTO & Co-Founder at PanDev

A principal engineer at a 200-person fintech spent Q3 writing 180 lines of code. Her team shipped 340,000 lines in the same period. When her CTO looked at coding-time dashboards for a performance review, she almost got flagged as underperforming. What actually happened in Q3: she rewrote the payment reconciliation spec that unblocked two teams, mentored three senior engineers into tech-lead roles, and killed a six-month project that would have shipped something the market didn't want. Her measurable output was tiny. Her impact was the largest of any engineer in the company that quarter.

This is the principal engineer measurement paradox. Every staff-plus framework (Will Larson's, Tanya Reilly's The Staff Engineer's Path, the Google internal engineering ladder) acknowledges it: principal engineers are paid for judgment and force multiplication, not throughput. But most engineering orgs measure them like senior engineers with a bigger title. This article is how to measure principal impact honestly — and how a principal should measure their own impact when the review conversation comes.

Logistics Engineering Metrics for Delivery Platform Teams

· 7 min read
Artur Pan
CTO & Co-Founder at PanDev

A delivery platform's engineering team runs a fundamentally different workload from a B2B SaaS team. The courier mobile app pings location every 3-5 seconds. The dispatcher console expects sub-200ms order assignments. Route-optimization jobs crunch combinatorial problems overnight and need to finish before dawn shifts start. A 2024 McKinsey report on last-mile logistics pegged the cost of a single hour of dispatcher downtime at $12,000-$35,000 for a mid-size regional carrier.

This shape of work changes what engineering metrics actually matter. DORA four keys still apply, but the team-health and delivery-performance picture shifts. Here's the metric stack that fits logistics platform teams — and the places where "copy a SaaS DORA dashboard" misleads you.

Marketplace Engineering: Metrics for Two-Sided Products

· 9 min read
Artur Pan
CTO & Co-Founder at PanDev

A marketplace CTO told me the line I keep hearing: "My supply team ships fast, my demand team ships fast, and GMV still stagnates." The DORA dashboards were green on both sides. The matching engine was not. Two-sided products have a metric gap that single-sided SaaS doesn't: engineering output on one side of the marketplace only creates business value if it's matched by output on the other side.

Andreessen Horowitz's marketplace framework ranks liquidity — the probability that a listed item actually transacts within a window — as the single best predictor of marketplace health. That probability is an engineering outcome, not a marketing one. When search latency rises by 200ms, listed-item conversion drops measurably. When seller onboarding takes 14 days instead of 4, supply growth curves flatten within a quarter.