WebPerf

Survivorship Bias in Web Performance

Why your analytics shows your site as faster than it is, and why FCP is a key metric to optimise for.

Published on 16 Feb 2022

❤️ 111 Likes 🔁 42 Reposts 💬 21 Comments 🔗 20 Shares

Introduction

Way back in 2010, an engineer at YouTube kicked off Project Feather: the goal was to reduce the weight of the popular video player page to improve performance. The project was a success: the page weight was reduced from 1,200kB to 98kB and the total number of requests was cut by 90%.

The analytics of Project Feather experiences did not reflect the improvements though: the aggregate page load time actually increased. Further investigation revealed that traffic from countries with poor connectivity had dramatically increased — users who previously could not even load the page were now able to watch videos. These new visitors had relatively slow experiences, bringing the aggregate values down.

There are similar stories from a wide range of websites:

the bank who decided not to invest in mobile web because customers preferred desktop online banking to mobile (the mobile site was so slow it was barely usable)
the retailer who had no iPad 1 traffic so stopped testing on iPad 1 (the website failed to load on iPad 1)
the retailer who had very low Android traffic and assumed their target demographic were iPhone users (Android experiences were twice as slow as iPhone)

There is a term for this issue: Survivorship Bias. If you search for it term you'll likely see a variation of this image:

diagram of a plane with red spots to show an aggregate of bullet strikes — The classic illustration of survivorship bias.

The story goes that statistician Abraham Wald used damage logs for aircraft returning from sorties in World War II to identify where reinforcing armour should be applied. The red dots show where bullet holes were found, so the logical assumption would be to reinforce those areas. Wald, however, surmised that the lack of red dots on areas of the aircraft show locations where hits would be critical and result in the aircraft not returning — so the reinforcements should be applied to those areas.

Our decisions are influenced by the data we have, but what about the data we don't have? In the military example above it is clear where the data is missing, but the web is slightly more complex. The users who have the worst experiences are likely to be phantom bounces: they don't appear in your analytics or intelligence tools because they don't hang around long enough for the app to load and analytics to fire.

Speed Bias

Google is experimenting with a new web performance metric: abandonment. Not to be confused with traditional abandonment metrics like cart abandonment, this is to measure how many users leave a site before it even loads. As part of this research, Nicolás Peña Moreno in the Chrome Speed Metrics team used Chrome data to measure the impact of performance on page abandonment rate — i.e. how many visitors leave a page before it loaded.

The metric used for the performance side of this equation was First Contentful Paint (FCP), so abandoners were potential visitors who left the site before anything useful was painted on the screen. It's worth noting at this point that it is very difficult for website owners to track this data themselves as it is unlikely that any analytics or tracking code has had a chance to run before FCP (and nor should it!).

I've taken the data that Nicolás shared and created a visualisation below to show the scale of the issue. Unfortunately no sample sizes were included in the results so we can't measure statistical significance at the higher FCP numbers, but the trends speak for themselves:

On Chrome Android, every second that FCP is delayed results in an additional 2.6 percentage points of abandonment. To clarify with an example: if your FCP was exactly 4s for all Chrome Android users, your analytics likely under-reports these users by 17.3%. So if your analytics shows one million users, there are another 210k users that abandoned before being logged (1,000,000 / ( 1 - 0.173 )). Reducing your FCP by one second would increase your reported visits by 2.9pts or 35,000 (( 1,000,000 / ( 1 - 0.173 ) ) * ( 1 - 0.144 )).

It is clear that poor performance will impact your traffic numbers, and means that thousands of potential visitors are leaving your site without a trace. It is also clear that mobile visitors are less tolerant of slow sites.

There is a secondary issue here: the impact on performance metrics.

It is fair to assume from this data that slower websites don't appear as slow as they are (as the slowest experiences will lead to abandonment) and, as we saw with the YouTube Feather example, making performance improvements may not reflect directly in your data. If we take the values from the chart above and apply them to a typical (log-normal) performance distribution we can predict the impact on reported metrics. A sample data set of 10,000 visits is created and the mobile abandonment statistics applied to see the difference in results, try adjusting the goal mean of the distribution and observe the changes in the table below.

Metric	Actual Visits	Excluding Abandons	Difference	Relative Difference
Visits	10,000	7,712	-2,288	-22.9%
Median FCP	4.06s	3.67s	-390ms	-10.6%
75% FCP	6.89s	6.01s	-881ms	-14.7%
90% FCP	11.13s	9.28s	-1,847ms	-19.9%
95% FCP	14.82s	11.98s	-2,845ms	-23.7%

Whilst this data is entirely hypothetical, I hope it helps bring to life the statistics on mobile abandonment shared earlier. Slow sites are more affected by this issue, and so are slow visitors — is your analytics showing a low level of traffic from Android devices because there really are a low number of visitors, or just a high abandonment rate? The difference at high percentiles is worth noting too, your 75th percentile FCP may be reported over a second faster than reality!

Actions to take

It is extremely difficult to measure the impact of early abandonment as a site owner. It is possible to compare analytics with logs, though, and determine the user agents or regions which have the highest disparity between intent to load a page (from server or CDN logs for documents) and successfully loading the page (analytics or RUM data). This strategy has some issues: some requests in your logs will be for crawlers which don't execute your analytics JavaScript and there will also be some disparity between user agents due to visitors using tracker blockers.

It is possible to beacon data early in a page load in order to better capture visitors who abandon before traditional analytics scripts fire. Nic Jansma has written up an analysis of strategies which shows that beacon capture (on a fast site) can be improved from 86.4% to 92.8% by listening to pagehide and visibilitychange events in addtion to onload. This still requires the JavaScript to execute before the visitor abandons and for the beacon to not be blocked by tracker blockers.

Whilst there are strategies to improve our data collection for abandoners, we will always lose some data. The best method to reduce abandonment and increase captured traffic is... improving performance! There are two reasons for this:

faster page loads mean that analytics events fire earlier and are more likely to capture the visitor
faster page loads are a better experience, so visitors are more likely to stay on the page

Both of these are positive outcomes!

Whilst the focus in web performance has recently been on the Core Web Vitals, I encourage you to analyse your First Contentful Paint with as much scrutiny as Largest Contentful Paint, Cumulative Layout Shift or First Input Delay. FCP was used by Nicolás in the abandonment study because it is such an important indicator of user experience.

Web Page Test filmstrip showing a gap of blank screen before first contentful paint — FCP marks when your visitor can see something useful

FCP marks the context switch from the previous page or a blank screen to showing something that the visitor actually expects to see, so in my analysis it is often the primary metric I focus on improving. If FCP is high, it means that visitors are more likely to think that the page is not working, they have time to reconsider what action they were taking and can easily hit the back button.

Note that FCP isn't a great cross-browser metric, so your mileage may vary when tracking it over time and across browsers and operating systems. That does not necessarily mean that it is a bad metric, it's just complex to implement consistently across the different browser engines.

The key points for optimising FCP are universal, though:

reduce time to first byte (TTFB): use a CDN, cache HTML pages
reduce blocking JS and CSS
defer non-critical JS
remove render blocking third-party tags

Expect to see the unexpected in your real user analytics when shipping performance improvements! Technically reducing FCP may in fact increase your reported FCP as more visitors on low-end devices and connections successfully load your pages. Expect to see traffic numbers change for different device types, and combine your performance metrics with business metrics to get a more holistic view of the success of each technical performance improvement.

Join the conversation!

The comments on this page are fed by tweets, using brid.gy and webmention.io. Send a webmention or use a link below to interact via twitter.

111 Likes

42 Reposts

21 Comments

Patrick Meenan 16 Feb 2022

Have you found a good way to account for the mix shift changes when measuring the metrics in RUM (i.e. when optimizing for FCP)? It's somewhat demoralizing and hard to explain when you see huge gains in lab tests, expect gains in RUM but barely see the aggregates move.

View original post
Simon Hearne 16 Feb 2022

I've had the same frustrations, no 100% solution but: I focus on a specific subset of users (e.g. Android / US / Wifi) to narrow the results and spot changes. Some changes only show after weeks as new users appear / old abandons re-visit. #WebPerf is more art than science!

View original post
Brian Louis Ramirez 16 Feb 2022

Great article! Survivorship bias is super interesting and relevant to putting the “known knowns” into perspective – especially as the % of users that allow all-out tracking is on the decline. Loved how you made the chart interactive, btw.

View original post
𝕋𝕂 16 Feb 2022

In practice, do you have a dashboard with different subsets of users? Like: FCP for User Subset 1 (where User Subset 1 is "Android / US / Wifi"), LCP for User Subset 1, and so on. Each metric for each subset of users. Or do you have a different approach in practice?

View original post
Sergey Chernyshev 16 Feb 2022

Love the interactive chart / table - reminded me of ux-speed-calculator.netlify.app

View original post
Tanner Hodges 16 Feb 2022

Wow, I've never actually heard the WWII story behind "survivorship bias"—wild. 🙏 This is fantastic info, thanks for sharing.

View original post
Tanner Hodges 16 Feb 2022

❓ Dumb question but in the interactive chart, what is a "goal mean of the distribution"? Not entirely sure what I'm changing as I play with the chart… (On that note, do you recommend any good "stats for dummies" resources? Particularly for perf-related analysis?)

View original post
Erwin / (perceived) performance / accessibility 17 Feb 2022

Interesting topic! I often explain that improving pagespeed could result in a higher reported bouncerate in your analytics as well (because: more tracked users means also tracking users that were already leaving the page soon in time). This often helps making them understand.

View original post
Jarosław Jarosik 17 Feb 2022

and nowadays yt won't even allow you to prebuffer the whole video so if your bandwidth is lower than bitrate then tough luck

View original post
Simon Hearne 17 Feb 2022

It adjusts μ in the log normal random function when generating the distribution. Not exactly changing the mean but it moves the distribution to achieve a rough mean at the set value. So lowering μ shows a distribution of a faster hypothetical site. en.m.wikipedia.org/wiki/Log-nor…

View original post
Tanner Hodges 17 Feb 2022

Gotcha. I'm still wrapping my head around stats, so this helps a lot—sampleLogNormal(Mean/2.9,σ) makes more sense now.

View original post
Tanner Hodges 17 Feb 2022

❓ One more quick question: How did you decide on these other values? • σ = 0.8 • Keep sampledRaw where random() < (0.92 - datum.val*0.027) 🙏 Thanks again!

View original post
Philip Tellis 💉💉💉 17 Feb 2022

This is one of the reasons boomerang has always measured abandonments (departure before load)... Unfortunately too many people think that delaying analytics is the solution and they end up losing this data.

View original post
Simon Hearne 17 Feb 2022

σ = 0.8 gives a close approximation to a real distribution of performance (eye-balled). Keep func uses the abandonment data from earlier chart (8% y-intercept, 0.027 pts/sec gradient). If you add a `compact` signal to the spec you can play with values: vega.github.io/editor/#/

View original post
Thomas A. Powell 17 Feb 2022

I point out that your server log files can provide a data smell if we look. Using JS analytics esp deferred or bottom of the page misses lots of things as well as bots. Had a site with ~3.5m unique in logs vs ~2.7m in JS. That's a lot of bots or load fails and a smelly clue!

View original post
Thomas Güttler 18 Feb 2022

Has anyone looked at htmx.org? AFAIK this method (fragments over the wire) can help to get a good FCP.

View original post
Simon Hearne 18 Feb 2022

I’m not sure how this would improve FCP. The fastest pages are static HTML + inline CSS with no JS dependencies.

View original post
Thomas Güttler 18 Feb 2022

It depends on how you use htmx. For example: the first page is pure html. The html contains additional hx-attributes. These attributes add interactivity to the page which is needed only after the page is loaded. This means the 10k htmx library is not needed for the first paint. …

View original post
Simon Hearne 18 Feb 2022

This is true of any static HTML page though, I’m not sure what htmx adds except developer convenience?

View original post
htmx.org 18 Feb 2022

if htmx approximates a static HTML page for first paint, i'd be good w/ that

View original post
Tanner Hodges 22 Feb 2022

🤘 Perfect, this is super helpful. Very much appreciated.

View original post

20 Comments

Malte Ubl 16 Feb 2022

This effect is extremely common. If you make your website demonstrably faster while the performance metrics measured in the field go down BUT your revenue (or whatever end to end gross business success metric you are using) goes up, then this is probably what is happening.

View original post
Joseph Scott 16 Feb 2022

If you look at #webperf data please read this

View original post
Dave Peiris 16 Feb 2022

Love this post from @simonhearne - and completely agree with focusing on improving FCP. Also - Blackbird data across millions of sessions frequently shows a strong correlation between FCP and ecom conversion rates

View original post
Sergey Chernyshev 16 Feb 2022

Very important topic for web performance - your data probably has significant chunk missing.

View original post
Barry Pollard 16 Feb 2022

Great post from @simonhearne . Your website analytics only show those visitors that had a good enough experience to hang around long enough to be measured!

View original post
Mariusz Michalski 17 Feb 2022

#webperf #performance

View original post
Danilo Velasquez 17 Feb 2022

Great article! This not only happen in performance but in other things as well. I remember a migration that we discarded because of low traffic. Imagine our surprise when we noticed that the old stack was sending every tracking event X3!!!

View original post
Alex Russell 17 Feb 2022

This effect is something I've seen reported by many teams over the years. Improving perf often *tanks* metrics as more users that previously bounced become the marginal successful user. We're buried so deep in JS bloat and we haven't even begun to account for the costs.

View original post
Linnea Bak 17 Feb 2022

Fantastic article on the importance of FCP and how to consider visitors who leave a site before Analytics Tracking has loaded. Since Google doesn’t want to serve results too slow for users to access, the Chrome Speed Team is looking at a new web performance metric: ABANDONMENT

View original post
Senthil P 18 Feb 2022

Survivorship Bias in Web Performance simonhearne.com/2022/survorshi…

View original post
Jean-Pierre Vincent 18 Feb 2022

le biais du survivant dans vos analytics fait que les utilisateurs les plus lents apparaissent moins dans les stats : simonhearne.com/2022/survorshi… et qu'améliorer la perf peut ne pas se voir dans les chiffres RUM

View original post
Egor Lynov 18 Feb 2022

Хорошая статья о том, что метрики вашего сайта необязательно говорят о ваших пользователях, потому что многие пользователи могут не использовать ваш сайт не потому что он им не нужен, а потому что его невозможно использовать simonhearne.com/2022/survorshi…

View original post
Jess Peck 🐍🤖 22 Feb 2022

simonhearne.com/2022/survorshi… you know any blogpost with this image is going to be a banger

View original post
Adam Phillips 23 Feb 2022

Great article. Worth a read #webperf simonhearne.com/2022/survorshi…

View original post
FullStack Bulletin 23 Feb 2022

Survivorship Bias in Web Performance simonhearne.com/2022/survorshi…

View original post
WP Rocket ™ 23 Feb 2022

Do you know why the First Contentful Paint (FCP) is a crucial metric to optimize? Even though it’s not a Core Web Vital, First Contentful Paint (FCP) is a great indicator of user experience and is closely related to user abandonment. 👇 simonhearne.com/2022/survorshi…

View original post
Friday Front-End 02 Mar 2022

Survivorship Bias in Web Performance, by @SimonHearne simonhearne.com/2022/survorshi…

View original post
Eco Web Hosting 09 Mar 2022

"Google is experimenting with a new web performance metric: abandonment. Not to be confused with traditional abandonment metrics like cart abandonment, this is to measure how many users leave a site before it even loads." @simonhearne talks survivor bias. simonhearne.com/2022/sur…

View original post
Alex Devero 25 Mar 2022

Survivorship Bias in Web Performance simonhearne.com/2022/survorshi… #webdev #dev #webdesign

View original post
rolgalan 20 Nov 2022

«The users who have the worst experiences are likely to be phantom bounces: they don't appear in your analytics or intelligence tools because they don't hang around long enough for the app to load and analytics to fire.» simonhearne.com/2022/survorshi…

View original post

Introduction #

Speed Bias #

Actions to take #