Browser Wars: Breaking down marketshare stats

There has been a lot of commotion around browsers over last few years as corporations and humans alike become obsessed if their browser is gaining marketshare or simply fading from the limelight. But there is a serious, festering detail behind the tracking systems that are currently being used by the press (Neowin too) that needs to be talked about because moving forward, the numbers will only become more obscure as these services become more complicated in the way that they fetch data.

What is the devil in the detail, the underlying cause for confusion, the fundamental issue that we all need to understand to make us better citizens of the net? That detail is the statistics in the post processing for the collection and distribution of browser marketshare and how they can easily be manipulated, slanted, cheated, inaccurately reported or any other term you can think of to alter data with an intended purpose.

There are two primary sources of data for market share on the net, StatCounter and Net Applications. The purpose of this article is not to decide which source is better, far from it, but to provide insight in to how each source is compiled and what strengths and weaknesses each brings to the table. Simply taking either source at face value is a recipe for making uniformed decisions that lead to inaccurate information being spread based on faulty data.

Currently each service is attempting to accurately portray what is happening with browser marketshare but go about it in separate ways. Which is better? We hope to provide you with the information to make that decision for yourself.

Taking a deeper dive, let’s take a look at Net Applications and how they process their data. In an attempt to remove the anomalies from the data that they collect, Net Applications actively scrubs their data to remove what it considers to be noise with the hopes of providing a higher quality of output. They also balance their data based on geoweighting but how does Net Applications geoweight? Their explanation is below:

The Net Market Share data is weighted by country. We compare our traffic to the CIA Internet Traffic by Country table, and weight our data accordingly. For example, if our global data shows that Brazil represents 2% of our traffic, and the CIA table shows Brazil to represent 4% of global Internet traffic, we will count each unique visitor from Brazil twice. This is done to balance out our global data. All regions have differing markets, and if our traffic were concentrated in one or more regions, our global data would be inappropriately affected by those regions. Country level weighting removes any bias by region.

How else is Net Applications currently scrubbing their data? As of February, Net Applications is removing pre-renders by Chrome which they state for February accounted for 4.3% of Chrome’s daily visits. Net Applications also bases their data on unique site visits instead of the absolute quantity which, in their opinion, provides a better representation of browser marketshare.

To put it simply, Net Applications takes its raw data, removes pre-rendering hits, weights the data based on population size (to achieve appropriate regional representation) and then shows it to the public. What this boils down to is that if you agree with their methodology, then this is the best look at the market data, but if you fear that their algorithms could be inaccurate or that an individual with bias scrubbing the data could easily skew it one direction, it immediately adds skepticism to the reports.

Net Applications is attempting to normalize the data, to remove the fluff that could present an inaccurate picture of the market. But, as soon as you start cleansing data, the ability to add bias is immediately present; but Net Applications does attempt to come clean about its techniques by being transparent about how it achieves its results.

StatCounter on the other hand is all about providing the raw data. From their perspective, there is no need to scrub any of the data as it should be up to the end user to decide if they feel the need to restrict the content that they provide to achieve more accurate results.

This method has several advantages and disadvantages too. One aspect is that there is no human interaction with the data; it is simply gathered and distributed which gives you a clean look at the raw data. Another advantage is that it is a true reflection of all the information gathered as it shows how browsers are interacting with websites in their natural state. But, there are also several downsides to this as well.

When looking strictly at raw data, you need to account for anomalies. Without proper review, it could be possible that if a browser adjusts the way it fetches data, it could then skew the data. For example, StatCounter does not view Chrome’s pre-rendering as an issue and as such, does not remove it from its data; a notable difference from Net Applications. Additionally, StatCounter looks at all hits, not unique, so if someone is smashing the F5 when a particular site is running slow (or millions of people press F5) it could dilute the data as it looks at all the hits and not the unique hits.

What you ultimately end up is unrefined data that has the potential for misinterpretation or deviations from the norm. While there are many “what ifs” with the data, until it’s properly combed over, like what Net Applications does, it has potential for error. But then when you look at what Net Applications does, it has the ability to skew the data unintentionally trying to normalize the content. What you end up with is unintended noise within data that you either accept or reject and if you accept the noise, you subscribe to StatCounter, if you reject, you subscribe to Net Applications.

But how data is cleaned is not the only the factor one must consider. Both services are based on a sample of the population as it is impossible to track every browser currently in use today. StatCounter boasts that their population is based on 15 billion hits a month on their 3+ million member sites while Net Applications uses 40K + websites to collect data that receive 160 million unique visits a month.

Knowing their sample sizes, StatCounter has a larger sample size than Net Applications which means you must decide if Net Applications population is representative of the whole.

Net Applications and StatCounter take two different approaches that result in two different figures for how each browser is doing in the market. Which one is better? That is for you to decide but remember this, each has its own faults and each service could be argued to the point that their data is not accurate.

It is important to pick one method and stick to it, if you bounce around between the two services your view becomes muddied and your interpretations will be flawed.

Image Credit: Silicon Angel