John Smart, 2013: exponential increase in search query length

Fascinating theories. Are search length of queries is still showing signs of exponential growth, empirically speaking?

Here is some raw data which could be analyzed to see whether its truly exponential or just increasing linearly.

Disclaimer the below analysis is extremely rough and I recommend looking at the raw data if you need this for anything important. Also I can’t vouch for the raw data’s accuracy its just information I found randomly on the web.

This is a back of the envelope analysis using only the USA data for January of each year except in instances where there was no January data in which case I used the February data. It seems to conflict with other data. For example this site lists the average query length to Google as 4.29 words as of 2012. I treated 10+ word queries as 10 word queries. Note also the Y axis is truncated. For some reason this data seems to show a drop off in 2013 aside from which the average word length seems to be steadily growing. For many searches of course we can find precisely what we’re looking for with two or one words. So we should certainly expect a level off at some point. And one of Google’s latest innovations is to allow a person to ask follow up questions such as “Who is the current U.S. President”, followed by “how old is he?” This could shorten query lengths slightly. In conversation with other humans we often ask one word follow up questions such as, “really?” or “right?” which we would not currently ask of Google.

On a related note a recent study found that “optimum length for an email is 50 to 125 words”. If we had true AI we might make a 50 to 125 request or series of requests of Google fully explaining what we are looking for as we might do with a friend or colleague.

I would be curious as to anyone’s opinion as to whether search length really dropped off in 2013 and if so why. There are a lot of intersecting factors. Google has no true competitors and is constantly being “gamed” and adjusting its algorithm. The data or my analysis could be wrong or impacted by random variance. The rise of inter connectivity and enhanced availability to access another human could be a factor. For example, we might now send an email knowing it will be quickly responded to when in the past we would have spent time constructing a complex search. We might also make more complicated inquiries of other humans in interest based social platforms such as reddit or twitter for more complicated questions. For example if you have a 10+ word question on programming you’re probably better off emailing a friend or posting on stack overflow. However, if you look at the raw data, even 10+ word searches are growing. And currently most of the searches cluster at 4 words or less. We have a long way to go before we are talking conversationally with our computers.

At the end of the above clip John Smart noted that the average question length for a human to human question was 11 to 14 words. He also stated in the video he thought we would reach this point by 2019. He also noted that by then every child would have a cell phone because they’d be dirt cheap by then. His latter prediction definitely looks like it will be correct.

words per search

The below quote from his essay seems to disagree with my analysis. Emphasis added.

Predicting the CI Emergence

When can we expect the CI’s emergence? In March 2005 Google’s director of search Peter Norvig noted that their average query is now about 2.5 words per query, by comparison to 1.3 on Alta Vista in its heyday, circa 1998. In subsequent email conversation with him he has told me that the actual number is “closer to 2.6 or 2.7.” This is an initial doubling time of only seven years, if this is a quasiexponential function.

It appears that the growth of the CI as a complex adaptive technological system is in the early phase of an S-curve, well before the inflection point, and thus its growth will continue to look exponential for some time to come.

[2008 Note: Average query length to Google now exceeds 4 words, apparently just this month. This is more early evidence that this phase of search query length growth will remain exponential up to the inflection point.]

In my opinion this average search query length, averaged across all the leading search engines of the day (Google, Yahoo!, Bing, etc.) will be one of the key numbers to watch to gauge the growing effectiveness of statistical natural language processing (statistical NLP) in creating a conversational front end for the internet and all our other complex technologies in the 21st century.

Leave a Reply

Your email address will not be published. Required fields are marked *