Google PageRank, Main and Supplemental Index, is there a connection?

Matt Cutts on PageRank and the Supplemental Index

In a comment in his Fall weather forecast blog update last year in October of 2006 Matt wrote,

PageRank is the primary factor determining whether a url is in the main web index vs. the supplemental results

And again at the beginning of this year (2007) Matt Cutts wrote in his Infrastructure status, January 2007 blog update

Having urls in the supplemental results doesnft mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank.

Is there a connection between PageRank and the total number of indexed pages?

I thought that if PageRank influences the number of pages in the Supplemental index then it also should have a direct influence on the number of pages in Google's Main Index, almost by definition, but even more so, that the number of pages indexed at all wether in the Main or Supplemental Index are essentially dependant on PageRank.

If a site doesn't have enough PageRank to support all its pages in the Main Index, it would only seem to make sense that insufficient PageRank could cause some pages to not even make it into the Supplemental Index and therefore not be indexed at all.

Could changing PageRank cause the number of pages indexed and SERPs positions to fluctuate?

Assuming no on-site changes and also assuming no significant changes in competing pages, is it possible that some pages could go totally missing from SERPs and others seeming to drop out of site all the way back to the last page when nothing appears to have changed?

Matt Cutts, in the same Infrastructure Status update said as much when he wrote,

If you used to have pages in our main web index and now theyfre in the Supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past.

Beyond that, if PageRank can influence which index a given page is in, Main or Supplemental or whether it is even indexed at all, if a given page's PageRank were to decrease for some reason either as Matt mentioned or even one or more inbound links no longer existing, it would not be too hard to imagine the page moving from the Main Index to the Supplemental Index or dropping out of the index altogether.

What if nothing changed and a page still drops from Main to Supplemental or disappears completely?

Since Toolbar PageRank is only updated 3 or 4 times a year, a given page's PageRank could significantly change without any outward appearance, unless of course it is moved to the Supplemental Index or dropped totally from both Main and Supplemental Index in either case making it hard to not notice.

Also, it is technically possible to have a page showing a Toolbar PageRank of 6 and yet be in the Supplemental Index either due to a decrease in incoming PageRank somewhere or, the addition of a large number of new pages to the site or, more likely a combination of both.

What if nothing changed for a given page and nothing changed with regards to inbound links to it?

Going one step further, if the PageRank of a given page were to drastically change, the fact that PageRank flows through a site could lead to a knock-on effect potentially causing a reduction of PageRank across an entire site depending on the site's internal linking.

So in effect even though nothing about a given page changes, changes with respect to another page or pages could become a factor.

If a significant source of PageRank is lost, pages that outwardly appear to have sufficient PageRank could all of a sudden have none or significant less than previous according to Google's internal processing and so appear to mysteriously disappear when they had previously looked healthy and untouchable.

Altering PageRank and observing the effect on the number of Primary and Supplemental indexed pages

I wanted to see if I could duplicate the effect I have described and using this site itself, which I have used for other experiements as well, I think I have proven, at least to myself that the process described above is not only possible for it to happen but more so, very likely that it is a possible cause of fluctuating changes in some sites' page's performance in Google's SERPs.

What I set out to do was to see if I could reproduce a change in the number of pages indexed, both Primary and Supplemental as well as see if I could effect a direct change in SERPs due to a change in PageRank.

Of course trying to effect a positive change over the time period I had available for this experiment would have required Blackhat techniques to manipulate PageRank and since I have some hopes of Google not banishing this site to the Google graveyard forever, trying the opposite seemed the obvious choice.

Please note that I manipulated this site's PageRank not using any Blackhat techniques but instead, setting out to destroy a portion of the PageRank that I have worked hard to build organically.

It might also be useful to mention that since this site had only 20 pages, at the time, available for indexing and all were in Google's Main Index already, the only direction to go was down.

Google method to my SEO madness?

What I did was to select some pages which although were not crucial to the majority of Google traffic to this site, were significant contributors to the overall PageRank, I then disallowed Google access to them.

I also requested their removal using the Google Webmaster Tools URL removal interface knowing that I could cancel the request after the pages were removed, which would theoretically restore them to their former glory.

At this point it would be good to make another note, what I did, don't do that!

When I cancelled the URL removal and reallowed Google access, the removed pages did not instantly snap back and it is taking a while for this site to recover, but recovering it is and I expect that from the start of the experiment to full recovery will only take around a month or so.

What apparently happens when a page is requested to be removed is that any PageRank associated with the page at the time is discarded and if the page is at some point restored, the inbound link discovery process appears to start over.

A sites' rise and fall, backwards

Four or five days after seeing the pages removed, I noticed the number of pages in Google's index, excluding the pages that were obviously removed, slowly decreasing over the next couple of days.

I had started out with 20 pages in the Main index and no Supplemental but then over the course of a couple of days, , then 18 Main and 1 Supplemental, then 17 and 1, finally 15 in the Main Index and 1 in the Supplemental.

Also during that time, I noticed different results across different data centers although they all were generally decreasing in total pages indexed.

Previous to this experiment, other when I had added new pages, I had never seen any differences between the number of pages indexed across Data Centers.

Seeming to be correlated with the drop in the number of pages indexed, I also noticed a number of SERPs positions lower slightly where they had pretty much been consistent before hand.

I didn't see enough of a consistent drop across the board to draw any conclusions although when the number of pages indexed returns and if the SERPs results return as well, a stronger correlation would be suggested.

At this point I figured losing 25% of this site was far enough and then re-allowed the bots and cancelled the URL removal.

And, although the following day, one Data Center had dropped as far as 13 and 1, (Main/Supplemental), other Data Centers were showing slowly increasing Main counts with a couple even showing 21 pages indexed, (Main + Supplemental).

PageRank and total pages indexed

As I have suspected for some time, having pages in Supplemental seems to suggest that there are additional pages not indexed at all in either index, Main or Supplemental.

This seems supported by the face that the total number of pages indexed during this experiment varied over the course of the experiment, PageRank was reduced and then pages not only moved to the Supplemental index but disappeared altogether, then as time went on, more and more pages came back into the index which likely coincided with PageRank slowly being restored.

One interesting point to note, while on the way down, there seemed to always only be one page in the Supplemental Index but as the total number of pages began rising again, the number of Supplemental pages rose faster than the number of pages in the Main Index so overall, on the way down pages seemed to drop out of the index totally while on the way back up, they seemed more likely to spend more time in the Supplemental Index before being moved to the Main Index.

How can changes in PageRank effect a site?

Since Google is constantly crawling pages and continually updating its internal data stores, PageRank is likely being recalculated on a regular basis as well.

But, recalculations are not always guarenteed to deliver positive results and in fact, could be negative as well considering knock-on effects where a reduction of PageRank of one page leads to the reduction of PageRank of pages it links to which in turn could lead to the reduction of PageRank of pages they link to on into infinity.

And, although organic links from reliable sites are not likely to change in a negative direction all that often, links from lesser reliable sites which may in turn be linked to from lesser reliable sites could lead to what amounts to drastic changes for any sites which previously had enjoyed a certain level of PageRank from the lesser reliable sources.

It would almost seem like getting links to a site could be seen as investing in the stock market, i.e. many small investments over longer periods of time on safe investments compared to large investments over a shorter period of time on risky investments.

Conclusions and further questions

It seems pretty straightforward that this experiment validates not only Matt Cutts' words on the subject of PageRank and the Supplemental Index but also, strongly suggesting that PageRank effects the total number of pages indexed at all, whether Main or Supplemental.

One also seems able to see the effects of a significant loss in PageRank by observing a number of different Data Centers over the course of days and sometimes even over the course of hours.

The evidence of the indexed page loss was a slow decrease in the total number of pages indexed and very few pages becoming Supplemental before being dropped altogether. Although with a larger site, would the number of Supplemental pages increase as well or as a precursor to pages becoming de-indexed totally or just drop out without becoming Supplemental initially?

I have seen how loss of PageRank can easily cause a drop in the number of pages in the Main Index as well as the total number of pages indexed and I have seen how the number of pages indexed will then improve with an increase in PageRank but is it possible for this single cycle to be repeated over and over again over some period of time naturally?

If a site owner sees some pages not showing as highly in the SERPs as they once did, although they appear to still have high PageRank, they may in fact only have enough PageRank to barely keep them out of the Supplemental index. and so end up having their SERPs position impacted by the decrease in PageRank.

One might also see their site lose pages to the Supplemental Index and/or totally from the index due to PageRank changes but the first place one is likely to notice it is a general loss of Google generated traffic due to some of the site's more important pages going Supplemental or being de-indexed even though Toolbar PageRank remains the same.

Is it possible that what many may see as a "penalty" might actually be a more mundane and simple problem of loss of PageRank?

If a number of sites within the same market all take a dive, is it possible that somewhere or other similar linking schemes employed may no longer be as effective as they once were so what may seem as a market targetted failing of sites could just be a link scheme shared or simultaneously employed that is the source?

How often does it happen that competitors will check out each others' linking methods and emulate them in an attempt to compete?

Only Google will ever know for sure and we all know how tight-lipped and vague Google likes to be when it can be gotten away with.