Update: This update is up at the top so that some people may keep their blood pressure in check. Since I posted this, Google, or at least Matt Cutts on his personal blog, did indeed choose to communicate with me, saying that they did try to send a warning email. They posted it, and it was a good email, with sufficient information that if I had actually received it, it would have been perfectly reasonable as something to orient me to fix the problem on the TOA site. Over on Matt’s blog, I suggested making warning email content available through the Google Webmaster Tools interface. I think that is being considered. And, best of all, Google re-indexed the TOA site on 2006/12/05. Back to the original post:
Google has done a lot of cool and useful things. But in one area, Google has failed badly in setting its policy. It has to do with when Google decides to de-index a website, that is, to remove all references to an entire domain from any search returns that it provides. You might think that a company that prides itself upon advanced textual analysis and automated decision-making algorithms might provide helpful warning messages to webmasters concerning problems found in their sites. You would be wrong. [Google’s Matt Cutts says that Google did send a warning email that I did not receive. — WRE] If Google decides that there is a problem, they will de-index the entire site, never attempt to communicate with the webmaster concerning their action, and (this is the big problem part) they refuse to tell a webmaster what the problem was or where the problem occurred, whether or not the webmaster deliberately created the problem or was the victim of some of the all-too-common website cracking that happens nowadays. Google’s policy of keeping problems secret is harmful, and in fact favors cheaters over honest webmasters.
How do I know this? Because my website got cracked and Google decided to de-index it.
No pages from your site are currently included in Google’s index due to violations of the webmaster guidelines. Please review our webmaster guidelines and modify your site so that it meets those guidelines. Once your site meets our guidelines, you can request reinclusion and we’ll evaluate your site.
I got to that message about the TalkOrigins Archive (TOA) sometime early Friday morning. A post on the talk.origins newsgroup cc’d to my email gave me a heads-up that the TOA no longer appeared in the Google index. That site is also my responsibility, so I started looking into the problem.
First, I just tried looking at Google’s public information about the TOA. That was all happy news, with a cute check-mark graphic and the note that the TOA was successfully indexed last on November 27th. I then claimed the TOA as my website through Google’s Webmaster Tools and verified it. Then, I got the rather different site summary that is quoted at the beginning of this message.
So, what, precisely, was causing Google to not like us anymore? The essential lesson here is that Google would not tell us. That isn’t mere caprice; that is Google policy. I tried to find a more extensive explanation since the first thing was labeled “Summary”. No such luck. That is not available via Googe Webmaster Tools. There was a listed phone number for Google, so I tried calling them. After proceeding as directed through their phone menu, I got a recorded message that all issues to do with indexing were handled through the web site and that Google did not offer any customer service via the phone. There was, of course, nothing further available from the web site, with the exception of the form for requesting reinclusion of a site. That doesn’t tell you anything about your problem in particular.
My mission, whether I liked it or not, was to find and fix whatever problem the TOA might have, with no guidance as to what the problem was and nothing at all about where to start looking. Since the TOA site is 5,000+ separate pages, that could be quite the task. I started with the default site page. I pulled it up in my browser. It looked completely unexceptionable. I then opened up a “page source” view. There, I did find something wrong. At the bottom of the page, buried within an ASP function that prevented it from being visible on browsers, was a block of bad links, links that had nothing to do with the TOA. Checking the file on the server, I found that it was changed on 2006/11/18. There was no corresponding entry in the TalkOrigins Archive Delegation (TOAD) change log, where the authorized TOA volunteers note each change made to the site. We had been cracked.
Within ten minutes, I had the bad stuff out of the default page and uploaded the clean file to the server. I informed the TOAD group of what I had found out and requested Douglas Theobald check his local copy of the files for any further cracked files. He found none. Douglas suggested that I post something to the Google Webmaster Help Group, which I did. I then entered the reinclusion request, clarifying the three stipulations that Google requires by a checkbox, without which one cannot submit the reinclusion request. Let’s have a look at what Google says on that form:
By submitting this form, I acknowledge that:
* I believe this site has violated Google’s quality guidelines in the past.
* This site no longer violates Google’s quality guidelines.
* I have read and agree to abide by Google’s quality guidelines.
Tell us more about what happened: what actions might have led to any penalties, and what corrective actions have been taken. If you used a search engine optimization (SEO) company, please note that. Describing the SEO firm and their actions is a helpful indication of good faith that may assist in evaluation of reinclusion requests. If you recently acquired this domain and think it may have violated the guidelines before you owned it, let us know that below. In general, sites that directly profit from traffic (e.g. search engine optimizers, affiliate programs, etc.) may need to provide more evidence of good faith before a site will be reincluded.
Soemthing to note here is that this goes beyond the mechanics to issues of ethics, what with all that emphasis upon “in good faith”. Google can, of course, apply any standard they like, and include or exclude sites at their pleasure. Of course, Google doesn’t want to appear to be capricious; with the above statement, Google obviously wants to cast itself as a judge of moral worth of sites, implying that they themselves are worthy of the role of judge in a court of equity. They absolutely invite evaluation of their own actions in an ethical framework, and my opinion is that Google doesn’t measure up in the sphere of how they handle de-indexing decisions. Certainly, they have the responsibility to keep their index from giving unwarranted weight to cheaters. That is not at issue here. What is at issue is their treatment of webmasters whose sites have acquired problems that may — or may not — actually be of their making.
Their stipulations for submitting a reinclusion request require an admission of guilt on the part of a webmaster who, as I found myself, could be the victim of a third party. Google’s policy of obscuring their reasons for de-indexing makes it much harder for honest, but cracker-victimized webmasters to return their sites to a state that is acceptable to Google. In fact, Google’s policy is far more burdensome upon honest webmasters than it is upon cheaters — the cheaters know what they have done that is out of compliance, and the honest webmasters have no such knowledge of where the problem may lie.
So I had to clarify my response to Google’s stipulations:
The TalkOrigins Archive has never deliberately violated Google’s quality guidelines. Our site has operated since 1995 in the same way, well before the origin of Google, and will continue to provide quality information to our readers even if Google ceases to exist as an entity. We never needed Google’s quality guidelines in order to make a quality website, and we would not lower our quality if Google decided to impose guidelines that were injurious to the standards that we have ourselves set and maintained.
I was extremely lucky. The damage to my site was limited and in the first place that I happened to look. Other honest webmasters might not be so lucky. They may have to undertake an arduous process of vetting pages, essentially having to second-guess the mind of the cracker in trying to locate a problem that Google knows the exact location of. Does that sound anything like equitable to you? It sure doesn’t to me.
As I said in my post to the Google Webmaster Help group, the Google policy of obscuring de-indexing decisions is harmful.