Sunday, January 13, 2008

Robot.txt file blocked googlebots crawling

I spent so many hours checking my webmaster tools on my google account. I found out that only few of my posts here and here that googlebot were able to indexed or crawled because of the robot.txt that is in my file . But I have no idea where that robot file came from. Is it from the hotmail codes? from the templates? or where?
There were 55 lists of my post url that were restricted by robot.txt and that googlebot had trouble crawling it. Maybe that's why, when I checked my google indexed it only shows up a few.
But I don't really understand what this mean, cause other websites want robot.txt file on their site cause they don't want any search engines to index the content of their page. But it does'nt make sense for me, as bloggers, we want our url post to be indexed by google.

If you want to check the status of your site, you can check on google websmaster tool here. Just log in to your google account and add you url site and once you verify your site, you can now view the status of it since the last time google crawled your site. You can also view your url's that google had hard time crawling and why they could not crawl them.

Anyway, so much for that, hopefully googlebot will be able to index my coming post for their next schedule crawl.

1 buzz me:

Binh Nguyen said...

Are you talking about your Blogspot blog? Then you can't do anything about it because that's Google's default. Blogspot doesn't allow indexing of RSS and scrawling of label pages.

Your post page also have difficult getting up to Google index because they treat it as duplicate content. It might take a long time before any page which is featured on the front page to come to the index. Especially for pages with too many links.

Well that's my own experience and knowledge. It's funny that Google the search engine own blog host is causing duplicate content issues.


RSS Feed (xml)