Robots.txt to Disallow All but Adsense Mediabot

May 7th, 2007 by Stefan Juhl

Sometimes I’m well on my way to make something way more complicated than it really is. Today I was on the verge to cloak my robots.txt file on some creative domain parking stuff.

My simple goal was to block all bots but the Google Adsense Mediabot (Mediapartners-Google/2.1), since I didn’t want to blast out millions of duplicate pages to the search engine spiders, yet I wanted to monetize the websites through Adsense.

My first thought was that if I just made the robots.txt files disallow all bots then Mediabot wouldn’t crawl them either, and that wouldn’t be good. The first solution that occurred to me was cloaking it. Not smart…

I figured it seemed a bit like overdoing it, so after using my extraordinary googling skills ;-) I found the solution which is dead simple. Just start your robots.txt file with disallowing Mediapartners-Google* from “nothing”. Like the example below.

User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Disallow: /content/

Today’s reminder: before coding something “really smart”, step back and consider if there’s already a simple solution to the problem.

Share and Enjoy:

  • digg
  • del.icio.us
  • YahooMyWeb
  • Furl
  • Reddit
  • BlinkList
  • Spurl
  • NewsVine
  • blinkbits

Posted in White Hat SEO, Monetization |

2 Responses to “Robots.txt to Disallow All but Adsense Mediabot”

  1. Mark Says:

    Hi.. Not too hot on my robots.txt, but my logic tells me the wildcard would override the mediapartners allow?

  2. Stefan Juhl Says:

    Yes, one could see it that way. But I think the logic is, that if a robot finds specific instructions, then it won’t continue to apply additional general rules.

    We should also remember that the search engines and especially Google tend to make their own standards ;-)

Leave a Reply

Stefan Juhl