Ultimate WordPress Robots.txt for Silo SEO
Robots.txt for Silo SEO
The Robots.txt file is like a roadmap, each road is an access point and by using Robots.txt you can control access to specific roads.
Robots is referring to bots, a bot is generally an automated crawler that goes through your site by locating and following links then drawing out a roadmap.
The problem with bots is they don’t know when and where to stop, they have zero intelligence, they just collect the data of a site then take it back home for the bigger brothers to crush up and analyze, the bots bigger brothers are the more powerful automated machines that sit in server farms and crush numbers based on rule sets in this case algorithms, these algorithms are used to determine what your site is about and eventually determine its ranking in the SERPS (Search Engine Result Page).
The problem is they can both start to get confused if you have a messy site structure, WordPress in this case contains so many roads that overlap it can and does make our sites appear more like a set of roundabouts.
Silo SEO WordPress Robots.txt
Never allow indexing of your cgi-bin for the love of god.
- User-agent: *
- Disallow: /cgi-bin
Next up we need to tell the bots not to bother indexing our private WordPress directories.
- Disallow: /wp-admin
- Disallow: /wp-includes
- Disallow: /wp-content/plugins
- Disallow: /wp-content/cache
- Disallow: /wp-content/themes
We also need to block access to our feeds, why would we want the bots crawling through our feeds right? We want them crawling our onsite content so its ranks well in the SERPS.
- Disallow: /feed
- Disallow: /*/feed
Next up is comments, we want to treat our comments as part of the on site content, not in our comment feed.
- Disallow: /comments
You also don’t want the bots indexing author archives, because it just adds more and more onsite duplicate content.
- Disallow: /author
Another one that adds duplicate content is our tag archives.
- Disallow: /tag
And believe it or not the date archives are also a problem for SEO, so lets just block the entire archives out of the search engines.
- Disallow: /archives
And just to make sure the bots don’t go near the date archives put this in.
- Disallow: /2010/*
- Disallow: /2011/*
- Disallow: /2012/*
You also don’t want any iframes being indexed NOTE this is pointless unless you create an iframe directory.
- Disallow: /iframes
In the Basic Bogan Training Module 3.2 WordPress SEO we structure our sites using the .html extension, you can block these at the robots.txt level, for example dont index my contact page/privacy policy/web site agreement. I don’t like my footprints getting indexed, so I block most of this stuff out before it even reaches the index.
- Disallow: /privacy-policy.html
- Disallow: /web-site-agreement.html
You also don’t want your categories being indexed, we cut this out in the Basic Bogan Training, but you can do this here also, note don’t add this to your robots.txt unless you have followed along in module 3.2 WordPress SEO in the Bogan Basic Training.
- Disallow: /category/*/*
And forget indexing trackbacks
- Disallow: */trackback
Cool now we are looking sweetin terms of WordPress Silo SEO.
But you also don’t want certain file types being indexed for example type this into Google.
Google filetype:xlsx
Scary right, I can remember doing all sorts of crazy stuff with this back in the day, people had no idea Google was indexing file types.
Here is a good start of file extensions to start blocking, you can make your own file extensions up and block them so you can store hidden files, works well.
- User-agent: Googlebot
- Disallow: /*.php$
- Disallow: /*.js$
- Disallow: /*.inc$
- Disallow: /*.css$
- Disallow: /*.gz$
- Disallow: /*.wmv$
- Disallow: /*.cgi$
- Disallow: /*.xhtml$
- Disallow: /*.xlsx $
- Disallow: /*.doc$
- Disallow: /*.pdf$
- Disallow: /*.zip$
Because we blocked all wp-* directories you will need to update your wp-content/uploads to another directory, I suggest you just create images.
- Allow: /images
Now just add a link to your site map, take this out for mass blog installs, you will need to install the XML Sitemap plugin ti generate this file.
- Sitemap: http://yourdomain.com/sitemap.xml.gz
That’s it’s your now solid, forget paying for Silo plugins or whatever, if you want more I suggest you check out Module 3.2 WordPress SEO in the Basic Bogan Training so you can get your permalinks perfect for SEO.
Below is the full robots.txt file, if you copy and past the code below into a .txt file called robots.txt and upload it into your sites root directory the bots will treat your site as a Silo SEO wordpress blog.
User-agent: *
Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /feed Disallow: /*/feed Disallow: /comments Disallow: /author Disallow: /tag Disallow: /archives Disallow: /2010/* Disallow: /2011/* Disallow: /2012/* Disallow: /iframes Disallow: /privacy-policy.html Disallow: /web-site-agreement.html Disallow: /category/*/* Disallow: */trackback
User-agent: Googlebot Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /*.xlsx $ Disallow: /*.doc$ Disallow: /*.pdf$ Disallow: /*.zip$
User-agent: * Allow: /images
Sitemap: http://yourdomain.com/sitemap.xml.gz




[...] more WordPress optimisation check out how to configure robots.txt for silo SEO , its a robots.txt file that will lock down a blog for SEO, even though we done most of our hacks [...]
my site not have robot directory how i make it.. and also not have robot .text file..how can i create this. i also want to create my site maps. any help for site map..
kevin,
just create robots.txt in your sites root directory but it should be there if you are using wordpress, same as the sitemap, will be created if you install Google XML Sitemaps, or you can burn your feed in google webmaster tools and use that as a sitemap..
Hello again…
I've a question. Do you need a robots.txt if you use the "robots-meta" plugin you mentioned before? Does one conflict with the other, or if I use this, can I skip using robots meta plugin?
As always, any response appreciated
Yep . It will overwrite your settings, I suggest using it, you will see down the bottom of robots meta it allows you to write to robots.txt and also make sure you update htaccess file, also down the bottom.
With regards to User-agent: Googlebot, why dont you just remove the Googlebot section and disallow it to all robots (e.g. php, css) etc?
Mainly because Googlebot is the one bot that seems to try and index everything
Hey there,
Two questions for you:
A: What is the reasoning for blocking category pages?
B: My blog is installed in a subdirectory. Specifically http://www.mydomain.com/magazine and not at the root. The robots.txt file is obviously in the root. Do I need to ad magazine/ to all of the directories you advise to block?
Example: Disallow: magazine/wp-admin
By the way, great, great article. Any response would be appreciated.
hey mate,
A. one simple reason is they can nuke double listings, so instead of having 2 articles in the SERP/google ect, you will get a cat and an article. that's one of many for me, also consider the flow across the site and cats are not needed for bots, only people/navigation, not serp traffic.
B. Just to be 100% sure on that, use the robots meta plugin, and also install the XML Sitemap Generator plugin. You cannot go wrong running those plugins to configure you blogs.
Thanks so much for the advice!
One last question though…will the Robots Meta plugin account for fact that the robots.txt file is in the domain root and not the blog root?
-Arron
Not 100% sure, more like 99%, but to double check once the plugin is installed enter into your browser.
yourdomainname.com/robots.txt
you can also look at other peoples configurations for example here is boganmarketings
http://boganmarketing.com/robots.txt
Make sure the robots.txt reflects as you said /magazine directory, if not then copy the robots.txt then manually add /magazine to it.
The last thing you should do to make sure Google is crawling your site is setup webmaster tools through your Google account and add your site, so if something is wrong and its not being crawled, you will be notified in webmaster tools.
Good Luck
Nice write-up. Thanks for taking the time to actually spell out why you are blocking specific things. This makes me consider your post much more valuable than those that said "Blindly copy this robots.txt and you will be all set".
Some questions:
1) "Disallow: /*.xhtml $" — I believe the space is an error and should be removed, right?
2) You blocked your privacy-policy. I thought Google wants to see this file (maybe as per AdSense TOS). By blocking it, does it simply mean that it will not get indexed – or that Google won't even know about it?
3) My site consists of PAGES only. Is there anything that you can think of that I might need to do since I only use PAGES and not POSTS?
Thanks again for your write-up. I hope you find a few minutes to answer my questions.
Sorry Ted your comment went into spam.
1. Yup remove the space.
2. It should be fine, if you're told you do not have a privacy policy then that's incorrect, show them you do and explain it is not allowed to be indexed, I find the reviewers on Google to be it's main problem and honestly I don't think half of them even know what they're doing especially with Adwords.
3. Not really, interesting only using pages, but with robots meta plugin installed you can control how the robots treat each page anyway, so shouldn't be a problem.
Hey,
Just to clarify – I've got my website at the mydomain.com, but my WP install is at mydomain.com/wp-install. For my Robots.txt do I disallow: /wp-install/wp-admin etc? What about the Feed/Comments/Tag etc lines, do I disallow them at wp-install/ also?
Tom that\’s a strange location and name, why not blog or something else? but yes you need to make sure all locations are an exact match in robots.txt
Seems pretty common from what I've seen – leaving the blog at the root but having the WP core files in a subfolder for neatness/security.
Thanks for the response, thought that was the case with the /wp-* files but wasn't sure re the other ones (which don't seem to exist anywhere on my server…)
Oh cool was wondering wtf was going on with your naming scheme mate, as for the security thats so so, really your best security is backup I think, and with stuff like backupbuddy for wordpress you can create mirrors so you don't ever lose anything on your blogs.
Also now the blog is not located in wp-install or whatever I would probably just use /wp-install/* and treat the rest as a normal blog.
Thanks for that, will look into BackupBuddy!
Or you could just use the import export function, what I do with a lot of my blogs is have local copies mirrored, so I install them on a wamp server.. You can use the export import in wordpress for this, works well or full backups using backupbuddy your call.
Hi Kilwa,
With this setup we are trying to mimic the silo site structure which is very strict and focuses on limiting unnecessary links.
Hello friend,
First of all, congratulations for the great post, I looked at various sites for pre-prepared robots.txt file for WordPress, and it was this blog that I found the best, I have bookmarked your blog. xD
At the moment I'm having a doubt.
You said that the robots.txt file must be inserted into the main directory of the domain.
My domain is in the following format: http://www.mydomain.com/ http://www.mydomain.com/blog/ http://www.mydomain.com/forum/
If I put the robots file inside the directory http://www.mydomain.com/ it will also be used for my forum in this case, am I right?
I noticed in your robots file looks like:
Ex: Disallow: / wp-admin
But it was not for looks like this: Disallow: /blog/wp-admin?
I would be grateful for your attention.
Ps.: You made the right choice to use the IntenseDebate as comments manager.
Sorry for bad English, I am Brazilian …
Hey JCMais,
Yup you are right if your blog is in /blog/ then you will need to have the robots.txt reflect its exact location.
You are welcome to copy mine and just add the /blog/ in front of any wordpress locations, should work fine.
Or check Module 3.2 WordPress SEO on how to configure robots meta, as that will handle the robots.txt
Hi Bogan
Really enjoyed this post, great work and thanks for the help so far.
One question:
My blog posts are located and viewed at http://www.example.com/2010/11/
If i Disallow: /2010/* my posts at http://www.example.com/2010/11/blah-blah-blah won't get indexed? right? this is a problem but there are things in my /2010/ folder that don't need indexing, or shouldn't get indexed.. what should I do? Any help would be great
I have changed the permanlink to /%category%/%postname% so my blogposts could display at /blog/blah-blah would this mean my blogposts would get indexed and the rest excluded? Hoping so!
Hey Nellie,
So you changed it up to /category/postname, no more 2010 in the URL?
if so copy paste the file below and you should be good to go.
http://boganmarketing.com/robots.txt
PS: Could you send me a review copy of your "Alpha Domainer" course? I want to have a look at it.
a tip:
Disallow: /wp-*
^above will replace the below code, thus being more simple/effective:
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
—————————–
Disallow: /20*
^above will replace the below code, thus being more simple/effective:
Disallow: /2010/*
Disallow: /2011/*
Disallow: /2012/*
—————————–
Disallow: */feed*
^above will replace the below code, thus being more simple/effective:
Disallow: /feed
Disallow: /*/feed
—————————–
Disallow: */trackback*
^above will replace the below code, thus being more simple/effective:
Disallow: /trackback
—————————–
Add this: Disallow: *.php
(which means; do not crawl or index any URL ending with “.php”)
under: User-agent: *
—————————–
have fun guys
Hey breeze,
Thanks for the tips, but I think its better for people to actually see what they're exactly blocking and understand the robots.txt file and its uses.
Hey,
Well my code tells the people what exactly is being replaced and looking at it, the "new" code
becomes pretty self-explanatory.
Yeh having another look at your code it does look cleaner, more streamlined and doesnt take away any real detail that might make peeps wonder wtf they are bocking.
if you want a review discount use the contact page and I will hook you up.
I agree with Bogan , but one thing in this article im wonder about.
1.Why to create new storage dir for images like /images when you didnt block whole wp-content dir (images are stored by default in wp-content/uploads )?
2. Some of SER does not reach .gz sitemap so what if you want to serve both xml.gz and .xml sitemap ?
Great info – but is there any chance you could expand it to include best practices for MultiSite?
@MadtownLems, it depends on how you want to use your multisites.
- Will each site use the same robots.txt?
- Will each site have a different variation of the main robots.txt?
- Will one or more sites be "private" thus disallowing robots at all?
When you create multisites in wordpress, it automatically uses the main robots.txt file.
You may not see it in your sub-directory as the robots.txt file used for the other sites is
actually virtual and used from the mainsite's robots.txt.
To have a robots.txt for each site, you'll need to find a wordpress plugin to manage them
seperately for you.
Here is a great resource for managing multisite robots.txt – http://wpmu.org/wpmu-robotstxt-globally/
Hey MadTownLems,
If this for a massive MU install 10,000 plus pages or just a few? and for wildcard domains?
And thnx breeze for the info above.
[...] Robots.txt- This is mainly an exclusion program used to tell robots to not crawl a certain page. Note: Certain robots can ignore the file (i.e. [...]
Sorry I am real new to wordpress so bear with me. So once I create the new directory for my images is my site going to lose all its images? If so what do I have to do transfer all my old images into the new directory or does it update all the old photos to the new directory.
You won't lose any images by simply creating a new directory images. The only thing this will do is save all images after that in a different place. As long as you don't delete the old directory then all your pics will be there and all pics from that point on will be saved in a new place. No biggie.
Thanks for the info.
Yup thats spot on, thank you thetruth, the location of your existing images will stay the same, and the new location will be sitename.com/images/filename.jpg
Can i use .xml instead of .xml.gz in http://yourdomain.com/sitemap.xml.gz?
Thank you very much! It was VERY helpful!
I’m still confuse about the robots.txt, but thank for this advice.
Great info, just used your guide to construct my robots.txt. I had not gotten around to it in over 5 years…but this Panda update is really concerning and I want to make sure that the site is squeaky clean for Google.
I stuck to the following:
User-agent: *
Disallow: /wp-content/themes/
Disallow: /wp-admin/
Disallow: /cgi-bin/
Disallow: /wp-includes/
Disallow: /author/
Disallow: /wp-content/plugins/
Disallow: /comments/
Disallow: /wp-content/cache/
Allow: /
I didn't want to get too crazy and become one of those horror stories where traffic is halved for writing something wrong. BTW Google Webmaster Tools has a small tool that will let you create the robots.txt file which you can download once you are done.
Great robots.txt tutorial for wordpress. I needed confirmation about disallowing certain files such as /author and /tag. I found so much partial information out there and many examples that were just bits and pieces or had errors. Yours is very thorough and clear. Thank you.
Why would I want to disallow cgi-bin, which is one level up from my WordPress installation? It's included in basically every robots.txt for WordPress, though other directories on a higher level are not. If the crawler also sniffs outside of the WP root directory, other folders should be excluded, too. But if it doesn't, there is no need to exclude cgi-bin. Right or wrong?
Also, what's the difference between /feed and /*/feed? And why is it /category/*/* and not simply /category/. When tested with the Google Webmaster Tools there is no difference.
Thanks for the tutorial dude, everything is a little older do you have a 2011 more current or does this still apply to todays google algorithm updates etc?
I used the exact robots.txt you showed, my site is a sales blog for a MLM does this setup you show benefit me by using the same setup???
I don't want my whole wordpress website to be indexed by the search engines so is it possible if I follow the above mentioned steps or need some thing different?
Thanks for sharing. This was pretty comprehensive compared to other lists I found.
WARNING! the robots text file at the top of this page CAN cause errors when you try to submit a site map to google.
This is because it is blocking the .gz extension which is the compressed file format for site maps that google and all search engines prefer.
It is very informative. Loved the whole article! Thanks for sharing.
vrnjacka banja smestaj
vrnjacka banja privatni smestaj
tekstovi pesama
I'm so excited. I really appreciate sharing this great post. Keep up your excellent work.
bojanke za decu
bojanke
gledanje u solju
I'm so excited. I really appreciate sharing this great post. Keep up your excellent work.
bojanke
bojanke za printanje
gledanje u solju
Ultimate WordPress Robots.txt for Silo SEO | Bogan Marketing Aw, this was a very nice post. In idea I wish to put in writing like this additionally – taking time and precise effort to make a very good article… but what can I say… I procrastinate alot and under no circumstances appear to get one thing done. Regards, Best Furniture Manufacturer
Excellent post. I want to thank you for this informative read. I will bookmark this site and visit again..
Banje u Srbiji
Banje Srbije
Excellent post. I want to thank you for this informative read. I will bookmark this site and visit again..
Banje u Srbiji
Banje Srbije
Thanks for your fantastic guide. I just created robots.txt for my new blog
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/
Allow: /wp-content/uploads/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: */trackback/
Disallow: */feed/
Disallow: */comments/
Disallow: /author
Disallow: /archives
Disallow: /iframes
Disallow: /tag/
Disallow: /*?*
Disallow: /*?
http://www.mydomain.com/sitemap.xml
What do you think
I always allow indexing of your cgi-bin because i am so lazy to learn something more about wordpress
Wonderful website, really informative and engaging.
I was looking around for robot txt information and discovered this site. Thanks for the info.
Very useful info. I used quite a bit of it. I think by default wordpress also creates feeds with urls like
/rss
/rss2
/atom
/rdf
I therefore recommend adding rules like:
Disallow: /rss
Disallow: /*/rss
Disallow: /rss2
Disallow: /*/rss2
Disallow: /atom
Disallow: /*/atom
Disallow: /rdf
Disallow: /*/rdf
weheheh nice tips my bro,
..
but when we are USE robots meta , are we could not to USE robots.txt?
What a great article, thank you!
When my blog gets indexed by Google, it indexes the post itself (which I want) but ti also indexes the blog home page, which I probably don't want. I've got the XML-sitemap generator plugin installed and I see there's an option for "sitemap content" and home page is checked. Is this something I should be un-check to have the home page delisted in Google? Many thanks!
very useful article indeed. I have a question : you sad that the access of the search engine at uploads folder is not possible and we should add a new folder and Allow: /images, this means I have to move all my uploads content to images folder and the modify all my content of the site for the new hyperlinks to the images? or is a easer way to avoid this?