Free Directory List | Free Bookmarking List

What is Robots.txt?

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Cheat Sheet

Block all web crawlers from all content

User-agent: * 
Disallow: /

Block a specific web crawler from a specific folder

User-agent: Googlebot 
Disallow: /no-google/

Block a specific web crawler from a specific web page

User-agent: Googlebot 
Disallow: /no-google/blocked-page.html

Sitemap Parameter

User-agent: * 
Disallow: 
Sitemap: http://www.example.com/none-standard-location/sitemap.xml

Optimal Format

Robots.txt needs to be placed in the top-level directory of a web server in order to be useful. Example: http://www.example.com/robots.txt

What is Robots.txt?

The Robots Exclusion Protocol (REP) is a group of web standards that regulate web robot behavior and search engine indexing. The REP consists of the following:

The original REP from 1994, extended 1997, defining crawler directives for robots.txt. Some search engines support extensions like URI patterns (wild cards).
Its extension from 1996 defining indexer directives (REP tags) for use in the robots meta element, also known as "robots meta tag." Meanwhile, search engines support additional REP tags with an X-Robots-Tag. Webmasters can apply REP tags in the HTTP header of non-HTML resources like PDF documents or images.
The Microformat rel-nofollow from 2005 defining how search engines should handle links where the A Element's REL attribute contains the value "nofollow."

Robots Exclusion Protocol Tags

Applied to an URI, REP tags (noindex, nofollow, unavailable_after) steer particular tasks of indexers, and in some cases (nosnippet, noarchive, noodp) even query engines at runtime of a search query. Other than with crawler directives, each search engine interprets REP tags differently. For example, Google wipes out even URL-only listings and ODP references on their SERPs when a resource is tagged with "noindex," but Bing sometimes lists such external references to forbidden URLs on their SERPs. Since REP tags can be supplied in META elements of X/HTML contents as well as in HTTP headers of any web object, the consensus is that contents of X-Robots-Tags should overrule conflicting directives found in META elements.

Microformats

Indexer directives put as microformats will overrule page settings for particular HTML elements. For example, when a page's X-Robots-Tag states "follow" (there's no "nofollow" value), the rel-nofollow directive of a particular A element (link) wins.

Although robots.txt lacks indexer directives, it is possible to set indexer directives for groups of URIs with server sided scripts acting on site level that apply X-Robots-Tags to requested resources. This method requires programming skills and good understanding of web servers and the HTTP protocol.

Pattern Matching

Google and Bing both honor two regular expressions that can be used to identify pages or sub-folders that an SEO wants excluded. These two characters are the asterisk (*) and the dollar sign ($).

* - which is a wildcard that represents any sequence of characters
$ - which matches the end of the URL

Public Information

The robots.txt file is public—be aware that a robots.txt file is a publicly available file. Anyone can see what sections of a server the webmaster has blocked the engines from. This means that if an SEO has private user information that they don’t want publicly searchable, they should use a more secure approach—such as password protection—to keep visitors from viewing any confidential pages they don't want indexed.

Important Rules

In most cases, meta robots with parameters "noindex, follow" should be employed as a way to to restrict crawling or indexation.
It is important to note that malicious crawlers are likely to completely ignore robots.txt and as such, this protocol does not make a good security mechanism.
Only one "Disallow:" line is allowed for each URL.
Each subdomain on a root domain uses separate robots.txt files.
Google and Bing accept two specific regular expression characters for pattern exclusion (* and $).
The filename of robots.txt is case sensitive. Use "robots.txt", not "Robots.TXT."
Spacing is not an accepted way to separate query parameters. For example, "/category/ /product page" would not be honored by robots.txt.

SEO Best Practice

Blocking Page

There are a few ways to block search engines from accessing a given domain:

Block with Robots.txt

This tells the engines not to crawl the given URL, but that they may keep the page in the index and display it in in results. (See image of Google results page below.)

Block with Meta NoIndex

This tells engines they can visit, but are not allowed to display the URL in results. This is the recommended method.

Block by Nofollowing Links

This is almost always a poor tactic. Using this method, it is still possible for the search engines to discover pages in other ways: through browser toolbars, links from other pages, analytics, and more.

Why Meta Robots is Better than Robots.txt

Below is an example of about.com's robots.txt file. Notice that they are blocking the directory /library/nosearch/.

Now notice what happens when the URL is searched for in Google.

Google has 2,760 pages from that "disallowed" directory. The engine hasn't crawled these URLs, so it appears as a URL rather than a traditional listing.

This becomes a problem when these pages accumulate links. Those pages than can accumulate link juice (ranking power) and other query-independent ranking metrics (like popularity and trust), but these pages can't pass these benefits to any other pages since the links on them don't ever get crawled.

In order to exclude individual pages from search engine indices, the noindex meta tag <meta name="robots" content="noindex"> is actually superior to robots.txt.

MozBar
The MozBar SEO toolbar lets you see relevant metrics in your browser as you surf the web.

External Resources

Robots Exclusion Protocol
The official source of information about the Robots Exclusion Protocol.
W3 and Robots Exclusion Protocol
W3's official documentation on the Robots Exclusion Protocol.

The Beginner's Guide to SEO
Moz’s comprehensive guide to the practice of search engine optimization for those unfamiliar with the subject.

How to use the meta name "robots" tag

How do you explain a spider of a search engine you only want him to index just the first page of your website or he is allowed to index the whole website. You use the so called ROBOTS tag.

The robots meta tag is not the same as the file called robots.txt You should use these two together. Both are used by the seach engines like Yahoo and Google. If you use this meta tag the wrong way you might shut the search engines out. So the influence of this meta tag is significant.

<meta name="robots" content="selection">

Example meta tag robots

Add the following meta tag in the HTML source of your page:

<meta name="robots" content="index, follow">

The spider will now index your whole website.
The spider will not only index the first webpage of your website but also all your other webpages.

you are also allowed to type it like this:

<meta name="robots" content="INDEX, FOLLOW"><META NAME="robots" CONTENT="INDEX, FOLLOW"><META NAME="robots" CONTENT="index follow">

By changing the index to no-index and the follow to no-follow your are able to influence the behaviour of the spider. If you don't want the search engine spider to crawl through your whole website you use the following meta tag :

<meta name="robots" content="index, nofollow">

The spider will now only look at this page and stops there.

<meta name="robots" content="noindex, follow">

The spider will not look at this page but will crawl through the rest of the pages on your website.

<meta name="robots" content="noindex, nofollow">

The spider will not look at this page and will NOT crawl through the rest of your webpages.

This metatag can also be typed with the / in the end of the tag:
<meta name="robots" content="index, nofollow" /><meta name="robots" content="noindex, follow" /><meta name="robots" content="noindex, nofollow" />

Where do you add this robot tag?

You add this robot tag on the first index page and you tell the spider if you want your whole website to be crawled or not.
Make sure that on every page relevant meta tags are added. Add keywords and phrases that are relevant and correspond to the text and the language on that specific page. It might be a lot of work to add specific meta tags to each page but you will notice in time that it works!

How To Use HTML Meta Tags

What Are Meta Tags?

HTML meta tags are officially page data tags that lie between the open and closing head tags in the HTML code of a document.

The text in these tags is not displayed, but parsable and tells the browsers (or other web services) specific information about the page. Simply, it “explains” the page so a browser can understand it.

Here's a code example of meta tags:

<head>
<title>Not a Meta Tag, but required anyway </title>
<meta name="description" content="Awesome Description Here">
<meta http-equiv="content-type" content="text/html;charset=UTF-8">
</head>

For more on the history of meta tags, see our post “Death of a Meta Tag”.

The Title Tag

Although the title tag appears in the head block of the page, it isn't actually a meta tag. What's the difference? The title tag is a required page “element” according to the W3C. Meta tags are optional page descriptors.

To learn more about best practices for title tag element, our post “How to Write Title Tags For Search Engine Optimization” tells you everything you need to know.

The Description Meta Tag

This is what the description tag looks like:

<meta name="description" content="Awesome Description Here">

Ideally, your description should be no longer than 155 characters (including spaces). However, check the search engine results page (SERP) of choice to confirm this. Some are longer and some are shorter. This is only a rule of thumb, not a definite “best practice” anymore.

The “description” meta tag helps websites in three important ways:

“Description” tells the search engine what your page or site is about: For the search engine to understand what your page is about, you need to write a good description. When Google's algorithm decides a description is badly written or inaccurate, it will replace that description with its own version of what is on the page. Wouldn't you prefer to describe your site to potential customers or visitors using your own words rather than leaving it in Google's artificial hands? Look at this example and judge for yourself:

“Description” helps with click through rates to your site: Writing a good description not only helps keep Google from rewriting it, but also helps you get good more people clicking through to your site. A well-written description not only tells users what is on your page, but also enticesthem to visit your site. A description is what shows up here in the search engine results. It is like good window dressing. Sites with poor descriptions will get less click throughs and the search engines will demote your site in favor of other sites.
“Description” helps with site rankings: The common belief (based on what Google said in 2009) is that nothing in the description will help you get rankings. However, I have seen evidence to the contrary. Is it heavily weighted? No, but if you want some value on a secondary keyword (say an –ing –ed or –s), use it here.

Two other quick notes on meta description tags:

Empty Descriptions: Can a description be empty? Yes. When it is empty Google and Bing will fill it in for you. In fact, sometimes (e.g., for blogs) you may prefer Google's or Bing's version. (Personally though, I always fill it in whenever possible, preferring my version to theirs, but if you have a small staff, this isn't always practical.)
Quotes: Don't use full quotation marks (“”) in your description. It will likely cut off your description. Use single quotes to avoid this issue.

The Keywords Meta Tag

A long time ago in a galaxy far, far away, the “keywords” meta tag was a critical element for early search engines. Much like the dinosaurs, this tag is a fossil from ancient search engine times.

The only search engine that looks at the keywords anymore is Microsoft's Bing – and they use it to help detect spam. To avoid hurting your site, your best option is to never add this tag.

Or, if that's too radical for you to stomach, at least make sure you haven't stuffed 300 keywords in the hopes of higher search rankings. It won't work. Sorry.

If you already have keyword meta tags on your website, but they aren't spammy, there's no reason to spend the next week hurriedly taking them out. It's OK to leave them for now – just take them out as you're able, to reduce page weight and load times.

Other Meta Tags

There are many other meta tags, but none are really considered useful nowadays. Many of the tags that we used did things like:

Told the spider when to come back

<meta name="revisit-after" content="30 days">

Told the browser the distribution

< meta name="distribution" content="web">

Told the page to refresh

<meta http-equiv="refresh" content="30">

Told the page to redirect/refresh

<meta http-equiv="refresh" content="x_seconds; url=http://www.yourhost.com/pagetosendto.html">

We don't use these anymore, either because there are better ways (such as schema tagging or server side methods) or because the engines they used to work on are no longer in existence or Google has explicitly told us they are not great ideas (such as redirects at the page level).

NOTE: Schema tagging and rich data snippets are single-handedly the most important (and somewhat quietly announced) change to how your site interacts with the search engines and the search spiders. Learn it. Know it. Implement it.

Robots Meta Tag

The robots tag is still one of the most important tags. Not so much for the proper implementation, but the improper.

The robots meta tag lets you specify that a particular page should not be indexed by a search engine or if you do or do not want links on the page followed.

Believe it or not, it is still common for a site to be deindexed because someone accidentally added a noindex tag to the entire site. Understanding this tag is vitally important.

Here are the four implementations of the Robots Meta Tag and what they mean.

```
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
```
This means: "Do not Index this page. Do not follow the links on the page." Your page will drop OUT of the search index AND your links to other pages will not be followed. This will break the link path on your site from this page to other pages.

This tag is most often used when a site is in development. A developer will noindex/nofollow the pages of the site to keep them from being picked up by the search engines, then forget to remove the tag. When launching your new website, do not trust it has been removed. DOUBLE CHECK!
```
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
```
This means: "Do Index this page. Do not follow the links on the page." Your page WILL be in the index AND your links to other pages will not be followed. This will break the link path on your site from this page to other pages.
```
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
```
This means: "Do not Index this page. Do follow the links on the page." Your page will drop OUT of the index BUT your links to other pages will be followed. This will NOT break the link path on your site from this page to other pages.
```
<META NAME="ROBOTS" CONTENT="INDEX, FOLLOW">
```
This means: "Do Index this page. Do follow the links on the page." This means your page WILL be in the index AND your links to other pages will be followed. This will NOT break the link path on your site from this page to other pages.

NOTE: The robots tag may be ignored by less scrupulous spiders.

The Charset Tag

Finally, all sites must validate charset. In the U.S., that is the UTF-8 tag. Just make sure this is on your page if you're delivering HTML using English characters.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Conclusion

While meta tags aren't the magical solution that you may have heard, they still play an important role in helping your site get found in search engines. Enjoy your metas!

Below are some additional resources to help you on your path to search engine optimization.

Robots.txt and Robots Metatags

Want to control those pesky bots? There are techniques available to help you control what they crawl, and what they index. This article will discuss the basics and point you at some resources for more information.

Robots.txt Overview

The most basic technique is the "robots.txt" file. This file allows you to tell search engine robots what parts of your site that the cannot crawl. To start, you need to create a file called robots.txt, and it must live in the root directory for your domain. This means that if your site is "www.yourdomain.com" the robots.txt file must be located at "www.yourdomain.com/robots.txt". Do not place it any where else, because it will have no effect.

The basic technique is simple. To exclude all bots from your server, structure your robots.txt as follows:

User-agent: *
Disallow: /

You can choose to disable only certain bots, simply by specifying the bot name on the User-Agent line, instead of using the "*" to indicate all bots. You can also specify that only certain directories are protected, with a file similar to this one:

User-agent: *
Disallow: /cgi-bin/
Disallow: /php/

The definitive definitions for the robots.txt file can be found at this location.

Limitations of Robots.txt

Robots.txt is only obeyed by "well behaved" bots. It is not a solution to prevent your competitor from crawling your site, or some other party from mounting a malicious attack on your domain. You need to protect your self by other means from theses types of problems.

In addition, the fact that a search engine bot is not supposed to crawl your page does not mean it will not index it. It still may. Google, for example will still index a page that it is not supposed to crawl if there is a link to that page from another site. If you look through Google search results you will sometimes see pages in the results that have just the URL with no title or description. That is a sure sign of a page that has been excluded by robots.txt, but that someone else links to the page.

Robots Metatags

The Robots metatags are implemented within each web page. There are two parameters: Index/Noindex and Follow/Nofollow. Index relates to whether or not the page should be indexed. Follow relates to whether or not a page should be analyzed for the links on the page. Like all metatags, this one should show up in the <head> section of your web page. Here is the basic syntax:

<meta name="robots" content="noindex,nofollow">

While you can specify "index" or "follow", there is no need to do so, as these are the defaults for every page on your site. This is what a search engine will do if it find no "robots" metatag. Here is the beauty of this metatag: Search engines are not supposed to index this page, even if another page links to it (and Google does obey this rule).

Be careful though, the robots metatag is new, and not all search engines obey it. For example, I learned in a discussion with Matt Cutts (of Google) that if you exclude the crawling of a page using the robots.txt file that the robots metatags are still ignored - and pages will still be indexed even if your robots.txt metatag specifies noindex. So if you truly do not want a page indexed by Google, do not mention it in the robots.txt file and rely on the robots metatags only.

The definitive definitions for the robots metatags can be found here. Read this article for information on metatags and SEO considerations

Have fun controlling those bots!

Meta Tags Optimisation Tutorial

Want to get your website higher in the search results from a search engine? Then the best way is by the correct use of Meta tags.

As 95% people using search engines only look at the top 20 search results, it is important to get your website as high as possible.

Several meta tags where introduced by the popular search engines, Infoseek and AltaVista, to help there search engines index web pages, most now use meta tags to index pages.

If you want your website indexed then consider using Meta tags.

Meta tags go in the head of your web page, in-between the HTML tags, <head> and </head>.

There are a number of different Meta tags that you can use, but the most important ones are the Description and the Keywords Meta tags as well as having a title for the web page.

Title

The title of your web page should also be in the document head and is the mot important 'non content' part of your page as it carries the most weighting with search engines. You should keep this below 60 characters and include your most important keywords first, if you have a company of brand name then it is often better to palace this at the end after your most important keywords for the page.


	<title>Meta Tags Optimisation Tutorial - Web Wiz SEO Article</title>

Description

This tag is often used by search engines for the description of your web page. It will not affect the ranking or indexing of your web page, so think of it as an advert for your page and write your description as a short ad of under 160 characters to encourage people to view your page.


	<meta name= "description" content="Tips on how to optimise your Meta Tags to get your website higher in search engine results." />

Keywords

To help get your website up in the ratings you can supplement the title and description with a list of keywords, separated by commas, that some one might type into a search engine when looking for a site like yours. Google will ignore keywords meta tags, but other search engines do use this to help index your website.


	<meta name="keywords" content="meta tags,tutorial,training,html" />

Rating

This is used to give the web page a rating for the appropriateness for kids. The ratings are, general, mature, restricted, and 14 years.


	<meta name="rating" content="general" />

The rest of the tags are not necessary but I shall run through them anyway.

Author

This is used to identify the author of the web page.


	<meta name="author" content="Web Wiz " />

Copyright

This one identifies any copyright information there is for the web page.


	<meta name="copyright" content="2015, Web Wiz " />

Expires

This meta tag is used by responsible web masters to let the search engine know when the page expires and can be removed from the search engines directory. It can either be set to never, or a date in the format day, month, year, eg. 28 June 2003.


	<meta name="expires" content="never"/>

Distribution

Tells the search engine who the page is meant for and can be set to; global, for everyone, local, for regional sites, and IU, for Internal Use.


	<meta name="distribution" content="global" />

Robots

This Meta tag is used is used to tell the search engine whether you want the web page indexed or not. You only really need to use this Meta tag if you DON'T wont your web page indexed. The values for this tag are: -

index(default)	Index the page
noindex	Don't index the page
nofollow	Don't index any pages hyper-linked to this page
none	Same as "noindex, nofollow"


	<meta name="robots" content="noindex, nofollow" />

Meta Tags Example

Below is an example of the head of a document containing Meta tags for search engines and a title for the web page: -


	<head><title>Meta Tags Optimisation Tutorial - Web Wiz SEO Article</title> <meta name="description" content="Tips on how to optimise your Meta Tags to get your website higher in search engine results." /> <meta name="keywords" content="meta tags, tutorial, training, HTML" /> <meta name="rating" content="general" /> <meta name="copyright" content="2015, Web Wiz " /> <meta name="revisit-after" content="31 Days" /> <meta name="expires" content="never"> <meta name="distribution" content="global" /> <meta name="robots" content="index" /> </head>

Free Meta Tag Generator

If you need help making meta tags for your website, then why not use Web Wiz free Meta Tag Generator, and you will have your own meta tags for your site in a matter of minutes.

Metatags and SEO

The role of metatags is misunderstood by some members of the webmaster community. I still have people who are not in the business come up to me and talk about SEO as if it's all about picking keywords metatags. I try not to cringe when I get into these discussions. I once had an idea for an article about this. The article was going to be completely blank. Only those that were enterprising enough to look at the source would have seen the content of the article in the keyword metatags:

<meta name="keywords" content="keywords, metatags, do, 
not, matter" />

I have built tons of high ranking sites and not implemented any keywords metatags at all. In fact, Matt Cutts confirmed in 2009 that Google does not use the keywords metatag. The fundamental reason for search engines to ignore the keyword metatag is that spammers abused it in the early days of SEO.

Search engines prefer to focus all of their attention on "user visible" text. The user can't see the keywords metatags unless they view the source of your web page. It's this invisible aspect that makes the keywords metatag so attractive for a spammer to abuse it.

So what about metatags in general? This article will provide an overview of their value and the best way to use them.

Title Tag

The title tag is the single most important "on page" element in telling a search engine what your page is all about. Yes, the title tag is incredibly important. During the design of your site, you should have decided on the best keywords for each of your pages. For each page, you should use the most important keywords (note: keywords means "search phrases" as used in this article) in your title tag. For example, if the most important keyword for your page is "blue widgets", you may use a title tag such as:

<title>Blue Widgets from Blue Widget Manufacturing</title>

You can emphasize more than one keyword, but should limit it to no more than 3, as follows:

<title>Blue Widgets, Round Blue Widgets, and Square Blue 
Widgets</title>

Note that we trimmed off the company name in this example. There are some SEOs that recommend that it's best to keep your title tag to 65 characters or less. Our opinion is that longer title tags are probably not harmful, but the extra characters are ignored by some search engines. Given that the extra characters are ignored, we tend to keep ours less than 65 characters.

The Two Golden Rules of Title Tags

1. You should have a separate page for each major customer product/service/need that you address, and the title tag should focus on the unique content of the page:

2. Do not use the same title tag on more than one page. This just causes them to compete with each other in the search results. If the pages are not really different, than why would they both exist? This suggests a bad search engine experience, and the search engines don't like it either.

Description Metatag

This metatag also sees limited use by search engines. Like the keywords metatag, it is not generally speaking user visible. I know of no search engine that considers the content of the description metatag for page ranking purposes. However, a search engine may use your description metatag as the description of your page that it displays in search results.

Google, for example, does this if it can't find enough text on your page to develop a good page summary on its own. For that reason, you should make sure that you write a good description metatag for your pages. Since this description may show up in the search results shown by search engines, you want the description to be well written enough that it will help entice the user to click on the link to your site instead of a link to someone else's site.

Keep the description metatag crisp, just a few lines of text. Don't stuff it with keywords. Remember, search engines do not use this tag for ranking purposes. Write something that tells the user why they should come to your page - what benefit will they get by doing so. Straightforward, basic marketing. Here is a simple example:

<meta name="description" content="Blue Widgets: Low Cost, High Quality 
Blue Widgets available for Order Online.  Delivered to your Doorstep in 
48 hours or less." />

Keywords Metatag

So should you implement a keywords metatag? I never do. If someone in your company is insistent, than spend 10 seconds or less and pick some keywords that relate to the unique aspects of the page.

Robots Metatag

The Robots metatag is relatively new. Both Google and Bind support this tag. It is designed to allow you to tell a search engine when you do not want it to index your page, and/or when you do not want the search engine to look at or evaluate any of the links on your page. The basic format of the metatag is:

<meta name="robots" content="noindex,nofollow">

You can specify either attribute, both attributes, or neither attribute by simply not including the robots metatag. Click here for more detailed information on the robots metatag

What is Robots.txt?

Cheat Sheet

Block all web crawlers from all content

Block a specific web crawler from a specific folder

Block a specific web crawler from a specific web page

Sitemap Parameter

Optimal Format

What is Robots.txt?

Robots Exclusion Protocol Tags

Microformats

Pattern Matching

Public Information

Important Rules

SEO Best Practice

Blocking Page

Block with Robots.txt

Block with Meta NoIndex

Block by Nofollowing Links

Why Meta Robots is Better than Robots.txt

Related Tools

External Resources

Related Guides

Example meta tag robots

Where do you add this robot tag?

What Are Meta Tags?

The Title Tag

The Description Meta Tag

The Keywords Meta Tag

Other Meta Tags

Robots Meta Tag

The Charset Tag

Conclusion

Robots.txt Overview

Limitations of Robots.txt

Robots Metatags

Meta Tags Optimisation Tutorial

Title

Description

Keywords

Rating

Author

Copyright

Expires

Distribution

Robots

Meta Tags Example

Free Meta Tag Generator

Title Tag

The Two Golden Rules of Title Tags

Description Metatag

Keywords Metatag

Robots Metatag

Popular Posts

Recent post