SEOmoz The Beginners Guide To SEO 2012 PDF
SEOmoz The Beginners Guide To SEO 2012 PDF
SEOmoz The Beginners Guide To SEO 2012 PDF
Imagine the World Wide Web as a network of stops in a big city subway
system.
Each stop is its own unique document (usually a web page, but sometimes a PDF, JPG or other
file). The search engines need a way to crawl the entire city and find all the stops along the way,
so they use the best path available links.
The link structure of the web serves to bind all of the pages together.
Through links, search engines automated robots, called crawlers, or spiders can reach the
many billions of interconnected documents.
Once the engines find these pages, they next decipher the code from them and store selected pieces
in massive hard drives, to be recalled later when needed for a search query. To accomplish the
monumental task of holding billions of pages that can be accessed in a fraction of a second, the
search engines have constructed datacenters all over the world.
These monstrous storage facilities hold thousands of machines processing large quantities of
information. After all, when a person performs a search at any of the major engines, they demand
results instantaneously even a 1 or 2 second delay can cause dissatisfaction, so the engines work
hard to provide answers as fast as possible.
1.
2.
Providing Answers
Providing answers to user queries,
most frequently through lists of
relevant pages, through retrieval and
rankings.
Search engines are answer machines. When a person looks for something online, it requires the
search engines to scour their corpus of billions of documents and do two things first, return only
those results that are relevant or useful to the searchers query, and second, rank those results in
order of perceived usefulness. It is both relevance and importance that the process of SEO
is meant to influence.
To a search engine, relevance means more than simply finding a page with the right words. In the
early days of the web, search engines didnt go much further than this simplistic step, and their
results suffered as a consequence. Thus, through evolution, smart engineers at the engines devised
better ways to find valuable results that searchers would appreciate and enjoy. Today, 100s of
factors influence relevance, many of which well discuss throughout this guide.
Over the 15 plus years that web search has existed, search
marketers have found methods to extract information about how
the search engines rank pages. SEOs and marketers use that data
There is perhaps no greater tool available to webmasters researching the activities of the engines than the freedom to use the search engines
to perform experiments, test theories and form opinions. It is through this iterative, sometimes painstaking process, that a considerable
amount of knowledge about the functions of the engines has been gleaned.
1.
5.
ishkabibbell.com)
2.
pages
6.
3.
4.
7.
In this test, we started with the hypothesis that a link higher up in a pages code carries more
weight than a page lower down in the code. We tested this by creating a nonsense domain linking
out to three pages, all carrying the same nonsense word exactly once. After the engines spidered
the pages, we found that the page linked to from the highest link on the home page ranked first.
also available through patent applications made by the major engines to the United States Patent
Office. Perhaps the most famous among these is the system that spawned Googles genesis in the
Stanford dormitories during the late 1990s PageRank documented as Patent #6285999
Method for node ranking in a linked database. The original paper on the subject Anatomy of a
Large-Scale Hypertextual Web Search Engine has also been the subject of considerable
study. To those whose comfort level with complex mathematics falls short, never fear. Although
the actual equations can be academically interesting, complete understanding evades many of the
most talented search marketers. Remedial calculus isnt required to practice SEO!
We like to say "Build for users, not search engines." When users have a bad experience at
your site, when they can't accomplish a task or find what they were looking for, this often
1.
2.
3.
4.
5.
Click on a result.
6.
7.
8.
correlates with poor search engine performance. On the other hand, when users are happy with
your website, a positive experience is created, both with the search engine and the site providing
the information or result.
solution.
What are users looking for? There are three types of search queries users generally perform:
"Do" Transactional Queries - Action queries such as buy a plane ticket or listen to a song.
"Know" Informational Queries - When a user seeks information, such as the name of the
band or the best restaurant in New York City.
"Go" Navigation Queries - Search queries that seek a particular online destination, such as
Facebook or the homepage of the NFL.
When visitors type a query into a search box and land on your site, will they be satisfied with what
they find? This is the primary question search engines try to figure out millions of times per day.
The search engines' primary responsibility is to serve relevant results to their users.
It all starts with the words typed into a small box.
the query.
Why invest time, effort and resources on SEO? When looking at the broad picture of search engine
usage, fascinating data is available from several studies. We've extracted those that are recent,
relevant, and valuable, not only for understanding how users search, but to help present a
compelling argument about the power of search.
view
view
view
The second position receives 10.1%, the third 7.2%, the fourth
4.8%, and all others are under 2%.
A #1 position in Bing's search results averages a 9.66% clickthrough rate.
The total average CTR for first ten results was 52.32% for Google
and 26.32% for Bing.
view
But Wait...
Imagine you posted online a picture of your family dog. A human might describe it as "a black,
medium-sized dog - looks like a Lab, playing fetch in the park." On the other hand, the best
search engine in the world would struggle to understand the photo at anywhere near that level of
sophistication. How do you make a search engine understand a photograph? Fortunately, SEO
allows webmasters to provide "clues" that the engines can use to understand content. In fact,
adding proper structure to your content is essential to SEO.
Understanding both the abilities and limitations of search engines allows you to properly build,
format and annotate your web content in a way that search spiders can digest. Without SEO, many
websites remain invisible to search engines.
hidden.
Japan.
Mixed contextual signals. For example, the title of your blog post
Take a look at any search results page and youll find the answer to why search marketing
has a long, healthy life ahead.
Ten positions, ordered by rank, with click-through traffic based on their relative position & ability to
attract searchers. Results in positions 1, 2 and 3 receive much more traffic than results down the
page, and considerably more than results on deeper pages. The fact that so much attention goes to so
few listings means that there will always be a financial incentive for search engine rankings. No
matter how search may change in the future, websites and businesses will compete with one another
for this traffic, branding, and visibility it provides.
Search engines are limited in how they crawl the web and
interpret content. A webpage doesn't always look the same to you
and I as it looks to a search engine. In this section, we'll focus on
specific technical aspects of building (or modifying) web pages so
they are structured for both search engines and human visitors
alike. This is an excellent part of the guide to share with your
programmers, information architects, and designers, so that all
parties involved in a site's construction can plan and develop a
search-engine friendly site.
In order to be listed in the search engines, your most important content should be in HTML text
format. Images, Flash files, Java applets, and other non-text content are often ignored or devalued
by search engine spiders, despite advances in crawling technology. The easiest way to ensure that
the words and phrases you display to your visitors are visible to search engines is to place it in the
HTML text on the page. However, more advanced methods are available for those who demand
greater formatting or visual display styles:
1.
3.
2.
4.
In the example above, Google's spider has reached page "A" and sees
links to pages "B" and "E". However, even though C and D might be
important pages on the site, the spider has no way to reach them (or
even know they exist.) This is because no direct, crawlable links point
to those pages. As far as Google is concerned, they might as well not
exist - great content, good keyword targeting, and smart marketing
won't make any difference at all if the spiders can't reach those pages
in the first place.
In the above illustration, the "<a" tag indicates the start of a link. Link tags can contain images, text, or other objects, all of which provide a
clickable area on the page that users can engage to move to another page. This is the original navigational element of the Internet "hyperlinks". The link referral location tells the browser (and the search engines) where the link points to. In this example, the URL
http://www.jonwye.com is referenced. Next, the visible portion of the link for visitors, called "anchor text" in the SEO world, describes the
page the link points to. The page pointed to is about custom belts, made by my friend from Washington D.C., Jon Wye, so I've used the
anchor text "Jon Wye's Custom Designed Belts". The </a> tag closes the link, so that elements later on in the page will not have the link
attribute applied to them.
This is the most basic format of a link - and it is eminently understandable to the search engines. The spiders know that they should add this
link to the engines' link graph of the web, use it to calculate query-independent variables (like Google's PageRank), and follow it to index the
contents of the referenced page.
Submission-required forms
Although this relates directly to the above warning on forms, it's such
a common problem that it bears mentioning. Some webmasters
believe if they place a search box on their site, then engines will be
attempt to "submit" forms and thus, any content or links that would
either do not crawl or give very little weight to the links embedded
The links embedded inside the Panda site (from our above example)
pandas are listed and linked to on the Panda page, no spider can
reach them through the site's link structure, rendering them invisible
to the engines (and un-retrievable by searchers performing a query).
Search engines will only crawl so many links on a given page - not an
block access by rogue bots, only to discover that search engines cease
their crawl.
The Meta Robots tag and the Robots.txt file both allow a site
Frames or I-frames
Technically, links in both frames and I-Frames are crawlable, but
both present structural issues for the engines in terms of organization
and following. Unless you're an advanced user with a good technical
understanding of how search engines index and follow links in
frames, it's best to stay away from them.
Links can have lots of attributes applied to them, but the engines ignore nearly all of these, with
the important exception of the rel="nofollow" tag. In the example above, by adding the
rel=nofollow attribute to the link tag, we've told the search engines that we, the site owners, do
not want this link to be interpreted as the normal, "editorial vote."
Nofollow, taken literally, instructs search engines to not follow a link (although some do.) The
nofollow tag came about as a method to help stop automated blog comment, guest book, and link
injection spam (read more about the launch here), but has morphed over time into a way of
telling the engines to discount any link value that would ordinarily be passed. Links tagged with
nofollow are interpreted slightly differently by each of the engines, but it is clear they do not pass
as much weight as normal "followed" links.
Keyword Abuse
keywords into text, the url, meta tags and links. Unfortunately, this
Today, although search engines still can't read and comprehend text
(more on this below.) If your page targets the keyword phrase "Eiffel
Tower" then you might naturally include content about the Eiffel
Tower" onto a page with irrelevant content, such as a page about dog
breeding, then your efforts to rank for "Eiffel Tower" will be a long,
uphill battle.
On-Page Optimization
(proximity)
of the documents
The Conclusion:
That said, keyword usage and targeting are still a part of the search
What should optimal page density look like then? An optimal page
for the phrase running shoes would thus look something like:
Use the keyword in the title tag at least once. Try to keep the
keyword as close to the beginning of the title tag as possible.
More detail on title tags follows later in this section.
Once prominently near the top of the page.
At least 2-3 times, including variations, in the body copy on the
page - sometimes a few more if there's a lot of text content. You
may find additional value in using the keyword or variations
more than this, but in our experience, adding more instances of a
term or phrase tends to have little to no impact on rankings.
At least once in the alt attribute of an image on the page. This not
only helps with web search, but also image search, which can
occasionally bring valuable traffic.
Once in the URL. Additional rules for URLs and keywords are
discussed later on in this section.
At least once in the meta description tag. Note that the meta
description tag does NOT get used by the engines for rankings,
but rather helps to attract clicks by searchers from the results
page, as it is the "snippet" of text used by the search engines.
Generally not in link anchor text on the page itself that points to
other pages on your site or different domains (this is a bit
complex - see this blog post for details).
Be mindful of length
The title tag of any page appears at the top of Internet browsing
Search engines display only the first 65-75 characters of a title tag in
software, and is often used as the title when your content is shared
through social media or republished.
the search results. (After this length, the engines show an ellipsis "..." to indicate when a title tag has been cut off) This is also the
general limit allowed by most social media sites, so sticking to this
limit is generally wise. However, if you're targeting multiple
keywords (or an especially long keyword phrase) and having them in
the title tag is essential to ranking, it may be advisable to go longer.
Leverage branding
At SEOmoz, we love to end every title tag with a brand name
mention, as these help to increase brand awareness, and create a
Using keywords in the title tag means that search engines will
"bold" those terms in the search results when a user has performed a
query with those terms. This helps garner a greater visibility and a
higher click-through rate.
higher click-through rate for people who like and are familiar with a
brand. Sometimes it makes sense to place your brand at the
beginning of the title tag, such as your homepage. Since words at the
beginning of the title tag carry more weight, be mindful of what you
are trying to rank for.
Meta Tags
Meta tags were originally intended to provide a proxy for information about a website's content.
Several of the basic meta tags are listed below, along with a description of their use.
Meta Robots
The Meta Robots tag can be used to control search engine spider activity (for all of the major
engines) on a page level. There are several ways to use meta robots to control how search engines
treat a page:
index/noindex tells the engines whether the page should be crawled and kept in the engines'
index for retrieval. If you opt to use "noindex", the page will be excluded from the engines. By
default, search engines assume they can index all pages, so using the "index" value is
generally unnecessary.
follow/nofollow tells the engines whether links on the page should be crawled. If you elect
to employ "nofollow," the engines will disregard the links on the page both for discovery and
ranking purposes. By default, all pages are assumed to have the "follow" attribute.
Example: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
noarchive is used to restrict search engines from saving a cached copy of the page. By
default, the engines will maintain visible copies of all pages they indexed, accessible to
searchers through the "cached" link in the search results.
nosnippet informs the engines that they should refrain from displaying a descriptive block
of text next to the page's title and URL in the search results.
noodp/noydir are specialized tags telling the engines not to grab a descriptive snippet
about a page from the Open Directory Project (DMOZ) or the Yahoo! Directory for display in
the search results.
The X-Robots-Tag HTTP header directive also accomplishes these same objectives. This
technique works especially well for content within non-HTML files, like images.
Meta Description
The meta description tag exists as a short description of a page's content. Search engines do not
use the keywords or phrases in this tag for rankings, but meta descriptions are the primary source
for the snippet of text displayed beneath a listing in the results.
The meta description tag serves the function of advertising copy, drawing readers to your site from
the results and thus, is an extremely important part of search marketing. Crafting a readable,
compelling description using important keywords (notice how Google "bolds" the searched
keywords in the description) can draw a much higher click-through rate of searchers to your page.
Meta descriptions can be any length, but search engines generally will cut snippets longer than 160
characters, so it's generally wise to stay in these limits.
In the absence of meta descriptions, search engines will create the search snippet from other
elements of the page. For pages that target multiple keywords and topics, this is a perfectly valid
tactic.
Although these tags can have uses for search engine optimization, they are less critical to the
process, and so we'll leave it to Google's Webmaster Tools Help to answer in greater detail - Meta
Tags.
URLs, the web address for a particular document, are of great value from a search perspective.
They appear in multiple important locations.
experiences.
Employ Empathy
Place yourself in the mind of a user and look at your URL. If you can
easily and accurately predict the content you'd expect to find on the
page, your URLs are appropriately descriptive. You don't need to
spell out every last detail in the URL, but a rough idea is a good
starting point.
Shorter is better
While a descriptive URL is important, minimizing length and trailing
slashes will make your URLs easier to copy and paste (into emails,
blog posts, text messages, etc) and will be fully visible in the search
results.
Go static
The best URLs are human readable without lots of parameters,
numbers and symbols. Using technologies like mod_rewrite for
Apache and ISAPI_rewrite for Microsoft, you can easily transform
Duplicate content is one of the most vexing and troublesome problems any website can face.
Over the past few years, search engines have cracked down on "thin" and duplicate content
through penalties and lower rankings.
Canonicalization happens when two or more duplicate versions of a webpage appear on
different URLs. This is very common with modern Content Management Systems. For example,
you offer a regular version of a page and a "print optimized" version of the same content. Duplicate
content can even appear on multiple websites. For search engines, this presents a big problem which version of this content should they show to searchers? In SEO circles, this issue is often
referred to as duplicate content - described in greater detail here.
Instead, if the site owner took those three pages and 301redirected them, the search engines would have only one,
stronger page to show in the listings from that site.
The Canonical URL tag attribute is similar in many ways to a 301 redirect from an SEO
perspective. In essence, you're telling the engines that multiple pages should be considered as one
(which a 301 does), without actually redirecting visitors to the new URL - often saving your
development staff considerable heartache.
For more about different types of duplicate content, this post by Dr. Pete deserves special
mention.
Rich Snippets
Ever see a 5 star rating in a search result? Chances are, the search
SEO Conference<br/>
<div>
While the use of rich snippets and structured data is not a required
Event date:<br/>
May 8, 7:30pm
some circumstances.
</div>
<div itemscope
itemtype="http://schema.org/Event">
<div itemprop="name">SEO Conference</div>
(pictures.) There are several good resources for learning more about
Event date:
Ask yourself...
Is the keyword relevant to your website's content? Will
searchers find what they are looking on your site when they
search using these keywords? Will they be happy with what
Even the best estimates of value fall flat against the hands-
and also how hard it will be to rank for the given term. Are
wallet out!
Understanding the search demand curve is critical. To the right we've
included a sample keyword demand curve, illustrating the small
number of queries sending larger amounts of traffic alongside the
volume of less-searched terms and phrases that bring the bulk of our
search referrals.
Resources
Where do we get all of this knowledge about keyword demand and
keyword referrals? From research sources like these listed here:
Google Adwords Keyword Tool
Google Insights for Search
Google Trends Keyword Demand Prediction
Microsoft Advertising Intelligence
Wordtrackers Free Basic Keyword Demand
Google's AdWords Keyword tool is a common starting point for SEO
keyword research. It not only suggests keywords and provides
estimated search volume, but also predicts the cost of running paid
campaigns for these terms. To determine volume for a particular
keyword, be sure to set the Match Type to [Exact] and look under
Local Monthly Searches. Remember that these represent total
Crafting a thoughtful, empathetic user experience can ensure that your site is perceived positively
by those who visit, encouraging sharing, bookmarking, return visits and links - signals that trickle
down to the search engines and contribute to high rankings.
2. Machine Learning
In 2011 Google introduced the Panda Update to its ranking algorithm, significantly changing the
way it judged websites for quality. Google started by using human evaluators to manually rate
1000s of sites, searching for "low quality" content. Google then incorporated machine learning to
mimic the human evaluators. Once its computers could accurately predict what the humans would
judge a low quality site, the algorithm was introduced across millions of sites spanning the
Internet. The end result was a seismic shift which rearranged over 20% of all of Google's search
results. For more on the Panda update, some good resources can be found here and here.
3. Linking Patterns
The engines discovered early on that the link structure of the web could serve as a proxy for votes
and popularity - higher quality sites and information earned more links than their less useful,
lower quality peers. Today, link analysis algorithms have advanced considerably, but these
principles hold true.
All of that positive attention and excitement around the content offered by the
collection of links. The timing, source, anchor text, and number of links to the
new site are all factored into its potential performance (i.e., ranking) for
relevant queries at the engines.
Transactional Searches
Navigational Searches
Informational Searches
Fulfilling these intents is up to you - Creativity, high quality writing, use of examples, images, and
multimedia all help in crafting content that perfectly fits with a searcher's goals. Your reward is
satisfied searchers who demonstrate positive experience through engaged activity on your site or with
links to it.
For search engines that crawl the web, links are the streets
between pages. Using sophisticated link analysis, the engines can
discover how pages are related to each other and in what ways.
Since the late 1990's search engines have used links as votes - representing the democracy of the
web's opinion about what pages are important and popular. The engines themselves have refined
the use of link data to a fine art, and complex algorithms create nuance evaluations of sites and
pages based on this information.
Links aren't everything in SEO, but search professionals attribute a large portion of the engines'
algorithms to link-based factors (see Search Engine Ranking Factors). Through links, engines
can not only analyze the popularity of a website & page based on the number and popularity of
pages linking to them, but also metrics like trust, spam, and authority. Trustworthy sites tend to
link to other trusted sites, while spammy sites receive very few links from trusted sources (see
mozTrust). Authority models, like those postulated in the Hilltop Algorithm, suggest that links
are a very good way of identifying expert documents on a given subject.
Global Popularity
The more popular and important a site is, the more links from that
site matter. A site like Wikipedia has literally 1000's of diverse sites
linking to it, which means it's probably a popular and important site.
To earn trust and authority with the engines, you'll need the help of
other link partners. The more popular, the better.
Local/Topic-Specific Popularity
The concept of "local" popularity, first pioneered by the Teoma
search engine, suggests that links from sites within a topic-specific
community matter more than links from general or off-topic sites.
For example, if your website sells dog houses, earning links from the
Society of Dog Breeders matters much more than earning links from
an off-topic, roller skating site.
Anchor Text
One of the strongest signals the engines use in rankings is anchor
text. If dozens of links point to a page with the right keywords, that
page has a very good probability of ranking well for the targeted
phrase in that anchor text. You can see examples of this in action with
searches like "click here", where many results rank solely due to the
anchor text of inbound links.
TrustRank
It's no surprise that the Internet contains massive amounts of spam.
Some estimate as much as 60% of the web's pages are spam. In order
to weed out this irrelevant content, search engines use systems for
measuring trust, many of which are based on the link graph. Earning
links from highly trusted domains can result in a significant boost to
this scoring metric. Universities, government websites and non-profit
organizations represent examples of high-trust domains.
Link Neighborhood
Spam links often go both ways. A website that links to spam is likely
spam itself, and in turn often has many spam sites linking back to it.
By looking at the totality of these links in aggregate, search engines
can understand the "link neighborhood" your website exists in. Thus,
it's wise to choose those sites you link to carefully and be equally
selective with the sites you attempt to earn links from.
Freshness
Link signals tend to decay over time. Sites that were once popular
often go stale, and eventually fail to earn new links. Thus, it's
important not only to earn links to your website, but also to continue
to earn additional links over time. Commonly referred to as
"FreshRank," search engines use the freshness signals of links to
judge current popularity and relevance.
Social Sharing
The last few years has seen an explosion in the amount of content
shared through social services such as Facebook, Twitter and
Google+. Although search engines treat socially shared links
differently than other types of links, they notice them nonetheless.
There is much debate among search professionals as to how exactly
search engines factor social link signals into their algorithm, but
there is no denying the rising importance of social channels.
Link building is an art. It's almost always the most challenging part of an SEO's job, but also the
one most critical to success. Link building requires creativity, hustle, and often, a budget. No two
link building campaigns are the same, and the way you choose to build links depends as much
upon your website as it does your personality. Below are three basic types of link acquisition.
As with any marketing activity, the first step in any link building campaign is the creation of goals
and strategies. Unfortunately, link building is one of the most difficult activities to measure.
Although the engines internally weigh each link with precise, mathematical metrics, it's impossible
for those on the outside to know this data.
SEOs rely on a number of signals to help build a rating scale of link value. Along with the data
from the link signals mentioned above, these metrics include the following:
Competitor's Backlinks
One of the best ways to determine how well a search engine values a
given page is to search for some of the keywords and phrases that
page targets (particularly those in the title tag and headline). For
the links that help them achieve this ranking. Using tools like Open
example, if you are trying to rank for the phrase "dog kennel",
Site Explorer, SEOs can discover these links and target these
earning links from pages that already rank for this phrase would help
significantly.
mozRank (mR) shows how popular a given web page is on the web.
Pages with high mozRank (popular) scores tend to rank better. The
getting linked-to by a page with few links is better than being linked-
more links to a given page, the more popular it becomes. Links from
to by the same page with many links on it (all other things being
unpopular websites.
acquisition.
Link building should never be solely about search engines. Links that
provide better search engine value for rankings, but also send
targeted, valuable visitors to your site (the basic goal of all Internet
It takes time, practice, and experience to build comfort with these variables as they
relate to search engine traffic. However, using your website's analytics, you should
be able to determine whether your campaign is successful.
Success comes when you see increases in search traffic, higher rankings, more
frequent search engine crawling. and increases in referring link traffic. If these
metrics do not rise after a successful link building campaign, it's possible you either
need to seek better quality link targets, or improve your on-page optimization.
If you have partners you work with regularly or loyal customers that love your brand, you
can use this to your advantage by sending out partnership badges - graphic icons that link
back to your site (like Google often does with their Adwords certification program). Just as
you'd get customers wearing your t-shirts or sporting your bumper stickers, links are the best
way to accomplish the same feat on the web. Check out this post on E-commerce links for
more.
This content and link building strategy is so popular and valuable that it's one of the few
recommended personally by the engineers at Google (source: USA Today & Stone
Temple). Blogs have the unique ability to contribute fresh material on a consistent basis,
participate in conversations across the web, and earn listings and links from other blogs,
including blogrolls and blog directories.
In the SEO world, we often call this "linkbait." Good examples might include David Mihm's
Local Search Ranking Factors, Compare the Meerkat, or the funny How Not To
humor to create a viral effect - users who see it once want to share it with friends, and
bloggers/tech-savvy webmasters who see it will often do so through links. This high quality,
detail.
editorially earned votes are invaluable to building trust, authority, and rankings potential
Be newsworthy.
Earning the attention of the press, bloggers and news media is an effective, time honored way
to earn links. Sometimes this is as simple as giving away something for free, releasing a
SEOs tend to use a lot of tools. Some of the most useful are provided by the
search engines themselves. Search engines want webmasters to create sites
and content in accessible ways, so they provide a variety of tools, analytics
and guidance. These free resources provide data points and opportunities
for exchanging information with the engines that are not provided
anywhere else.
Below we explain the common elements that each of the major search engines support and identify
why they are useful.
1. Sitemaps
Think of a sitemap as a list of files that give hints to the search
engines on how they can crawl your website. Sitemaps help search
engines find and classify content on your site that they may not have
found on their own. Sitemaps also come in a variety of formats and
can highlight many different types of content, including video,
images, news and mobile.
You can read the full details of the protocols at Sitemaps.org. In
addition, you can build your own sitemaps at XML-Sitemaps.com.
Sitemaps come in three varieties:
XML
Extensible Markup Language (Recommended Format)
This is the most widely accepted format for sitemaps. It is
extremely easy for search engines to parse and can be
produced by a plethora of sitemap generators. Additionally, it
allows for the most granular control of page parameters.
Relatively large file sizes. Since XML requires an open tag and
a close tag around each element, file sizes can get very large.
RSS
Txt
Text File
Extremely easy. The text sitemap format is one URL per line
up to 50,000 lines.
2 Robots.txt
The robots.txt file, a product of the Robots Exclusion Protocol, is
a file stored on a website's root directory (e.g.,
www.google.com/robots.txt). The robots.txt file gives instructions to
automated web crawlers visiting your site, including search spiders.
By using robots.txt, webmasters can indicate to search engines which
areas of a site they would like to disallow bots from crawling as well
as indicate the locations of sitemap files and crawl-delay parameters.
You can read more details about this at the robots.txt Knowledge
Center page.
The following commands are available:
Disallow
Prevents compliant robots from accessing specific pages or folders.
Sitemap
Indicates the location of a websites sitemap or sitemaps.
Crawl Delay
Indicates the speed (in milliseconds) at which a robot can crawl a
server.
An Example of Robots.txt
#Robots.txt www.example.com/robots.txt
User-agent: *
Disallow:
# Dont allow spambot to crawl any pages
User-agent: spambot
disallow: /
sitemap:www.example.com/sitemap.xml
content.
3. Meta Robots
The meta robots tag creates page-level instructions for search engine
bots.
The meta robots tag should be included in the head section of the
HTML document.
4. Rel="Nofollow"
Remember how links act as votes? The rel=nofollow attribute
allows you to link to a resource, while removing your "vote" for
search engine purposes. Literally, "nofollow" tells search engines not
to follow the link, but some engines still follow them for discovering
new pages. These links certainly pass less value (and in most cases no
juice) than their followed counterparts, but are useful in various
situations where you link to an untrusted source.
An Example of nofollow
<a href=http://www.example.com title=Example
rel=nofollow>Example Link</a>
5. Rel="canonical"
http://example.com/default.asp
Often, two or more copies of the exact same content appear on your
website under different URLs. For example, the following URLs can
<html>
<head>
http://www.example.com/default.asp
</head>
<body>
http://example.com/
<h1>Hello World</h1>
</body>
http://example.com/default.asp
</html>
http://Example.com/Default.asp
To search engines, these appear as 5 separate pages. Because the
content is identical on each page, this can cause the search engines to
devalue the content and its potential rankings.
The canonical tag solves this problem by telling search robots which
page is the singular "authoritative" version which should count in
web results.
Settings
Geographic Target - If a given site targets users in a particular
location, webmasters can provide Google with information that will
help determine how that site appears in its country-specific search
results, and also improve Google search results for geographic
queries.
Preferred Domain - The preferred domain is the one that a
webmaster would like used to index their site's pages. If a webmaster
specifies a preferred domain as http://www.example.com and Google
finds a link to that site that is formatted as http://example.com,
Google will treat that link as if it were pointing at
http://www.example.com.
URL Parameters - You can indicate to Google information about
each parameter on your site, such as "sort=price" and
"sessionid=2". This helps Google crawl your site more efficiently,
ignoring those parameters that produce duplicate content and
increasing the number of unique pages Google can crawl on your site.
Crawl Rate - The crawl rate affects the speed of Googlebot's
Site Configuration
This important section allows you to submit sitemaps, test robots.txt
files, adjust sitelinks, and submit change of address requests when
you move your website from one domain to another. This area also
contains the "Settings" and "URL parameters" sections discussed in
the previous column.
+1 Metrics
When users share your content on Google+ with the +1 button, this
Diagnostics
Labs
The Labs section of Webmaster Tools contains reports that Google
considers still in the experimental stage, but important to
Sign Up
Key Features
Sites Overview- This interface provides a single overview of all
your websites' performance in Bing powered search results. Metrics
at a glance include clicks, impressions, pages indexed and number of
pages crawled for each site.
Crawl Stats - Here you can view reports on how many pages of your
site Bing has crawled and discover any errors encountered. Like
Google Webmaster, you can also submit sitemaps to help Bing to
discover and prioritize your content.
Index - This section allows webmasters to view and help control how
Bing indexes their web pages. Again, similar to settings in Google
Webmaster Tools, here you can explore how your content is
organized within Bing, submit URLs, remove URLs from search
results, explore inbound links and adjust parameter settings.
Traffic - The traffic summary in Bing Webmaster reports
impressions and click-through data by combining data from both
Bing and Yahoo search results. Reports here show average position as
well as cost estimates if you were to buy ads targeting each keyword.
Sign Up
Features
Identify Powerful Links - Open Site Explorer sorts all of your
inbound links by their metrics that help you determine which links
are most important.
Find the Strongest Linking Domains - This tool shows you the
strongest domains linking to your domain.
Analyze Link Anchor Text Distribution - Open Site Explorer
shows you the distribution of the text people used when linking to
you.
Head to Head Comparison View - This feature allows you to
compare two websites to see why one is outranking the other.
Social Share Metrics - Measure Facebook Shares, Likes, Tweets,
and +1's for any URL.
For more information, click below:
Learn more
Search engines have only recently started providing better tools to help webmasters improve their
search results. This is a big step forward in SEO and the webmaster/search engine relationship. That
said, the engines can only go so far with helping webmasters. It is true today, and will likely be true in
the future that the ultimate responsibility for SEO is on the marketers and webmasters.
It is for this reason that learning SEO for yourself is so important.
In classical SEO times (the late 1990's), search engines had "submission" forms that were part of
the optimization process. Webmasters & site owners would tag their sites & pages with keyword
information, and "submit" them to the engines. Soon after submission, a bot would crawl and
include those resources in their index. Simple SEO!
Unfortunately, this process didn't scale very well, the submissions were often spam, and the
practice eventually gave way to purely crawl-based engines. Since 2001, not only has search engine
submission not been required, but it is actually virtually useless. The engines all publicly note that
they rarely use "submission" URLs , and that the best practice is to earn links from other sites. This
will expose your content to the engines naturally.
You can still sometimes find submission pages (here's one for Bing), but these are remnants of
time long past, and are essentially useless to the practice of modern SEO. If you hear a pitch from
an SEO offering "search engine submission" services, run, don't walk, to a real SEO. Even if the
engines used the submission service to crawl your site, you'd be unlikely to earn enough "link
juice" to be included in their indices or rank competitively for search queries.
Once upon a time, much like search engine submission, meta tags (in
particular, the meta keywords tag) were an important part of the SEO
process. You would include the keywords you wanted your site to
rank for and when users typed in those terms, your page could come
up in a query. This process was quickly spammed to death, and
eventually dropped by all the major engines as an important ranking
signal.
It is true that other tags, namely the title tag (not stictly a meta tag,
but often grouped with them) and meta description tag (covered
previously in this guide), are of critical importance to SEO
best practices. Additionally, the meta robots tag is an important
tool for controlling spider access. However, SEO is not "all about
meta tags", at least, not anymore.
Ever see a page that just looks spammy? Perhaps something like:
"Bob's cheap Seattle plumber is the best cheap Seattle plumber for all
your plumbing needs. Contact a cheap Seattle plumber before it's too
late"
Not surprisingly, a persistent myth in SEO revolves around the
concept that keyword density - a mathematical formula that divides
the number of words on a page by the number of instances of a given
keyword - is used by the search engines for relevancy & ranking
calculations.
Despite being proven untrue time and again, this myth has legs.
Many SEO tools still feed on the concept that keyword density is an
important metric. It's not. Ignore it and use keywords intelligently
and with usability in mind. The value from an extra 10 instances of
your keyword on the page is far less than earning one good editorial
link from a source that doesn't think you're a search spammer.
Put on your tin foil hats, it's time for the most common SEO conspiracy theory: spending on search
engine advertising (PPC) improves your organic SEO rankings.
In all of the experiences we've ever witnessed or heard about, this has never been proven nor has it
ever been a probable explanation for effects in the organic results. Google, Yahoo! & Bing all have
very effective walls in their organizations to prevent precisely this type of crossover.
At Google in particular, advertisers spending tens of millions of dollars each month have noted
that even they cannot get special access or consideration from the search quality or web spam
teams. So long as the existing barriers are in place and the search engines cultures maintain their
separation, we believe that this will remain a myth. That said, we have seen anecdotal evidence
that bidding on keywords you already organically rank for can help increase your organic click
through rate.
As long as there is search, there will always be spam. The practice of spamming the search engines
- creating pages and schemes designed to artificially inflate rankings or abuse the ranking
algorithms employed to sort content - has been rising since the mid-1990's.
With payouts so high (at one point, a fellow SEO noted to us that a single day ranking atop
Google's search results for the query "buy viagra" could bring upwards of $20,000 in affiliate
revenue), it's little wonder that manipulating the engines is such a popular activity on the web.
However, it's become increasingly difficult and, in our opinion, less and less worthwhile for two
reasons.
2. Smarter Engines
Search engines have done a remarkable job identifying scalable, intelligent
methodologies for fighting spam manipulation, making it dramatically more difficult
to adversely impact their intended algorithms. Complex concepts like TrustRank
(which SEOmoz's Linkscape index leverages), HITS, statistical analysis, historical
data and more have all driven down the value of search spam and made so-called
"white hat" tactics (those that don't violate the search engines' guidelines) far more
attractive.
More recently, Google's Panda update introduced sophisticated machine learning
algorithms to combat spam and low value pages at a scale never before witnessed
online. If the search engines' job is to deliver quality results, they have raised the bar
year after year.
This guide is not intended to show off specific spam tactics, but, due to the large
number of sites that get penalized, banned or flagged and seek help, we will cover the
various factors the engines use to identify spam so as to help SEO practitioners avoid
problems. For additional details about spam from the engines, see Google's
Webmaster Guidelines and Bing's Webmaster FAQs (pdf).
The important thing to remember is this: Not only do manipulative techniques not
help you in most cases, but often times they cause search engines to impose penalties
on your site.
Search engines perform spam analysis across individual pages and entire websites (domains).
We'll look first at how they evaluate manipulative practices on the URL level.
One of the most obvious and unfortunate spamming techniques, keyword stuffing, involves
littering repetitions of keyword terms or phrases into a page in order to make it appear more
relevant to the search engines. The thought behind this - that increasing the number of times a
term is mentioned can considerably boost a page's ranking - is generally false. Studies looking at
thousands of the top search results across different queries have found that keyword repetitions
play an extremely limited role in boosting rankings, and have a low overall correlation with top
placement.
The engines have very obvious and effective ways of fighting this. Scanning a page for stuffed
keywords is not massively challenging, and the engines' algorithms are all up to the task. You can
read more about this practice, and Google's views on the subject, in a blog post from the head of
their web spam team - SEO Tip: Avoid Keyword Stuffing.
One of the most popular forms of web spam, manipulative link acquisition relies on the search
engines' use of link popularity in their ranking algorithms to attempt to artificially inflate these
metrics and improve visibility. This is one of the most difficult forms of spamming for the search
engines to overcome because it can come in so many forms. A few of the many ways manipulative
links can appear include:
Reciprocal link exchange programs, wherein sites create link pages that point back and forth
to one another in an attempt to inflate link popularity. The engines are very good at spotting
and devaluing these as they fit a very particular pattern.
Link schemes, including "link farms" and "link networks" where fake or low value websites
are built or maintained purely as link sources to artificially inflate popularity. The engines
combat these through numerous methods of detecting connections between site registrations,
link overlap or other common factors.
Paid links, where those seeking to earn higher rankings buy links from sites and pages willing
to place a link in exchange for funds. These sometimes evolve into larger networks of link
buyers and sellers, and although the engines work hard to stop them (and Google in
particular has taken dramatic actions), they persist in providing value to many buyers &
sellers (see this post on paid links for more on that perspective).
Low quality directory links are a frequent source of manipulation for many in the SEO field.
A large number of pay-for-placement web directories exist to serve this market and pass
themselves off as legitimate with varying degrees of success. Google often takes action
against these sites by removing the PageRank score from the toolbar (or reducing it
dramatically), but won't do this in all cases.
There are many more manipulative link building tactics that the search engines have identified
and, in most cases, found algorithmic methods for reducing their impact. As new spam systems
emerge, engineers will continue to fight them with targeted algorithms, human reviews and the
collection of spam reports from webmasters & SEOs.
A basic tenet of all the search engine guidelines is to show the same content to the engine's
crawlers that you'd show to an ordinary visitor. This means, among other things, not to hide text in
the html code of your website that a normal visitor can't see.
When this guideline is broken, the engines call it "cloaking" and take action to prevent these pages
from ranking in their results. Cloaking can be accomplished in any number of ways and for a
variety of reasons, both positive and negative. In some cases, the engines may let practices
that are technically "cloaking" pass, as they're done for positive user experience
reasons. For more on the subject of cloaking and the levels of risk associated with various tactics
and intents, see this post, White Hat Cloaking, from Rand Fishkin.
Although it may not technically be considered "web spam," the engines all have methods to
determine if a page provides unique content and "value" to its searchers before including it in their
web indices and search results. The most commonly filtered types of pages are "thin" affiliate
content, duplicate content, and dynamically generated content pages that provide very little
unique text or value. The engines are against including these pages and use a variety of content
and link analysis algorithms to filter out "low value" pages from appearing in the results.
Google's 2011 Panda update took the most aggressive steps ever seen in reducing low quality
content across the web, and Google continues to update this process.
In addition to watching individual pages for spam, engines can also identify traits and properties
across entire root domains or subdomains that could flag them as spam. Obviously, excluding
entire domains is tricky business, but it's also much more practical in cases where greater
scalability is required.
Just as with individual pages, the engines can monitor the kinds of links and quality of referrals
sent to a website. Sites that are clearly engaging in the manipulative activities described above on a
consistent or seriously impacting way may see their search traffic suffer, or even have their sites
banned from the index. You can read about some examples of this from past posts - Widgetbait
Gone Wild or the more recent coverage of the JC Penney Google penalty.
Websites that earn trusted status are often treated differently from
those who have not. In fact, many SEOs have commented on the
"double standards" that exist for judging "big brand" and high
importance sites vs. newer, independent sites. For the search engines,
trust most likely has a lot to do with the links your domain has
were to post that same content to a page on Wikipedia and get those
same spammy links to point to that URL, it would likely still rank
tremendously well - such is the power of domain trust & authority.
Trust built through links is also a great method for the engines to
employ. A little duplicate content and a few suspicious links are far
more likely to be overlooked if your site has earned hundreds of links
from high quality, editorial sources like CNN.com or Cornell.edu. On
the flip side, if you have yet to earn high quality links, judgments may
be far stricter from an algorithmic view.
"back" button on their browser, and try another result. This indicates
that the result they served didn't meet the user's query.
It's not enough just to rank for a query. Once you've earned your
ranking, you have to prove it over and over again.
It can be tough to know if your site/page actually has a penalty or if things have changed, either in
the search engines' algorithms or on your site that negatively impacted rankings or inclusion.
Before you assume a penalty, check for the following:
Once youve ruled out the list below, follow the flowchart beneath for more specific
advice.
Errors
Errors on your site that may have inhibited or prevented crawling. Google's Webmaster Tools
is a good, free place to start.
Changes
Changes to your site or pages that may have changed the way search engines view your content.
(on-page changes, internal link structure changes, content moves, etc.)
Similarity
Sites that share similar backlink profiles, and whether theyve also lost rankings - when the
engines update ranking algorithms, link valuation and importance can shift, causing ranking
movements.
Duplicate Content
Modern websites are rife with duplicate content problems, especially when they scale to large size.
Check out this post on duplicate content to identify common problems.
While this charts process wont work for every situation, the logic has been uncanny in helping us identify spam penalties or
mistaken flagging for spam by the engines and separating those from basic ranking drops. This page from Google (and the
embedded Youtube video) may also provide value on this topic.
The task of requesting re-consideration or re-inclusion in the engines is painful and often
unsuccessful. It's also rarely accompanied by any feedback to let you know what happened or why.
However, it is important to know what to do in the event of a penalty or banning.
Hence, the following recommendations:
Remove/fix everything you can. If you've acquired bad
If you haven't already, register your site with the engine's
Webmaster Tools service (Google's and Bing's). This
Be aware that with the search engines, lifting a penalty is not their obligation or responsibility.
Legally, they have the right to include or reject any site/page for any reason. Inclusion is a privilege,
not a right, so be cautious and don't apply techniques you're unsure or skeptical of - or you could find
yourself in a very rough spot.
They say that if you can measure it, then you can improve it. In search
engine optimization, measurement is critical to success. Professional SEOs
track data about rankings, referrals, links and more to help analyze their
SEO strategy and create road maps for success.
Although every business is unique and every website has different metrics that matter, the
following list is nearly universal. Note that we're only covering those metrics critical to SEO optimizing for the search engines. As a result, more general metrics may not be included. For a
more comprehensive look at web analytics, check out Choosing Web Analytics Key
Performance Indicators from Avinash Kaushik's excellent Web Analytics Blog.
Three major engines make up 95%+ of all search traffic in the US Google and the Yahoo-Bing alliance. For most countries outside the US
80%+ of search traffic comes solely from Google (with a few notable
exceptions including both Russia and China.) Measuring the
contribution of your search traffic from each engine is critical for
several reasons:
The keywords that send traffic are another important piece of your
analytics pie. You'll want to keep track of these on a regular basis to
help identify new trends in keyword demand, gauge your
performance on key terms and find terms that are bringing
significant traffic that you're potentially under optimized for.
You may also find value in tracking search referral counts for terms
outside the "top" terms/phrases - those that are important and
valuable to your business. If the trend lines are pointing in the wrong
direction, you know efforts need to be undertaken to course correct.
Search traffic worldwide has consistently risen over the past 15 years,
so a decline in quantity of referrals is troubling - check for seasonality
issues (keywords that are only in demand certain times of the
week/month/year) and rankings (have you dropped, or has search
volume ebbed?).
When it comes to the bottom line for your organization, few metrics
matter as much as conversion. For example, in the graphic to the
right, 5.80% of visitors who reached SEOmoz with the query "SEO
Tools" signed up to become members during that visit. This is a much
higher conversion rate than most of the 1000s of keywords used to
How this will eventually play out is anyone's guess. In the meantime, smart SEOs and web analytics experts have devised workarounds to
try and recover some of this missing keyword data, although nothing can substitute for the real thing. Read more about dealing with (not
provided) keywords in this blog post.
Analytics Software
The Right Tools for the Job
Omniture
Fireclick
Mint
Sawmill Analytics
Clicktale
Coremetrics
Unica Affinium NetInsight
Additional Reading:
Analytics. Because of it's broad adoption you can find many tutorials
and guides available online. Google Analytics also has the advantage
No matter which analytics software you decide is right for you, we also strongly recommend testing different versions of pages on
your site and making conversion rate improvements based on the results. Testing pages on your site can be as simple as using a
free tool to test two versions of a page header or as complex as using an expensive multivariate software to simultaneously test
hundreds of variants of a page. There are many testing platforms out there, but if you're looking to put a first toe in the testing
waters, one free, easy to use solution we recommend is Google's Website Optimizer. It's a great way to get started running
tests that can inform powerful conversion rate improvements.
Google Trends
Available at Google.com/Trends - this shows keyword search
volume/popularity data over time. If you're logged into your Google
account, you can also get specific numbers on the charts, rather than
just trend lines.
Bing IP Query
e.g., ip:216.176.191.233 - this query will show pages that
Microsoft's engine has found on the given IP address. This can be
useful in identifying shared hosting and seeing what other sites are
hosted on a given IP address.
Microsoft Ad Intelligence
Available at Microsoft Advertising - a great variety of keyword
research and audience intelligence tools are provided by Microsoft,
primarily for search and display advertising. This guide won't dive
deep into the value of each individual tool, but they are worth
page to rank well, regardless of its content. The higher the Page
Authority, the greater the potential for that individual page to rank.
query) that are used by the search engines (e.g., Google's PageRank
or FAST's StaticRank). Search engines often rank pages with higher
This metric uses the same algorithm as mozRank but applies it to the
domain-level link graph. (A view of the web that only looks at
links from the seed set are then able to cast (lesser) trust-votes
through their links. This process continues across the web and the
# of Links - The total number of pages that contain at least one link
to this page. For example, if the Library of Congress homepage
(http://www.loc.gov/index.html) linked to the White House's
homepage (http://www.whitehouse.gov) in both the page content
and the footer, this would still be counted as only a single link.
# of Links - the quantity of pages that contain at least one link to the
domain. For example, if http://www.loc.gov/index.html and
http://www.loc.gov/about both contained links to
http://www.nasa.gov, this would count as two links to the domain.
that contain at least one page with a link to any page on this site. For
example, if http://www.loc.gov/index.html and
domain to nasa.gov.
Fluctuation
In Search Engine Page and Link Count Numbers
The numbers reported in "site:" and "link:" queries are rarely precise,
and thus we strongly recommend not getting too worried about
fluctuations showing massive increases or decreases unless they are
accompanied by traffic drops. For example, on any given day, Yahoo!
reports between 800,000 and 2 million links to the SEOmoz.org
domain. Obviously, we don't gain or lose hundreds of thousands of
links each day, but the variability of Yahoo!'s indices means that
these numbers reports provide little guidance about our actual link
growth or shrinkage.
If you do see significant drops in links or pages indexed accompanied
by similar traffic referral drops from the search engines, you may be
experiencing a real loss of link juice (check to see if important links
that were previously sending traffic/rankings boosts still exist) or a
loss of indexation due to penalties, hacking, malware, etc. A thorough
analysis using your own web analytics and Google's Webmaster
Tools can help to identify potential problems.
Falling
Search Traffic from a Single Engine
3.
Falling
Search Traffic from Multiple Engines
Chances are good that you've done something on your site to block crawlers or stop indexation.
This could be something in the robots.txt or meta robots tags, a problem with hosting/uptime, a
DNS resolution issue or a number of other technical breakdowns. Talk to your system
administrator, developers and/or hosting provider and carefully review your Webmaster Tools
accounts and analytics to help determine potential causes.
Individual
Ranking Fluctuations
Gaining or losing rankings for a particular term/phrase or even several happens millions of times a
day to millions of pages and is generally nothing to be concerned about. Ranking algorithms
fluctuate, competitors gain and lose links (and on-page optimization tactics) and search engines
even flux between indices (and may sometimes even make mistakes in their crawling, inclusion or
ranking processes). When a dramatic rankings decrease occurs, you might want to carefully review
on-page elements for any signs of over-optimization or violation of guidelines (cloaking, keyword
stuffing, etc.) and check to see if links have recently been gained or lost. Note that with sudden
spikes in rankings for new content, a temporary period of high visibility followed by a dramatic
drop is common (in the SEO field, we refer to this as the "freshness boost").
Positive
Increases in Link Metrics Without Rankings Increases
Many site owners worry that when they've done some "classic" SEO - on-page optimization, link
acquisition, etc. they can expect instant results. This, sadly, is not the case. Particularly for new
sites, pages and content that's competing in very difficult results, rankings take time and even
earning lots of great links is not a sure recipe to instantly reach the top. Remember that the
engines need to not only crawl all those pages where you've acquired links, but index and process
them - given the almost certain use of delta indices by the engines to help with freshness, the
metrics and rankings you're seeking may be days or even weeks behind the progress you've made.
Contributors
We would like to extend a very heartfelt thank you to all of the people who contributed to this guide:
Urban Influence
Linda Jenkinson
Tom Critchlow
Will Critchlow
Dr. Pete
Hamlet Batista
chuckallied
lorisa
Optomo
identity
Pat Sexton
SeoCatfish
David LaFerney
Kimber
g1smd
Steph Woods
robbothan
RandyP
bookworm seo
Rafi Kaufman
Sam Niccolls
Danny Dover
Cyrus Shepard
Sha Menz
Casey Henry
and Rand Fishkin