Web Services

Search Engine Optimisation

What is a Search Engine?

A software program that retrieves information from a computer system and gives some of it back when you ask nicely. What you get back depends on:

  • How the search engine software works
  • What has been put into the search engine
  • Certain characteristics of your documents (many of which you can control)

How Does a Search Engine Work?

Collection of information:

  • The software 'crawls' with a 'spider' or 'bot':
  • Collects information by following links and/or directories and finding files
  • You can control what is collected from your site, depends on:
    - What it knows is available – some have to be told!
    - Site maps or Custom XML documents – you can build your own sitemap to suit Google
    - Links to files from files

NOTE: you can exclude search engines – very important for sensitive information

Building an index, which is a reference to:

  • Some of the words in a file and where in the file they are
  • Some words are left out eg. the, a, an
  • Important parts of a page
  • You control what is in the important parts of a page:
    -  Page title – most important
    -  Link text – use concrete, informative words
    -  Headings, eg. H1, H2
    -  Body of the document, with important content higher up the page – you control the order of information
    -  Less important: Metadata – although larger search engines prefer other page parts although they may use metadata in the absence of any other usable text

NOTE: There may be a limit set on the amount indexed:

-  101kb for HTML and 120kb for PDF in Google

Ranking or weighing with algorithms:

  • A set of rules to decide which bit of a page and what sort of words are most important
  • These differ from one engine to another
  • Generally, most use the location/frequency method, the most important words are:
    -  Near the top of the page in 'important' page parts
    Mentioned first and more than once – but not TOO often
    -  You control these

Processing queries and display of results:

  • Queries and Results:
    -  Search engines break down the query, and return results based on rank
    Page-ranking engines, eg. UTAS search engine:
    -  Results based on presence of search words in 'important' parts of a page
    -  You control the important page parts in your site
  • Site-ranking engines, eg. Google:
    -
      Results based on presence of 'important' words in 'important' sites with lots of inbound links:
  • Or Money:
    -  You can always pay for better placement, eg. Google, Yahoo

Some Ways to Get Excluded From Most Search Engines, just some of the ways:

Google punishes for most of these, particularly:

  • Anchor SPAM (using the same few words in lots of inbound links)
  • Too many trivial changes to content
  • Certain parameters in dynamic URLs eg. &id=

Some sites may not be excluded but filtered by 'safe' options eg. SafeSearch in Google filters adult content

Write Searchable HTML:

  • Write a unique page title relevant to the page contents – MOST IMPORTANT
  • Add some metadata, particularly description, keywords
  • Use structural elements properly eg.:
    -  Headings, H1, H2
  • Use plain text links, not URLs
  • Put keyword-rich text near the top of the page
    -  Use the inverted pyramid writing style, starting with the conclusion
  • Add keyword rich alt text to images (if about the content and accessible)
  • Prune your pages to remove irrelevant content – moves important content closer to the top of the page!

Write Searchable Documents:

  • Fill out the document 'properties' – this text will be displayed in search results
  • Use structural elements properly in Word, PDF eg.:
    -  Headings
    -  Table of Contents
  • Split large documents into sections:
    -  Get more content indexed
    -  Documents will load faster
  • Provide alt text for images:
    -  Adds more 'indexable' text to your page
    -  If done well is good for accessibility to people with disabilities

How to Test the Searchability of your Documents:

Using the UTAS search engine:

  • Search using 'common language' terms (what do your clients call what you do?)
  • See where your document is ranked in results
  • Examine higher ranking documents, looking for your search term in the important page parts
  • Change your documents and wait for the documents to be reindexed
  • You can also make a site-specific search with the UTAS search engine

No user support from Google:

  • Wait 4-8 weeks for the Google collection to be refreshed