May 01, 2008   Sign In |  About ebizQ |  Contact Us |  Join ebizQ Gold Club

ITGumbo: spicing IT up

IT Copywrite

Technology and application of technology.

ebizQ presents ITGumbo: a spicy blog network where vendors and IT professionals share ideas about creating Business Agility.

Linked Data for Search Engine Optimization

Search Engine Optimization (SEO) is a technique applied to increase the rank of a page in Search Engine Result Page (SERP). SERP is the list of URIs found for the user query and rendered in descending PageRank order on a single web page. Since most web users shall traverse the URIs on the first SERP or may be next few, most web content providers would want that their web page URI is included in the first few SERPs.

The web page URI relative weightage that determines the URI position in SERP is called PageRank. One Search Engine may have a different PageRank algorithm than the other. Most Search Engines use relevance of keywords mentioned in link element, values of @name, @title and @id in HTML elements, number of inlinks and outlinks, PageRank of the inlink & outlink web pages, etc to determine the value of PageRank.

Therefore for a high PageRank it is important to add popular and relevant outlinks to a web page. Outlinks are URIs of other web pages that have been embedded with a element and @href in this web page. Inlinks are the URIs of the web page that have included the URI of this web page in their content. Some web authors also use cite element to refer to a web resource. Both inlinks and citations are counted by the Search Engine.

Inserting outlinks is one of the primary steps in creation of Linked Data because it identifies the 'things' that are connected with this web page. However repetitive insertion of the same outlink in a web page may not enhance the PageRank, because an intelligent Search Engine may count unique URIs only. Sometimes it is difficult to read a web document with too many outlinks to comprehend every reference, in such cases inserting a 'gloss' is a better idea to enhance readability. To add a gloss add @title in HTML a element with the text to be displayed. In web 3.0 repetition of outlinks may be avoided with @about and blank node.

If page 'A' refers to page 'B' then the outlink is embedded in page 'A' as follows:
web 2.0web 3.0
<a href="page_B_uri">page_B_anchor_text</a> <a rel="dcterms:references href="page_B_uri">page_B_anchor_text</a>

The difference in the two representations is the addition of property dcterms:references in the latter. The syntax for this semantic annotation is given in RDFa specifications. This property identifies the relationship between 'thing' 'A' and 'thing' 'B', which in this case is 'A' references 'B'. It is not necessary that all outlinks are references, the relationship may be identified by any other metadata term.

E.g. consider this RDF description of 'albert' copied from Linked Data architecture note:

<rdf:Description about="#albert">
  <fam:child rdf:Resource="#brian">
  <fam:child rdf:Resource="#carol">
</rdf:Description>
These RDF triples may be derived from the following web 3.0 web content, the equivalent web 2.0 content is also given here
web 2.0web 3.0
<p>The name of albert's children is <a href="#brian">brian</a> and <a href="#carol">carol</a>. <p about="#albert">The name of albert's children is <a rel="fam:child" href="#brian">brian</a> and <a rel="fam:child" href="#carol">carol</a>.</p>

The name of albert's children is brian and carol.

The name of albert's children is brian and carol.

Here the property fam:child is the relationship between albert and brian.

Some web 2.0 Search Engines are powerful to scan the complete document for any word or phrase in the text. These Search Engines may infer the relationship between Albert and Brian by reading the text preceding/succeeding the a element. The advantage of adding RDFa annotation in web 3.0 is that it highlights relationship information that is extractable into RDF triple form.

<#albert> fam:child <#brian> .
<#albert> fam:child <#brian> .

This RDF triple is stored in a database that is usable by many other web applications.Search is not the only application on the WWW, but is the basis of many other practical uses of web data. Search for Albert's children may return only the names of Brian and Carol in web 2.0, but in web 3.0 may also give additional information about Brian and Carol such as school name, home address, etc. The additional information about Brian and Carol may or may not be present on the same web page, but with Linked Data in web 3.0 this information can be fetched from objects Brian URI and Carol URI.

RDFa provides a syntax that can be used to publish information i.e. (data + context) so that machine extraction and processing of relevant information is possible and many different inferences can be derived from this data. Niche Search Engines may use the value of @rel to render the relationship in audio. A marketing agent may find the names of Albert's children in a particular locality and may approach their guardians with special discounts on uniforms, teaching aids, new books, etc. Guardian of a new admission to school may find other students in his or her locality. The WWW is considered as a media for publishing information and RDFa/RDF+OWL are the technologies to represent this information in a structured manner so that it can be used in different contexts.

In recent past new web applications have been launched that either provide the metadata for semantic annotation or perform the annotation for specified vocabularies. Calais and TextWise are popular applications and hand annotation is another method.

Open Issues:

  • A common vocabulary or a vocabulary repository is required that is one single place to find relevant normative metadata for semantic annotation.
  • A procedure is required to allow any web user to suggest an addition of a new metadata term in a vocabulary. Attribution of new metadata term or vocabulary may not be necessary.
  • A procedure is required to eliminate redundant normative metadata as new vocabularies are defined and new metadata terms are introduced.

Conclusion: In web 3.0 Search Engines that focus on natural language processing and intelligent search may reconsider PageRank algorithms. Search Engines shall use SPARQL filters and solution sequence modifiers to select and arrange information. Since Linked Data enhances authenticity of data therefore semantic annotation shall be a contributing factor in PageRank determination. Hence semantic annotation is a Search Engine Optimization technique. Linked Data shall be used to generate data mash-ups for many different contexts.

References:
OWL, RDF, RDFa, SPARQL
Advertisement

0 TrackBacks

Listed below are links to blogs that reference "Linked Data for Search Engine Optimization".

TrackBack URL for this entry: http://itgumbo.com/microsite/MT/mt-tb.cgi/1543

Leave a comment