December 23, 2007 6:51 AM
Semantic Annotation of Blogs in Web 3.0
In order to incorporate the semantic annotation in the blog post, the standard ontology shall have to be used by blogging tools like MovableType, WordPress, etc. These tools use proprietary values for HTML element ‘div’ attributes id, class, etc. to annotate the published content stored on the web server. The limitation of the present practice of proprietary annotation is that the blogger has to register the blog in various forums by submitting the blog URI and Search Engine indexing is based on keywords defined in the META element and other HTML element attributes like id, etc. As described in the previous blog post a method is required to automate the process of blog discovery. I.e., classification of blog post is required for categorization.
Common Vocabularies for semantic annotation
The Semantically-Interlinked Online Communities (SIOC) core ontology a W3C member submission by DERI Galway, Dublin Core Metadata, SKOS and FOAF are some of the vocabularies that may be used for semantic annotation of the blog posts. A simplified version of semantically annotated blog post may be achieved with Dublin Core Metadata and FOAF vocabulary.
xmlns:bgterm="http://purl.org/dc/terms/"
xmlns:bgtype="http://purl.org/dc/dcmitype/"
| normative metadata | description |
|---|---|
| bg:contributor | comment posters |
| bg:creator | author of the post |
| bg:date | date of publication |
| bg:description | content of the post |
| bg:identifier | identifier for the post |
| bg:publisher | e.g. IT Gumbo |
| bg:relation | URI of related blog posts |
| bg:subject | keywords that shall classify the post |
| bg:title | title of the post |
| bgterm:abstract | either an abstract or selected lines of the post |
| bgterm:bibliographicCitation | for references |
| bgtype:image | for picture of the poster |
Some other terms from Dublin Core Metadata, FOAF and SKOS may be used. The DMOZ category can be added for post classification by adding bg:service property.
User interface for semantic annotated blog post
The blog posters may not use SIOC vocabulary; instead this vocabulary may be used in the blog tools. The blog tools may not necessarily use the SIOC RDF browser to publish blog content but instead use RDFa or microformats for semantic annotation and may include a GRDDL transformation. GRDDL transformation shall be used by semantic web applications to generate RDF/XML for the blog post. Refer table [2], column 2. The blogging tool shall use XHTML+RDFa to publish blog post, it is not necessary to visually render RDF/OWL class & properties these are embedded in the published content in order to annotate the content for semantic web applications. Refer table [1], column 2. The published semantic annotated blog post shall reside on the web server. There may be no visible change in the user interface for posting a blog or in the published post.
The blogging tool may continue to use its proprietary XForms or HTML forms for user input interface. Tool may either include Qnames for normative metadata in XForms model to generate table [2], column 1 submission or if proprietary tags are used in XForms model then a mapping to normative metadata terms shall be required in order to publish XHTML+RDFa.
The tools that use XForms or HTML forms for user input and standard ontology for semantic application must be able to publish web content with RDFa semantic annotation in order to make the content useful for semantic web applications.
Elimination of redundant metadata
If there are no new characteristics added to a Class or Property, it is better idea to use the base property rather than defining a new property as sub-property-of another property. Example: sioc:subject is defined as sub-property-of dc:subject. The issue here is that sioc prefix expands to http://rdfs.org/sioc/ns# namespace and dc prefix expands to http://purl.org/dc/elements/1.1/ namespace. Another new vocabulary http://some.org/newvocabulary/ may define a new property xyz:differentLocalPart as a sub-property-of dc:subject. In order to answer a user query “applications of SIOC" the Search Engine SPARQL agent shall have to select all web resources that use any of these normative metadata terms with value 'SIOC'. This may not be an easy task for SPARQL agent because the query must then find and include all evolving similar normative metadata terms such as sioc:subject, dc:subject, xyz:differentLocalPart and so on.
Note: Web users may use different prefix for the same XML Namespace, here in this blog post prefix ‘bg’ has been used to suggest that a standard ontology is required for blog annotation (to continue discussion with previous post on this topic). This standard ontology may be one of the existing common vocabularies (e.g. Dublin Core Metadata).
Advantages of standard ontology
The advantages of defining and using standard ontology are:
- Redundant normative metadata is eliminated,
- Ambiguity due to similar vocabularies is resolved,
- WWW user shall not be inhibited of writing content for ‘semantic web’; some web users may not use blog tools and may hand-type semantic annotation in the published content. Single source of information (i.e. one common vocabulary) with one standard ontology to annotate a ‘thing’ shall address the first two points.
- If normative metadata from same vocabulary is used to annotate blog posts and non-blog content then more data can be found for a keyword or category. Example: refer example in the table [1], predicate is bg:subject, object is SIOC and subject is the IRI of the blog post. Non-blog articles may also embed this property for keyword/tag ‘SIOC’. There shall be more number of IRIs in the IRI database for bg:subject='SIOC'.
Conclusion: It is not necessary that browser must render the ontology in XForms, HTML forms or published content. The blogging tool must be able to validate and process the XML data received from the XForms or HTML forms user interface and use RDFa to embed standard ontology in published content. Same normative metadata must be used for annotation of blog posts or other published web content in order to provide more data for semantic web applications.
| Web 2.0 annotation of blog post | Semantic annotation of blog post |
|---|---|
|
<body> <h2 id="archive-title">Blogs in Web 3.0</h2> <div class="entry"> <p> Content of the post. </p> <div class="post_author_info"> Written by <br /> <span>Ila Nivas</span> </div> </div> Tags: <a href="Link to SIOC articles" rel="tag">SIOC</a> <a href="Link to RDFa articles" rel="tag">RDFa</a> <a href="Link to Blogs articles" rel="tag">Blogs</a> </body> |
<body> <div about=""> <h2 property="bg:title">Blogs in Web 3.0</h2> <div property="bg:description"> Content of the post </div> <p> Written by <br /> <span property="bg:creator foaf:person">Ila Nivas</span> </p> <!-- List of keywords --> <span property="bg:subject">SIOC</span> <span property="bg:subject">RDFa</span> <span property="bg:subject">Blogs</span> <!-- Category --> <span property="bg:service">Weblog</span> <span property="bg:service">Information Technology</span> <!-- Publisher --> <span property="bg:publisher">IT Gumbo</span> </div> </body> |
| XML data submitted to XML processor | RDF/XML data extracted from published content |
|---|---|
|
<blogpost xmlns:bg="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<bg:title>Blogs in Web 3.0</bg:title>
</blogpost><bg:description>Content of the post</bg:description> <bg:creator>Ila Nivas</bg:creator> <foaf:person>Ila Nivas</foaf:person> <bg:subject>SIOC</bg:subject> <bg:subject>RDFa</bg:subject> <bg:subject>Blogs</bg:subject> <bg:service>Weblog</bg:service> <bg:service>Information Technology</bg:service> <bg:publisher>IT Gumbo</bg:publisher> |
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bg="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="Blog IRI">
<bg:title>Blogs in Web 3.0</bg:title>
</rdf:Description><bg:description>Content of the post</bg:description> <bg:creator>Ila Nivas</bg:creator> <foaf:person>Ila Nivas</foaf:person> <bg:subject>SIOC</bg:subject> <bg:subject>RDFa</bg:subject> <bg:subject>Blogs</bg:subject> <bg:service>Weblog</bg:service> <bg:service>Information Technology</bg:service> <bg:publisher>IT Gumbo</bg:publisher> </rdf:RDF> |



I am going to be blogging live from a couple of days of the
Leave a comment