November 12, 2007 1:26 AM
Web Resource Classification for Search Engine Taxonomy
As more data on WWW is made available with semantic annotation of web resources the categorization of web resources based on classification with characteristics identifiable by normative metadata shall be the key to development of semantic web applications. The inclusion of normative metadata from standard ontology as web resource descriptors in POWDER DR or embedded in the web content with RDFa shall provide data for search engine indexing to build SE taxonomy.
- Characteristics are properties of a 'thing'.
- Classification is the identification of characteristics for categorization.
- Categorization is grouping of resources with same characteristics.
- Taxonomy is the process of classification.
The value of HTML language elements and attributes such as content attribute in META element with attribute name='keywords', heading elements, alt, name, title, hreflang and media, etc. are used by search engines to classify web content. While these attributes may still be used for web resource classification, the normative metadata shall extend the vocabulary that is used for web resource classification. The difference is that HTML element and attribute tag value and not tag name are used for web resource classification, the RDF/OWL class property (predicate) as well as values (object) shall be used for classification of semantic web resources (subject). It is important to note that HTML tag names do not mean name attribute value but tag names such as META, TH, A, H1-H6, etc.
Search Engine Indexing
Search engines may use the web resource classification to build taxonomy for efficient search. Since the web resources (subjects) with same classification, i.e. <predicate, object> tuple shall be categorized into one category, the hierarchical taxonomy may be built with normative metadata from standard ontology.

In semantic web all data is represented in the form of <subject, predicate, object> triples. The object is a value of the predicate, i.e. it is the value of a subject characteristic that is identified by the predicate. The object may be another web resource or a literal value. Hence all web resources with the characteristic identified by predicate shall have one of the well-known values for object. If the object value is an integer typed literal, SE may use predicate only for classification. This web resource classification is used to build SE taxonomy for web resources where IRI database is build by characteristics predicate and/or object. The figure illustrates the SE hierarchical taxonomy derived from the web resources RDF graph for <subject, predicate, object> triple.
In the figure the web resources in the IRI database include either of the following <predicate, object> tuple in the RDFa annotation of the web content or in POWDER DR.
dcmitype:service "Internet Service Provider"
dcmitype:service "Electric Service Provider"
dcmitype:service <http://www.somestandards.com/ISP>
dcmitype:service <http://www.somestandards.com/ESP>
The web content providers have classified the web resources ('thing'); SE shall use these classification characteristics to build taxonomy for WWW IRIs by grouping all web resources with same characteristics. All web resource with dcmitype:service property are further classified by the value of this property. The natural language query "ISP" submitted to the search engine by the user is transformed into the following SPARQL query by the search engine.
{
?s dcmitype:service "ISP", "Internet Service Provider" .
?s dcmitype:service <http://www.somestandards.com/ISP> .
}
Note: Search engine may add other expansions of "ISP" abbreviation to the query. User must enter desired expansion for efficient query.
Conclusion: A natural language query submitted by the web user that is converted to SPARQL query with relevant normative metadata corresponding to the user query terms can be answered by traversing the SE hierarchical taxonomy to retrieve web resources from the IRI database.



I am going to be blogging live from a couple of days of the
Leave a comment