July 9, 2007 10:20 AM
Encoding normative metadata context
Here it is suggested that if normative metadata terms can be represented by alphanumeric codes defined by respective standards organization associated with the product, then this will simplify the web services development and make search more efficient. The Web Service Definition (WSD) and Functional Definition (FD) documents will then use this normative metadata code only. The translation and conversion of the natural language query to normative metadata code will be done by the thesaurus based context mediator.
|
Example: Normative metadata term code - xxxyyyzzz Standards organization identification - xxx Product vocabulary identification - yyy Metadata term identification – zzz |
By assigning alphanumeric codes to the metadata terms the ambiguity due to use of synonyms in the vocabulary can be eliminated. Synonyms can be assigned the same code. The identification of a metadata term in the user query and translation of this term to code may be done by the context mediator. This alphanumeric code will encode the metadata context within the code, thus connecting the language with the data. This method is analogous to the EPCglobal tag that is assigned to physical "things". Since all web resources subjects, predicate and objects are considered as "things", this alphanumeric code will assign a unique identity to every "thing" on the World Wide Web. Within a web document the metadata term code will still be used as text term, i.e. by including vocabulary namespace and reference with Qname.
The vocabulary that defines an alphanumeric code for a metadata term can be identified from the code. Introduction of alphanumeric codes for normative metadata terms does not add any processing overhead for context mediators. The context mediators will still process the natural language query to understand the query context and find corresponding vocabulary. The use of alphanumeric codes for metadata terms will enable introduction of new features in database applications; intelligent context search algorithms can be built based on these codes. The textual terms require an exact match of vocabulary and metadata, whereas alphanumeric code will enable search on sub-code such as "xxxyyy". Thus a two-pass search where first a web document that uses a given vocabulary is found and then the metadata term is searched within the document can be replaced with single-pass search algorithms based on metadata term code.
Consider RDF triples:
| DSLBroadbandService | BitRate | _:2Mbps |
| _:2Mbps | rateValue | "2" |
| _:2Mbps | unitValue | "Mbps" |
| Service_code | BitRate_code | |
|---|---|---|
| rateValue_code | unitValue_code | |
| DSLBroadbandService_code | 2 | Mbps |
| DSLBroadbandService_code | 10 | Mbps |
| SatelliteBroadbandService_code | 200 | Mbps |
The table has BitRate column with two sub-properties rateValue and unitValue. These sub-properties are assigned unique code "aaabbb001" and "aaabbb002" respectively, where "aaa" is the standards organization (e.g. IEEE) that defines the vocabulary for broadband service, "bbb" is the code that uniquely identifies broadband vocabulary amongst other vocabularies defined by this organization and the values "001" and "002" identify the property rateValue and unitValue respectively. The table stores the actual values of the elements of BitRate class for all instances of broadband service record. A search for "Mbps broadband service" can be limited to search for records with BitRate sub-property "aaabbb002" value "Mbps". A search for vocabulary "aaabbb" is eliminated and direct search for metadata term "aaabbb002" is executed.
The BitRate class may be written as (rateValue_code = aaabbb001):
|
<owl:Class rdf:ID="BitRate_code" /> <owl:DatatypeProperty rdf:ID="rateValue_code> <rdfs:domain rdf:resource="#BitRate_code" /> <rdfs:range rdf:resource="xs:integer_code" /> </owl:DatatypeProperty> |
The disadvantage of these alphanumeric codes for normative metadata terms is that these codes will have to be embedded in the HTML schema of a web page to enable assimilation of information in the semantic web framework. Whereas textual metadata terms would have enabled assimilation of information in the semantic web framework even when metadata terms are not added to HTML schema, this text may be found by pattern matching.
Conclusion: The advantages of alphanumeric code for normative metadata terms are:
- context is encoded within metadata term code,
- context search features can be added to database applications,
- more efficient search applications can be built.



I am going to be blogging live from a couple of days of the
Leave a comment