April 14, 2008   Sign In |  About ebizQ |  Contact Us |  Join ebizQ Gold Club

ITGumbo: spicing IT up

IT Copywrite

Technology and application of technology.

ebizQ presents ITGumbo: a spicy blog network where vendors and IT professionals share ideas about creating Business Agility.

March 2007 Archives

Web 3.0 - 'thing' vocabularies

RDF specifications mention that URI can be used to refer anything that is referred to in a statement. RDF allows use of URI reference (URIref) and fragment identifiers to refer to subject, predicate and object in a statement. RDF uses XML, RDF/XML markup language to represent information in a machine readable format.

The information is structured into a triple <subject, predicate, object>.

Subject – is the 'thing'
Predicate – is the property (characteristics) of the 'thing'
Object – is the value of this property of the 'thing'

The Subject i.e. a 'thing' is identified by an URIref. When an URIref is assigned to a Predicate, by definition of URI it is assumed that this Predicate is a unique property of a 'thing'. Therefore all websites that sell (or refer to) this 'thing' can refer to the same Predicate URIref for this property of the 'thing'. Different web pages may assign different value to this property.

Predicate = Object URIref | literal value.

All web pages that refer to an Object URIref may refer to the same URIref.

» Continue reading Web 3.0 - 'thing' vocabularies.

Search web data more efficiently

Web 3.0 will provide:

  • better methods to connect online data
  • efficient methods to interrogate data
  • computer readable labels that can help filter data: such as access to porn websites
  • information classification
  • general-purpose metadata that will be consistent across web-sites
  • relationships between classification categories - ontologies

Web 3.0 will convert the existing web of documents to a web of databases. This will make web easier to use, the artificial intelligence in computers and browsers will understand the information as humans do. It will simplify the research.

Open market for consumer - Web 3.0

Semantic web is about converting human-readable information into machine readable information. To allow crawlers to go upto the physical location of the information: the data semantics must be accessible, the internal representation of this data on the company server is immaterial. e.g. 'Kbps' is BitRate; this information is expressed on the web with reference to a unique URI: www.commondefinitions.org/#kbps in the metadata for 'BitRate'. In the company database this information may be represented as: Kbps=1, Mbps=2, etc (BitRate enumeration).


What will be achieved with semantic web: consider one application " open market", this is what we will discuss here. Here I am not referring to the e-commerce company 'Open Market' that was acquired by FatWire Software and Soverain Software. I want to discuss the concept of 'open market' where the consumer has access to information and is free to make decisions without any external influence. To be able to provide an open market on the WWW, it is necessary that every web content provider understands the same language and has an open door. By 'open door' I would mean that the web content provider allows access to the information.


The issues to be addressed are :

» Continue reading Open market for consumer - Web 3.0.

Semantic Web - vague but exciting?

From "surface web" to "deep web" to:

  • mine the hidden data
  • bring discipline to the world wide web
  • to organize information systematically
  • to extract meaning

HTML --> XML --> RDF & OWL ...

How easy it is to find all the relevant information about a 'thing' in this world? Web 2.0 may provide the information; how to measure the relevance? Will semantic web also control the search engine bias?

An article on Tim Berners-Lee vision about www.

Element to Web Resource - way to Semantic Web

Consider ISP has following services: Dial-up, DSL, ISDN, Satellite, and WLAN. ISP defines the following characteristics of every service: Connection Type (wired or wireless), Connection Speed ( 56Kbps, 128Kbps, 256Kbps, 512Kbps, 1Mbps , etc), Installation Cost (Fixed Charges, Modem Cost), Subscription Cost ($25/Month, $5/4GB, etc)

HTMLXMLRDF
<table>
<tr>
<th>Speed</th><th>Unit</th>
<th>SCost</th><th>SType</th>
</tr>
<tr>
<td>256</td><td>Kbps</td>
<td>$5</td><td>monthly</td>
</tr>
<tr>
<td>2</td><td>Mbps</td>
<td>$60</td><td>monthly</td>
</tr>
</table>
<ServicePackages>
<Package>
<Speed unit="Kbps">256</Speed>
<SubsCost unit="Monthly">5</SubsCost>
</Package>
<Package>
...
</Package>
</ServicePackages>
<rdf:Description rdf:nodeID="PkgWebRes">
<cdef:Speed rdf:nodeID="SpdWebRes">
<cdef:SubsCost rdf:nodeID="ScostWebRes">
</rdf:Description>

<rdf:Description rdfnodeID="SpdWebRes">
<cdef:BitRate rdf:nodeID="BRWebRes">
</rdf:Description>
formatting for display storage layout and
logical structure
web resources metadata

HTML provides formatting to the information that is displayed on the web page.

XML provides a method to structure this information into entities that have attributes: e.g. Characteristics of a service. Each entity is described in the XML document as an element, attribute specifications and child elements. Document Type Definition (DTD), HTML and Style Sheet provide the formatting for the XML structured information. The advantage of XML structured information is that each element can be accessed and processed individually like a database record. The element information can be exchanged in protocol messages and the element information can be used to operate relational database. Before XML, the web page content existed in HTML format that was not easily accessible for database operations.

The World Wide Web is a web of such HTML and XML documents. The XML entities are accessible within a limited scope, e.g. the credit card information is exchanged online with the finance service (bank), and the secure protocol connections between the consumer, merchant and the bank ensure information security. The e-shop product information and the purchase order are communicated between the consumer and the merchant. XML has made it possible to exchange information within this limited scope. The unresolved issue is the time spent by the consumer in locating the best product. An e-shop may provide a sort mechanism or a search tool. However the scope is limited to e-shop.

The DSL service packages that were listed in tabular form with HTML are entered as database records by XML definition; the information is present for the Search Engine as a text document. If every element identified in the XML document can be considered as a web resource with an unambiguous URI then a metadata can be defined for this web resource. This metadata will empower the Search Engines to make an intelligent search. Consider the service provided by the ISP as a web resource and the characteristics of this service as a metadata of this web resource. With speed and cost as data values in the metadata a Search Engine can locate the best service.

RDF will define the DSL service package as a web resource with associated metadata. Mention of a phrase “256Kbps” in the search query will tell Search Engine to query all metadata with '256Kbps' value in the BitRate field. A more advanced Search Engine can then compare the SubscriptionCost in the results. The semantic web provides meaning to the phrase '256Kbps'; rather than listing all web pages with the text phrase '256Kbps', the Web 3.0 Search Engine will list only those pages that have this phrase in the BitRate field of metadata.

Resource Description Framework (RDF) is a language to represent the entities as web resources with the associated metadata. The challenges are:

  • Identification of web resources
  • Standard definition of metadata
  • Definition of common namespace
  • ...

Swoogle is a search engine that indexes all URI with a given term in the URI, all documents that contain this term in the metadata. E.g. search term “CDMA”, Swoogle will index all http://*.CDMA.*, http://*.*.*/*/CDMA/* type of URI and all documents with “CDMA” in metadata. SHOE is another Semantic Search project that works on metadata and does not depend on keyword density only for indexing web pages.

Please refer W3C specifications for correct language syntax.

Feasibility of Web 3.0, popularly known as Semantic Web

Product X is sold on-line by 'n' number of providers. Product X has features x1, x2, ... xn. Service provider S1 & S2 use XML tags to describe the product in the web content on their web-site:

S1:
xmlns: X=http://s1/xterms

<X:x1>value</X:x1>
<X:x2>value</X:x2>
...
<X:xn>value</X:xn>
S2:
xmlns: X=http://s2/xterms

<X:x1>value</X:x1>
<X:x2>value</X:x2>
...
<X:xn>value</X:xn>

The use of different tag names by providers for similar objects will limit the utility of this objectivity to a particular web-site. A consumer who wants to buy a product has to study the web content of the service providers to find the best match for his/her requirements. Comparisons of the products and features may be found online in review articles. While the listings on the SERP may be biased because of SE or content provider methods, the review articles may provide individual or community opinions.

The Web 3.0 philosophy is to transform this method of information representation into a more objective manner that can provide information that will enable deductive reasoning and inference by intelligent computers.

Web 3.0 will provide:

  • common formats for integration and combination of data drawn from diverse sources
  • language to relate data to real world objects
  • notations to describe concepts, terms and relationships

This technology will enable to build a Search Engine that can compare the feature x1 ( x1 , x1) of a product to produce answers to the natural language query of a consumer such as: ISP for 256Kbps DSL broadband internet connection. Here product feature x1 is 'internet connection' with value 'DSL'. DSL internet connection may have features like link bandwidth and subscription cost.

The feasibility of Web 3.0 can be powered by an existence of a Search Engine based on this technology. An interesting presentation


SERP – Search Engine Result Page