Promo

Preface

Oh, hello.  Here lie a collection of articles, narratives and ponderings of computery things; finely blended with my portfolio bestowing works and experiments in U.I. design, infographics, and software development.  Bon appétit.

Calendar

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

Categories

posts rss
Cynosura 0 rss
General 0 rss
Graphic Design 4 rss
Programming 6 rss
WWW Tech 2 rss

The Semantic Web

Ivory Towers and Missed Oppertunities

A t the heart of the Semantic Web is the philosophy that the world’s information can be rationalized.  Such a noble proposition extends from our innate desire to make sense of the world around us, and isn't unique to just the narrow section of the Artificial Intelligence community which co-founded the Semantic Web.  With this worldview comes the assertion of a universal truth, an all-conquering, logical explanation for any particular domain of discourse; a formalized ontology of the world around us. [1]  

The vision of the Semantic Web is to create such digitized ontologies† of the world’s digitized information as a means to enable computers to work in an unprecedented automated and rational fashion.  With these ontologies, we will be able to dictate information (i.e. metadata) which is meaningful to computers.  Combining this data with computer-understandable semantics will provide the framework to stimulate a ‘revolution of new possibilities’ [2] and bring an unseen synergy across the chaotic landscape of the internet, where applications and services currently operate in explicitly defined boundaries of interoperability.  

The Semantic Web has undoubtedly matured somewhat since it was first uncovered.  Its core enabling technology, the Resource Description Framework (RDF), has been recommended by the World Wide Web Consortium (W3C) as the standardised method for web based knowledge representation since 1999.[3]  Since then, a family of unanimous technologies has sprouted from the W3C in effort to germinate the semantic revolution which many of the Semantic Web’s champions have been hoping for.  A stable set of specifications have been available since 2004,[4] the most recent of these to be recommended being SPARQL, a query language for the Semantic Web, standardized on January 15th, 2008.[5]

What is clear, however, is that such a revolution hasn’t yet occurred.  The most salient of the Semantic Web’s advocates agree to this extent: “Despite [...] significant drivers in defence, business and commerce, it’s still apparent that the Semantic Web isn’t with us on any scale.” [3] This essay aims to examine what I think are the core challenges facing the Semantic Web as it tries to become a mainstream medium for data interchange.  I shall begin by analysing where we are know in terms of the de-facto and de-jure standards being adopted by web-developers today, what it was that made these popular, and contrasting these with Semantic Web technologies and the Semantic Web philosophy.  Following that I shall discus the inherent need for knowledge representation on the web, and whether the Semantic Web sufficiently fulfils this need better than the current alternatives.

Web 2.0

The current state of the art of public web applications and services in use today represents a convergence of technology, standardization and social attitudes.  This revolution in web-design and social trends has reached critical mass over the past 3 years, after an arduous evolution of how designers leverage the internet’s potential as a platform; That is, as a sphere-of-interaction for connecting its users to one another, rather than merely networking the world’s computers on an information-bus.  Tim Berners-Lee, who is cited as first proposing the internet in 1989,[6] is convinced that this is entirely what the internet was intended for all along.[7]  However true this may be, it is only recently that words such as Wiki and Blog have become part of the web-designer’s everyday vernacular.  What these terms represent is a paradigm shift towards bottom-up, user-driven content, in what its advocates christened as Web2.0.[8]

What Web2.0 actually means ipso facto isn’t particularly objective even to one of the most prominent of its champions, Tim O’Reilly.[9]  The term is fuzzy at best, and is largely just a moniker for the industry-driven hype of the time, but doubles up as an umbrella-term for technologies such as Ajax (itself a name for a group of web-based technologies) and the techniques in which they are commonly used.  What is clear, however, is that Web2.0 is all about people; it’s about participation, usability, mobility, and social connectivity.  

Web 2.0 became popular as the result of a realisation that users add value to an application, although few people will go to the trouble of adding value to applications by explicit means. Therefore: “Web 2.0 companies set inclusive defaults for aggregating user data and building value as a side-effect of ordinary use of the application.” [9] The net result is user-driven content which progressively adds value to an application, albeit incidentally.  This is what’s known as The Architecture of Participation,[10] and is the driving impulse behind sites such as Last.fm (www.Last.fm, an internet radio and music community) and Digg (www.Digg.com, a community-based article popularity aggregator).  These sites relish and survive on the data its users provide and the inner communities they create; without their users, there would be no value, and without the value, there would be no community.  What’s interesting to observe here is the chicken-and-the-egg scenario of creating value on a web application which depends on its users, and the parallels this draws with the semantic web’s adoption.  A site such as del.icio.us (www.del.icio.us, the archetypal Web 2.0 bookmark-managing website, which I will discuss in more detail later on) depends on a critical mass of users to create any worthwhile content, yet it was conceived without any.  The connection with the Semantic Web is that it depends on a critical mass metadata to be useful, yet web-developers aren’t creating any en masse because it’s not worthwhile creating metadata when there are no significant tools which take advantage of it. Yet, as the past few years have demonstrated with Web 2.0, it’s possible to overcome this hurdle with the architecture of participation, and in particular, user-generated semantics.

Folksonomies

Folksonomy , also known as Social Classification, is a typical feature of a Web 2.0 application, and is created by harnessing the power of the tag; an electronic label which is pinned to an artefact (an image or a video, for example) to describe what it is.  Over time a group of users who are tagging content will create a tag cloud; a weighted-list of tags which use visual-cues to denote the popularity of a particular tag.[11]  These atomic pieces of metadata can be used to aggregate or cluster content which is labelled by the same tag, thus providing a degree of keyword search capability not otherwise possible.  A Folksonomy is a distributed classification system similar in concept to the ontologies of the Semantic Web.  In reality, however, there is a sizable difference.

A tag cloud aggregating the interests of an American politial forum.

Source: unknown.

Folksonomies represent a structure which emerges organically as a result of individuals in a community managing their own information requirements.[3]  This is in complete contrast to the ontology, which instead of being a bottom-up user driven approach working via consensus, is a top-down (from the perspective of the user), one-size-fits-all approach to classification.  Clay Shirky, a detractor of the Semantic Web, maintains in his article Ontology is Overrated that tagging is a ‘radical break with previous [failed] categorization strategies’, and that the Semantic Web is an attempt to recreate the ontological classifications which aren’t suited to the problem.[12]

Some of the supporting arguments in Shirky’s polemic are slightly misplaced.  In particular, he directly compares the taxonomies of old directory services, such as the Yahoo! Directory service, to ontologies. This is an incorrect analogy; ontologies are essentially for sharing information between applications, not necessarily finding information.  I believe the main thesis of Shirky’s argument is correct, however.  There are fundamental problems with the approach the Semantic Web has adopted.  These problems arise from the fact that outside the boundaries of well-defined and formalized domains of discourse (e.g. epidemiology[3]), which have groups of expert ontologists, there is only an uncoordinated hodgepodge of people who understand the theories and underpinnings of the Semantic Web, let alone have the experience to be able to use the available tools: “The list of factors making ontology a bad fit is [...] an almost perfect description of the web –largest corpus, most naive users, no global authority, and so on.” [12] The complexity issue facing the Semantic Web is reminiscent of the early days of the web, which was spearheaded by a small but mobilised group within the High Energy Physics community, well before there was any significant commercial uptake.  Advocates are quick to point out this correlation [13,14] however, along with Shirky, I believe that the problem is one more of a matter of feasibility rather than technicality.

Because the web is distributed, decentralised and falls under no particular authority (all aspects which made the internet popular in the first place), it becomes very difficult to define a consistent worldview on cultural information across independent development efforts.[15]  Such consistency is what a successful Semantic Web would depend upon.

A digression on the URI, and why it’s inadequate

RDF uses the Uniform Resource Identifier (URI) to build a vocabulary of terms which describe things uniquely.  URIs are powerful because they are decentralised, and anyone can create any number of URIs as there are things worthy of describing.  This degree of flexibility is inherently flawed because people can use different URIs to describe the same thing, or the same URI to describe different things.  This is similar to a tag within a folksonomy, which within itself carries no semantics, requiring a contextual interpretation which is inherently imprecise, and thus ambiguous.

The designers of the Semantic Web have taken account for duplicate URI problem in their design of the Web Ontology Language (OWL).  The owl:sameAs element can be used to identify synonyms across multiple ontologies.[1]  However, I see this as an eschewal of the problem.  Firstly, it is not feasible for a developer to be familiarised with the terms of every other ontology on the internet before he designs his own.  Furthermore, a URI can change over time and may necessitate change if it was designed poorly.[16]  Thusly, if data is to be used in unanticipated ways by conjoining it arbitrarily with another feasibly related data sets (note that merely exposing the data in RDF doesn’t automatically make it of any particular use in a semantic web), and these data sets use different vocabularies to describe the same things, then a mapping must occur.  But in necessitating such a mapping to take place, what have we achieved? Not much more than what presenting the data in vanilla schema-based XML would achieve in such a scenario.  Semantic Web advocates have placed much emphasis on the fact that this is a necessary trade-off for feasibility.[17]  I don’t believe I am the only one who thinks it is an unsatisfactory one, and in order to help solve this issue, community projects such as SWAG (swag.webns.net, which encompasses parts of other projects such as Dublin Core (dublincore.org) and FOAF (xmlns.com/FOAF/spec)) have been created to help ensure data interoperability on the Semantic Web by providing a URI database of commonly used terms and a standardised set of conventions for describing resources.  Such initiatives, while necessary, are antithetical to what made the original internet so successful.  In summary: the URI enables anyone to come up with the words in a dictionary, but it does not ensure we are using the same dictionary.

Bridging the gap between Folksonomy and Ontology

Tags are simple to understand, powerful, and are here today.  The act of tagging sometimes appears altruistic, but the fact is incidental; people first started tagging because it fitted their needs as opposed to the needs of others.  This is crucial.  Given a community of taggers, and the activity becomes synergetic.   People in the Semantic Web community are swift to point out the failings of tagging as a system for classification.  Much of this criticism is legitimate, but some of it is exaggerated.  For instance, an obvious difference between the folksonomy and the ontology is that all tags fall within one namespace and so there is no opportunity for hierarchies to form.[18]  For example, take the following hierarchy: A Swallow is a Bird is an Animal.  Now, if someone tags a picture exclusively as swallow, then someone else who is searching for pictures of birds will not be able to use inference to find the picture aggregated within the results.  This argument has merit but has little practical significance.  Take, for instance, the 15 most commonly used tags on Del.icio.us: design, blog, tools, programming, webdesign, software, web2.0, music, google, art, css, photography, web, education, news [19] Analysing this list you’ll find little (if any) immediate requirement for hierarchies.  Further down the list (not shown) you’ll find Java which would fall within the programming topic, but experienced taggers will often use both these tags should such an item require it.  When we begin to use multiple tags, the folksonomy no longer looks flat, and tags become multidimensional.[20]

Other inadequacies with tagging remain, such as its natural language dependency and the inability to establish horizontal intra-relationships amongst terms and being able to distinguish these from the vertical hierarchies described above. There is significant research effort being undertaken in this area,[21] some of which has made it into the wild.[22]  Other pundits suggest that it is time for the Semantic Web community to embrace the virtues of tagging, and to adopt a unified representation of both the tag and the ontology.[23]  The web developer community, however, is shifting in an opposing direction with their rapid adoption of the microformat

Microformats

The microformat is gaining traction as a credible, yet simple approach to bridging the gap between the semantically meaningless documents to computers which imbue the internet, and the more realistic ambitions of the Semantic Web’s exponents.  Essentially, microformats allow structured pieces of information to become tagged. Microformats are built directly upon existing web content and therefore exploit the mature and wide-spread internet data formats which define such documents (XHTML/HTML).  As such, they don’t require developers to learn the complex schemas of RDF and OWL to enrich their content with useful machine readable metadata; they don’t even require developers to rethink the cognitive models of their existing data structures.   “Even though [microformats] sidestep the existing ‘technology stack’ of RDF, ontologies, and Artificial Intelligence-inspired processing tools, various microformats have emerged that parallel the goals of several well-known Semantic Web projects.” [24] Adoption of the microformat has been rapid chiefly due to their inherent simplicity and compatibility with existing web authoring techniques.  So far, the movement has grown into a de-facto standard for annotating contact, event and location data.  Sites such as Technorati (Technorati.com/contacts) and the IBM Employee Directory (ibm.com/contact/employees) allow people to conveniently exchange contact information, in the form of the vCard microformat, to various applications such as the Microsoft Windows Vista Contacts application[25] and the Apple AddressBook.[26]  Furthermore, creators of world’s two most popular web browsers, Mozilla Firefox and Microsoft Internet Explorer, have expressed their desire to at least partially support microformats in upcoming versions of their products.[27,28]

The adoption of the microformat compared with RDF is analogous to that of RSS.  Version 1.0 of RSS is perhaps the largest in-use example of RDF available.  RSS, however, remained rather inconspicuous in the web authoring community until (much like microformats) it was stripped of RDF and extensively simplified in the subsequent version, 2.0.

The Semantic Web allows for the decentralized development of ontologies.  Microformats make no such ostentatious claims.  The microformat philosophy is to develop solutions for particular developers’ needs, not provide a ‘panacea for all taxonomies, ontologies, and other such abstractions.’ [29]  Furthermore, developers accept that these formats need to be agreed upon prior to them becoming at all useful on the Web.  Microformats are individually small and well defined; they represent a very evolutionary (rather than revolutionary) and pragmatic approach to developing a loosely connected ‘little-s’ Semantic Web , rather than the one-solution-fits-all approach which, by design, leaves the community with no agreed-upon schematics for achieving common goals collaboratively.

Conclusion

The Semantic Web stands at a precipice.  It has failed to deliver outside of academic circles because of the same reasons which made the original web so successful.  While sound in theory, the practicalities of achieving the more audacious goals of the Semantic Web are problematic at best, and in no way represent the driving sentiment of the enterprise engineer, let alone the humble minute-man web designer.  The proclamations of gold at the end of the rainbow by the scientific community are reminiscent of the fruitless promises of the Artificial Intelligence crowd during the latter part of the 20th century.[15]  While proven useful within highly centralised sections of the e-science community, adoption in any decentralized sense, outside groups which are highly organized, has been plaintive.  The goals of the Semantic Web must be brought down from the stratosphere, or, bridges must be built to the ivory tower.  The community could wait for automation techniques to reach a scale likable to artificial intelligence, or, more realistic goals must be set.

Top: Core SW technologies are entertaining an ever more narrow section of the development community.  Is interest dwindling?

Above: New technologies fail to catch on.

Source: google

Simple Social classification techniques have already achieved much more in the wild than the Semantic Web, and have low cognitive costs for the taxonomist.  Microformats show great promise and demonstrate that the community is ready to embrace small evolutionary changes as the tools (authoring-software and browsers etc.) convalesce to developers’ needs.   Copy-cat initiatives such as RDFa , while promising on their own merits, represent defiance by the Semantic Web community and an unshakable determination to go down their own path rather than accept the shortcomings of upcoming formats and work collaboratively with those communities to build bridge technologies (although the GRDDL project is particularly interesting in this respect,[3] it is far removed from being a simple solution to the problem of translating semantically enriched XHTML into RDF).  As a result, RDFa is undoubtedly going to hamper the progress of microformats, and vice versa, in the upcoming years.  Natural selection will dictate who is fittest for purpose.

Works Cited

  1. Antoniou, G., & Harmelen, F. v. (2004). A Semantic Web primer. Cambridge, Massachusetts: The MIT Press.
  2. Berners-Lee, T., Hendler, J., & Lassila, O. (2001, May). The Semantic Web. Scientific American , p.34-43.
  3. Shadbolt, N., Hall, W., & Berners-Lee, T. (2006). The Semantic Web Revisited. IEEE Intelligent Systems, 96-101.
  4. Herman, I. (2007). Semantic Web Adoption. CSWS2007 (pp. 1-47). Beijing, China: W3C.
  5. Herman, I. (2008, January 15).SPARQL is a Recommendation.Retrieved April 2008, from Semantic Web Activity News: http://www.w3.org/blog/SW/2008/01/15/sparql_is_a_recommendation
  6. Berners-Lee, T. (1990, May). The original Proposal of the WWW, HTMLized. Retrieved April 2008, from http://www.w3.org/History/1989/proposal.html
  7. Berners-Lee, T. (2006, July 28). The IBM Developerworks Podcast. (S. Laningham, Interviewer)
  8. The Web 2.0 Conference. (2004, June). Web 2.0 Conference. Retrieved April 2008, from http://web.archive.org/web/20040602111547/http://web2con.com/
  9. O'Reilly, T. (2005, August 09). What is Web 2.0. Retrieved April 2008, from O'Reilly Net: http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
  10. O'Reilly, T. (2004, June). The Architecture of Participation. Retrieved April 2008, from The O'Reilly Network: http://www.oreillynet.com/pub/a/oreilly/tim/articles/architecture_of_participation.html
  11. Smith, G. (2004, August). Folksonomy: social classification. Retrieved April 2008, from atomiq: http://atomiq.org/archives/2004/08/folksonomy_social_classification.html
  12. Shirkey, C. (2005). Shirkey: Ontology is Overrated. Retrieved April 2008, from Shirkey: http://shirky.com/writings/ontology_overrated.html
  13. Goble, C., & Roure, D. D. The Grid: An Application of the Semantic Web. Manchester, England: The Department of Computer Science, The University of Manchester.
  14. Updegrove, A. (2005, June). An Interview With Tim Berners-Lee. Retrieved April 2008, from http://www.consortiuminfo.org/bulletins/semanticweb.php
  15. Shirkey, C. (2007, November 7). The Semantic Web, Syllogism, and worldview. Retrieved April 2008, from Clay Shirkey's writings about the internet: http://www.shirky.com/writings/semantic_syllogism.html
  16. W3C. Cool URIs don't change. Retrieved April 2008, from W3C: http://www.w3.org/Provider/Style/URI
  17. Swartz, A. (2002, January 5). The Semantic Web In Breadth. Retrieved April 2008, from Logic Error: http://logicerror.com/semanticWeb-long
  18. Mathes, A. (2004, December). Folksonomies - Cooperative Classification and Communication Through Shared Metadata. Retrieved April 2008, from http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html
  19. http://del.icio.us/tag/
  20. Shirkey, C. (2005, January 24). Tags != folksonomies && Tags != Flat name spaces. Retrieved April 2008, from Corante: http://many.corante.com/archives/2005/01/24/tags_folksonomies_tags_flat_name_spaces.php
  21. Angeletou, S., Sabou, M., & Motta, E. (2007). Bridging the Gap Between Folksonomies and the Semantic Web. Milton Keynes, United Kingdom: The Open University.
  22. Jones, R. (2007, August 27). Audio Fingerprinting for Clean Metadata. Retrieved April 2008, from Last.fm: http://blog.last.fm/2007/08/29/audio-fingerprinting-for-clean-metadata
  23. Gruber, T. (2006). Where the Social Web Meets the Semantic Web. The 5th International Semantic Web Conference. Athens, GA, USA: LSWC.
  24. Khare, R., & Çelik, T. (2006). Microformats: A Pragmatic Path to the Semantic Web. CommerceNet Labs Technical Report 06-01.
  25. Microsoft Corp. (n.d.). Can I send and receive vCard contacts? Retrieved April 2008, from Windows Vista Help: http://windowshelp.microsoft.com/Windows/en-US/Help/ec129e8d-9a0f-40d7-bbc0-9faade057a1c1033.mspx
  26. Apple Inc. (n.d.). Importing and exporting vCards. Retrieved April 2008, from Apple Docs: http://docs.info.apple.com/article.html?path=AddressBook/4.0/en/ad995.html
  27. Mozilla Wiki. (2007, May). Microformats. Retrieved April 2008, from The Mozilla Wiki: http://wiki.mozilla.org/Microformats
  28. Reimer, J. (2007, May). Microsoft drops hints about Internet Explorer 8. Retrieved April 2008, from ArsTechnica: http://arstechnica.com/news.ars/post/20070502-microsoft-drops-hints-about-internet-explorer-8.html
  29. microformats.org/about/
Author
Ray
Published
Tue. 30 December, 2008 14:20
18 Comments
comments open
filed under
WWW Tech
Rating
  • Currently 4.833333/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Bookmarks

Commentary

Comments are closed