XML 2007 Conference
Marriott Copley Place
Boston, Massachusetts, USA
3-5 December 2007
Add to your personal schedule

Metadata Mining: Automated Semantic Classification for Service Repositories

Joshua Fox (IBM)
XML in the Enterprise Suffolk
Chair: David Orchard (BEA Systems)

Services repositories hold out the promise of re-usability across the enterprise and beyond. Yet when a repository begins accumulating large amounts of service information, it too often falls victim to its own success. A developer who wants to consume a specific service cannot deal with thousands of WSDL, XSD, or other metadata items. All too often, this results in new code being written unnecessarily.

Classification using taxonomies or ontologies is the preferred solution. With a good semantic classification, a potential consumer can easily search for a service which meets their business requirements, even where the terminologies and structures are not yet standardized. Yet classifying masses of metadata is a time-consuming and error prone – and therefore impractical – task.

Data mining techniques help answer this challenge. Significant technical improvements in the last decade have helped data mining to score impressive successes in areas as disparate as purchase recommendations and homeland security. Data mining discovers relationships, patterns, and trends; often does better than human analysts can. Mining of XML data-sets (most service metadata is itself expressed in XML) has also seen tremendous improvements in performance and accuracy.

Since metadata is just another type of data, applying data mining to metadata is technically straightforward. This presentation will illustrate how to use standard tools to automatically propose mappings from very large sets of WSDL and XSD into ontological categories, simplifying further work for repository managers.

We will demonstrate this in the context of real-life service re-use scenarios. We will combine a practical guide to interfacing with service repositories with a non-specialist overview of XML data mining techniques.

Photo of Joshua Fox

Joshua Fox

IBM

Joshua Fox is Project Lead and Chief Technologist for the Metadata Analytics project in IBM, in which he researches and develops innovative solutions for analyzing and classifying disorganized SOA metadata. Previously, Joshua Fox was Chief Architect of Unicorn Solutions (acquired by IBM), an early leader in semantic information management software.

Fox has also served as Principal Architect and Director at Mercury Interactive (acquired by HP) and as Senior Software Architect at VocalTec.

In addition to several presentations at XML Conference, Fox has spoken at many conferences including JavaOne, WebServices Edge, and published in Dr. Dobb’s Journal, XML Journal, WebServices Journal, and others in the fields of software, metadata, and enterprise ontology. (See http://www.joshuafox.com for details.)

He received his PhD from Harvard University and his BA summa cum laude from Brandeis University.

Your account


(?)

Premiere sponsor

Microsoft Interoperability

Platinum sponsors

JustSystems
DataDirect
IBM

Gold sponsors

Intel
Antenna House

Produced by

IDEAlliance

Event sponsor

RSuite CMS

Co-hosts

OASIS
Philly XML
XML Guild
Event software by Expectnation