One of the stumbling blocks to realizing the semantic web is that most information available on web pages today is designed for consumption by humans, not machines. Inconsistencies in content and markup from one page to another impede machine-to-machine communication. This poses some interesting automation challenges as service providers attempt to expose what is needed by both organic and inorganic consumers. The semantic web will rely on information exchange enabled by tagging web pages with metadata that is understandable to machines, but unobtrusive to humans. Machines require consistency, too, for interpreting data for which many different formats would otherwise be possible (such as names, addresses or dates).
For example, contemporary search engines can’t easily find the answer to questions like – “Give me a list of dentists in a particular geographical area with a list of comments from their patients” – not because it’s a particularly complex question, but rather because of the myriad of ways medical practitioner names, specialties, locations, and patient feedback might be represented and tagged on various websites. For example, patients’ comments about doctors are more likely to be found in blogs and other freetext postings (human-readable) as opposed to standard survey forms with discrete, controlled sets of questions and answers (machine-processible). These distributed information tokens will need to be tagged with metadata understandable to machines before the data can be aggregated. But consistently tagging the entire web is a daunting task! A lightweight solution has emerged from the realm of social networking: microformats.
Microformats are small pieces of information that can be injected into web pages using available standards. They are machine-consumable, and expose common information such as events, locations, and points-of-contact with minimal disruption to humans. Information tagged in this way then can be harvested and employed in “mashups” (content-combining applications built from multiple sources) to answer complex questions such as the example above, or simply to aggregate information in ways unanticipated by the original web document content provider.
In this presentation, we will share some ways we have been leveraging microformats with mashups to satisfy military information exchange requirements. These are important and timely solutions, since the U.S. Department of Defense (DoD) is attempting to deploy search and sharing capabilities for 21st century warfighters similar to what they enjoy in their homes on the public internet. At the same time, DoD and others are hoping to fulfill the long-anticipated promise of software and services that can be sewn together and tapped on demand (a.k.a. the “wild, wild web”).
Foundational to our work is a body of legacy military information exchange standards that have been on a migrational path to web-based technologies since the 1990’s. We will show how the insertion of microformats and mashup technology can help to “pave even more cowpaths” for our stakeholders, and to cost-effectively move DoD closer to an information sharing infrastructure envisioned as a semantic web for warfighters.
Our work in progress is highly significant since it draws on software and technologies that are freely available and widely used in the public domain, yet it is targeted at an environment which has historically featured very different information sharing paradigms than the internet. Not only does the warfighting environment have more rigorous security constraints than the public web, but also many of its clients have not yet completely embraced the “need to share” philosophy and practices that are fundamental to the bottom-up success of the internet. We recommend some ways ahead despite these fundamental technical and cultural hurdles.
Finally, the paper proposes some ways our work can be generalized to other domains. The challenges inherent in bootstrapping legacy information and process migration through web-based approaches are not unique to the world of military information exchange, so our findings and recommendations are relevant to the XML community at large.
Dr. Mary Ann Malloy is a Lead Information Systems Engineer for The MITRE Corporation. She is an internationally recognized expert in tactical messaging standards and information interoperability. Her research interests include XML technologies, business process/rules management and visualization. Dr. Malloy supports net-centric interoperability initiatives for various Department of Defense stakeholder communities, including Defense Information Systems Agency and U.S. Joint Forces Command. She leads MITRE efforts to engage in mentoring and collaborative research with local industry and academia in the Hampton Roads region. She has published nearly 40 technical papers, most recently “Generational Challenges to the Netcentric Future” and “Wicked Project Management.”