5. Wide-Area Information Servers
Users on different platforms can access personal, company, and published information from one interface. The information can be anything: text, pictures, voice, or formatted documents. Since a single computer-to-computer protocol is used, information can be stored anywhere on different types of machines. Anyone can use this system since it uses natural language questions to find relevant documents. Relevant documents can be fed back to a server to refine the search. This avoids complicated query languages and vendor-specific systems. Successful searches can be automatically run to alert the user when new information becomes available.
The servers take a user's question and do their best to find relevant documents. The servers, at this point, do not "understand" the user's English language question, rather they try to find documents that contain those words and phrases and ranks then based on heuristics. The user interfaces (clients) talk to the servers using an extension to a standard protocol Z39.50. Using a public standard allows vendors to compete with each other, while bypassing the usual proprietary protocol period that slows development. Thinking Machines is giving away an implementation of this standard to help vendors develop clients and servers.
There is a mailing list that has weekly postings on progress and new releases: wais-interest@think.com, and another for general WAIS matters: wais-discussion@think.com
The project is based on the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups. Originally aimed at the High Energy Physics community, it has spread to other areas and attracted much interest in user support, resource discovery and collaborative work areas.
Reader View
The WWW world consists of documents, and links. Indexes are special documents which, rather than being read, may be searched. The result of such a search is another ("virtual") document containing links to the documents found. A simple protocol ("HTTP") is used to allow a browser program to request a keyword search by a remote information server.
The web contains documents in many formats. Those documents which are hypertext, (real or virtual) contain links to other documents, or places within documents. All documents, whether real, virtual or indexes, look similar to the reader and are contained within the same addressing scheme.
To follow a link, a reader clicks with a mouse (or types in a number if he or she has no mouse). To search and index, a reader gives keywords (or other search criteria). These are the only operations necessary to access the entire world of data.
Information Provider View
The WWW browsers can access many existing data systems via existing protocols (FTP, NNTP) or via HTTP and a gateway. In this way, the critical mass of data is quickly exceeded, and the increasing use of the system by readers and information suppliers encourage each other.
Making a web is as simple as writing a few SGML files which point to your existing data. Making it public involves running the FTP or HTTP daemon, and making at least one link into your web from another. In fact, any file available by anonymous FTP can be immediately linked into a web. The very small start-up effort is designed to allow small contributions.
At the other end of the scale, large information providers may provide an HTTP server with full text or keyword indexing. This may allow access to a large existing database without changing the way that database is managed. Such gateways have already been made into Digital's VMS/Help, Technical University of Graz's "Hyper-G", and Thinking Machine's WAIS systems.
The WWW model gets over the frustrating incompatibilities of data format between suppliers and reader by allowing negotiation of format between a smart browser and a smart server. This should provide a basis for extension into multimedia, and allow those who share application standards to make full use of them across the web.
The protocol is currently being implemented to add multimedia facilities. Existing standards are used wherever possible, notably in the use of SGML for hypertext format, MIME registration for multimedia representations, and internet-style telnet basis for the search/retrieve protocol.
"World-Wide Web: An Information Infrastructure for High-Energy Physics", T. J. Berners-Lee et. al., CERN, Presented at "Artificial Intelligence and Software Engineering for High Energy Physics" in La Londe, France, January 1992. Proceedings to be published by World Scientific, Singapore, ed. D Perret-Gallix.
The above papers and other information may be available by FTP at the same place, or via the Web itself.
[Index | Previous Paragraph | Next Paragraph ]