Where World Wide Web Went Wrong
Andrew Pam, Xanadu Australia, P.O. Box 477 Blackburn VIC 3130
Email avatar@aus.xanadu.com
URL:
http://www.glasswings.com.au/attendants.html
- Abstract:
-
We are all aware by now of the phenomenal success of the WWW. In this
paper I would instead like to examine some of the limitations inherent in
the current WWW design and implementations and some possible solutions.
- Lack of transparent support for mirroring
- Lack of an underlying distributed file system
- Lack of bivisibility and bifollowability
- Lack of versioning and alternates
- Limited support for metadata
- Limited support for Computer Mediated Communication
- Cyberspace/"Hyperspace" as a pervasive user interface metaphor
- Limited support for transclusions
- Transcopyright - the Xanadu solution for business on the Net
- New financial instruments for the new media
- Keywords:
- Docuverse Hyper-G Hypermedia Information Systems Transclusion Web WWW Xanadu
Contents
Introduction
Everyone's opinions are a product of their experiences, so I would like
to start by explaining a little bit about my background in the field of
hypermedia. The words "hypertext" and "hypermedia" were coined by my
friend Ted Nelson in a paper to the ACM 20th national conference in
1965, before I was even born! Although I had come across occasional
articles Ted had written for Creative Computing magazine, my
first exposure to his legendary Xanadu project did not occur until 1987
when I purchased the Microsoft Press second edition of his classic book
Computer Lib / Dream Machines [Nel87], which outlined his idea
of a "docuverse" or universal library of multimedia documents.
As an avid science fiction reader, my imagination had already been
captured by this idea of a universally accessible computer storage and
retrieval system as presented in the 1975 novel Imperial Earth
by Arthur C. Clarke [Cla75]. But here was someone actually involved in
trying to create such a system. I immediately sent off a US$100
donation to Project Xanadu to reserve a Xanadu account name, and also
purchased the 1988 edition of Ted's self-published book Literary
Machines [Nel88] and the Technical Overview video
describing the Xanadu project in detail.
At this time I was already heavily involved in online communications,
both professionally (designing and implementing a communications
protocol for caravan park bookings across Australia, and later
integrating it with the CICS mainframe-based booking system at the RACV)
and personally, running a computer bulletin board system which
eventually grew to a network of bulletin boards spanning New South
Wales, Victoria and South Australia.
In 1989 I met Katherine Phelps, my partner, who incidentally is also
giving a paper at this conference. She is a writer and publisher and I
suppose it was inevitable that we should combine our interests and go
into online publishing. After publishing traditional paper books and a
lack of success convincing banks of the prospects of a magazine on
floppy disks, we decided to try publishing on the Internet instead.
Although we were on the Xanadu beta team, the software was in the
process of a complete rewrite commenced in 1988 and was not at a usable
stage. Our intention was always to upgrade to Xanalogical
(Xanadu-capable) software, but we decided to start with Gopher and World
Wide Web (which, with the advent of Mosaic, had just started to become
popular). Tim Berners-Lee had actually been aware of the Xanadu ideas
in the design of WWW, and he had incorporated Ted's basic 1965 concept
of hyperlinks, though not the later refinements. Autodesk were funding
the Xanadu research project during this period but in 1993 I heard the
news that they had dropped all of their research projects not directly
connected with their core business of Computer Aided Design. I
immediately contacted Ted and made arrangements to visit him in San
Francisco.
As a consequence of those meetings Katherine and I officially became
Xanadu Australia, the first licensees of the Xanadu technology. We
organised Ted's speaking tour of Melbourne and Sydney in early 1994
and then began organising the necessary support and facilities to set
up our own research and commercial online publishing ventures.
In addition to my original computer programming and consultancy
business under the name of Serious Cybernetics, I am presently also a
partner in Glass Wings and in Xanadu Australia and a system
administrator for CinEmedia, a project of the State Film Centre of
Victoria which houses the new SFCV / RMIT Annexe where Katherine is
presently undertaking her PhD in Animation and Interactive Multimedia.
I have spent the last few years examining as many Internet resource
discovery, information delivery and computer-mediated communication
applications as possible on as many platforms as possible, with
particular interest in Hyper-G, a hypermedia system from the Graz
University of Technology which is directly inspired by Ted's vision as
expressed in Literary Machines.
So to the subject matter of this paper. I have had the opportunity to
examine and compare much of the software presently used on the Internet,
I have a good background in hypermedia designs and concepts, and I felt
that with the prevalence of hype about the World Wide Web it would be
valuable to take a critical stance and look at what I believe to be its
major deficiencies, together with various initiatives and possible
solutions to overcome them.
Scalability
This is one of the most fundamental design issues for any network
information retrieval (NIR) system, since thanks to the phenomenal
growth of the Internet there are alrady millions of people using such
software and millions more coming online over the next few years. Ted's
goal for Xanadu from the outset was to support hundreds of millons of
users all over the world (and possibly in orbit) by the year 2020, a
goal which he called the "2020 vision".
This took up much of the design team's time in the 1960s and 1970s, when
it was by no means clear how this could be accommodated. Today we have,
if not the answers, some pretty good working models. Unfortunately, WWW
isn't really one of them; it suffers from a crucial flaw, in that any
given piece of information (document) is served from a single location.
This is the cause of many problems. A single point of failure results in
many documents being periodically (and sometimes permanently) inaccessible
due to network or system problems. This is probably one of the most
frustrating features of the WWW, since it makes the system as a whole
unreliable. One of the key requirements of any NIR tool is reliable
access to documents on request, since users expect to be able to obtain
information whenever they require it; this also alleviates the need to
store information locally in case of retrieval difficulties!
Bandwidth issues
Apart from the reliability issues, another major problem is the requirement
for sufficient bandwidth to deliver information to everyone who requests it.
In the centralised WWW design, this is one of the most difficult parts of
operating a popular site, particularly in countries where the network
infrastructure to support large numbers of simultaneous requests from all
over the world is prohibitively expensive or simply not available.
Even in cases where fairly substantial bandwidth has been provisioned
for a WWW server, however, there can still be occasions on which there
is a sudden peak in demand. These are called "flash crowds" after a
1973 Larry Niven science fiction story describing how the advent of
teleportation would cause sudden influxes of thousands of people to
locations where anything of interest seemed to be occurring, often
leading to unexpected riots. This issue was raised by Jeff Duntemann in
his "END." column in the June/July issue of PC Techniques
[Dun95], although he had no answer to the problem, and also discussed by
Frank Kappe in a recent paper about Hyper-G [Kap95].
Cacheing and mirroring
Some WWW servers have attempted to address these problems by providing a
proxy cacheing facility, where frequently requested documents only have
to be retrieved once by any given site. Unfortunately, this still
requires users to actively choose to use the proxy cacheing service,
which many do not; it suffers from cache consistency problems,
especially with frequently updated documents; and it does not solve the
flash crowding problem because the onset of the phenomenon will still
cause thousands of proxy cache servers to request the same documents
from the original site of publication. Additionally, this obscures the
usage statistics which are of special importance to commercial sites.
A better approach would be to automatically make copies or "mirrors"
of popular documents on other servers. Unfortunately there is currently
no mechanism for indicating that such mirrors exist or where they might
be located short of manually placing hyperlinks on a page to list them!
This is primarily caused by a confusion between resource identifiers and
locators. The Xanadu designs have always started from the premise that
every document in the system needs to have its own unique identifier,
regardless of the location where it is stored. The Hyper-G system uses
globally unique object identifiers for the same reasons. However,
documents in the WWW are identified by a Uniform Resource Locator (URL)
which only describes the document's location - since it does not uniquely
identify the document, it provides no mechanism to determine where other
copies of the same document might be located.
An Internet Engineering Task Force (IETF) Working Group was formed to
address proposals for Uniform Resource Identifiers (URIs) and Uniform
Resource Names (URNs) and how they might be resolved to a list of URLs
for the identified document. However, the group was unable to reach
consensus and was closed, with new working groups to be formed to
address specific issues. I believe that in any case attempting to
retrofit global identifiers and resolution systems to the existing
locator-based WWW system, while a worthy cause, will require efforts of
considerable magnitude and it will probably be far simpler to move
towards what the Hyper-G team call "second-generation" NIR tools such as
Hyper-G itself which are fundamentally based on unique identifiers and
are fully backwards compatible with existing WWW clients.
I am not sure if I can whole-heartedly adopt the term
"second-generation", since Xanadu was the forerunner of the present
"first-generation" WWW and already possessed the attributes ascribed to
second-generation systems. However, since the Xanadu system has never
been widely deployed for public use it can perhaps be considered on its
own merits as a separate strand of NIR research.
My own suggestion for a quick work-around to provide transparent
mirroring capabilities to the WWW is to add a simple enhancement to the
Domain Name System (DNS) name resolution procedure used by WWW clients.
I propose that clients should check for multiple address records ("A
records") and record which addresses respond fastest. This simple
change would allow server administrators to designate official mirror
sites by simply adding additional A records to the DNS entry for their
server. This technique is already in use by Mail Transfer Agents (MTAs)
such as sendmail used to transfer Internet email and could surely be
adopted for use with WWW.
Distributed file systems
The best solution to these problems is to move to a distributed file
system (DFS). Apart from the original Xanadu work on a log-based DFS,
this has recently become an area of great research interest. In a 1994
paper "Xanalogy: The State of the Art" [Pam94] I list Prospero, AFS,
Mungi, Sprite, Plan 9, DASH, GAFFES and DFS925.
While it is possible to use a DFS with the WWW by using a "file:"
URL, this requires that clients have access to the same DFS which
unfortunately is rarely the case except within individual organisations.
This problem arises because the DFS is being accessed directly by the
WWW client software, which can be solved by instead implementing the DFS
within the server, the approach taken by Hyper-G.
Symmetry
Another significant problem with the current WWW implementation is the
design of the hyperlinks. They are embedded within the documents
themselves and are unidirectional and univisible. That is, they can
only be followed in one direction and can only be seen from the
originating end. This makes link maintenance a nightmare, compounded
by the lack of unique document identifiers which significantly increases
the frequency with which destination documents change their URL.
Bivisible and bifollowable links have been part of the Xanadu design
for a long time, but HyperTed (created by Dr. Adrian Vanzyl of Monash
University Medical Informatics) and Hyper-G are the first products I
have seen to implement them elsewhere. Naturally they cannot be stored
within the document itself (difficult in any case for other media types
such as sound, graphics and video) because any given document could
easily become the target for any number of links which might entirely
outweigh and obscure the document's actual contents! They must
therefore fall into the realm of externally stored metadata, which is
exactly how the Xanadu and Hyper-G systems treat them. Hyper-G actually
creates embedded links from this metadata on the fly when an HTML
document is retrieved.
Versions and Alternates
Many documents evolve over time, either through revision or occasionally
branching into alternative versions of the same document. An extremely
useful facility barely supported by current software (and here I include
stand-alone desktop applications as well as NIR tools) would be to
provide a mechanism for maintaining multiple versions of the same
document, preferably without duplicating the storage required for
unaltered material. Version control systems do exist, but largely for
use by computer programmers as source code management tools rather than
as part of a general-purpose filesystem. This is one part of the Xanadu
vision yet to come to fruition.
Historical context
A related problem is that not only are earlier versions of documents
usually superceded by revised versions, thus making the original version
inaccessible, but often documents are removed from circulation entirely,
perhaps because a WWW server has ceased operating or simply because
space is no longer available for those documents (a problem common to
periodicals). This makes it impossible to access them for future
reference and works against the hyperlinking facility of NIR tools.
This issue of permanent archival of electronic documents is very
important and is being addressed by many library and archival
organisations. It is also important that information should be
published using data formats which are open standards and easily
amenable to format conversion in future as standards change. This is
one of the motivations of SGML and one of the benefits of SGML-based
document formats such as HTML (native to WWW) and HTF (native to Hyper-G).
Document inter-comparison
Ted believes that the ability to compare documents for their
similarities and differences is one of the most important tools that
computers can offer us. Unfortunately, many of his early designs such
as Qframes (where adjacent window borders indicated correspondences) and
lines drawn between screen windows are yet to be widely implemented.
This is probably largely due to the dominance of the Xerox PARC
windowing model and the prevalence of "user interface police" requiring
that window boundaries are sacrosanct and inviolable. However, there is
apparently an OS/2 program called PMDIFF that implements the latter
comparison facility. I am not currently aware of any NIR tools that
provide this sort of function.
Metadata
Another issue of concern to librarians is the storage and interpretation
of document metadata -- information about the document itself such as
its authorship, copyright status, date of publication and so forth. The
IETF URI-WG proposed the creation of Uniform Resource Characteristics
(URCs) to accommodate this information. Metadata can also address the
social and political problem of censorship currently prominent in public
discussion of the Internet. Systems such as SurfWatch are ineffective
because they discriminate for or against documents on the basis of
keywords in the URL or title, which may not accurately reflect the
content of the document (for example, "The Sex Life of Plants" or
"Physical Education for Girls"). Furthermore, the selection process is
performed by the company rather than by each individual viewer according
to their own views and preferences.
The Interpedia project, which aims to create a new encyclopaedia
freely available over the Internet, has proposed a Seal of Approval
(SOAP) concept which would permit any person or organisation to annotate
documents to indicate their approval or disapproval of the material.
Each document could bear any number of SOAPs, allowing a broad range of
opinions about each document to be expressed (for example, by different
religious bodies or national censorship boards). SOAPs could be
implemented using public annotations.
Live interaction
The services presently
available on the Internet can be classified into two major categories:
NIR tools and real-time communication tools. While the WWW was designed
as a NIR tool, there have been several initiatives to support real-time
communication facilities. WebChat provides a real-time multi-user
communication facility using the WWW. Sensemedia, provisionally
licensed as Xanadu America, have created a MOO (a multi-user textual
virtual environment) which also acts as a WWW server. They call this
system the "WOO". Waxweb is another similar system which additionally
incorporates Virtual Reality Markup Language (VRML) into the MOO server.
Ubique's Sesame and Hyper-G also provide mechanisms for users to
communicate with each other while browsing the web.
Spatial dimensions
Speaking of Virtual Reality, one idea especially popular since William
Gibson [Gib84] gave us the term "cyberspace" is to change the way we
interact with information on our computer screens from a flat
two-dimensional "desktop metaphor" to a three-dimensional world. There
has been some work on representing WWW documents and hyperlinks in 3D,
but this task is made considerably easier by the Hyper-G architecture of
external link metadata as demonstrated by the Hyper-G "Information
Landscape". [AKM95]
Transclusions
"Transclusion" is a term introduced by Ted Nelson to define virtual
inclusion, the process of including something by reference rather than
by copying. This is fundamental to the Xanadu designs; originally
transclusions were implemented using hyperlinks, but it was later
discovered that in fact hyperlinks could be implemented using
transclusions! Transclusions permit storage efficiency for multiple
reasonably similar documents, such as those generated by versions and
alternates as discussed above.
WWW currently permits images to be transcluded using the
<IMG> tag, but strangely does not support any other media types.
Some support for text transclusion has been added in the form of a
"server side include" facility in some WWW servers, but this is a
work-around with limited use.
Transclusions also highlight some of the intriguing new legal issues
raised by hypermedia technology. If someone takes a copy of an image
and places it on their WWW server without permission, this is clearly a
breach of copyright. However, if they merely transclude the image, it
is still being retrieved directly from the original site but is now
being displayed in a completely new context, which probably does not
breach copyright law but may raise "droit morale" (moral rights) issues.
This is another reason why links (here including transclusions) need
to be bivisible and bifollowable, not only for maintenance reasons as
discussed above but also to permit creators to monitor the context in
which their material appears and the uses to which it is being put.
Transcopyright
This leads me directly to Transcopyright, the Xanadu solution for
business on the Net. Ted Nelson has proposed a new copyright doctrine
called "Transcopyright" [Nel95] in the same way that Bob Wallace created
"Shareware". Fundamentally, the proposal is that copyright holders
choosing to publish on a hypermedia system supporting bivisible and
bifollowable transclusions must, under the transcopyright doctrine,
explicity grant permission for anyone to transclude and thus reuse their
material in any way and in any context so long as it is purchased or
obtained, as directed by the rightsholder, by each recipient.
Naturally, using the material in any other medium falls outside the
terms of the doctrine and is subject to separate agreement.
If a mechanism is in place to permit the system to charge for
documents, this would permit copyright holders to be assured of their
requested royalties on every use of their information, whether direct or
by transclusion. Partial use of documents could be paid pro-rata.
Considering the popularity of clip-art, musical "sampling" and collage
art, this could rapidly become the ideal market for information of all
kinds, especially entertainment content.
Money
The major remaining issue to be resolved before this can become a
reality is the difficulty of currency conversion when transactions are
carried out in a global market, especially transactions for very small
sums (possibly as little as fractions of a cent!) There are already a
number of systems for exchanging money on the Net, principally Digicash
and systems based on traditional credit cards. However, Digicash
emulates real cash so closely that it suffers from all the same
drawbacks (having to have the right change in your electronic wallet!)
and the credit card systems are no use to people who don't have a credit
card. None of these systems are well suited to very small transactions.
Katherine Phelps discusses our thoughts on possible solutions
in her paper "You Think This Is a Revolution -- You Ain't Seen Nothing
Yet." [Phe95]
Conclusion
Despite the various problems and limitations of WWW outlined in this
paper, it has clearly been of tremendous benefit to the way information
is stored and transmitted in our society. I look forward to
participating in the further evolution of these tools as they continue
to change the way we entertain ourselves and do business.
Further information
- Xanadu
-
http://www.aus.xanadu.com/xanadu/ or
http://www.xanadu.net/xanadu/
- Hyper-G
-
http://hyperg.iicm.tu-graz.ac.at/ or
http://hmu1.cs.auckland.ac.nz/
- The IETF URI-WG
- http://www.ics.uci.edu/pub/ietf/uri/
- The Interpedia project
- gopher://twinbrook.cis.uab.edu/1interped.70
- WebChat
- http://www.irsociety.com/webchat.html
- Sensemedia
- http://www.sensemedia.net/papers/
- Waxweb
- http://bug.village.virginia.edu/
- VRML
- http://vrml.wired.com/
- Sesame
- http://www.ubique.com/
- Digicash
- http://www.digicash.com/
References
- [AKM95]
- Keith Andrews, Frank Kappe, and Hermann Maurer, "The Hyper-G Network
Information System", Graz: J.UCS vol 1 No. 4 28 Apr 1995 (also available at
ftp://iicm.tu-graz.ac.at/pub/Hyper-G/papers/dms94.ps)
- [Cla75]
- Arthur C. Clarke, "Imperial Earth", London: Gollancz 1975
- [Dun95]
- Jeff Duntemann, "Corri, the Comet, and the Child-Proof Cap", page 112
PC Techniques June/July 1995
- [Gib84]
- William Gibson, "Neuromancer", New York: Ace 1984
- [Kap95]
- Frank Kappe, "A Scalable Architecture for Maintaining Referential Integrity
in Distributed Information Systems", Graz: J.UCS vol 1 No. 2 28 Feb 1995 (also
available at
ftp://iicm.tu-graz.ac.at/pub/Hyper-G/papers/p-flood.ps)
- [Nel87]
- Theodor Holm Nelson, "Computer Lib / Dream Machines", Redmond: Microsoft
Press 1987
- [Nel88]
- Theodor Holm Nelson, "Literary Machines 88.1", self published 1988
- [Nel95]
- Theodor Holm Nelson, "Transcopyright: Pre-Permission for Virtual
Republishing", forthcoming in Communications of the ACM
- [Pam94]
- Andrew Pam, "Xanalogy: The State of the Art", privately circulated
- [Phe95]
- Katherine Phelps, "You Think This Is a Revolution -- You Ain't Seen Nothing
Yet", forthcoming in proceedings of the 1995 Asia-Pacific WWW Conference