"David G. Durand" wrote:CurrentPacker and a number of other
undefined globals were Fluid
variables. Fluids (in case you're not an old LISPer) are dynamically
> scoped globals. I'm not sure why a special mechanism like this was
needed in Smalltalk, where globals _are_ all fluid, but I have a
pretty good idea for two reasosn:
As I remember they are needed on the C++ side, remember the
{declaration} stuff
is ignored by smalltalk.
1. They are part of the garbage collector's start-set for a
collection sweep.
2. And of course, these variables were required to be persistent,
missing in the changesets we have. One thing that I would love to see
is a complete FileOut of the whole Xanadu Image, as I could then use
Smalltalk-based tools to select out more of those methods, and I
could also examine the way the globals were actually set up.
>
Coming in the tarfile,if it ever gets out on this flakey line.
> At this point, my real issues around the Ent have to do with:
The details of the canopy (where things seemed to change several
times during the course of the project)
They SURE did, seemingly every week.
The details of balancing, and to what extent balancing the O- and
H-trees is independent.
The dynamic ballancing was a convienience, for the rabid implementation, and
it was convienently forgotten by the design/bozos that it broke disk reading
severely, by meaning that every read had to be rewritten, completely breaking
the model for write once media.
Whether it's possible to have a purely binary H-tree implementation.
How the generalizations to arbitrary addressing schemes worked (It
almost seems that they treated some of them as ordered dictionaries
based on Hash values for objects).
I also would be very happy to understand the overview of how the
Green implementation works. Jeff's work looks rather nice so far (for
C++). In particular, orgls seem to be different in Green and Gold.
And I don't understand the purpose of the Spanfilade -- It's a 2-d
enfilade, mapping points in some pair of coordinate spaces into some
other coordinate space(s).
the spanf map maps ffom spans to the docs they are part of ,or map to.
"David G. Durand" wrote:I'm not even worried about the splay versus
incremental balancing,
just whether one can independently rebalance H-trees when a new
O-tree is added, as long chains of versions can make for long, linear
H-trees, if I'm not mistaken.
Here I'm not sure, but I'd be surprised if things had to get linear,
why couldn't it be reballanced? Long chans of versions sould result in a tree
with the versions at the bottom of the tree.
> I also would be very happy to understand the overview of how the
Green implementation works. Jeff's work looks rather nice so far (for
C++). In particular, orgls seem to be different in Green and Gold.
Right since Gold is MANY generations of design from Green, there are few
direct identities. Mostly anything that would limit performance to
square root
is avoided in Gold (note that poom and span have square root worst case).
And I don't understand the purpose of the Spanfilade -- It's a 2-d
enfilade, mapping points in some pair of coordinate spaces into some
other coordinate space(s).
the spanf map maps ffom spans to the docs they are part of ,or map to.
I asked a whole bunch of questions (that I'm leaving in case I'm
confused), but is this the basic sketch of the Green Architecture?
Begin sketch:
The Granfilade stores all the text in the system under private
addresses. As an efficiency measure, text inserted into the
granfilade is inserted adjacent to the text that it's near. All text
in the granfilade belongs to an existing document.
Somewhere, perhaps in the granfilade (using a distinguished range of
addresses?) there's a global map that maps Xanadu server, account,
document, etc. addresses down to a single poomfilade per document.
The addresses map directly to the granfilade via the addresses.
The poomfilade is edited just like any other enfilade, but it doesn't
contain characters, but rather contains offsets and lengths of items
in the granfilade (stored as granfilade addresses).
The spanfilade maintains a global inverse map to all of the
poomfilades, maitaining an association between ranges in the
granfilade to specific document ranges in the poomfilades.
When new text is inserted, it's added at an appropriate spot in the
granfilade, and a new (offset,length) pair of granfilade tumblers is
added to the poomfilade. Simultaneously, the spanfilade is updated to
reflect that the new range in the granfilade maps to a specific range
in the relevant poomfilade.
When text is transcluded from one edition into another, a new entry
is made in the destination poomfilade to indicate that a certain set
of granfilade addresses is now included in that poomfilade, and a set
of new mappings is added to the spanfilade, reflecting that those
ranges of granfilade addresses also map to certain poomfilade
addresses in the destination poomfilade.
To backfollow from a document address, you start at its poom, and
harvest a set of granfilade ranges, then you pump them into the
spanfilade, and harvest a set of document ranges that point into
other poomfilades.
Links may just be special records stored in the poomfilades, since
the same backfollow approach works for them.
end of David's guess.
Right, note that there isn't currently enough info stored to remove a document
or edit a link, as the poom entries can't be found. This should be
simple to fix.
Some random thoughts on the above.
The above is actually very similar to the stuff that Steve DeRose and
I explained to you at Hypertext '89. In particular, I think we had
already independently derived the model-T enfilade, and
structure-sharing versioning strategy. We were sticking Markup nodes
in there, ala SGML, but that was more of a religious difference than
a data structure one.
Really, I'm surprised, I would have thought I would remember so close a
similarity. We consider the i<->v mapping as fundamental, did you have
anything like that? Is there any paper on that, so I can compare, it might
be useful to see what the differences were, it could give some insight.
The problem of fragmenting end-sets is a real one in all of these
schemes, as a linear number of edits can create a linear number of
items that need to be returned in these intermediate queries, which
can make the run-time worse than log in some cases.
Well yes but,
recall that we mostly will want to retireve small pieces of the document, some
small multiple of display sizes, and recall that much (often all) of
the structure
and text will just get cached in ram, the problem is an in CPU problem and
not a big "read it in form disk and take forever" kind of thing.
Here are the random queries I started before coming up with the design above:
Is this spanfilade map, just a just mapping from a (start, end) pair
of global xanadu addresses to a range of granfilade addresses, or
does it map to document addresses?
grandfilade addresses == i (iinvarient) addresses
Yes the spanf maps to the i addresses,
Does the granfilade use tumblers only to allow the creation of new,
linearly ordered insertion points between existing granfilade
addresses? couldn't the granfilade be a linear, append-only
text-space? I guess the granfilade does allow for changes to
non-frozen versions to vanish silently as their versions are edited.
More than that the tumblers allow locality and authorship info to be incoded.
Note that Gold hasn't yet decided on a particular address encoding!
[BTW if you're jut generating insertion points you can create a
simpler address space based on paths in an infinite binary tree. The
resulting numbers turn out to be isomorphic to the rational numbers.]
note that tumblers (Transfilite nUMBERS) are a LARGER set than
rationals. They
are isomorphic to the transfinite ordinals.
The POOMfilade seems more obscure to me. Does it map from Document
addresses (global Udanax addresses) into the granfilade, or the
spanfilade.
Poom maps i<->v
spanf maps i<->i
so given an i (or granf address) the poom gives a v (document address)
given an ispan the spanf gives the i addresses in the documents
(including links) that map
to that ispan.
Links seem like a natural for the spanfilade, but they seem like they
ought to be stored in document addresses, not granfilade addresses.
Ha, that would mean as the document changed the links would point to
different stuff.
For example showing links as caps for easy ascii representation
this is A text with TWO spans linked.
now if the addresses were to the v (document addresses) and we
edited by adding 6xes and a space
xxxxx thIs is a tEXT with two spans linked.
Make more sense now?
To: roger@xxxxxxxxxx
From: "David G. Durand" <david@xxxxxxxxxxxxxxxxxxx>
Subject: Re: I need a place to put ....
Cc:
Bcc:
X-Attachments:
"David G. Durand" wrote:I'm not even worried about the splay versus
incremental balancing,
just whether one can independently rebalance H-trees when a new
O-tree is added, as long chains of versions can make for long, linear
H-trees, if I'm not mistaken.
Here I'm not sure, but I'd be surprised if things had to get linear,
why couldn't it be reballanced? Long chans of versions sould
result in a tree
with the versions at the bottom of the tree.
Right. As long as you can rebalance in H. If I had 3 concentrated
hours to analyze this, I could probably write down the invariants,
and see if they can be maintained. I've been assuming all along that
I can rotate in H (only) or in O (only), and that that's enough.
Right since Gold is MANY generations of design from Green, there are few
direct identities. Mostly anything that would limit performance to
square root
is avoided in Gold (note that poom and span have square root worst case).
You can do span (range) queries in log N + S time (log of number of
spans, plus the size of the result set). Of course that's a
different algorithm.
Somewhere, perhaps in the granfilade (using a distinguished range of
addresses?) there's a global map that maps Xanadu server, account,
document, etc. addresses down to a single poomfilade per document.
The addresses map directly to the granfilade via the addresses.
So some granfilade addresses are externally visible ones (V-space
addresses that MIRV out via the Poomfilades), and some are internal
I-space addresses.
The poomfilade is edited just like any other enfilade, but it doesn't
contain characters, but rather contains offsets and lengths of items
in the granfilade (stored as granfilade addresses).
The spanfilade maintains a global inverse map to all of the
poomfilades, maitaining an association between ranges in the
granfilade to specific document ranges in the poomfilades.
When new text is inserted, it's added at an appropriate spot in the
granfilade, and a new (offset,length) pair of granfilade tumblers is
added to the poomfilade. Simultaneously, the spanfilade is updated to
reflect that the new range in the granfilade maps to a specific range
in the relevant poomfilade.
When text is transcluded from one edition into another, a new entry
is made in the destination poomfilade to indicate that a certain set
of granfilade addresses is now included in that poomfilade, and a set
of new mappings is added to the spanfilade, reflecting that those
ranges of granfilade addresses also map to certain poomfilade
addresses in the destination poomfilade.
To backfollow from a document address, you start at its poom, and
harvest a set of granfilade ranges, then you pump them into the
spanfilade, and harvest a set of document ranges that point into
other poomfilades.
Links may just be special records stored in the poomfilades, since
the same backfollow approach works for them.
end of David's guess.
Right, note that there isn't currently enough info stored to remove
a document
or edit a link, as the poom entries can't be found. This should be
simple to fix.
Some random thoughts on the above.
The above is actually very similar to the stuff that Steve DeRose and
I explained to you at Hypertext '89. In particular, I think we had
already independently derived the model-T enfilade, and
structure-sharing versioning strategy. We were sticking Markup nodes
in there, ala SGML, but that was more of a religious difference than
a data structure one.
Really, I'm surprised, I would have thought I would remember so close a
similarity. We consider the i<->v mapping as fundamental, did you have
anything like that? Is there any paper on that, so I can compare, it might
be useful to see what the differences were, it could give some insight.
Nah, we never wrote it down intelligibly. We never looked at
addressing as central in the same way, and concentrated on structure
sharing. We had a more basic structure: a family of trees, sharing
many nodes in common, plus an external structure for tracking link
endpoints to node-IDs. That's more basic because you don't have an
easy way to find all sharers of a datum. But we did have the idea of
accumulating widths down the tree, and external indices from spans
to spans.
As I said, similar, not the same.
I came to look at addressing as much more critical later on, and
it's the key to my dissertation work, but by then I was more
interested in arbitrary combinations of changes and undo, coupled
with the ability to make edit-stable pointers into an alternative
version space.
I devoted a lot less time to algorithms, because it's less important now.
The problem of fragmenting end-sets is a real one in all of these
schemes, as a linear number of edits can create a linear number of
items that need to be returned in these intermediate queries, which
can make the run-time worse than log in some cases.
Well yes but,
recall that we mostly will want to retireve small pieces of the
document, some
small multiple of display sizes, and recall that much (often all)
of the structure
and text will just get cached in ram, the problem is an in CPU problem and
not a big "read it in form disk and take forever" kind of thing.
CPU problems are not problems at all anymore. At least not for most
data structures. There's a lot of cache-hit questions though around
disk traffic.
Here are the random queries I started before coming up with the
design above:
Does the granfilade use tumblers only to allow the creation of new,
linearly ordered insertion points between existing granfilade
addresses? couldn't the granfilade be a linear, append-only
text-space? I guess the granfilade does allow for changes to
non-frozen versions to vanish silently as their versions are edited.
More than that the tumblers allow locality and authorship info to be incoded.
Note that Gold hasn't yet decided on a particular address encoding!
But does the granfilade include server/node/account/info in
i-addresses? I guess that makes sense, as the bytes are perpetually
"owned" by the home document.
I tend to think of owning the address space that you publish as
opposed to the bytes at the end, but that reflects my own mental
drift, not your model. That makes things clearer.
[BTW if you're jut generating insertion points you can create a
simpler address space based on paths in an infinite binary tree. The
resulting numbers turn out to be isomorphic to the rational numbers.]
note that tumblers (Transfilite nUMBERS) are a LARGER set than
rationals. They
are isomorphic to the transfinite ordinals.
Sure, you have a lot more split points available. That's why
accounts/nodes etc. can split more easily. Just for a text stream,
though, the rationals are sufficient.
The POOMfilade seems more obscure to me. Does it map from Document
addresses (global Udanax addresses) into the granfilade, or the
spanfilade.
Poom maps i<->v
spanf maps i<->i
so given an i (or granf address) the poom gives a v (document address)
given an ispan the spanf gives the i addresses in the documents
(including links) that map
to that ispan.
I guess I still have the poom backwards. How do I go from a
v-address (e.g. from the front end) to an i-address?
Links seem like a natural for the spanfilade, but they seem like they
ought to be stored in document addresses, not granfilade addresses.
Ha, that would mean as the document changed the links would point
to different stuff.
yeah, this is a perspective difference. I've spent a decade and a
half dealing with editing problems, and from that perspective you're
primarily focusing on a small number of "current states" and
unwinding the history to determine the original contexts, and do
document comparisons.
For example showing links as caps for easy ascii representation
this is A text with TWO spans linked.
now if the addresses were to the v (document addresses) and we
edited by adding 6xes and a space
xxxxx thIs is a tEXT with two spans linked.
Make more sense now?
Oh yeah. I actually figured out this mistake before writing my
design note/query. How do you record the originating document
context where a link was made? I guess by the link's home address.
When you make a new rev of a document you also create new links
(though structure is shared by transclusion). So I guess that tells
you what poomfilade to use to turn the link spans back into v-spans
for a particular document.
Are you saying that there's no way to find the POOMs pointing to a
given i-span?
PS. Can I post this thread to the udanax list?
-- David