[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

more Green information



I've been talking with Roger about the internals of Green, and thought that the information might be of interest.

Herewith, a few messages in a discussion of details.

Once I feel I've got these straight I'll try to summarize, but that takes time that I will have to find.


At 9:21 AM -0400 10/18/00, David G. Durand wrote:

At 6:12 AM -0500 10/18/00, Jeff Rush wrote:
Tonight I was decoding Abraham and trying to find the CurrentPacker
class, with no luck.
I was seeking the low-level disk I/O, trying to build up a layered
architecture diagram
like the traditional 7-level ISO/OSI diagram.  What's on top and what's
on bottom,
generally speaking re software modules (*NOT* datastructures as I know
Ents have no top/bottom
per se).  CurrentPacker is not included in the coordspace.st and top.st
files I received.
I'll look for it in what you upload tonight.


CurrentPacker and a number of other undefined globals were Fluid variables. Fluids (in case you're not an old LISPer) are dynamically scoped globals. I'm not sure why a special mechanism like this was needed in Smalltalk, where globals _are_ all fluid, but I have a pretty good idea for two reasosn:

1. They are part of the garbage collector's start-set for a collection sweep.

2. And of course, these variables were required to be persistent, and so the variables needed to have an explicit storage model, and there needed to be overridable methods to detect each access and update event.

Each Fluid also had an Emulsion, which was a persistent storage Area in which a set of FLuids could be stored. There was a global variable (CurrentEmulsion/GlobalEmulsion/CurrentGlobalEmulsion? I forget) that stored the default Emulsion for new globals.

I wasn't able to figure out how to correctly initialize these things, I think because several global methods for Object and Class are missing in the changesets we have. One thing that I would love to see is a complete FileOut of the whole Xanadu Image, as I could then use Smalltalk-based tools to select out more of those methods, and I could also examine the way the globals were actually set up.

At this point, my real issues around the Ent have to do with:

The details of the canopy (where things seemed to change several times during the course of the project)

The details of balancing, and to what extent balancing the O- and H-trees is independent.

Whether it's possible to have a purely binary H-tree implementation.

How the generalizations to arbitrary addressing schemes worked (It almost seems that they treated some of them as ordered dictionaries based on Hash values for objects).

I also would be very happy to understand the overview of how the Green implementation works. Jeff's work looks rather nice so far (for C++). In particular, orgls seem to be different in Green and Gold. And I don't understand the purpose of the Spanfilade -- It's a 2-d enfilade, mapping points in some pair of coordinate spaces into some other coordinate space(s).

At 10:33 AM -0700 10/18/00, roger gregory wrote:

"David G. Durand" wrote:CurrentPacker and a number of other undefined globals were Fluid

 variables. Fluids (in case you're not an old LISPer) are dynamically
 > scoped globals. I'm not sure why a special mechanism like this was
 needed in Smalltalk, where globals _are_ all fluid, but I have a
 pretty good idea for two reasosn:


As I remember they are needed on the C++ side, remember the {declaration} stuff
is ignored by smalltalk.


1. They are part of the garbage collector's start-set for a collection sweep.

 2. And of course, these variables were required to be persistent,
 missing in the changesets we have. One thing that I would love to see
 is a complete FileOut of the whole Xanadu Image, as I could then use
 Smalltalk-based tools to select out more of those methods, and I
 could also examine the way the globals were actually set up.
 >

Coming in the tarfile,if it ever gets out on this flakey line.


 > At this point, my real issues around the Ent have to do with:

 The details of the canopy (where things seemed to change several
 times during the course of the project)


They SURE did, seemingly every week.


 The details of balancing, and to what extent balancing the O- and
 H-trees is independent.


The dynamic ballancing was a convienience, for the rabid implementation, and
it was convienently forgotten by the design/bozos that it broke disk reading
severely, by meaning that every read had to be rewritten, completely breaking
the model for write once media.


 Whether it's possible to have a purely binary H-tree implementation.

 How the generalizations to arbitrary addressing schemes worked (It
 almost seems that they treated some of them as ordered dictionaries
 based on Hash values for objects).

 I also would be very happy to understand the overview of how the
 Green implementation works. Jeff's work looks rather nice so far (for
 C++). In particular, orgls seem to be different in Green and Gold.
 And I don't understand the purpose of the Spanfilade -- It's a 2-d
 enfilade, mapping points in some pair of coordinate spaces into some
 other coordinate space(s).


the spanf map maps ffom spans to the docs they are part of ,or map to.

At 5:09 PM -0400 10/18/00, David G. Durand wrote:


 > The details of balancing, and to what extent balancing the O- and
 H-trees is independent.


The dynamic ballancing was a convienience, for the rabid implementation, and
it was convienently forgotten by the design/bozos that it broke disk reading
severely, by meaning that every read had to be rewritten, completely breaking
the model for write once media.

I'm not even worried about the splay versus incremental balancing, just whether one can independently rebalance H-trees when a new O-tree is added, as long chains of versions can make for long, linear H-trees, if I'm not mistaken.
 > I also would be very happy to understand the overview of how the
 Green implementation works. Jeff's work looks rather nice so far (for
 C++). In particular, orgls seem to be different in Green and Gold.
 And I don't understand the purpose of the Spanfilade -- It's a 2-d
 enfilade, mapping points in some pair of coordinate spaces into some
 other coordinate space(s).


the spanf map maps ffom spans to the docs they are part of ,or map to.

I asked a whole bunch of questions (that I'm leaving in case I'm confused), but is this the basic sketch of the Green Architecture?

Begin sketch:

The Granfilade stores all the text in the system under private addresses. As an efficiency measure, text inserted into the granfilade is inserted adjacent to the text that it's near. All text in the granfilade belongs to an existing document.

Somewhere, perhaps in the granfilade (using a distinguished range of addresses?) there's a global map that maps Xanadu server, account, document, etc. addresses down to a single poomfilade per document. The poomfilade is edited just like any other enfilade, but it doesn't contain characters, but rather contains offsets and lengths of items in the granfilade (stored as granfilade addresses).

The spanfilade maintains a global inverse map to all of the poomfilades, maitaining an association between ranges in the granfilade to specific document ranges in the poomfilades.

When new text is inserted, it's added at an appropriate spot in the granfilade, and a new (offset,length) pair of granfilade tumblers is added to the poomfilade. Simultaneously, the spanfilade is updated to reflect that the new range in the granfilade maps to a specific range in the relevant poomfilade.

When text is transcluded from one edition into another, a new entry is made in the destination poomfilade to indicate that a certain set of granfilade addresses is now included in that poomfilade, and a set of new mappings is added to the spanfilade, reflecting that those ranges of granfilade addresses also map to certain poomfilade addresses in the destination poomfilade.

To backfollow from a document address, you start at its poom, and harvest a set of granfilade ranges, then you pump them into the spanfilade, and harvest a set of document ranges that point into other poomfilades.

Links may just be special records stored in the poomfilades, since the same backfollow approach works for them.

end of David's guess.

Some random thoughts on the above.

The above is actually very similar to the stuff that Steve DeRose and I explained to you at Hypertext '89. In particular, I think we had already independently derived the model-T enfilade, and structure-sharing versioning strategy. We were sticking Markup nodes in there, ala SGML, but that was more of a religious difference than a data structure one.

The problem of fragmenting end-sets is a real one in all of these schemes, as a linear number of edits can create a linear number of items that need to be returned in these intermediate queries, which can make the run-time worse than log in some cases.

Here are the random queries I started before coming up with the design above:

Is this spanfilade map, just a just mapping from a (start, end) pair of global xanadu addresses to a range of granfilade addresses, or does it map to document addresses?

Does the granfilade use tumblers only to allow the creation of new, linearly ordered insertion points between existing granfilade addresses? couldn't the granfilade be a linear, append-only text-space? I guess the granfilade does allow for changes to non-frozen versions to vanish silently as their versions are edited.

[BTW if you're jut generating insertion points you can create a simpler address space based on paths in an infinite binary tree. The resulting numbers turn out to be isomorphic to the rational numbers.]

The POOMfilade seems more obscure to me. Does it map from Document addresses (global Udanax addresses) into the granfilade, or the spanfilade.

Links seem like a natural for the spanfilade, but they seem like they ought to be stored in document addresses, not granfilade addresses.

  -- David

At 10:49 PM -0700 10/18/00, roger gregory wrote:

"David G. Durand" wrote:I'm not even worried about the splay versus incremental balancing,

just whether one can independently rebalance H-trees when a new
O-tree is added, as long chains of versions can make for long, linear
H-trees, if I'm not mistaken.

Here I'm not sure, but I'd be surprised if things had to get linear,
why couldn't it be reballanced?  Long chans of versions sould result in a tree
with the versions at the bottom of the tree.


  > I also would be very happy to understand the overview of how the
  Green implementation works. Jeff's work looks rather nice so far (for
  C++). In particular, orgls seem to be different in Green and Gold.

Right since Gold is MANY generations of design from Green, there are few
direct identities. Mostly anything that would limit performance to square root
is avoided in Gold (note that poom and span have square root worst case).


  And I don't understand the purpose of the Spanfilade -- It's a 2-d
  enfilade, mapping points in some pair of coordinate spaces into some
  other coordinate space(s).


the spanf map maps ffom spans to the docs they are part of ,or map to.

I asked a whole bunch of questions (that I'm leaving in case I'm
confused), but is this the basic sketch of the Green Architecture?

Begin sketch:

The Granfilade stores all the text in the system under private
addresses. As an efficiency measure, text inserted into the
granfilade is inserted adjacent to the text that it's near. All text
in the granfilade belongs to an existing document.

Somewhere, perhaps in the granfilade (using a distinguished range of
addresses?) there's a global map that maps Xanadu server, account,
document, etc. addresses down to a single poomfilade per document.

The addresses map directly to the granfilade via the addresses.


The poomfilade is edited just like any other enfilade, but it doesn't
contain characters, but rather contains offsets and lengths of items
in the granfilade (stored as granfilade addresses).

The spanfilade maintains a global inverse map to all of the
poomfilades, maitaining an association between ranges in the
granfilade to specific document ranges in the poomfilades.

When new text is inserted, it's added at an appropriate spot in the
granfilade, and a new (offset,length) pair of granfilade tumblers is
added to the poomfilade. Simultaneously, the spanfilade is updated to
reflect that the new range in the granfilade maps to a specific range
in the relevant poomfilade.

When text is transcluded from one edition into another, a new entry
is made in the destination poomfilade to indicate that a certain set
of granfilade addresses is now included in that poomfilade, and a set
of new mappings is added to the spanfilade, reflecting that those
ranges of granfilade addresses also map to certain poomfilade
addresses in the destination poomfilade.

To backfollow from a document address, you start at its poom, and
harvest a set of granfilade ranges, then you pump them into the
spanfilade, and harvest a set of document ranges that point into
other poomfilades.

Links may just be special records stored in the poomfilades, since
the same backfollow approach works for them.

end of David's guess.


Right, note that there isn't currently enough info stored to remove a document
or edit a link, as the poom entries can't be found. This should be simple to fix.


Some random thoughts on the above.

The above is actually very similar to the stuff that Steve DeRose and
I explained to you at Hypertext '89. In particular, I think we had
already independently derived the model-T enfilade, and
structure-sharing versioning strategy. We were sticking Markup nodes
in there, ala SGML, but that was more of a religious difference than
a data structure one.


Really, I'm surprised, I would have thought I would remember so close a
similarity.  We consider the i<->v mapping as fundamental, did you have
anything like that? Is there any paper on that, so I can compare, it might
be useful to see what the differences were, it could give some insight.


The problem of fragmenting end-sets is a real one in all of these
schemes, as a linear number of edits can create a linear number of
items that need to be returned in these intermediate queries, which
can make the run-time worse than log in some cases.


Well yes but,
recall that we mostly will want to retireve small pieces of the document, some
small multiple of display sizes, and recall that much (often all) of the structure
and text will just get cached in ram, the problem is an in CPU problem and
not a big "read it in form disk and take forever" kind of thing.


Here are the random queries I started before coming up with the design above:

Is this spanfilade map, just a just mapping from a (start, end) pair
of global xanadu addresses to a range of granfilade addresses, or
does it map to document addresses?


grandfilade addresses == i (iinvarient) addresses
Yes the spanf maps to the i addresses,


Does the granfilade use tumblers only to allow the creation of new,
linearly ordered insertion points between existing granfilade
addresses? couldn't the granfilade be a linear, append-only
text-space? I guess the granfilade does allow for changes to
non-frozen versions to vanish silently as their versions are edited.


More than that the tumblers allow locality and authorship info to be incoded.
Note that Gold hasn't yet decided on a particular address encoding!


[BTW if you're jut generating insertion points you can create a
simpler address space based on paths in an infinite binary tree. The
resulting numbers turn out to be isomorphic to the rational numbers.]


note that tumblers (Transfilite nUMBERS) are a LARGER set than rationals. They
are isomorphic to the transfinite ordinals.



The POOMfilade seems more obscure to me. Does it map from Document
addresses (global Udanax addresses) into the granfilade, or the
spanfilade.


Poom maps  i<->v
spanf maps i<->i
so given an i (or granf address) the poom gives a v  (document address)
given an ispan the spanf gives the i addresses in the documents (including links) that map
to that ispan.


Links seem like a natural for the spanfilade, but they seem like they
ought to be stored in document addresses, not granfilade addresses.


Ha, that would mean as the document changed the links would point to different stuff.

For example showing links as caps for easy ascii representation

this is A text with TWO spans linked.
now if the addresses were to the v (document addresses) and we edited by adding 6xes and a space
xxxxx thIs is a tEXT with two spans linked.
Make more sense now?


At 9:46 AM -0400 10/19/00, David G. Durand wrote:
To: roger@xxxxxxxxxx
From: "David G. Durand" <david@xxxxxxxxxxxxxxxxxxx>
Subject: Re: I need a place to put ....
Cc:
Bcc:
X-Attachments:

"David G. Durand" wrote:I'm not even worried about the splay versus incremental balancing,

just whether one can independently rebalance H-trees when a new
O-tree is added, as long chains of versions can make for long, linear
H-trees, if I'm not mistaken.

Here I'm not sure, but I'd be surprised if things had to get linear,
why couldn't it be reballanced? Long chans of versions sould result in a tree
with the versions at the bottom of the tree.

Right. As long as you can rebalance in H. If I had 3 concentrated hours to analyze this, I could probably write down the invariants, and see if they can be maintained. I've been assuming all along that I can rotate in H (only) or in O (only), and that that's enough.


Right since Gold is MANY generations of design from Green, there are few
direct identities. Mostly anything that would limit performance to square root
is avoided in Gold (note that poom and span have square root worst case).

You can do span (range) queries in log N + S time (log of number of spans, plus the size of the result set). Of course that's a different algorithm.


Somewhere, perhaps in the granfilade (using a distinguished range of
addresses?) there's a global map that maps Xanadu server, account,
document, etc. addresses down to a single poomfilade per document.

The addresses map directly to the granfilade via the addresses.

So some granfilade addresses are externally visible ones (V-space addresses that MIRV out via the Poomfilades), and some are internal I-space addresses.


The poomfilade is edited just like any other enfilade, but it doesn't
contain characters, but rather contains offsets and lengths of items
in the granfilade (stored as granfilade addresses).

The spanfilade maintains a global inverse map to all of the
poomfilades, maitaining an association between ranges in the
granfilade to specific document ranges in the poomfilades.

When new text is inserted, it's added at an appropriate spot in the
granfilade, and a new (offset,length) pair of granfilade tumblers is
added to the poomfilade. Simultaneously, the spanfilade is updated to
reflect that the new range in the granfilade maps to a specific range
in the relevant poomfilade.

When text is transcluded from one edition into another, a new entry
is made in the destination poomfilade to indicate that a certain set
of granfilade addresses is now included in that poomfilade, and a set
of new mappings is added to the spanfilade, reflecting that those
ranges of granfilade addresses also map to certain poomfilade
addresses in the destination poomfilade.

To backfollow from a document address, you start at its poom, and
harvest a set of granfilade ranges, then you pump them into the
spanfilade, and harvest a set of document ranges that point into
other poomfilades.

Links may just be special records stored in the poomfilades, since
the same backfollow approach works for them.

end of David's guess.


Right, note that there isn't currently enough info stored to remove a document or edit a link, as the poom entries can't be found. This should be simple to fix.


Some random thoughts on the above.

The above is actually very similar to the stuff that Steve DeRose and
I explained to you at Hypertext '89. In particular, I think we had
already independently derived the model-T enfilade, and
structure-sharing versioning strategy. We were sticking Markup nodes
in there, ala SGML, but that was more of a religious difference than
a data structure one.


Really, I'm surprised, I would have thought I would remember so close a
similarity.  We consider the i<->v mapping as fundamental, did you have
anything like that? Is there any paper on that, so I can compare, it might
be useful to see what the differences were, it could give some insight.

Nah, we never wrote it down intelligibly. We never looked at addressing as central in the same way, and concentrated on structure sharing. We had a more basic structure: a family of trees, sharing many nodes in common, plus an external structure for tracking link endpoints to node-IDs. That's more basic because you don't have an easy way to find all sharers of a datum. But we did have the idea of accumulating widths down the tree, and external indices from spans to spans.

As I said, similar, not the same.

I came to look at addressing as much more critical later on, and it's the key to my dissertation work, but by then I was more interested in arbitrary combinations of changes and undo, coupled with the ability to make edit-stable pointers into an alternative version space.

I devoted a lot less time to algorithms, because it's less important now.

The problem of fragmenting end-sets is a real one in all of these
schemes, as a linear number of edits can create a linear number of
items that need to be returned in these intermediate queries, which
can make the run-time worse than log in some cases.


Well yes but,
recall that we mostly will want to retireve small pieces of the document, some small multiple of display sizes, and recall that much (often all) of the structure
and text will just get cached in ram, the problem is an in CPU problem and
not a big "read it in form disk and take forever" kind of thing.

CPU problems are not problems at all anymore. At least not for most data structures. There's a lot of cache-hit questions though around disk traffic.

Here are the random queries I started before coming up with the design above:
Does the granfilade use tumblers only to allow the creation of new,
linearly ordered insertion points between existing granfilade
addresses? couldn't the granfilade be a linear, append-only
text-space? I guess the granfilade does allow for changes to
non-frozen versions to vanish silently as their versions are edited.


More than that the tumblers allow locality and authorship info to be incoded.
Note that Gold hasn't yet decided on a particular address encoding!

But does the granfilade include server/node/account/info in i-addresses? I guess that makes sense, as the bytes are perpetually "owned" by the home document.

I tend to think of owning the address space that you publish as opposed to the bytes at the end, but that reflects my own mental drift, not your model. That makes things clearer.


[BTW if you're jut generating insertion points you can create a
simpler address space based on paths in an infinite binary tree. The
resulting numbers turn out to be isomorphic to the rational numbers.]


note that tumblers (Transfilite nUMBERS) are a LARGER set than rationals. They
are isomorphic to the transfinite ordinals.

Sure, you have a lot more split points available. That's why accounts/nodes etc. can split more easily. Just for a text stream, though, the rationals are sufficient.


The POOMfilade seems more obscure to me. Does it map from Document
addresses (global Udanax addresses) into the granfilade, or the
spanfilade.


Poom maps  i<->v
spanf maps i<->i
so given an i (or granf address) the poom gives a v  (document address)
given an ispan the spanf gives the i addresses in the documents (including links) that map
to that ispan.

I guess I still have the poom backwards. How do I go from a v-address (e.g. from the front end) to an i-address?

Links seem like a natural for the spanfilade, but they seem like they
ought to be stored in document addresses, not granfilade addresses.


Ha, that would mean as the document changed the links would point to different stuff.

yeah, this is a perspective difference. I've spent a decade and a half dealing with editing problems, and from that perspective you're primarily focusing on a small number of "current states" and unwinding the history to determine the original contexts, and do document comparisons.

For example showing links as caps for easy ascii representation

this is A text with TWO spans linked.
now if the addresses were to the v (document addresses) and we edited by adding 6xes and a space
xxxxx thIs is a tEXT with two spans linked.
Make more sense now?

Oh yeah. I actually figured out this mistake before writing my design note/query. How do you record the originating document context where a link was made? I guess by the link's home address. When you make a new rev of a document you also create new links (though structure is shared by transclusion). So I guess that tells you what poomfilade to use to turn the link spans back into v-spans for a particular document.

Are you saying that there's no way to find the POOMs pointing to a given i-span?

PS. Can I post this thread to the udanax list?

  -- David

--
-------------------------------
David Durand
david@xxxxxxxxxxxxxxxxxxx
Chief Technical Officer
Dynamic Diagrams
http://www.dynamicDiagrams.com/