[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

more Green information

To: udanax@xxxxxxxxxx
Subject: more Green information
From: "David G. Durand" <david@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 19 Oct 2000 23:37:05 -0400

I've been talking with Roger about the internals of Green, andthought that the information might be of interest.


Herewith, a few messages in a discussion of details.

Once I feel I've got these straight I'll try to summarize, but thattakes time that I will have to find.



At 9:21 AM -0400 10/18/00, David G. Durand wrote:

At 6:12 AM -0500 10/18/00, Jeff Rush wrote:
Tonight I was decoding Abraham and trying to find the CurrentPacker
class, with no luck.
I was seeking the low-level disk I/O, trying to build up a layered
architecture diagram
like the traditional 7-level ISO/OSI diagram.  What's on top and what's
on bottom,
generally speaking re software modules (*NOT* datastructures as I know
Ents have no top/bottom
per se).  CurrentPacker is not included in the coordspace.st and top.st
files I received.
I'll look for it in what you upload tonight.
CurrentPacker and a number of other undefined globals were Fluidvariables. Fluids (in case you're not an old LISPer) are dynamicallyscoped globals. I'm not sure why a special mechanism like this wasneeded in Smalltalk, where globals _are_ all fluid, but I have apretty good idea for two reasosn:
1. They are part of the garbage collector's start-set for a collection sweep.
2. And of course, these variables were required to be persistent,and so the variables needed to have an explicit storage model, andthere needed to be overridable methods to detect each access andupdate event.
Each Fluid also had an Emulsion, which was a persistent storage Areain which a set of FLuids could be stored. There was a globalvariable (CurrentEmulsion/GlobalEmulsion/CurrentGlobalEmulsion? Iforget) that stored the default Emulsion for new globals.
I wasn't able to figure out how to correctly initialize thesethings, I think because several global methods for Object and Classare missing in the changesets we have. One thing that I would loveto see is a complete FileOut of the whole Xanadu Image, as I couldthen use Smalltalk-based tools to select out more of those methods,and I could also examine the way the globals were actually set up.
At this point, my real issues around the Ent have to do with:
The details of the canopy (where things seemed to change severaltimes during the course of the project)
The details of balancing, and to what extent balancing the O- andH-trees is independent.
Whether it's possible to have a purely binary H-tree implementation.
How the generalizations to arbitrary addressing schemes worked (Italmost seems that they treated some of them as ordered dictionariesbased on Hash values for objects).
I also would be very happy to understand the overview of how theGreen implementation works. Jeff's work looks rather nice so far(for C++). In particular, orgls seem to be different in Green andGold. And I don't understand the purpose of the Spanfilade -- It's a2-d enfilade, mapping points in some pair of coordinate spaces intosome other coordinate space(s).


At 10:33 AM -0700 10/18/00, roger gregory wrote:

"David G. Durand" wrote:CurrentPacker and a number of otherundefined globals were Fluid

 variables. Fluids (in case you're not an old LISPer) are dynamically

 > scoped globals. I'm not sure why a special mechanism like this was

 needed in Smalltalk, where globals _are_ all fluid, but I have a
 pretty good idea for two reasosn:

As I remember they are needed on the C++ side, remember the{declaration} stuff

is ignored by smalltalk.

1. They are part of the garbage collector's start-set for acollection sweep.


 2. And of course, these variables were required to be persistent,
 missing in the changesets we have. One thing that I would love to see
 is a complete FileOut of the whole Xanadu Image, as I could then use
 Smalltalk-based tools to select out more of those methods, and I
 could also examine the way the globals were actually set up.

 >

Coming in the tarfile,if it ever gets out on this flakey line.

 > At this point, my real issues around the Ent have to do with:


 The details of the canopy (where things seemed to change several
 times during the course of the project)


They SURE did, seemingly every week.


 The details of balancing, and to what extent balancing the O- and
 H-trees is independent.


The dynamic ballancing was a convienience, for the rabid implementation, and
it was convienently forgotten by the design/bozos that it broke disk reading
severely, by meaning that every read had to be rewritten, completely breaking
the model for write once media.


 Whether it's possible to have a purely binary H-tree implementation.

 How the generalizations to arbitrary addressing schemes worked (It
 almost seems that they treated some of them as ordered dictionaries
 based on Hash values for objects).

 I also would be very happy to understand the overview of how the
 Green implementation works. Jeff's work looks rather nice so far (for
 C++). In particular, orgls seem to be different in Green and Gold.
 And I don't understand the purpose of the Spanfilade -- It's a 2-d
 enfilade, mapping points in some pair of coordinate spaces into some
 other coordinate space(s).


the spanf map maps ffom spans to the docs they are part of ,or map to.


At 5:09 PM -0400 10/18/00, David G. Durand wrote:

 > The details of balancing, and to what extent balancing the O- and
 H-trees is independent.
The dynamic ballancing was a convienience, for the rabid implementation, and
it was convienently forgotten by the design/bozos that it broke disk reading
severely, by meaning that every read had to be rewritten, completely breaking
the model for write once media.
I'm not even worried about the splay versus incremental balancing,just whether one can independently rebalance H-trees when a newO-tree is added, as long chains of versions can make for long,linear H-trees, if I'm not mistaken.
 > I also would be very happy to understand the overview of how the
 Green implementation works. Jeff's work looks rather nice so far (for
 C++). In particular, orgls seem to be different in Green and Gold.
 And I don't understand the purpose of the Spanfilade -- It's a 2-d
 enfilade, mapping points in some pair of coordinate spaces into some
 other coordinate space(s).
the spanf map maps ffom spans to the docs they are part of ,or map to.
I asked a whole bunch of questions (that I'm leaving in case I'mconfused), but is this the basic sketch of the Green Architecture?
Begin sketch:
The Granfilade stores all the text in the system under privateaddresses. As an efficiency measure, text inserted into thegranfilade is inserted adjacent to the text that it's near. All textin the granfilade belongs to an existing document.
Somewhere, perhaps in the granfilade (using a distinguished range ofaddresses?) there's a global map that maps Xanadu server, account,document, etc. addresses down to a single poomfilade per document.The poomfilade is edited just like any other enfilade, but itdoesn't contain characters, but rather contains offsets and lengthsof items in the granfilade (stored as granfilade addresses).
The spanfilade maintains a global inverse map to all of thepoomfilades, maitaining an association between ranges in thegranfilade to specific document ranges in the poomfilades.
When new text is inserted, it's added at an appropriate spot in thegranfilade, and a new (offset,length) pair of granfilade tumblers isadded to the poomfilade. Simultaneously, the spanfilade is updatedto reflect that the new range in the granfilade maps to a specificrange in the relevant poomfilade.
When text is transcluded from one edition into another, a new entryis made in the destination poomfilade to indicate that a certain setof granfilade addresses is now included in that poomfilade, and aset of new mappings is added to the spanfilade, reflecting thatthose ranges of granfilade addresses also map to certain poomfiladeaddresses in the destination poomfilade.
To backfollow from a document address, you start at its poom, andharvest a set of granfilade ranges, then you pump them into thespanfilade, and harvest a set of document ranges that point intoother poomfilades.
Links may just be special records stored in the poomfilades, sincethe same backfollow approach works for them.
end of David's guess.

Some random thoughts on the above.
The above is actually very similar to the stuff that Steve DeRoseand I explained to you at Hypertext '89. In particular, I think wehad already independently derived the model-T enfilade, andstructure-sharing versioning strategy. We were sticking Markup nodesin there, ala SGML, but that was more of a religious difference thana data structure one.
The problem of fragmenting end-sets is a real one in all of theseschemes, as a linear number of edits can create a linear number ofitems that need to be returned in these intermediate queries, whichcan make the run-time worse than log in some cases.
Here are the random queries I started before coming up with the design above:
Is this spanfilade map, just a just mapping from a (start, end) pairof global xanadu addresses to a range of granfilade addresses, ordoes it map to document addresses?
Does the granfilade use tumblers only to allow the creation of new,linearly ordered insertion points between existing granfiladeaddresses? couldn't the granfilade be a linear, append-onlytext-space? I guess the granfilade does allow for changes tonon-frozen versions to vanish silently as their versions are edited.
[BTW if you're jut generating insertion points you can create asimpler address space based on paths in an infinite binary tree. Theresulting numbers turn out to be isomorphic to the rational numbers.]
The POOMfilade seems more obscure to me. Does it map from Documentaddresses (global Udanax addresses) into the granfilade, or thespanfilade.
Links seem like a natural for the spanfilade, but they seem likethey ought to be stored in document addresses, not granfiladeaddresses.
  -- David


At 10:49 PM -0700 10/18/00, roger gregory wrote:

"David G. Durand" wrote:I'm not even worried about the splay versusincremental balancing,

just whether one can independently rebalance H-trees when a new
O-tree is added, as long chains of versions can make for long, linear
H-trees, if I'm not mistaken.

Here I'm not sure, but I'd be surprised if things had to get linear,
why couldn't it be reballanced?  Long chans of versions sould result in a tree
with the versions at the bottom of the tree.

  > I also would be very happy to understand the overview of how the

  Green implementation works. Jeff's work looks rather nice so far (for
  C++). In particular, orgls seem to be different in Green and Gold.

Right since Gold is MANY generations of design from Green, there are few

direct identities. Mostly anything that would limit performance tosquare root

is avoided in Gold (note that poom and span have square root worst case).

  And I don't understand the purpose of the Spanfilade -- It's a 2-d
  enfilade, mapping points in some pair of coordinate spaces into some
  other coordinate space(s).


the spanf map maps ffom spans to the docs they are part of ,or map to.


I asked a whole bunch of questions (that I'm leaving in case I'm
confused), but is this the basic sketch of the Green Architecture?

Begin sketch:

The Granfilade stores all the text in the system under private
addresses. As an efficiency measure, text inserted into the
granfilade is inserted adjacent to the text that it's near. All text
in the granfilade belongs to an existing document.

Somewhere, perhaps in the granfilade (using a distinguished range of
addresses?) there's a global map that maps Xanadu server, account,
document, etc. addresses down to a single poomfilade per document.

The addresses map directly to the granfilade via the addresses.


The poomfilade is edited just like any other enfilade, but it doesn't
contain characters, but rather contains offsets and lengths of items
in the granfilade (stored as granfilade addresses).

The spanfilade maintains a global inverse map to all of the
poomfilades, maitaining an association between ranges in the
granfilade to specific document ranges in the poomfilades.

When new text is inserted, it's added at an appropriate spot in the
granfilade, and a new (offset,length) pair of granfilade tumblers is
added to the poomfilade. Simultaneously, the spanfilade is updated to
reflect that the new range in the granfilade maps to a specific range
in the relevant poomfilade.

When text is transcluded from one edition into another, a new entry
is made in the destination poomfilade to indicate that a certain set
of granfilade addresses is now included in that poomfilade, and a set
of new mappings is added to the spanfilade, reflecting that those
ranges of granfilade addresses also map to certain poomfilade
addresses in the destination poomfilade.

To backfollow from a document address, you start at its poom, and
harvest a set of granfilade ranges, then you pump them into the
spanfilade, and harvest a set of document ranges that point into
other poomfilades.

Links may just be special records stored in the poomfilades, since
the same backfollow approach works for them.

end of David's guess.

Right, note that there isn't currently enough info stored to remove a document

or edit a link, as the poom entries can't be found. This should besimple to fix.


Some random thoughts on the above.

The above is actually very similar to the stuff that Steve DeRose and
I explained to you at Hypertext '89. In particular, I think we had
already independently derived the model-T enfilade, and
structure-sharing versioning strategy. We were sticking Markup nodes
in there, ala SGML, but that was more of a religious difference than
a data structure one.

Really, I'm surprised, I would have thought I would remember so close a
similarity.  We consider the i<->v mapping as fundamental, did you have
anything like that? Is there any paper on that, so I can compare, it might
be useful to see what the differences were, it could give some insight.


The problem of fragmenting end-sets is a real one in all of these
schemes, as a linear number of edits can create a linear number of
items that need to be returned in these intermediate queries, which
can make the run-time worse than log in some cases.

Well yes but,
recall that we mostly will want to retireve small pieces of the document, some

small multiple of display sizes, and recall that much (often all) ofthe structure

and text will just get cached in ram, the problem is an in CPU problem and
not a big "read it in form disk and take forever" kind of thing.


Here are the random queries I started before coming up with the design above:

Is this spanfilade map, just a just mapping from a (start, end) pair
of global xanadu addresses to a range of granfilade addresses, or
does it map to document addresses?

grandfilade addresses == i (iinvarient) addresses
Yes the spanf maps to the i addresses,


Does the granfilade use tumblers only to allow the creation of new,
linearly ordered insertion points between existing granfilade
addresses? couldn't the granfilade be a linear, append-only
text-space? I guess the granfilade does allow for changes to
non-frozen versions to vanish silently as their versions are edited.

More than that the tumblers allow locality and authorship info to be incoded.
Note that Gold hasn't yet decided on a particular address encoding!


[BTW if you're jut generating insertion points you can create a
simpler address space based on paths in an infinite binary tree. The
resulting numbers turn out to be isomorphic to the rational numbers.]

note that tumblers (Transfilite nUMBERS) are a LARGER set thanrationals. They

are isomorphic to the transfinite ordinals.


The POOMfilade seems more obscure to me. Does it map from Document
addresses (global Udanax addresses) into the granfilade, or the
spanfilade.

Poom maps  i<->v
spanf maps i<->i
so given an i (or granf address) the poom gives a v  (document address)

given an ispan the spanf gives the i addresses in the documents(including links) that map

to that ispan.


Links seem like a natural for the spanfilade, but they seem like they
ought to be stored in document addresses, not granfilade addresses.

Ha, that would mean as the document changed the links would point todifferent stuff.


For example showing links as caps for easy ascii representation

this is A text with TWO spans linked.

now if the addresses were to the v (document addresses) and weedited by adding 6xes and a space

xxxxx thIs is a tEXT with two spans linked.
Make more sense now?



At 9:46 AM -0400 10/19/00, David G. Durand wrote:

To: roger@xxxxxxxxxx
From: "David G. Durand" <david@xxxxxxxxxxxxxxxxxxx>
Subject: Re: I need a place to put ....
Cc:
Bcc:
X-Attachments:
"David G. Durand" wrote:I'm not even worried about the splay versusincremental balancing,
just whether one can independently rebalance H-trees when a new
O-tree is added, as long chains of versions can make for long, linear
H-trees, if I'm not mistaken.
Here I'm not sure, but I'd be surprised if things had to get linear,
why couldn't it be reballanced? Long chans of versions souldresult in a tree
with the versions at the bottom of the tree.
Right. As long as you can rebalance in H. If I had 3 concentratedhours to analyze this, I could probably write down the invariants,and see if they can be maintained. I've been assuming all along thatI can rotate in H (only) or in O (only), and that that's enough.
Right since Gold is MANY generations of design from Green, there are few
direct identities. Mostly anything that would limit performance tosquare root
is avoided in Gold (note that poom and span have square root worst case).
You can do span (range) queries in log N + S time (log of number ofspans, plus the size of the result set). Of course that's adifferent algorithm.
Somewhere, perhaps in the granfilade (using a distinguished range of
addresses?) there's a global map that maps Xanadu server, account,
document, etc. addresses down to a single poomfilade per document.
The addresses map directly to the granfilade via the addresses.
So some granfilade addresses are externally visible ones (V-spaceaddresses that MIRV out via the Poomfilades), and some are internalI-space addresses.
The poomfilade is edited just like any other enfilade, but it doesn't
contain characters, but rather contains offsets and lengths of items
in the granfilade (stored as granfilade addresses).

The spanfilade maintains a global inverse map to all of the
poomfilades, maitaining an association between ranges in the
granfilade to specific document ranges in the poomfilades.

When new text is inserted, it's added at an appropriate spot in the
granfilade, and a new (offset,length) pair of granfilade tumblers is
added to the poomfilade. Simultaneously, the spanfilade is updated to
reflect that the new range in the granfilade maps to a specific range
in the relevant poomfilade.

When text is transcluded from one edition into another, a new entry
is made in the destination poomfilade to indicate that a certain set
of granfilade addresses is now included in that poomfilade, and a set
of new mappings is added to the spanfilade, reflecting that those
ranges of granfilade addresses also map to certain poomfilade
addresses in the destination poomfilade.

To backfollow from a document address, you start at its poom, and
harvest a set of granfilade ranges, then you pump them into the
spanfilade, and harvest a set of document ranges that point into
other poomfilades.

Links may just be special records stored in the poomfilades, since
the same backfollow approach works for them.

end of David's guess.
Right, note that there isn't currently enough info stored to removea documentor edit a link, as the poom entries can't be found. This should besimple to fix.
Some random thoughts on the above.

The above is actually very similar to the stuff that Steve DeRose and
I explained to you at Hypertext '89. In particular, I think we had
already independently derived the model-T enfilade, and
structure-sharing versioning strategy. We were sticking Markup nodes
in there, ala SGML, but that was more of a religious difference than
a data structure one.
Really, I'm surprised, I would have thought I would remember so close a
similarity.  We consider the i<->v mapping as fundamental, did you have
anything like that? Is there any paper on that, so I can compare, it might
be useful to see what the differences were, it could give some insight.
Nah, we never wrote it down intelligibly. We never looked ataddressing as central in the same way, and concentrated on structuresharing. We had a more basic structure: a family of trees, sharingmany nodes in common, plus an external structure for tracking linkendpoints to node-IDs. That's more basic because you don't have aneasy way to find all sharers of a datum. But we did have the idea ofaccumulating widths down the tree, and external indices from spansto spans.
As I said, similar, not the same.
I came to look at addressing as much more critical later on, andit's the key to my dissertation work, but by then I was moreinterested in arbitrary combinations of changes and undo, coupledwith the ability to make edit-stable pointers into an alternativeversion space.
I devoted a lot less time to algorithms, because it's less important now.
The problem of fragmenting end-sets is a real one in all of these
schemes, as a linear number of edits can create a linear number of
items that need to be returned in these intermediate queries, which
can make the run-time worse than log in some cases.
Well yes but,
recall that we mostly will want to retireve small pieces of thedocument, somesmall multiple of display sizes, and recall that much (often all)of the structure
and text will just get cached in ram, the problem is an in CPU problem and
not a big "read it in form disk and take forever" kind of thing.
CPU problems are not problems at all anymore. At least not for mostdata structures. There's a lot of cache-hit questions though arounddisk traffic.
Here are the random queries I started before coming up with thedesign above:
Does the granfilade use tumblers only to allow the creation of new,
linearly ordered insertion points between existing granfilade
addresses? couldn't the granfilade be a linear, append-only
text-space? I guess the granfilade does allow for changes to
non-frozen versions to vanish silently as their versions are edited.
More than that the tumblers allow locality and authorship info to be incoded.
Note that Gold hasn't yet decided on a particular address encoding!
But does the granfilade include server/node/account/info ini-addresses? I guess that makes sense, as the bytes are perpetually"owned" by the home document.
I tend to think of owning the address space that you publish asopposed to the bytes at the end, but that reflects my own mentaldrift, not your model. That makes things clearer.
[BTW if you're jut generating insertion points you can create a
simpler address space based on paths in an infinite binary tree. The
resulting numbers turn out to be isomorphic to the rational numbers.]
note that tumblers (Transfilite nUMBERS) are a LARGER set thanrationals. They
are isomorphic to the transfinite ordinals.
Sure, you have a lot more split points available. That's whyaccounts/nodes etc. can split more easily. Just for a text stream,though, the rationals are sufficient.
The POOMfilade seems more obscure to me. Does it map from Document
addresses (global Udanax addresses) into the granfilade, or the
spanfilade.
Poom maps  i<->v
spanf maps i<->i
so given an i (or granf address) the poom gives a v  (document address)
given an ispan the spanf gives the i addresses in the documents(including links) that map
to that ispan.
I guess I still have the poom backwards. How do I go from av-address (e.g. from the front end) to an i-address?
Links seem like a natural for the spanfilade, but they seem like they
ought to be stored in document addresses, not granfilade addresses.
Ha, that would mean as the document changed the links would pointto different stuff.
yeah, this is a perspective difference. I've spent a decade and ahalf dealing with editing problems, and from that perspective you'reprimarily focusing on a small number of "current states" andunwinding the history to determine the original contexts, and dodocument comparisons.
For example showing links as caps for easy ascii representation

this is A text with TWO spans linked.
now if the addresses were to the v (document addresses) and weedited by adding 6xes and a space
xxxxx thIs is a tEXT with two spans linked.
Make more sense now?
Oh yeah. I actually figured out this mistake before writing mydesign note/query. How do you record the originating documentcontext where a link was made? I guess by the link's home address.When you make a new rev of a document you also create new links(though structure is shared by transclusion). So I guess that tellsyou what poomfilade to use to turn the link spans back into v-spansfor a particular document.
Are you saying that there's no way to find the POOMs pointing to agiven i-span?
PS. Can I post this thread to the udanax list?

  -- David


--
-------------------------------
David Durand
david@xxxxxxxxxxxxxxxxxxx
Chief Technical Officer
Dynamic Diagrams
http://www.dynamicDiagrams.com/

Prev by Date: Questions
Next by Date: Last item in Green thread
Previous by thread: Questions
Next by thread: Last item in Green thread
Index(es):