[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

More Udanax code questions

To: udanax@xxxxxxxxxx
Subject: More Udanax code questions
From: "David G. Durand" <david@xxxxxxxxxxxxxxxxxxx>
Date: Fri, 25 Feb 2000 12:36:12 -0500
In-reply-to: <v04210103b4d9c5ad526d@[216.207.71.175]>
References: <v04210103b4d9c5ad526d@[216.207.71.175]>

It's perhaps silly to reply to your own posting, but I realized thatthere are a bunch of questions that I could have asked that I didn't,and Roger's mail is a sign that we might be able to get someanswers. So I'll add new questions to my old questions. Anyone elsewho's been going through the code is welcome to answer as well, ofcourse.




At 12:16 PM -0500 2/23/00, David G. Durand wrote:

Here's what I've learned and some of the questions that I still have.

My own examination shows that the fundamental "model-T" enfiladeappears to be a balanced tree of totally ordered content addresses.The simplest approach to managing such a data structure for multipleversions, with data sharing, is to use a technique that I've alwayscalled "copy-to-root". The udanax code instead inserted some sort ofnode representing a permutation or alteration in place at everyancestor where the child relations change for a given version. Ibelieve that these "node-level diffs" are called "orgls", but can'tquite tell.

It's also frequently necessary to split nodes where a rearrangementoccurs, as a precondition to creating an orgl that rearrangedchildren.

Questions: The tree seems to be n-ary, and height-balanced (the pathfrom root to leaves is constant length). What are the arityconditions on nodes, and what are the performance implications of theimplementation of orgls at each node? How is a particular versionused as a key to select the correct orgl at a node?

The granfilade seems to store the I-stream in a more-or-less vanillaModel-T enfilade, with mapings to actual data items. Are links storedin here, or only data?

Link management is based on a "2-D" enfilade which seems to managestarting points and widths of linked areas, with linked lists ofpointers to the original data in the I-stream.


The 2-D enfilade is the "spanfilade", right?

It seems like the start and end spans of link-regions are stored herein terms of v-stream addresses. Do these map also to I-streamaddresses or is that a separate step?

One possible design would be to store each linkend region once in thespanfilade as a v-stream-vstream mapping, and then use the separate,per-document v-stream maps to find the actual i-stream addresses ofthe linked data.

Alternatively, you could store I-stream addresses directly here, andlink them directly to the corresponding link-descriptions, with theassociated v-stream addresses. In this case, a link end would beexpanded to a list of I-stream ranges, which are added to the tree,and then stored with references to the link itself. Thus thespanfilade would consist of a node for each of the smallest spansthat have identical links overlapping them, and would point to listsof those links. There should be some structure-sharing tricks here tokeep from duplicating the list elements when a span must be split.This still seems like it would have to be linear in the number ofI-stream regions covered by a link endpoint, though.

In the 2-D enfilade, how does the second dimension affect theindexing? Is it purely a secondary key (as the code appears) or isthere some trickery that makes this more flexible. How and why areorgls used in the 2-D enfilade. Is this merely a map from v-streamaddresses to i-stream addresses? Does it also include special itemsmapping from

In general, how does one find the set of V-stream addressescorresponding to an I-stream address?

The loaf/crum management is a basic explicitly managed virtualmemory, with in memeory use-counts managed in the code.

I don't really have any questions about this, other than how crumsare assigned to loaves. This is an area where I'd be tempted to maybelearn a few lessons, but mostly skip over, on the assumption thatcurrent, if sub-optimal, memory management facilities might suffice.Making this more transparent via an object model with explicitmanagement of references would not be a bad idea.

The Smalltalk code is more scrutable, but also more ambitious. Itseems to contain a certain amount of dead code, as well as manyclasses for process management and the like that are not so closelyrelated to the fundamental algorithms (at least the pure datamanagement ones).

As Roger says, there's much good functionality here, but theimplementation may be too clumsy for contemporary use... On the otherhand, the pure-smalltalk version might run well enough on currenthardware to be useful, at least on a small scale.

In the generalized code, the sequence is replaced by an arbitraryaddress space, which can be any algebraic structure (set of itemswith significant relations between them). Contents are associateddirectly (by identity) with elements of the address space by amapping. Operations are represented by transformations (or completereplacement, or incremental replacement?) of the address->datamappings. This means that handling n-d (and indeed arbitrary)topologies can be handled uniformly within a single structure. Iassume that this structure is the fabled Ent.

A key question in this is how to efficiently represent theincremental modifications of an arbitrary map so that the runtime ofapplying the composed map is not linear in the number ofmodifications.


How are partial maps represented efficiently?

I don't quite understand how WIDativity and DSPativity are used toenable the efficient creation of compositional maps for this. I getlost in the code, but the Ent seems to be a hierarchicalrepresentation of a series of maps. The top level represents largescale rearrangements, and yields a precise state when composed withthe smaller scale maps whose results form its input. Such acomposition is a way to retrieve a single state of the base addressspace. However, there must be an efficient way to select fromalternative maps at each level, so that a particular state can beselected.

So how do you select an appropriate map at a given level? Oneimagines a hybrid tree (much like a k-D tree) where alternate levelsselect on:

a) an address key (as input to a partial address->data map). Thismight be one way to break up a mapping into "geographical" regions ofthe address space.b) A version or configuration key (to select the correctaddress->data map). Each version would create new choice points amongvariations of the map, and these would probably also be representedby a balanced structure of some sort.

This might be a big tree, with hashtables at each node, withelements of the hashtable being alternative maps for that node,keyed off of the version identifier of a given state (a Bert,right?).

The version structure itself (a Dagwood?) is a version treecontaining some sort of keys to be used in partial map selection. Isthis right?


   -- David
--
_________________________________________
David Durand              dgd@xxxxxxxxx  \  david@xxxxxxxxxxxxxxxxxxx
http://cs-people.bu.edu//dgd/             \  Director of Development
    Graduate Student no more!              \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
                                             \__________________________

References:
- Is the list alive?
  - From: David G. Durand

Prev by Date: test of udanax mailing list
Next by Date: test of my access to udanax.com
Previous by thread: Is the list alive?
Next by thread: Alive?
Index(es):