[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Re: Presentation on Xanalogical Data Structures

On Wed, 2004-07-07 at 00:02, Jeff Rush wrote:
> On Tue, 2004-07-06 at 11:32, Steve Witham wrote:
> > The stuff you said about the different large components (block server,
> > Mozilla pluggin, etc.), the different
> > small components (tumblers, blocking, enfilades), how they fit
> > together, and how (briefly) they differ from the Green implementation
> > and Ted's Silverstands vision, are good stuff and help to motivate
> > the lower-level stuff.  It's good to have an idea of the uses of
> > something and also an alternative design, when you're looking at
> > a design.
> > 
> > When you say disk blocking... are you talking about the B-tree
> > stuff?  Or, if you let someone else's library handle blocking,
> > does that mean you are just using binary trees, and letting it
> > group some of the binary nodes into blocks?
> Yes, in Green they implemented their own object persistence mechanism,
> with binary nodes (one datum per node with sibling links) arranged into
> an N-way tree (multiple nodes per level).  They then worked hard to get
> those related nodes to pack together into fixed-size disk blocks.  And
> then they designed a caching system below that to hold the disk blocks
> as they came in from disk.
> I've usually seen it as an N-way tree the way a B-tree works; each node
> holds some number of keys/datums, and the nodes -are- the disk blocks as
> well.  The extra layer of tracking nodes (crums) and the blocks (loafs)
> they belong in seems awkward, although it does provide an extra layer of
> virtuality if you haven't finalized your tree structures.
> A difference then is when the WID operator is invoked on a layer of the
> tree; in their approach you have to navigate over a linked mesh of nodes
> to sum their WID attributes; in a B-tree you only have to iterate over
> an array of WID attributes.  The traversal of linked nodes can have
> performance issues in a virtual memory environment that arrays don't
> have.
Well yes if you let your virtual memory get bigger than your real memory
and then follow lots of pointers.  We don't do either. Note that the
numbers of nodes tend to stay really small like 5 or 7.

> In Python I'm storing N tuples (key -> datum) per tuple (enfilade node),
> and making them all immutable.  The nodes are variable length on disk. 
> It also means as you change the trees you generate more nodes and
> consume more disk space.  But disk space is cheap, immutability solves a

Since modern machines can generate intermediate data structures at a
Gigabyte/second, you really don't want to save all that to disk.  That
was the major purpose of the cashing and garbage collection in Green.

> lot of locking issues, and the code is easier to understand.  Fixed
> length blocks are preferred when you're rewriting in-place; variable
> length blocks are good for retaining old versions.
of course you can always fix what needs fixing later, more readable code
lets you do that, but then the only trouble I have reading green's code,
is emotional.  Modifying green is another matter. 

> > I guess what I'm saying is that to me, the choice of binary
> > trees vs. 2-3 trees vs. trees with wide branching, and whether
> > & how the tree is balanced, is one set of issues,
> > and putting the nodes onto disk blocks is another,
> > and I'm not sure whether you're grouping both under "disk blocking."
Every thing is deeply intertwigled.  Try it your way, if it needs fixing
lets argue about that when we have measurements.  I can argue for the
way we did it, but the trade offs may be different now that an 80
megabyte disk doesn't cost $10,000 anymore.  But perhaps they don't
change as much as you might think

> Ah, so you would agree with the Green approach of separating the choice
> of tree structure from disk storage?
> > Anyway, you should make a bunch of slides, then do an audio recording
> > of yourself practicing presenting the slides (if not a recording
> > of your performance at the conference).  Then you can post pdf's of
> > the slides and nice compact mp3 of the audio... right?
> That's an idea.  I've never done such but perhaps I will. ;-)
> -Jeff