[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Re: NewsSpeak



> I believe the extensible hashing algorithms in the literature (the
> ones Mr. Hill is researching) are constant time to access & constant
> time to store a new record (or overwrite an existing one).  And this
> constant time averages much closer to one disk seek than two.

Perfect hash is "choose a hash function that has no collisions at all".

My impression is that the extensible hash schemes are very close, but
not quite a cigar.  They seem to be a very good job of getting most of
the benefits of perfect hash by:
 - Using a very-good-but-not-perfect hash function, allowing collisions
   now and then (constant probability?), and.
 - Doing the massive rearrange incrementally, smearing the cost out over
   the insertions.  (Getting it to average constant?)
Perhaps they're O(1) given non-pathological data, but I suspect they're
just a tad higher than that.  More collisions from graniness?  More data
to move around when you extend?  I'll have to look more closely.

> Btw, I also liked the NewsSpeak alot.  I'd do Esther Dyson very
> differently if I had it to do over again.

Thanks.  Also thanks to everyone else who sent me a nice comment.
(BTW, it was Esther's reaction that prompted the composition.)

	cheers
	michael