Diagonal vectors for BSP trees



I've been dealing with high-dimensional spaces and especially nearest
neighbor searching using kd-trees for about 2 months now. Kd-trees are
very compact in storage since the direction of the splitting plane can
be described by a single integer number. On the other hand, describing
an arbitrary direction for a general BSP tree takes O(n) amount of
floating point numbers.

I was reading the paper "Refinements to nearest-neighbor searching in k
-dimensional trees" by Robert Sproull:

http://www.springerlink.com/content/j6h2x56111290g18/

He mentions that using planes of arbitrary direction speeds up nearest
neighbors searching. However, it should be noted that back then (1991)
the incremental computation of _exact_ distances to kd-tree nodes while
searching nearest neighbors was not known (or at least not well known)
as introduced by "Algorithms for Fast Vector Quantization", by Sunil
Arya and David Mount:

http://www.cs.ust.hk/faculty/arya/pub/DCC.pdf

This improvement can make the advantage of arbitrary directed splitting
planes less important. Also, it can be hard to guarantee fat nodes with
arbitrary splitting planes, which is essential for nearest neighbor
searching performance:

"It's okay to be skinny, if your friends are fat", Maneewongvatana and
David Mount:

http://www.cs.umd.edu/~mount/Papers/cgc99-smpack.pdf

"An Optimal Algorithm for Approximate Nearest in Fixed Dimensions",
Sunil Arya et al.:

http://www.cs.umd.edu/~mount/Papers/dist.pdf

Given arbitrary directions for splitting planes, it is hard to choose a
convenient direction to split the point set. Finding the direction of
maximum variance can be done by eigendecomposing the covariance matrix,
in time (I think) O(n^2 m + n^3) for n dimensions and m points. This is
restrictive for higher dimensions.

To get the best of both worlds, I came up with the idea of using vectors
of the form {-1, 0, 1}^n instead. I call them diagonal vectors. Such
vectors can be packed into compact storage, taking only 2bits per
dimension and dot products can be carried out without multiplication. I
proved that such vectors approximate any vector direction in any
dimension with a maximum angular error of 22.5 degrees. Given an
arbitrary vector in R^n, I have an algorithm to produce the closest
diagonal vector (in angle) in O(n lg n) time and O(n) storage.

What would really make diagonal vectors interesting was if we could
construct an algorithm to find the direction of maximum variance when
restricted to the set of diagonal vectors, and do this much faster than
with arbitrary directions. And here's where I need help. I have a greedy
algorithm candidate that takes O(nm + n log n) time and O(n + m)
storage. However, I don't know if it produces the correct answer or
not!:) It is the traditional problem with greedy algorithms: I should
show that the optimum local choices lead to the global maximum.

You can help by:
1) Proving that my algorithm works or not.
2) Giving a fast algorithm to compute the diagonal vector of greatest
variance.
3) Proving that there does not exist an algorithm to compute the answer
faster than with the arbirary directions.
4) Referring to publications which have given thought to this problem
before and especially using the diagonal vectors in BSP trees (I haven't
found any) .

Here's the paper that contains my analysis and the algorithms:

http://kaba.hilvi.org/News/diagonal.pdf

My greedy algorithm candidate is given in section 5. Note it is still a
draft and probably will contain errors (some positions which are marked
with question marks mean I am taking an educated guess).

Thanks for reading this far:)

--
http://kaba.hilvi.org
.



Relevant Pages

  • Re: [newbie] Best way to search for binary data
    ... [string searching ... ... > searching random bytes. ... algorithms are usually preferred to the Knuth-Morris-Pratt algorithm. ... void init_search(const char *string) ...
    (microsoft.public.vc.language)
  • Re: "Algorithms" in Molecular Biology?
    ... provided now includes its own refutation, continuously evading any test to ... for refuting my point is to provide a definition of Algorithm which all ... Is searching not an answer in and of itself. ...
    (sci.bio.evolution)
  • Re: Algorythms
    ... > the algorithm you're using is just as good or better than something ... searching for something time-tested and already used, ... store data: ... Also, which is the best method of comparing strings, I know IF-s are slower ...
    (borland.public.delphi.thirdpartytools.general)
  • Re: Efficient way to search for overlapping circles of varying radii?
    ... radius of p1 is less than the radius of p2, ... LARGER circle, as long as that larger circle has not already been removed. ... For all we can know your existing algorithm, ... sorting and searching in datasets of more than 1 dimension. ...
    (comp.graphics.algorithms)
  • Re: Menu Tooltips.
    ... After much searching, the nearest I can get to /tooltips_enabled ... user level settings for all the Gnome stuff. ...
    (Ubuntu)