Re: APL and (very) large arrays



Randy Macdonald wrote:
"Stefano \"WildHeart\" Lanzavecchia" <wildstf@xxxxxxxxxxx> wrote in
news:dq2gee$aqps$1@xxxxxxxxxxxxxxxxx:



2) The two biggest elements of a vector (for instance to build the
  associated Huffman code. Surely there must be a better way than
  sorting the whole vector and then making a 2 take ?

As far as I know, there's no better way. You can avoid the finally
indexing by doing the 2 take of the index vector, but I cannot really
see how to improve the algorithm given the overhead of the
interpreter. Of course some cleverer guy will prove me wrong
straight-away. I mean: you could always max1<-ceiling/vec
max2<-ceiling/vec without max1
but you would still have to traverse the original vector three times
and make one almost full one copy of it so, unless you timed it
against your favourite implementation, you wouldn't know if that would
be faster than the gradedown for any size of vec.


Channeling Bill Nye*, I just did some timing tests, comparing the (a) 2 take sort vs. (b) biggest,biggest without biggest. (b) won. Finding the biggest, even twice, seems faster than doing a complete sort.

Obviously. Sort is at best O(n log2(n)) depending on the sorting algorithm used, whereas biggest is O(n). Regardless of the constants of proportionality, asymptotically b must win out. It is also easy to do in one pass in a scalar language with at most two comparisons per element, or more efficiently with a total of no more than n+log2(n)-1 comparisons. Whether or not these methods would pay off in APL would require an experiment.

As to the OP's question about primes, there are advanced efforts to compute pi(n) for very large values of n (up to 10*23), but the only elementary method is to find all primes <= n and count them. The sieve of Eratosthenes is the simplest method for finding all primes <= n. With careful implementation, values of n up to 1000000000 could probably be done in a 250MB workspace. By representing only the odd values the sieve would require 62.5MB. The algorithm would need to construct a 500000000 bit vector (another 62.5MB) and and it with the sieve on the order of 3000 times. This would take quite a while. At the end, counting the one bits remaining in the sieve (+/) and adding 1 (for 2, which is not represented) would complete the task. This assumes that the interpreter can do +/ on a bit vector without converting the whole thing to an integer vector.

There is also a considerably more complicated linear sieve algorithm but it would require 4GB of storage for its main array and is built upon accessing and updating individual array elements in a set of nested loops, which is not APL's forte.
.




Relevant Pages

  • Re: Prime Number Question
    ... were discovered a simply algorithm for generating all prime numbers ... directly, in batches, with each batch being used ... the current best factoring method, the number field sieve, only becomes ... characteristic of primes ...
    (comp.unix.bsd.openbsd.misc)
  • Re: simple algorithm for finding primes
    ... Well it includes the Sieve of Eratosthenes within the source. ... Your algorithm is a fine, simplistic algorithm, however it really only works ... because you determine the entire sequence of primes one after another. ... for longer and longer sequence of primes it will get slower and slower and ...
    (comp.lang.c)
  • Re: Prime Number Question
    ... were discovered a simply algorithm for generating all prime numbers ... directly, in batches, with each batch being used ... the current best factoring method, the number field sieve, only becomes ... characteristic of primes ...
    (comp.unix.bsd.openbsd.misc)
  • Re: Prime Number Question
    ... were discovered a simply algorithm for generating all prime numbers ... directly, in batches, with each batch being used ... the current best factoring method, the number field sieve, only becomes ... characteristic of primes ...
    (comp.unix.bsd.openbsd.misc)
  • Re: Prime Number Question
    ... were discovered a simply algorithm for generating all prime numbers ... directly, in batches, with each batch being used ... the current best factoring method, the number field sieve, only becomes ... However, given efficient prime factorisation, yes, quite a few crypto ...
    (comp.unix.bsd.openbsd.misc)