Re: Memory-mapped persistent hash?



From: "Tim Pease" <tim.pease@xxxxxxxxx>

You could do a mmap solution. Modify Hash such that []= does a
Marshal.dump of your object, stores the object into the mmap, and then
that memory location is stored in the Hash instead of the object.

[] must also be modified to take the memory location, Marshal.load the
object from the mmap, and then return the object.

The hard part of this is doing the memory management of the mmap --
when an object is deleted from the hash, removing it from the mmap;
consolidating unused mmap regions; etc. All the standard MMU stuff
you normally don't have to deal with in Ruby.

It would be much easier to implement if all the objects being stored
in the Hash were guaranteed to be the same size. Then you would just
need an free/allocated array to keep track of what can go where in the
mmap.

Agreed. But ironically, what gets me, is that with a modern
VMM, this is exactly what is already going on with Ruby's hash
in memory. Except that the backing store is the system swap
file, and so, not persistent.

In principle, I just want to change the backing store to a
memory mapped file, instead. :-)

I've wondered what would happen if one took a nice malloc
implementation, made it operate inside a heap that was memory-mapped onto a file, and then took something like the
STL hash_map (or ruby's hash) and wired it to the malloc allocating from the memory-mapped file.

Intuitively, it seems it would have no choice but to perform
fantastically, as long as the whole file could be mapped
into memory.

However, once the file size exceeded available memory, I
can imagine that it might degrade to sub-optimal performace.

Along these lines, I've also wondered if one could get a
ruby application to persist similarly, (in principle!)
by wiring ruby's memory allocation functions to a malloc
that was allocating from a memory-mapped file. Of course
the tricky part would be dealing with all the objects
containing system resources that can't be persisted, such as file handles, etc. etc. Probably a nightmare in
practical terms, unless the language and its libraries
were designed that way from the start...

Ah, well. In talking about this, it seems there are really
two scenarios for memory-mapped persistent hashes. One when
all the pages can fit in RAM; and the other when the filesize
would greately exceed available RAM (and even worse, when
the filesize exceeds the maximum contiguous block that
can be even mapped into the process address space at all.)

Hmm...


Regards,

Bill



.



Relevant Pages

  • Re: Workaround with Remoting in CF
    ... > else you can store it in the Application Cache. ... >> I need a hash table in memory and clients accesing to it. ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: Parsing Large Files
    ... This gives me the hash %id which is keyed by ... Since you want to have the Y and X, store ... It is easier to sort in memory than ... to sort files from perl. ...
    (comp.lang.perl.misc)
  • Re: advantages of mmap() over read()
    ... The trie is about 1.1 MB, ... > hash table is about 358k. ... How is using mmap() ... If you read once and stor in memory, ...
    (comp.unix.programmer)
  • Re: [PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS
    ... will see _none_ of the write because the mmap write occurred during ... inode in the buffered I/O *writeback* path, we have to stop pages being ... filesystems in both locking and the way it treats the page cache. ... No, but the data _in memory_ will, and now when the direct read ...
    (Linux-Kernel)
  • Re: How process size is calculated? Is it always based on the current highest available address in m
    ... But allocations/deallocations with mmaps can eventually lead to non-continuously mmapped memory. ... Before mmap: ... UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND ...
    (freebsd-hackers)