Re: Tracking down a garbage collection problem



Wincent Colaiuta wrote:
I'm trying to work out ways to reduce the memory use of one of my
projects, but I don't know what methods are available to the Ruby
programmer for profiling memory use and tracking down garbage
collection problems.

The short version:

I have a project where processing a file can consume dozens of
megabytes of memory; if I process many files in a single run then
total memory usage can reach hundreds of megabytes or more than a gig.
I would expect garbage collection to kick in along the way but it
doesn't seem to be happening, memory usage grows and grows, and I
don't know where to start zeroing in on the problem.

The long version:

I've written an object-oriented templating system[1] that incorporates
a memoizing packrat parser. As each file is parsed the parser
"memoizes" the partial results for speed. In a lengthy file the size
of the memoizing cache can grow quite large (dozens of megabytes). But
I would expect the entire contents of the memoization cache to get
garbage collected when I move on to the next file; the cache itself
has definitely fallen out of scope by that time. But garbage
collection doesn't seem to be happen, as memory use grows linearly as
I batch process input files.

As this is a largish, complicated project I don't even know where to
begin to start investigating this. So really, I am looking for general
information on techniques for measuring and exploring memory use and
garbage collection in Ruby.

Thanks in advance for the advice!
Wincent

[1] http://walrus.wincent.com/

Well ... where to begin?? :)

1. First of all, get the notion that "premature optimization is the root
of all evil" out of your head. The only sense in which that maxim is
valid is when the word "premature" is strongly emphasized. Part of the
practice of software engineering, and what separates software
engineering from "mere coding" is knowing what the algorithms of choice
are for the problem you are trying to solve -- and their resource
requirements -- and using them. Dijkstra may have said the premature
optimization thing, but it's obviously been taken out of the context of
his *massive* output of practical computer science and software
engineering teachings. Read *everything* he wrote!

2. In general, to reduce memory usage, you must do one or both of two
things: recompute things rather than storing them in memory, or write
things explicitly out to "backing store" and read them back in.

3. Languages without explicit object destructors need to be fixed,
including Ruby. :) However, part of software engineering in the absence
of them is to make sure there are no references to objects you no longer
want, and then explicitly call the garbage collector. I do a lot of
coding in R, which is a dynamic, garbage collected language for
scientific and statistical computing. I've got a 1 GB workstation, and
still I have "normal sized problems" that can overflow memory. A simple
delete of unused objects (R has "rm", which will delete an object from
the workspace) followed by a call to the garbage collector usually gets
me going again.

4. Relational databases are your friend. They are designed and optimized
for dealing with large and complicated datasets, and object-relational
mappings like ActiveRecord and Og (Object Graph) exist in Ruby to make
working with them as simple as possible. How do you define "large"? For
a single-user system like a laptop or workstation, figure you have
something like half of the installed RAM to run your applications. At
least on Linux workstations, things like I/O buffers will take up the
other half. If you're only running this one application, anything bigger
than half of your installed RAM is too big and ought to be redesigned to
use a database.

--
M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P)
http://borasky-research.net/

If God had meant for carrots to be eaten cooked, He would have given rabbits fire.


.



Relevant Pages

  • Re: 386sx/25mhz compatibility
    ... I'd need to add hard drives also, ... a better computer out of the garbage. ... Pentium, mid-2001, suddenly better computers started appearing. ... later, I got a 50MHz 486 with 16megs of memory, for ten dollars. ...
    (comp.os.linux.hardware)
  • Re: FileStream.Close() & GarbageCollection - Memory Leak Question
    ... I considered the memory a "resource" that would be freed but I have no ... The only resource that's truly "managed" is memory in the garbage ... managed objects can still "own" unmanaged resources (pointers ... You could think of the handle like a dry cleaning ticket. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Larry Wall, on Tcl
    ... Sure, if each chunk of memory allocated was only freed in one place, it wouldn't be too painful. ... It keeps the bug from being noticed unless the subsystem using slow physical memory to back the virtual memory is overwhelmed. ... Both of those are prevented by garbage collection. ... What do you do to explicitly free up registers in your C code when you're done with them? ...
    (comp.lang.tcl)
  • Re: Memory management/leak?
    ... I even took a look at the decompiled code for Random and DateTime (using ... I suspect this might be a bit faster and create ... less garbage than you calling DateTime.Now.Millisecond as DateTime.Now ... Of course the other thing you can do is run your code in a memory profiler. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Ruby 1.9.0/1.8.7/1.8.6/1.8.5 new releases (Security Fix)
    ... That'd be really valuable to the Ruby ... Not only to the community, ... atm to do memory profiling. ...
    (comp.lang.ruby)