Re: LONG: Variable internals and copy-on-write




WARNING: Very long post

Enclosed is a MEX source file that demonstrates some
of the internal
memory structure of Matlab variables. This file
blatantly ignores
Matlab API, and pokes around data structures
directly! This might
destroy your data, explode your computer, etc. But I
don't think it
will :) Seriously, though, save data in open
programs before trying
this.

The rest of this post is a detailed description of
the storage scheme
for Matlab variables, with an emphasis on the method
Matlab uses to
implement "copy-on-write". Familiarity with C MEX is
assumed.

The information presented here is only guesses. It
is not endorsed or
verified by The Mathworks, and I make no claims about
its accuracy. I
and my employer disclaim all liability and
responsibility for anything
that occurs due to use of this information. Use with
caution!

The following is a more detailed version of the
mxArray structure than
the one found in matrix.h. In that file, some of
these fields are
grouped together and named "reserved" or some such.

struct mxArray_tag {
char name[mxMAXNAM];
int class;
int vartype;
mxArray *crosslink;
int number_of_dims;
int nelements_allocated;
int dataflags;
int rowdim;
int coldim;
union {
struct {
void *pdata;
void *pimag_data;
void *irptr;
void *jcptr;
int reserved;
int nfields;
} number_array;
} data;
};

In general this speaks for itself... read through the
source of the
MEX file attached to find the details of each field.
However the
following special cases need some extra explanation:

Cell arrays:

The header of a cell array looks identical to the
header for a numeric
real array (except the class, of course!). The data
for a cell array,
however, consists of an array of pointers to more
mxArray structures,
which are the contents of the cell array. This is
the intuitive
arrangement if you consider that cell arrays are
containers for all
other Matlab array types, including more cell arrays.
(See figures below).

Struct arrays:

Structs are stored exactly like cells, except for the
field names.
The number of fields is stored in the element
nfields, and the
imaginary data pointer (pimag_data) is a (char *)
that points to a
list of field names. The first 32 bytes are the
first field name
(null-terminated), the second 32 bytes are the second
field name, etc.
The values of the fields are stored like a cell
array, but interleave
the fields. That is, for a two element structure
with 3 fields, 3
pointers for the 3 fields of the first element are
stored, followed by
3 more pointers for the second element.

Cross links:

This is the most interesting part of the whole thing!
You might have
heard before that Matlab implements a "copy-on-write"
algorithm. This
means that if you copy an array, only the header is
copied; the data
itself is shared between the two arrays. The first
time one of the
arrays is modified, Matlab first copies all the data.

Matlab uses what I call "cross links" to implement
this. When you
copy an array (using b=a for instance), Matlab
creates a new header,
and sets the data pointer (pdata) to point to the
same data as the
source array. It then sets the crosslink field of
the new header to
the address of the old header, and vice versa. Then,
if any time an
array is written to, the crosslink field is checked.
If non-zero, a
copy of the data array is made, both crosslink fields
are zeroed, and
the modification continues.

For more than one copy, a circular list is used.


-----------------------------------------
a = [35.7 100.2 1.2e7];

mxArray a
pdata -----> 35.7 100.2 1.2e7
crosslink=0


-----------------------------------------
b = a;

mxArray a
pdata -----> 35.7 100.2 1.2e7
crosslink / \
| / \ |
| | |
| | |
\ / | |
crosslink |
mxArray b |
pdata --------

-----------------------------------------
a(1) = 1;

mxArray a
pdata -----> (1) 100.2 1.2e7
crosslink=0


crosslink=0
mxArray b
pdata ------> 35.7 100.2 1.2e7 ...

-----------------------------------------


Cross links and cell arrays:

When copying a cell array, the same thing happens. A
new header is
created, but the data array (in this case an array of
mxArray*) is
shared. The first time a cell element is modified,
this array is
copied. However, don't forget that each element
itself points to
another mxArray. So each of those mxArray headers is
copied as well.
But ONLY the headers! The data for each of the cell
elements is again
crosslinked between the two cell arrays.

-----------------------------------------

mxArray a mxArray mxArray mxArray
pdata -----> pdata pdata pdata
crosslink / \ | | |
| / \ | | | |
| | | | | |
| | | \ / \ / \ /
\ / | | [1 2] 'hello' [100x100
100 double]
crosslink |
mxArray b |
pdata --------

-----------------------------------------

a{2}(1) = 'j';

mxArray a mxArray mxArray mxArray
pdata -----> pdata pdata pdata
crosslink=0 | | |
| | |
\ / \ / \ /
[1 2] 'hello' [100x100
'hello' [100x100 double]
/ \ / \
crosslink=0 | |
mxArray b pdata pdata
pdata -----> mxArray mxArray mxArray
pdata
|
|
\ /
'jello'

(Note: the mxArray's that share pdata have
have crosslinks between them)
-----------------------------------------

About the program:

Read the comments at the top of the file!
Compile it with a working mex installation using
mex headerdump.c

Run it with one argument: any matlab variable of any
time, including
temporary variables (as in headerdump([1])). It will
display some
goop extracted from the header, including any
crosslinks. To see
this, make a copy of a variable and run it on that.

For cell arrays, headerdump will print a list of all
the elements (up
to 30), including crosslinks, if any.


Hope this is useful to someone!

-Peter Boettcher


Sorry, MIME is not working for me. There are two
files here as inline
plain text. Cut and paste into an editor.


----begin headerdump.c----
/* headerdump.c: MEX file to show the internals of an
mxArray
header.

WARNING! WARNING! WARNING! This program blatantly
tly abuses
the Matlab API! It pokes around in private data
ata structures
without remorse! The structures may change from
rom version to
version (or even platform to platform for all we
we know).

That said, this program only looks at memory. So
So even if you
get a Matlab segfault, nothing should be
be corrupted. Please
let me know, though, if you do segfault, with
ith instructions on
how to reproduce it.

This works on Matlab 5.3, and from looking at the
the header files
in R12, the structure is the same. No guarantees
ees though!

The author and his employer disclaim all liability
ity for any damage
caused by this program. For experimental use
use only!


Author: Peter Boettcher <boettcher@xxxxxxxxxx>
Last modified: <Fri Dec 22 14:42:20 2000 by pwb>
wb> */

#include "mex.h"
#include "mxinternals.h"

/* Print one-liner describing mxArray, including any
crosslinks */
void briefdump(mxArray *in)
{
int *tmp;
int i;

printf("(%s) ", mxGetClassName(in));
printf("Address: %p", in);
if(in->crosslink)
printf(" <linked to %p>", in->crosslink);

if(in->number_of_dims == 2) {
printf(" [%i %i]\n", in->rowdim, in->coldim);
} else {
printf(" [");
tmp = (int *)(in->rowdim);
for(i=0; i<in->number_of_dims; i++) {
printf("%i ", tmp[i]);
}
printf("]\n");
}
}

/* Print detailed description of mxArray */
void dumpMxArray(mxArray *in)
{
int *tmp;
int i;
int numel;

printf("Name: %.32s\n", in->name);
printf("Address: %p", in);
/* Crosslink means two or more variables point to
to the same
data. The link allows Matlab to copy the array
array and update
the affected variables if someone wants to
ts to modify an element */
if(in->crosslink)
printf(" <Crosslinked to %p: %.32s>\n",
\n", in->crosslink,
in->crosslink->name);
else
printf("\n");

/* This field has a unique value for each class,
s, but is not
the same as the class ID */
printf("Related to classID? %i (true: %i)\n",
in->class, (int)mxGetClassID(in));

printf("Variable type: ");
switch(in->vartype) {
case MXVARNORMAL:
printf("Normal\n");
break;
case MXVARPERSIST:
printf("Persistent\n");
break;
case MXVARGLOBAL:
printf("Global\n");
break;
case MXVARSUBEL:
printf("Subelement of a cell or struct\n");
break;
case MXVARTEMP:
printf("Temporary\n");
break;
default:
printf("Unknown variable type: %i (Please email
mail boettcher@xxxxxxxxxx)\n",
in->vartype);
}

printf("Data Flags: Logical %i DblScalar %i
%i (other: %x)\n",
(in->dataflags&MXLOGICALMASK)!=0,
(in->dataflags&MXSCALARMASK)!=0,
(in->dataflags&0x00fffffc));

if(in->dataflags & 0xff000000)
printf("User Data: %x\n",
\n", (in->dataflags&0xff000000)>>24);


printf("\nDimensions (%i): ", in->number_of_dims);
if(in->number_of_dims == 2) {
printf("[%i %i]\n", in->rowdim, in->coldim);
numel = in->rowdim * in->coldim;
} else { /* multidimensional */
numel = 1;
printf("[");
tmp = (int *)(in->rowdim);
for(i=0; i<in->number_of_dims; i++) {
numel *= tmp[i];
printf("%i ", tmp[i]);
}
printf("]\n");
}

printf("Real data: %p",
", in->data.number_array.pdata);
if(in->dataflags & MXSCALARMASK)
printf(" [%g]", *((double
uble *)in->data.number_array.pdata));
printf("\n");

if(in->data.number_array.pimag_data) { /* complex
ex */
printf("Imag data: %p",
%p", in->data.number_array.pimag_data);
if(in->dataflags & MXSCALARMASK)
printf(" [%g]", *((double
double *)in->data.number_array.pimag_data));
printf("\n");
}


if(in->nelements_allocated) { /* sparse */
printf("Sparse matrix: Nelements: %i\n",
\n", in->nelements_allocated);
printf("Column ptr: %p\nRow ptr: %p\n",
in->data.number_array.irptr,
ptr, in->data.number_array.jcptr);
}

if(in->data.number_array.reserved) /* what's this?
s? */
printf("Unknown field: %x (Please email
mail boettcher@xxxxxxxxxx)\n",
in->data.number_array.reserved);

if(in->class == 6) /* struct */ {
printf("Number of struct fields: %i\n",
\n", in->data.number_array.nfields);
numel *= in->data.number_array.nfields;
for(i=0; i<in->data.number_array.nfields; i++)
printf(" %.32s\n", (char
(char *)in->data.number_array.pimag_data + i*32);
printf("\n");
}
printf("\n");

/* Structs are stored like cells, using mxArray
ay pointers in the real
data spot. They are stored in "field major"
ajor" order, meaning the
pointers to the values of the fields of the
f the first struct element
are stored in order, then the columns of the
f the struct array, then
the rows. */
/* Print a one-liner for each element of the cell
ll or struct, up
to a maximum of 30 */
if(in->class == 6 || in->class == 5) {
for(i=0; i<((numel < 30) ? numel : 30); i++)
if(((mxArray
xArray **)in->data.number_array.pdata)[i])
briefdump(((mxArray
y **)in->data.number_array.pdata)[i]);
else
printf("(nil)\n");
}
}

void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
if(nrhs < 1)
mexErrMsgTxt("One input required.");

dumpMxArray(prhs[0]);
}

---end headerdump.c---

---begin mxinternals.h---
#ifndef MXINTERNALS_H
#define MXINTERNALS_H 1


#define MXVARNORMAL (0)
#define MXVARPERSIST (1)
#define MXVARGLOBAL (2)
#define MXVARSUBEL (3)
#define MXVARTEMP (4)

#define MXLOGICALMASK 0x02
#define MXSCALARMASK 0x01

/* Blatant disregard for MATLAB API. This allows you
to poke around an mxArray structure directly. Use
with extreme caution! May not be portable across
platforms or versions, may cause Matlab segfaults,
may cause your computer to explode and eat all
all your data... */
struct mxArray_tag {
char name[mxMAXNAM];
int class;
int vartype;
mxArray *crosslink;
int number_of_dims;
int nelements_allocated;
int dataflags;
int rowdim;
int coldim;
union {
struct {
void *pdata;
void *pimag_data;
void *irptr;
void *jcptr;
int reserved;
int nfields;
} number_array;
} data;
};

#endif
---end mxinternals.h---



This is very interesting stuff, thank you very much for posting it!
.



Relevant Pages

  • Re: Passing an array of structuresfrom a pointer?
    ... I've tried an attempt to do that, and I tried to "clean" the header and main ..c file. ... int LIST_NUMBER; ... typedef struct ddListBox{ ... DDLB_COLL1array! ...
    (microsoft.public.vc.language)
  • Re: Comparing pointers to NULL
    ... Suppose our data is an array of records with each record having many ... In C it is natural to use a struct to represent a record and an ... in the body of a loop over record number the current record number ...
    (comp.lang.c)
  • Re: Passing an array of structuresfrom a pointer?
    ... struct ddlbColl1 * xxx; ... of the array type! ... int LIST_NUMBER; ...
    (microsoft.public.vc.language)
  • Re: I went a help
    ... int main{ ... nig is an array of struct. ... Since name is a pointer, ...
    (comp.lang.c)
  • Re: Memory Allocation Problem, please help
    ... typedef struct word_tag{ ... array is not an array. ... static int total_word_count; ... static int word_index(const char *word); ...
    (comp.lang.c)