Re: correlating a sample = ordered vector with a database of such vectors, prioritization of results, statistics



Dear Ray,
thank you for your reply.

Replication:
The drug comes from just 1 vial, and the drug is stable. There is some
time between the 1st and 2nd replicate, which means that the the cells
used are not exactly the same in terms of their "inner state" (energy
state, gene expression....) which to certain degree influences their
response to the drug. So the heterogenity of cell culture is the main
source of variability between the replicates.

I guess I see the point of the p-values you made but still I find
difficult to accept ( matter of inner reconciliation -it may well be
just a matter of time, when my brain decides, that I agree :-)). I
know that there are permutation tests which I thought were suited for
such an estimation that a correlation could arise by chance "from
within the sample". Moreover I can think beyond those 60 cell lines,
there are hunderts others somewhere and I expect that when I treat
them with two drugs well correlated across the panel, their effects
will be correlated again (reasonable assumption that shared resistance
to a drug is caused by shared property - e.g. high activity of an
enzyme breaking the drug).

In fact I wanted to use p-values as a means of eliminating the effects
of some outlying points on correlation (high correlations low p-
values) and I hoped to use beta-uniform mixture (http://
bioinformatics.oxfordjournals.org/cgi/reprint/19/10/1236) for cuttoff
choice. Are those uses unjustified? If so, what would be the right
method of compound prioritization?

To make an average from the replicates and order the correlated
compounds according to the correlation coeficients? Should the cell
lines where the replicates obviously disagree be removed (there is a
case when I see 15% rise in correlation after removal of just 1
point)? How to put results for 2 and more "mydrugs" where I suppose
common action together?

Thank for your insights
Jiri Voller






On Jan 16, 9:36 am, Ray Koopman <koop...@xxxxxx> wrote:
On Jan 15, 9:32 am, karpatov <jirivol...@xxxxxxxxx> wrote:

[...]
What I did :
- converted activities into (- log10) scale (not normalised)

Fine. I was going to suggest logging the data. The base doesn't
matter -- whatever you're most comfortable with. I can only
speculate as to what "normalised" might mean in this context.

- calculated correlation coeficient of my profiles and profiles in
database (tried pearson, spearman, kendall)

You might also try some kind of intraclass correlation (ICC),
but if you're not familiar with ICCs, put this suggestion on hold
until other problems are settled.

- calculating p-values - null hypothesis corelation is 0
(cor.test in R)

As I explained in my December 4, 2007, response to your previous
question, the usual p-value calculations do not apply, because the
60 cell lines are fixed, as opposed to being randomly sampled from
a population. The parameter you want to estimate is what the
correlation over those 60 cell lines would be if you could measure
activity errorlessly. The problem is that any correlation you get
will be affected by measurement error. You need to adjust the
correlation for measurement error, and to estimate how accurate
that adjustment has been.

You do not say how the two replicates were obtained or used. Were
there two batches of each drug (or something analogous to batches),
with each batch being split into 60 samples, and each sample being
tested on a cell line? Or was there only one batch of each drug,
with each batch being split into 120 samples, and 2 samples being
tested on a cell line? Statistically, the difference between those
two examples is that the first one gives matched data and the second
does not. In either case, the best simple estimate of the correlation
will probably be obtained by using the average of the two measures
of activity, rather than by combining the correlations, but the
matched-unmatched distinction will be relevant when estimating the
accuracy of the correlation.

[...]


th
.



Relevant Pages


Loading