Re: Help please!!! Two stats questions from a biologist...
- From: Ray Koopman <koopman@xxxxxx>
- Date: Mon, 25 Jun 2007 11:14:34 -0700
On Jun 24, 7:22 pm, fil...@xxxxxxxxx wrote:
Hey guys, a biology student (me :) ) needs some help from you maths
people.
Question 1:
I'd like to have an expression (as elegant as possible) for P(x=k),
the prob. of having k positive results for a series of n Bernoulli
experiments. But that's a binomial distribution! Hold on a sec... I
need a formula for P(x=k) in the case where each experiment has a
different probability p(i) i=1,...,n . (So in this case nCk , p**k,
(1-p)**(n-k) where p is constant, don't apply). All the expressions i
wrote range from incorrect to awfully inelegant, so perhaps you guys
come up with a nice one. I know it could be hard to write a complex
expression here so Latex code is just fine. Any ideas?
Question 2:
How to compare the compositional dissimilarity between two proteins?
Each composition can be represented by a 20-component vector, where
the components are the respective fractions for the 20 amino acids, so
of course the components add up to 1. I just learned euclidean
distance does a terrible job at this since it can show as "close" two
compositional profiles that are in fact negatively correlated.
Instead, Pearson's distance (1- r) and Spearman's (1-rho) seem to be
ok for the task. But, in your opinion, which one makes more
statistical sense for this purpose? I'd like the correlation measure
to be as sensitive as possible. So... Pearson's or Spearman's to
compare protein compositions??
Thanks in advance, guys. I'm sure you can help me with this.
Felipe.
Answer 1:
It's called the compound binomial distribution, and there is no
simple expression for the P(x = k), which is the sum, over all
n_choose_k binary vectors (y_1,...,y_n) for which sum y_i = k,
of prod p_i^y_i (1-p_i)^(1-y_i).
Answer 2:
There are many dissimilarity functions that are suitable for such
data. Try Hellinger distance = sqrt(sum (sqrt(p_i)-sqrt(q_i))^2 ),
where sum p_i = sum q_i = 1.
.
- References:
- Prev by Date: Help please!!! Two stats questions from a biologist...
- Next by Date: Re: Help please!!! Two stats questions from a biologist...
- Previous by thread: Help please!!! Two stats questions from a biologist...
- Next by thread: Re: Help please!!! Two stats questions from a biologist...
- Index(es):
Relevant Pages
|