Re: Regression with an EM algorithm



On 7 juil, 16:35, souzi HEC <souzi....@xxxxxxxxx> wrote:
Hi,
I’m new in Matlab software programming and I need a little help to
implement an EM algorithm…

My name is Souad Romdhan, and I am a student in master "Finance and
Actuarial studies" at Institute of High Commercial Studies.

I am currently conducting a research project on the Modeling Loss
Index Triggers to price Cat Bonds: Application of the risk of
hurricanes in USA.

I need to solve with  MATLAB  (especially with EM algorithm) this
specific problem below:
I have two data samples. The first sample contains 66 depths for
historical hurricanes happened in Florida between 1935 - 2008. The
second sample contains only 48 losses associated to these hurricanes.
This second data is missing 19 values of losses.
Since 48 out 66 observations have information about hurricanes losses,
it is necessary to treat the missing data of losses for a further
analysis. So, I chose to make a regression with EM-algorithm:

1-      The log (hurricanes losses) are directly proportional to the
depths of the hurricanes. Besides, statistically we observe a
relationship between the two vectors (depths of hurricanes and log
(losses)). So the approach is to estimate the missing losses by means
of the linear regression model  x= α+ β yi + εi, (Linear Model)
where  y (n x 1) is a vector of observations on the response variable
(depth) and  x (n x p) is the p explanatory variables (=log(loss))..

2-      This approach that will deal with the missing data is the
regression with expectation –Maximum algorithm (EM)
The expectation step:
This algorithm consists of omitting the cases with missing data and
running a regression on what remains.  The regression coefficient will
be used to estimate the missing data.
The maximization step:
After this estimation step, a new regression will be done over the
complete data (including estimated values). With the new regression
coefficients, the missing data is re-estimated. This process will
continue until the estimates are adjusted to give model sampling
error, i.e. it will not be a longer noticeable change.

My problem is that estimates values are not being adjusted after one
step… If I take difference of these two estimated Beta it will be
zero... (So the loop is stopped, because the new beta minus old beta
is zero...). I attached the code with Data - maybe you can try to do
something with this...
Please help me to correct this algorithm with MATLAB Software.
Excuse me for mistakes because I’m not fluent at English writing
Thank you in advance to consider my request.
I look forward to hearing from you in the near future.
Best and kind regards
Souad
Below we find the algorithm and the data

The attached algorithm is:

% load Hurr DATA from Excel file
DATA = xlsread('C:Project\Hurr DATA.xls');
x=DATA(:,3);
y=log(DATA(:,5));
% take observed data only
yobs = y(not(isnan(y)));
ymis = y(isnan(y));
xobs = x(not(isnan(y)));
xmis = x(isnan(y));
% E-STEP
% estimate alpha and beta coefficients
Sxy = (xobs-mean(xobs))'*(yobs-mean(yobs));
Sxx = (xobs-mean(xobs))'*(xobs-mean(xobs));
Syy = (yobs-mean(yobs))'*(yobs-mean(yobs));
b = Sxy / Sxx
a = mean(yobs) - b * mean(xobs)
% calculate r square statistic
rs = Sxy^2 / (Sxx * Syy)
% plot regression line
xmin = min(xobs); xmax= max(xobs);
ymin = a + b*xmin; ymax = a + b * xmax;
plot([xmin xmax], [ymin ymax]); hold on;
% plot observed values
plot(xobs,yobs,'.');xlabel('depth'); ylabel('log(loss)');
% plot estimated values
yhatmis = a + b * xmis;
plot(xmis, yhatmis,'r.'); hold on

% plot 'new' data
figure;
xnew = [xobs; xmis(1:end)];
ynew = [yobs; yhatmis(1:end)];
plot(xnew,ynew,'g.')
% M-STEP
% fit regression to new data
Sxy = (xnew-mean(xnew))'*(ynew-mean(ynew));
Sxx = (xnew-mean(xnew))'*(xnew-mean(xnew));
Syy = (ynew-mean(ynew))'*(ynew-mean(ynew));
b = Sxy / Sxx
a = mean(ynew) - b * mean(xnew)
rs = Sxy^2 / (Sxx * Syy)

n = 200; i = 1;
tol = 1; err = tol + 1;
while and(i<n, err > tol)
    Sxy = (xnew-mean(xnew))'*(ynew-mean(ynew));
    Sxx = (xnew-mean(xnew))'*(xnew-mean(xnew));
    Syy = (ynew-mean(ynew))'*(ynew-mean(ynew));
    bnew = Sxy / Sxx;
    anew = mean(ynew) - b * mean(xnew);
    err = (bnew - b)^2+(anew - a)^2;
    b = bnew; a = anew;
    rs = Sxy^2 / (Sxx * Syy);
    %update dataset
    yhatmis = a + b * xmis;
    xnew = [xobs; xmis];
    ynew = [yobs; yhatmis];
    i = i + 1;
end

My data is

Depht of Hurricanes             Losses of Hurricanes
Complete Data           Incomplete Data

897             NAN
973             NAN
964             NAN
985             NAN
975             NAN
962             NAN
985             NAN
951             NAN
980             NAN
940             NAN
974             NAN
963             NAN
975             NAN
954             NAN
958             NAN
955             10450000
985             NAN
975             NAN
930             55000000
981             NAN
968              67 200 000
966             6000000
974             2000000
948             28000000
982             5400000
983             NAN
977             2580000
909             1000000
980             2562500
955             56410000
970             65300000
964             79969000
1002            5000000
987             100000000
959             NAN
942             NAN
971              7 000 000
967             53864000
993             NAN
950              30 000 000
983              5 000 000
922             15000000000
930             15000000
941             8000000
925             60000000
973             350000000
942             1360000000
973             20000000
987             NAN
964             340000000
956             50000000
987              100 000 000
940              115 000 000
963              35 000 000
932             7430000000
960             4275000000
946             4300000000
950             3000000000
946             855000000
917             707000000
924             33500000
922             10300000000
970             195000000


It's interessting!
.



Relevant Pages

  • Regression with an EM algorithm
    ... historical hurricanes happened in Florida between 1935 - 2008. ... This second data is missing 19 values of losses. ... it is necessary to treat the missing data of losses for a further ... I chose to make a regression with EM-algorithm: ...
    (comp.soft-sys.matlab)
  • Re: Regression analysis -- how-to?
    ... > were it not for the fact that, the last time I asked an algorithm ... Do you want to learn about regression? ... > (linear, quadratic, logarithmic, exponential, power and inverse) and at ... > least two types of linear correlation coefficient (rank and product ...
    (comp.lang.pascal.delphi.misc)
  • Re: theta = (XX)^-1*Xy
    ... This is better-known as the shrinkage estimator. ... I should have mentioned Ridge Regression also, ... -- Reef Fish Bob. ... But there is a lot of "contaiminated and missing data" ...
    (sci.stat.math)
  • Re: results too good?
    ... Artificial neural networks are not a random process. ... Use 7-fold cross-validation where regression MSE or classification ... Repeating this whole algorithm repeatedly ... increased averaging smoothen my line even more? ...
    (comp.soft-sys.matlab)
  • Re: Calculating Expected Value With Multiple Variables Regression
    ... > It appears this group is full of smart statisticians so maybe I can get ... I am pretty clueless when it comes to stats, ... some of the deals having missing data for some of the attributes. ... You could try logistic regression for the binary outcome ...
    (sci.stat.consult)

Loading