Regression with an EM algorithm



Hi,
I’m new in Matlab software programming and I need a little help to
implement an EM algorithm…

My name is Souad Romdhan, and I am a student in master "Finance and
Actuarial studies" at Institute of High Commercial Studies.

I am currently conducting a research project on the Modeling Loss
Index Triggers to price Cat Bonds: Application of the risk of
hurricanes in USA.

I need to solve with MATLAB (especially with EM algorithm) this
specific problem below:
I have two data samples. The first sample contains 66 depths for
historical hurricanes happened in Florida between 1935 - 2008. The
second sample contains only 48 losses associated to these hurricanes.
This second data is missing 19 values of losses.
Since 48 out 66 observations have information about hurricanes losses,
it is necessary to treat the missing data of losses for a further
analysis. So, I chose to make a regression with EM-algorithm:

1- The log (hurricanes losses) are directly proportional to the
depths of the hurricanes. Besides, statistically we observe a
relationship between the two vectors (depths of hurricanes and log
(losses)). So the approach is to estimate the missing losses by means
of the linear regression model x= α+ β yi + εi, (Linear Model)
where y (n x 1) is a vector of observations on the response variable
(depth) and x (n x p) is the p explanatory variables (=log(loss)).

2- This approach that will deal with the missing data is the
regression with expectation –Maximum algorithm (EM)
The expectation step:
This algorithm consists of omitting the cases with missing data and
running a regression on what remains. The regression coefficient will
be used to estimate the missing data.
The maximization step:
After this estimation step, a new regression will be done over the
complete data (including estimated values). With the new regression
coefficients, the missing data is re-estimated. This process will
continue until the estimates are adjusted to give model sampling
error, i.e. it will not be a longer noticeable change.

My problem is that estimates values are not being adjusted after one
step… If I take difference of these two estimated Beta it will be
zero... (So the loop is stopped, because the new beta minus old beta
is zero...). I attached the code with Data - maybe you can try to do
something with this...
Please help me to correct this algorithm with MATLAB Software.
Excuse me for mistakes because I’m not fluent at English writing
Thank you in advance to consider my request.
I look forward to hearing from you in the near future.
Best and kind regards
Souad
Below we find the algorithm and the data

The attached algorithm is:

% load Hurr DATA from Excel file
DATA = xlsread('C:Project\Hurr DATA.xls');
x=DATA(:,3);
y=log(DATA(:,5));
% take observed data only
yobs = y(not(isnan(y)));
ymis = y(isnan(y));
xobs = x(not(isnan(y)));
xmis = x(isnan(y));
% E-STEP
% estimate alpha and beta coefficients
Sxy = (xobs-mean(xobs))'*(yobs-mean(yobs));
Sxx = (xobs-mean(xobs))'*(xobs-mean(xobs));
Syy = (yobs-mean(yobs))'*(yobs-mean(yobs));
b = Sxy / Sxx
a = mean(yobs) - b * mean(xobs)
% calculate r square statistic
rs = Sxy^2 / (Sxx * Syy)
% plot regression line
xmin = min(xobs); xmax= max(xobs);
ymin = a + b*xmin; ymax = a + b * xmax;
plot([xmin xmax], [ymin ymax]); hold on;
% plot observed values
plot(xobs,yobs,'.');xlabel('depth'); ylabel('log(loss)');
% plot estimated values
yhatmis = a + b * xmis;
plot(xmis, yhatmis,'r.'); hold on


% plot 'new' data
figure;
xnew = [xobs; xmis(1:end)];
ynew = [yobs; yhatmis(1:end)];
plot(xnew,ynew,'g.')
% M-STEP
% fit regression to new data
Sxy = (xnew-mean(xnew))'*(ynew-mean(ynew));
Sxx = (xnew-mean(xnew))'*(xnew-mean(xnew));
Syy = (ynew-mean(ynew))'*(ynew-mean(ynew));
b = Sxy / Sxx
a = mean(ynew) - b * mean(xnew)
rs = Sxy^2 / (Sxx * Syy)

n = 200; i = 1;
tol = 1; err = tol + 1;
while and(i<n, err > tol)
Sxy = (xnew-mean(xnew))'*(ynew-mean(ynew));
Sxx = (xnew-mean(xnew))'*(xnew-mean(xnew));
Syy = (ynew-mean(ynew))'*(ynew-mean(ynew));
bnew = Sxy / Sxx;
anew = mean(ynew) - b * mean(xnew);
err = (bnew - b)^2+(anew - a)^2;
b = bnew; a = anew;
rs = Sxy^2 / (Sxx * Syy);
%update dataset
yhatmis = a + b * xmis;
xnew = [xobs; xmis];
ynew = [yobs; yhatmis];
i = i + 1;
end



My data is

Depht of Hurricanes Losses of Hurricanes
Complete Data Incomplete Data

897 NAN
973 NAN
964 NAN
985 NAN
975 NAN
962 NAN
985 NAN
951 NAN
980 NAN
940 NAN
974 NAN
963 NAN
975 NAN
954 NAN
958 NAN
955 10450000
985 NAN
975 NAN
930 55000000
981 NAN
968 67 200 000
966 6000000
974 2000000
948 28000000
982 5400000
983 NAN
977 2580000
909 1000000
980 2562500
955 56410000
970 65300000
964 79969000
1002 5000000
987 100000000
959 NAN
942 NAN
971 7 000 000
967 53864000
993 NAN
950 30 000 000
983 5 000 000
922 15000000000
930 15000000
941 8000000
925 60000000
973 350000000
942 1360000000
973 20000000
987 NAN
964 340000000
956 50000000
987 100 000 000
940 115 000 000
963 35 000 000
932 7430000000
960 4275000000
946 4300000000
950 3000000000
946 855000000
917 707000000
924 33500000
922 10300000000
970 195000000

.



Relevant Pages

  • Re: Regression with an EM algorithm
    ... I need to solve with  MATLAB  (especially with EM algorithm) this ... historical hurricanes happened in Florida between 1935 - 2008. ... it is necessary to treat the missing data of losses for a further ... I chose to make a regression with EM-algorithm: ...
    (comp.soft-sys.matlab)
  • Re: theta = (XX)^-1*Xy
    ... This is better-known as the shrinkage estimator. ... I should have mentioned Ridge Regression also, ... -- Reef Fish Bob. ... But there is a lot of "contaiminated and missing data" ...
    (sci.stat.math)
  • Re: Calculating Expected Value With Multiple Variables Regression
    ... > It appears this group is full of smart statisticians so maybe I can get ... I am pretty clueless when it comes to stats, ... some of the deals having missing data for some of the attributes. ... You could try logistic regression for the binary outcome ...
    (sci.stat.consult)
  • Re: theta = (XX)^-1*Xy
    ... I should have mentioned Ridge Regression also, ... of getting the "wrong sign" for some of the regression coefficients, ... But there is a lot of "contaiminated and missing data" ...
    (sci.stat.math)
  • Re: regression question
    ... small, weak, slow-moving hurricanes that impact areas with ... relatively low wealth produce relative low damage ... approaching the problem using regression challenging. ... the wind-radii was ...
    (sci.stat.math)

Loading