Re: Q(λ)-learning algorithm question



Why do you think that it is difficult making it work with function
approximation?

The attitude is exactly as you wrote - maximum value for a state-action
pair will be the value of the luckiest agent.

You also right that it is a biased overestimation. Truly, in simulation
the CQ(lamda) converged, but in real experiments with a mobile robot it
didn't converge for various ranges of discount factors and eligibility
traces (but still improved performance while comparing with the
standard Q(lamda)). Another reason it might didn't converge is the low
number of learning episodes (50) I performed. Do you have an idea how
shall I prove mathematically the CQ(lamda) or any other RL-based
algorithm? I mean, how to describe the algorithm more scientifically;
define it mathematically much better than described in the paper. For
example in the form of convergence or superiority. How can I
demonstrate advantages or disadvantages of an algorithm mathematically?
How can I prove convergence or divergence? How can I show if it is
better or worse than other algorithms? Tough questions...

I read both papers - yours and Kretchmar's. They are very relevant to
my research and I'll quote them. Still, I need more time to dive into
them more seriously - I hope by Tuesday this week.

Thanks,

Uri.

.



Relevant Pages

  • Re: request for algorithm
    ... >that can be solved by means of N-dimensional iterative algorithms, ... but when using only previous points convergence gives me ... >algorithm ensures convergence. ...
    (sci.math.num-analysis)
  • Re: How to prove that a random sort algorithm converges?
    ... There are more than one way to prove the algorithm. ... We need to define what is the convergence criterion. ... It's not necessary to speak of "termination". ... probability 1" is better than saying "the algorithm will converge". ...
    (sci.math)
  • Re: request for algorithm
    ... but when using only previous points convergence gives me ... > discretization compared to a central alternative one, ... > algorithm ensures convergence. ... because by nature the flow of information comes from every ...
    (sci.math.num-analysis)
  • Re: time-series smoothing
    ... >I note that the Nelder-Mean algorithm is mentioned in Numerical Recipes ... >has provided a variety of convergence proofs for direct search ... one of which looks like a variant of the Nelder-Mead ...
    (sci.stat.edu)
  • Re: time-series smoothing
    ... >I note that the Nelder-Mean algorithm is mentioned in Numerical Recipes ... >has provided a variety of convergence proofs for direct search ... one of which looks like a variant of the Nelder-Mead ...
    (sci.math.num-analysis)