Re: Q(λ)-learning algorithm question
- From: kartoun@xxxxxxxxx
- Date: 15 Jan 2006 11:46:39 -0800
Why do you think that it is difficult making it work with function
approximation?
The attitude is exactly as you wrote - maximum value for a state-action
pair will be the value of the luckiest agent.
You also right that it is a biased overestimation. Truly, in simulation
the CQ(lamda) converged, but in real experiments with a mobile robot it
didn't converge for various ranges of discount factors and eligibility
traces (but still improved performance while comparing with the
standard Q(lamda)). Another reason it might didn't converge is the low
number of learning episodes (50) I performed. Do you have an idea how
shall I prove mathematically the CQ(lamda) or any other RL-based
algorithm? I mean, how to describe the algorithm more scientifically;
define it mathematically much better than described in the paper. For
example in the form of convergence or superiority. How can I
demonstrate advantages or disadvantages of an algorithm mathematically?
How can I prove convergence or divergence? How can I show if it is
better or worse than other algorithms? Tough questions...
I read both papers - yours and Kretchmar's. They are very relevant to
my research and I'll quote them. Still, I need more time to dive into
them more seriously - I hope by Tuesday this week.
Thanks,
Uri.
.
- Follow-Ups:
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- References:
- Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Q(λ)-learning algorithm question
- Prev by Date: Re: Q(λ)-learning algorithm question
- Next by Date: Re: Q(λ)-learning algorithm question
- Previous by thread: Re: Q(λ)-learning algorithm question
- Next by thread: Re: Q(λ)-learning algorithm question
- Index(es):
Relevant Pages
|
|