Re: Q(λ)-learning algorithm question
- From: Rémi <no.spam@please>
- Date: Tue, 17 Jan 2006 14:32:38 +0100
kartoun@xxxxxxxxx wrote:
Me too, I don't know how to do it (for now). We can collaborate scientifically if you'd like.
About the above two papers I promised to review:
1) R. Matthew Kretchmar. Parallel Reinforcement Learning. The 6th World Conference on Systemics, Cybernetics, and Informatics. 2002.
Kretchmar investigates the problem of multiple reinforcement learning agents attempting to learn the value function of a particular task in parallel for the n-armed bandit task. A parallel reinforcement learning solution is suggested to overcome the problem of statistic overwhelming by an agent’s information that is correspondingly has a larger accumulated experience than the other agents. To overcome this problem each agent keeps track of two sets of parameters: (i) one set for the actual independently experienced trials of a particular agent, and (ii) an additional set for combined trials among all other agents. The agents share accumulated experience by keeping separate parameters for their own independent experience and the combined experience of all other agents. In addition, the agents can compute an estimate (general Q value which is a weighted combination of a particular agent with all the other agents) based upon the global experience. This estimate is computed from a weighted average of the agent’s own independent experience and the accumulated experience of all other agents. Results show that as more as you add agents, learning is accelerated because there is a larger pool of accumulated experience upon which to base future estimates. The experiment with 10 parallel agents (the highest number of agents in Kretchmar's experiments) learns the fastest.
2) Your paper: Odalric-Ambrym Maillard, Rémi Coulom, and Philippe Preux. Parallelization of the TD(λ) Learning Algorithm.
Odalric-Ambrym et al. present a method to parallelize the TD(λ) algorithm for reducing computation times for various learning tasks. An extensition to Kretchmar's is presented in which agents share their experience by averaging their value functions.
More precisely, the weights of the parametric approximators are averaged, which is not equivalent to averaging the value functions, since the value does not depend linearly on weights in feedforward neural networks.
This is done for multi-state episodic tasks, using the TD(λ) algorithm and generalizing function approximators. Experiments using the same TD(λ) algorithm conducted on three kinds of problems: (i) the pendulum problem; (ii) the cart-pole problem and (iii) the swimmer problem, show that using more than an average seven processes reveals no speeding up in computation times.
Some remarks (in my opinion): 1) You had to mention neural networks in the abstract. 2) "Nevertheless, for this method to be successful, we assume that the weight changes between two cycles differs little from what we would have with the sequential method, and this does not hold if P or G is too big.". Why? Please define "too big". 3) What is a small G? How small?
Numerical values are provided on the plots. In general, what is big and small depends on the problem and approximation architecture.
4) Why more than 7 processes don't speed up computation times? Is there any reasonable explanation for that? It's task specific I believe - but you tried it for 3 different tasks, didn't you?
There are two kind of losses: losses because of time lost in communication (processes are running on different machines that communicate with MPI over an ethernet network), and losses because running trials in parallel is not equivalent to running them sequentially. Numerical values are different for the 3 tasks.
Odalric wrote a longer report in French, that I can e-mail to you if you are interested.
5) Where was it published? 6) At what year?
_Seventh European Workshop on Reinforcement Learning_, Naples, Oct 2005
Thanks,
Uri.
Thanks for your feedback,
Rémi .
- Follow-Ups:
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- References:
- Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Re: Q(λ)-learning algorithm question
- From: Rémi
- Re: Q(λ)-learning algorithm question
- From: kartoun
- Q(λ)-learning algorithm question
- Prev by Date: A GAME A.I CAN'T WIN OFF WORLD!!! STARGATE!!
- Next by Date: Re: Q(λ)-learning algorithm question
- Previous by thread: Re: Q(λ)-learning algorithm question
- Next by thread: Re: Q(λ)-learning algorithm question
- Index(es):