CACM

546 Words2 Pages

In (7) and (8) the CACM-RL and CACM-RL* evaluation per action are shown. In (8) it is considered the transition likelihood, according the overlapped shaded regions in Fig. 6. As it can be noted in Table II, the more rewarded action in CACM-RL is a2, but in CACM-RL* is a1. This is because although a2 transits faster, a1 transition obtains greater reward. There are several methods to evaluate the probability of transition from each pair (state, action). One approach consists of distributing an uniform sample into-cell per dimension, but Monte Carlo method is more appropriate when working with systems with a big number of dimensions, especially to save computation resources and time at the expense of reducing the accuracy. As a mother of fact the evaluation of 100 points per dimension in a 4D problem requires 1004 sample evaluations per state-action. The implementation of CACM-RL* consists of adding to CACM-RL, the analysis beyond the center of cell functionality, into the rewarding functions as it can be seem in the algorithm described in Table I. Lines 11 and 22 represent the rewa...

Open Document