Temat: Markov decision processes

Skocz do pozycji: 1.

Tytuł:: Decision problem for infinite duration semi-Markov process
Autorzy:: Grabski, F.
Tematy:: reliability
semi-Markov decision processes
optimization
Howard algorithm
linear programing; Pokaż więcej
Wydawca:: Uniwersytet Morski w Gdyni. Polskie Towarzystwo Bezpieczeństwa i Niezawodności
Powiązania:: https://bibliotekanauki.pl/articles/2069519.pdf Link otwiera się w nowym oknie
Opis:: In the paper there are presented basic concepts and some results of the theory of semi-Markov decision processes. The optimization problem for the infinite duration SM process is connsider in the paper. The Howard algoritm which enables to find the optimal stationary strategy is also discussed here. The algorithm is applied in a decision problem concerning the two components renewable series system is. It is also shown that this algorithm is equivalent to the some linear programing problem.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 2.

Tytuł:: Decision problem for a finite states change of semi-Markov process
Autorzy:: Grabski, F.
Tematy:: reliability
semi-Markov decision processes
optimization
Howard algorithm
linear programming; Pokaż więcej
Wydawca:: Uniwersytet Morski w Gdyni. Polskie Towarzystwo Bezpieczeństwa i Niezawodności
Powiązania:: https://bibliotekanauki.pl/articles/2069327.pdf Link otwiera się w nowym oknie
Opis:: In the paper there are presented basic concepts and some results of the theory of semi-Markov decision processes. The algorithm of optimization a SM decision process with a finite number of state changes is discussed here. The algorithm is based on a dynamic programming method. To clarify it the SM decision model for the maintenance operation is shown.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 3.

Tytuł:: Improving modified policy iteration for probabilistic model checking
Autorzy:: Mohagheghi, Mohammadsadegh
Karimpour, Jaber
Isazadeh, Ayaz
Tematy:: probabilistic model checking
Markov decision processes
modified policy iteration
probabilistic reachability; Pokaż więcej
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Powiązania:: https://bibliotekanauki.pl/articles/27312850.pdf Link otwiera się w nowym oknie
Opis:: Along with their modified versions, value iteration and policy iteration are well-known algorithms for the probabilistic model checking of Markov decision processes. One challenge with these methods is that they are time-consuming in most cases. Several techniques have been proposed to improve the performance of iterative methods for probabilistic model checking; however, the running times of these techniques depend on the graphical structure of the utilized model. In some cases, their performance can be worse than the performance of standard methods. In this paper, we propose two new heuristics for accelerating the modified policy iteration method. We first define a criterion for the usefulness of the computations of each iteration of this method. The first contribution of our work is to develop and use a criterion to reduce the number of iterations in modified policy iteration. As the second contribution, we propose a new approach for identifying useless updates in each iteration. This method reduces the running time of the computations by avoiding the useless updates of states. The proposed heuristics have been implemented in the PRISM model checker and applied on several standard case studies. We compare the running time of our heuristics with the running times of previous standard and improved methods. Our experimental results show that our techniques yields a significant speed-up.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 4.

Tytuł:: Solving Markov decision processes by d-graph algorithms
Autorzy:: Kátai, Z.
Tematy:: Markov decision processes
dynamic programming
graph representation
graph algorithms
optimization problems; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Instytut Badań Systemowych PAN
Powiązania:: https://bibliotekanauki.pl/articles/205688.pdf Link otwiera się w nowym oknie
Opis:: Markov decision processes (MDPs) provide a mathematical model for sequential decisionmaking (sMDP/dMDP: stochastic/ deterministic MDP). We introduce the concept of generalized dMDP (g-dMDP) where each action may result in more than one next (parallel or clone) state. The common tools to represent dMDPs are digraphs, but these are inadequate for sMDPs and g-dMDPs. We introduce d-graphs as general tools to represent all the above mentioned processes (stationary versions). We also present a combined d-graph algorithm that implements dynamic programming strategies to find optimal policies for the finite/infinite horizon versions of these Markov processes. (The preliminary version of this paper was presented at the Conference MACRo 2011.)
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 5.

Tytuł:: Optimal stationary policies inrisk-sensitive dynamic programs with finite state spaceand nonnegative rewards
Autorzy:: Cavazos-Cadena, Rolando
Montes-de-Oca, Raúl
Tematy:: unichain property
Markov decision processes
risk-sensitive optimality equation
risk-sensitive expected total- reward criterion; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Instytut Matematyczny PAN
Powiązania:: https://bibliotekanauki.pl/articles/1208177.pdf Link otwiera się w nowym oknie
Opis:: This work concerns controlled Markov chains with finite state space and nonnegative rewards; it is assumed that the controller has a constant risk-sensitivity, and that the performance ofa control policy is measured by a risk-sensitive expected total-reward criterion. The existence of optimal stationary policies isstudied within this context, and the main resultestablishes the optimalityof a stationary policy achieving the supremum in the correspondingoptimality equation, whenever the associated Markov chain hasa unique positive recurrent class. Two explicit examples are providedto show that, if such an additional condition fails, an optimal stationarypolicy cannot be generally guaranteed. The results of this note, which consider both the risk-seeking and the risk-averse cases, answer an extended version of a question recently posed in Puterman (1994).
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 6.

Tytuł:: Metoda Monte Carlo w uczeniu ze wzmocnieniem
Monte Carlo method in Reinforcement Learning
Autorzy:: Gromna, Martyna
Opis:: Celem pracy jest rozwiązanie problemu decyzyjnego Markowa (MDP). Korzystamy z programowania dynamicznego, w sytuacji w której znamy dynamikę MDP – używamy algorytm iteracji strategii oraz algorytm iteracji wartości. Z kolei, gdy nie znamy dynamiki MDP stosujemy metody Monte Carlo. Część pracy poświęcona jest także algorytmowi Q-learning. Jego działanie prezentujemy poprzez rozwiązanie problemu taxi-v2 w programie Python.
The aim of the work is to solve the Markov decision problem (MDP). We use dynamic programming in a situation in which we know the dynamics of MDP - we use the strategy iteration algorithm and the value iteration algorithm. On the other hand, when we do not know the MDP dynamics, we use Monte Carlo methods. Part of the work is also devoted to the Q-learning algorithm. We present its operation by solving the taxi-v2 problem in Python.
Dostawca treści:: Repozytorium Uniwersytetu Jagiellońskiego

Inne

na półce

Informacja

Wyszukujesz frazę "Markov decision processes" wg kryterium: Temat