第7卷‧第1期,
198301
, pp. 1-8
時間連續馬可夫決策過程
- 作者:
許世雄; 任眉眉
- 作者服務機構:
國立清華大學應用數學研究所
- 中文摘要:
本文討論如何控制純粹跳躍馬可夫過程的問題。我們假設狀態空間為一般,行動空間為有限,分別在有折扣及無折扣兩種情形下,討論平穩最佳策略的存在性。若是報酬打折扣,則利用策略反覆算法可得最佳值,但得不到最佳策略。我們以兩個例子說明策略反覆算法的數值計算。
- 英文摘要:
In this paper, continuous time Markov decision processes with general state spaces and finite actionspaces are considered. We shall discuss the problem of controlling pure jump processes in the discountedreward case as well as the average reward case. We show the existence of a stationary optimal policy undersuitable conditions and use the policy iteration method to find the optimal value for the discounted rewardcase. Unlike the discrete state space case, the method can only be used to approximate the optimal valuebut not the optimal policy. For conclusions, we present two examples to exhibit the numerical computa-tions of the iteration method.
- 中文關鍵字:
--
- 英文關鍵字:
--