python教程分享Python强化练习之PyTorch opp算法实现月球登陆器

目录
  • ppo 算法
  • actor-critic 算法
  • gym
  • lunarlander-v2
  • ppo 算法实现月球登录器
    • ppo

概述

从今天开始我们会开启一个新的篇章, 带领大家来一起学习 (卷进) 强化学习 (reinforcement learning). 强化学习基于环境, 分析数据采取行动, 从而最大化未来收益.

Python强化练习之PyTorch opp算法实现月球登陆器

强化学习算法种类

Python强化练习之PyTorch opp算法实现月球登陆器

on-policy vs off-policy:

  • on-policy: 训练数据由当前 agent 不断与环境交互得到
  • off-policy: 训练的 agent 和与环境交互的 agent 不是同一个 agent, 即别人与环境交互为我提供训练数据

ppo 算法

ppo (proximal policy optimization) 即近端策略优化. ppo 是一种 on-policy 算法, 通过实现小批量更新, 解决了训练过程中新旧策略的变化差异过大导致不易学习的问题.

Python强化练习之PyTorch opp算法实现月球登陆器

actor-critic 算法

actor-critic 算法共分为两部分. 第一部分为策略函数 actor, 负责生成动作并与环境交互; 第二部分为价值函数, 负责评估 actor 的表现.

Python强化练习之PyTorch opp算法实现月球登陆器

gym

gym 是一个强化学习会经常用到的包. gym 里收集了很多游戏的环境. 下面我们就会用 lunarlander-v2 来实现一个自动版的 “阿波罗登月”.

Python强化练习之PyTorch opp算法实现月球登陆器

安装:

  pip install gym  

如果遇到报错:

  attributeerror: module 'gym.envs.box2d' has no attribute 'lunarlander'  

解决办法:

  pip install gym[box2d]  

lunarlander-v2

lunarlander-v2 是一个月球登陆器. 着陆平台位于坐标 (0, 0). 坐标是状态向量的前两个数字, 从屏幕顶部移动到着陆台和零速度的奖励大约是 100 到 140分. 如果着陆器坠毁或停止, 则回合结束, 获得额外的 -100 或 +100点. 每脚接地为 +10, 点火主机每帧 -0.3分, 正解为200分.

Python强化练习之PyTorch opp算法实现月球登陆器

启动登陆器

代码:

  import gym    # 创建环境  env = gym.make("lunarlander-v2")    # 重置环境  env.reset()    # 启动  for i in range(180):        # 渲染环境      env.render()        # 随机移动      observation, reward, done, info = env.step(env.action_space.sample())        if i % 10 == 0:          # 调试输出          print("观察:", observation)          print("得分:", reward)  

输出结果:

观察: [ 0.00861025 1.4061487 0.42930993 -0.11858992 -0.00789343 -0.05729095
0. 0. ]
得分: 0.4097546298543773
观察: [ 0.04917412 1.3876126 0.41002613 -0.13066985 -0.06578191 -0.12604967
0. 0. ]
得分: -1.0858669952763478
观察: [ 0.08917055 1.3429415 0.43598312 -0.2890789 -0.17471936 -0.23913136
0. 0. ]
得分: -2.9339827504803666
观察: [ 0.1326253 1.2450166 0.44708318 -0.5567949 -0.32039645 -0.28250334
0. 0. ]
得分: -2.2779730990326357
观察: [ 0.18323365 1.1110108 0.615291 -0.61922276 -0.43743232 -0.2921057
0. 0. ]
得分: -3.107298313736037
观察: [ 0.24544087 0.94960684 0.66677517 -0.7835077 -0.5929364 -0.2968613
0. 0. ]
得分: -0.5472611013563438
观察: [ 0.3148238 0.75122666 0.7238519 -0.98458177 -0.72915816 -0.26130882
0. 0. ]
得分: -2.5665300894414416
观察: [ 0.38628978 0.49828076 0.74157137 -1.2624744 -0.85754734 -0.37227553
0. 0. ]
得分: -3.2562193227533087
观察: [ 0.46820658 0.18855602 0.92624503 -1.4677961 -1.08614 -0.4508995
0. 0. ]
得分: -4.017106927961208
观察: [ 0.57930076 -0.09440845 1.4345247 -0.693939 -2.0783656 -5.4039164
1. 0. ]
得分: -100
观察: [ 0.7383894 -0.08930686 1.4662493 -0.13461255 -3.653495 -3.109081
0. 0. ]
得分: -100
观察: [ 0.859124 -0.08471288 0.9377837 0.21408719 -3.8998525 0.10151418
0. 0. ]
得分: -100
观察: [ 9.3801367e-01 -4.6761338e-02 6.5999150e-01 1.4583524e-01
-3.9281998e+00 -4.7179851e-06 0.0000000e+00 1.0000000e+00]
得分: -100
观察: [ 0.9879366 -0.04012476 0.33624884 0.08859511 -4.253908 -1.0233303
0. 0. ]
得分: -100
观察: [ 1.0056045 -0.03840658 0.0733737 0.01812508 -4.6796274 -0.6103991
0. 0. ]
得分: -100
观察: [ 1.0112988 -0.03921754 0.07890484 -0.00624387 -4.845023 -0.17111658
0. 0. ]
得分: -100
观察: [ 1.0234139 -0.04488504 0.15701209 -0.0331554 -4.829875 0.07602684
0. 0. ]
得分: -100
观察: [ 1.0306002e+00 -4.8987642e-02 -1.1189224e-02 8.7506004e-04
-4.8712435e+00 -1.5446089e-01 0.0000000e+00 0.0000000e+00]
得分: -100

ppo 算法实现月球登录器

ppo

  import torch  import torch.nn as nn  from torch.distributions import categorical    # 是否使用gpu加速  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  print(device)      class memory:      def __init__(self):          """初始化"""          self.actions = []  # 行动(共4种)          self.states = []  # 状态, 由8个数字组成          self.logprobs = []  # 概率          self.rewards = []  # 奖励          self.is_terminals = []  # 游戏是否结束        def clear_memory(self):          """清除memory"""          del self.actions[:]          del self.states[:]          del self.logprobs[:]          del self.rewards[:]          del self.is_terminals[:]      class actorcritic(nn.module):      def __init__(self, state_dim, action_dim, n_latent_var):          super(actorcritic, self).__init__()            # 行动          self.action_layer = nn.sequential(              # [b, 8] => [b, 64]              nn.linear(state_dim, n_latent_var),              nn.tanh(),  # 激活                # [b, 64] => [b, 64]              nn.linear(n_latent_var, n_latent_var),              nn.tanh(),  # 激活                # [b, 64] => [b, 4]              nn.linear(n_latent_var, action_dim),              nn.softmax(dim=-1)          )            # 评判          self.value_layer = nn.sequential(              # [b, 8] => [8, 64]              nn.linear(state_dim, n_latent_var),              nn.tanh(),  # 激活                # [b, 64] => [b, 64]              nn.linear(n_latent_var, n_latent_var),              nn.tanh(),                # [b, 64] => [b, 1]              nn.linear(n_latent_var, 1)          )        def forward(self):          """前向传播, 由act替代"""            raise notimplementederror        def act(self, state, memory):          """计算行动"""            # 转成张量          state = torch.from_numpy(state).float().to(device)            # 计算4个方向概率          action_probs = self.action_layer(state)            # 通过最大概率计算最终行动方向          dist = categorical(action_probs)          action = dist.sample()            # 存入memory          memory.states.append(state)          memory.actions.append(action)          memory.logprobs.append(dist.log_prob(action))            # 返回行动          return action.item()        def evaluate(self, state, action):          """          评估          :param state: 状态, 2000个一组, 形状为 [2000, 8]          :param action: 行动, 2000个一组, 形状为 [2000]          :return:          """            # 计算行动概率          action_probs = self.action_layer(state)          dist = categorical(action_probs)  # 转换成类别分布            # 计算概率密度, log(概率)          action_logprobs = dist.log_prob(action)            # 计算熵          dist_entropy = dist.entropy()            # 评判          state_value = self.value_layer(state)          state_value = torch.squeeze(state_value)  # [2000, 1] => [2000]            # 返回行动概率密度, 评判值, 行动概率熵          return action_logprobs, state_value, dist_entropy      class ppo:      def __init__(self, state_dim, action_dim, n_latent_var, lr, betas, gamma, k_epochs, eps_clip):          self.lr = lr  # 学习率          self.betas = betas  # betas          self.gamma = gamma  # gamma          self.eps_clip = eps_clip  # 裁剪, 限制值范围          self.k_epochs = k_epochs  # 迭代次数            # 初始化policy          self.policy = actorcritic(state_dim, action_dim, n_latent_var).to(device)          self.policy_old = actorcritic(state_dim, action_dim, n_latent_var).to(device)          self.policy_old.load_state_dict(self.policy.state_dict())            self.optimizer = torch.optim.adam(self.policy.parameters(), lr=lr, betas=betas)  # 优化器          self.mseloss = nn.mseloss()  # 损失函数        def update(self, memory):          """更新梯度"""            # 蒙特卡罗预测状态回报          rewards = []          discounted_reward = 0          for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)):              # 回合结束              if is_terminal:                  discounted_reward = 0                # 更新削减奖励(当前状态奖励 + 0.99*上一状态奖励              discounted_reward = reward + (self.gamma * discounted_reward)                # 首插入              rewards.insert(0, discounted_reward)            # 标准化奖励          rewards = torch.tensor(rewards, dtype=torch.float32).to(device)          rewards = (rewards - rewards.mean()) / (rewards.std() + 1e-5)            # 张量转换          old_states = torch.stack(memory.states).to(device).detach()          old_actions = torch.stack(memory.actions).to(device).detach()          old_logprobs = torch.stack(memory.logprobs).to(device).detach()            # 迭代优化 k 次:          for _ in range(self.k_epochs):              # 评估              logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions)                # 计算ratios              ratios = torch.exp(logprobs - old_logprobs.detach())                # 计算损失              advantages = rewards - state_values.detach()              surr1 = ratios * advantages              surr2 = torch.clamp(ratios, 1 - self.eps_clip, 1 + self.eps_clip) * advantages              loss = -torch.min(surr1, surr2) + 0.5 * self.mseloss(state_values, rewards) - 0.01 * dist_entropy                # 梯度清零              self.optimizer.zero_grad()                # 反向传播              loss.mean().backward()                # 更新梯度              self.optimizer.step()            # 将新的权重赋值给旧policy          self.policy_old.load_state_dict(self.policy.state_dict())  

main

  import gym  import torch  from ppo import memory, ppo    ############## 超参数 ##############  env_name = "lunarlander-v2"  # 游戏名字  env = gym.make(env_name)  state_dim = 8  # 状态维度  action_dim = 4  # 行动维度  render = false  # 可视化  solved_reward = 230  # 停止循环条件 (奖励 > 230)  log_interval = 20  # print avg reward in the interval  max_episodes = 50000  # 最大迭代次数  max_timesteps = 300  # 最大单次游戏步数  n_latent_var = 64  # 全连接隐层维度  update_timestep = 2000  # 每2000步policy更新一次  lr = 0.002  # 学习率  betas = (0.9, 0.999)  # betas  gamma = 0.99  # gamma  k_epochs = 4  # policy迭代更新次数  eps_clip = 0.2  # ppo 限幅      #############################################    def main():      # 实例化      memory = memory()      ppo = ppo(state_dim, action_dim, n_latent_var, lr, betas, gamma, k_epochs, eps_clip)        # 存放      total_reward = 0      total_length = 0      timestep = 0        # 训练      for i_episode in range(1, max_episodes + 1):            # 环境初始化          state = env.reset()  # 初始化(重新玩)            # 迭代          for t in range(max_timesteps):              timestep += 1                # 用旧policy得到行动              action = ppo.policy_old.act(state, memory)                # 行动              state, reward, done, _ = env.step(action)  # 得到(新的状态,奖励,是否终止,额外的调试信息)                # 更新memory(奖励/游戏是否结束)              memory.rewards.append(reward)              memory.is_terminals.append(done)                # 更新梯度              if timestep % update_timestep == 0:                  ppo.update(memory)                    # memory清零                  memory.clear_memory()                    # 累计步数清零                  timestep = 0                # 累加              total_reward += reward                # 可视化              if render:                  env.render()                # 如果游戏结束, 退出              if done:                  break            # 游戏步长          total_length += t            # 如果达到要求(230分), 退出循环          if total_reward >= (log_interval * solved_reward):              print("########## solved! ##########")                # 保存模型              torch.save(ppo.policy.state_dict(), './ppo_{}.pth'.format(env_name))                # 退出循环              break            # 输出log, 每20次迭代          if i_episode % log_interval == 0:                            # 求20次迭代平均时长/收益              avg_length = int(total_length / log_interval)              running_reward = int(total_reward / log_interval)                # 调试输出              print('episode {} t avg length: {} t average_reward: {}'.format(i_episode, avg_length, running_reward))                # 清零              total_reward = 0              total_length = 0    if __name__ == '__main__':      main()  

输出结果

episode 20 avg length: 93 reward: -243
episode 40 avg length: 92 reward: -172
episode 60 avg length: 79 reward: -192
episode 80 avg length: 85 reward: -164
episode 100 avg length: 90 reward: -179
episode 120 avg length: 100 reward: -201
episode 140 avg length: 91 reward: -175
episode 160 avg length: 101 reward: -141
episode 180 avg length: 86 reward: -153
episode 200 avg length: 93 reward: -189
episode 220 avg length: 96 reward: -221
episode 240 avg length: 105 reward: -140
episode 260 avg length: 94 reward: -121
episode 280 avg length: 91 reward: -131
episode 300 avg length: 91 reward: -122
episode 320 avg length: 90 reward: -113
episode 340 avg length: 100 reward: -110
episode 360 avg length: 110 reward: -92
episode 380 avg length: 110 reward: -75
episode 400 avg length: 119 reward: -76
episode 420 avg length: 162 reward: -77
episode 440 avg length: 194 reward: -91
episode 460 avg length: 144 reward: -28
episode 480 avg length: 192 reward: -8
episode 500 avg length: 244 reward: -25
episode 520 avg length: 239 reward: -1
episode 540 avg length: 269 reward: 21
episode 560 avg length: 289 reward: 27
episode 580 avg length: 270 reward: 65
episode 600 avg length: 264 reward: 86
episode 620 avg length: 256 reward: 66
episode 640 avg length: 278 reward: 75
episode 660 avg length: 235 reward: 11
episode 680 avg length: 244 reward: 84
episode 700 avg length: 253 reward: 73
episode 720 avg length: 292 reward: 63
episode 740 avg length: 293 reward: 104
episode 760 avg length: 279 reward: 109
episode 780 avg length: 246 reward: 86
episode 800 avg length: 260 reward: 124
episode 820 avg length: 276 reward: 131
episode 840 avg length: 269 reward: 121
episode 860 avg length: 194 reward: 67
episode 880 avg length: 241 reward: 94
episode 900 avg length: 259 reward: 98
episode 920 avg length: 211 reward: 83
episode 940 avg length: 260 reward: 105
episode 960 avg length: 194 reward: 65
episode 980 avg length: 202 reward: 68
episode 1000 avg length: 243 reward: 79
episode 1020 avg length: 260 reward: 66
episode 1040 avg length: 289 reward: 117
episode 1060 avg length: 252 reward: 94
episode 1080 avg length: 262 reward: 114
episode 1100 avg length: 272 reward: 112
episode 1120 avg length: 263 reward: 97
episode 1140 avg length: 256 reward: 93
episode 1160 avg length: 274 reward: 120
episode 1180 avg length: 256 reward: 117
episode 1200 avg length: 241 reward: 105
episode 1220 avg length: 238 reward: 103
episode 1240 avg length: 267 reward: 121
episode 1260 avg length: 283 reward: 124
episode 1280 avg length: 299 reward: 149
episode 1300 avg length: 281 reward: 126
episode 1320 avg length: 266 reward: 102
episode 1340 avg length: 282 reward: 128
episode 1360 avg length: 275 reward: 114
episode 1380 avg length: 285 reward: 105
episode 1400 avg length: 294 reward: 123
episode 1420 avg length: 293 reward: 132
episode 1440 avg length: 248 reward: 85
episode 1460 avg length: 281 reward: 115
episode 1480 avg length: 291 reward: 152
episode 1500 avg length: 279 reward: 130
episode 1520 avg length: 267 reward: 103
episode 1540 avg length: 270 reward: 137
episode 1560 avg length: 269 reward: 120
episode 1580 avg length: 260 reward: 113
episode 1600 avg length: 282 reward: 147
episode 1620 avg length: 259 reward: 125
episode 1640 avg length: 240 reward: 90
episode 1660 avg length: 284 reward: 125
episode 1680 avg length: 282 reward: 123
episode 1700 avg length: 274 reward: 123
episode 1720 avg length: 273 reward: 130
episode 1740 avg length: 260 reward: 117
episode 1760 avg length: 243 reward: 106
episode 1780 avg length: 241 reward: 90
episode 1800 avg length: 290 reward: 144
episode 1820 avg length: 258 reward: 131
episode 1840 avg length: 283 reward: 142
episode 1860 avg length: 262 reward: 100
episode 1880 avg length: 273 reward: 132
episode 1900 avg length: 255 reward: 92
episode 1920 avg length: 251 reward: 117
episode 1940 avg length: 220 reward: 103
episode 1960 avg length: 221 reward: 111
episode 1980 avg length: 205 reward: 83
episode 2000 avg length: 227 reward: 102
episode 2020 avg length: 251 reward: 123
episode 2040 avg length: 227 reward: 100
episode 2060 avg length: 255 reward: 135
episode 2080 avg length: 273 reward: 136
episode 2100 avg length: 256 reward: 126
episode 2120 avg length: 273 reward: 141
episode 2140 avg length: 280 reward: 109
episode 2160 avg length: 266 reward: 112
episode 2180 avg length: 249 reward: 88
episode 2200 avg length: 247 reward: 119
episode 2220 avg length: 270 reward: 143
episode 2240 avg length: 257 reward: 65
episode 2260 avg length: 250 reward: 30
episode 2280 avg length: 261 reward: 112
episode 2300 avg length: 270 reward: 139
episode 2320 avg length: 275 reward: 128
episode 2340 avg length: 290 reward: 149
episode 2360 avg length: 269 reward: 139
episode 2380 avg length: 272 reward: 137
episode 2400 avg length: 232 reward: 105
episode 2420 avg length: 242 reward: 127
episode 2440 avg length: 241 reward: 134
episode 2460 avg length: 249 reward: 113
episode 2480 avg length: 287 reward: 154
episode 2500 avg length: 289 reward: 149
episode 2520 avg length: 258 reward: 129
episode 2540 avg length: 250 reward: 101
episode 2560 avg length: 287 reward: 158
episode 2580 avg length: 271 reward: 145
episode 2600 avg length: 253 reward: 120
episode 2620 avg length: 255 reward: 127
episode 2640 avg length: 254 reward: 122
episode 2660 avg length: 238 reward: 123
episode 2680 avg length: 243 reward: 115
episode 2700 avg length: 241 reward: 93
episode 2720 avg length: 232 reward: 90
episode 2740 avg length: 215 reward: 83
episode 2760 avg length: 241 reward: 112
episode 2780 avg length: 273 reward: 129
episode 2800 avg length: 269 reward: 133
episode 2820 avg length: 246 reward: 91
episode 2840 avg length: 261 reward: 130
episode 2860 avg length: 261 reward: 136
episode 2880 avg length: 289 reward: 128
episode 2900 avg length: 271 reward: 131
episode 2920 avg length: 277 reward: 145
episode 2940 avg length: 251 reward: 117
episode 2960 avg length: 253 reward: 120
episode 2980 avg length: 270 reward: 133
episode 3000 avg length: 240 reward: 85
episode 3020 avg length: 284 reward: 141
episode 3040 avg length: 255 reward: 117
episode 3060 avg length: 299 reward: 134
episode 3080 avg length: 263 reward: 122
episode 3100 avg length: 259 reward: 126
episode 3120 avg length: 270 reward: 125
episode 3140 avg length: 299 reward: 150
episode 3160 avg length: 256 reward: 116
episode 3180 avg length: 264 reward: 124
episode 3200 avg length: 271 reward: 128
episode 3220 avg length: 259 reward: 122
episode 3240 avg length: 261 reward: 125
episode 3260 avg length: 271 reward: 129
episode 3280 avg length: 242 reward: 126
episode 3300 avg length: 218 reward: 93
episode 3320 avg length: 230 reward: 116
episode 3340 avg length: 223 reward: 109
episode 3360 avg length: 249 reward: 122
episode 3380 avg length: 224 reward: 104
episode 3400 avg length: 261 reward: 131
episode 3420 avg length: 280 reward: 140
episode 3440 avg length: 264 reward: 125
episode 3460 avg length: 247 reward: 105
episode 3480 avg length: 276 reward: 141
episode 3500 avg length: 282 reward: 149
episode 3520 avg length: 282 reward: 141
episode 3540 avg length: 290 reward: 152
episode 3560 avg length: 282 reward: 141
episode 3580 avg length: 291 reward: 151
episode 3600 avg length: 289 reward: 166
episode 3620 avg length: 266 reward: 142
episode 3640 avg length: 277 reward: 91
episode 3660 avg length: 272 reward: 114
episode 3680 avg length: 281 reward: 159
episode 3700 avg length: 287 reward: 160
episode 3720 avg length: 254 reward: 78
episode 3740 avg length: 296 reward: 174
episode 3760 avg length: 267 reward: 124
episode 3780 avg length: 273 reward: 148
episode 3800 avg length: 275 reward: 147
episode 3820 avg length: 276 reward: 145
episode 3840 avg length: 283 reward: 151
episode 3860 avg length: 275 reward: 142
episode 3880 avg length: 290 reward: 142
episode 3900 avg length: 290 reward: 154
episode 3920 avg length: 283 reward: 141
episode 3940 avg length: 273 reward: 145
episode 3960 avg length: 290 reward: 161
episode 3980 avg length: 268 reward: 145
episode 4000 avg length: 270 reward: 142
episode 4020 avg length: 283 reward: 156
episode 4040 avg length: 283 reward: 149
episode 4060 avg length: 299 reward: 172
episode 4080 avg length: 292 reward: 158
episode 4100 avg length: 274 reward: 143
episode 4120 avg length: 299 reward: 163
episode 4140 avg length: 290 reward: 153
episode 4160 avg length: 299 reward: 165
episode 4180 avg length: 290 reward: 160
episode 4200 avg length: 299 reward: 157
episode 4220 avg length: 299 reward: 171
episode 4240 avg length: 271 reward: 148
episode 4260 avg length: 265 reward: 139
episode 4280 avg length: 258 reward: 137
episode 4300 avg length: 280 reward: 137
episode 4320 avg length: 262 reward: 133
episode 4340 avg length: 255 reward: 110
episode 4360 avg length: 275 reward: 134
episode 4380 avg length: 282 reward: 154
episode 4400 avg length: 264 reward: 128
episode 4420 avg length: 299 reward: 150
episode 4440 avg length: 275 reward: 151
episode 4460 avg length: 257 reward: 116
episode 4480 avg length: 256 reward: 104
episode 4500 avg length: 263 reward: 134
episode 4520 avg length: 299 reward: 164
episode 4540 avg length: 265 reward: 137
episode 4560 avg length: 265 reward: 147
episode 4580 avg length: 283 reward: 138
episode 4600 avg length: 299 reward: 152
episode 4620 avg length: 281 reward: 154
episode 4640 avg length: 289 reward: 161
episode 4660 avg length: 264 reward: 143
episode 4680 avg length: 285 reward: 138
episode 4700 avg length: 291 reward: 143
episode 4720 avg length: 280 reward: 154
episode 4740 avg length: 284 reward: 125
episode 4760 avg length: 296 reward: 136
episode 4780 avg length: 254 reward: 127
episode 4800 avg length: 281 reward: 147
episode 4820 avg length: 282 reward: 143
episode 4840 avg length: 243 reward: 119
episode 4860 avg length: 280 reward: 139
episode 4880 avg length: 270 reward: 137
episode 4900 avg length: 278 reward: 150
episode 4920 avg length: 203 reward: 83
episode 4940 avg length: 272 reward: 153
episode 4960 avg length: 289 reward: 151
episode 4980 avg length: 289 reward: 157
episode 5000 avg length: 299 reward: 168
episode 5020 avg length: 292 reward: 136
episode 5040 avg length: 290 reward: 158
episode 5060 avg length: 286 reward: 157
episode 5080 avg length: 282 reward: 154
episode 5100 avg length: 278 reward: 121
episode 5120 avg length: 291 reward: 138
episode 5140 avg length: 297 reward: 143
episode 5160 avg length: 290 reward: 165
episode 5180 avg length: 290 reward: 157
episode 5200 avg length: 276 reward: 150
episode 5220 avg length: 278 reward: 149
episode 5240 avg length: 287 reward: 153
episode 5260 avg length: 274 reward: 145
episode 5280 avg length: 299 reward: 176
episode 5300 avg length: 299 reward: 173
episode 5320 avg length: 299 reward: 164
episode 5340 avg length: 271 reward: 157
episode 5360 avg length: 299 reward: 180
episode 5380 avg length: 279 reward: 156
episode 5400 avg length: 268 reward: 133
episode 5420 avg length: 279 reward: 136
episode 5440 avg length: 278 reward: 130
episode 5460 avg length: 268 reward: 137
episode 5480 avg length: 273 reward: 152
episode 5500 avg length: 299 reward: 168
episode 5520 avg length: 266 reward: 95
episode 5540 avg length: 294 reward: 146
episode 5560 avg length: 289 reward: 165
episode 5580 avg length: 288 reward: 139
episode 5600 avg length: 299 reward: 174
episode 5620 avg length: 291 reward: 168
episode 5640 avg length: 281 reward: 147
episode 5660 avg length: 270 reward: 126
episode 5680 avg length: 263 reward: 153
episode 5700 avg length: 283 reward: 161
episode 5720 avg length: 271 reward: 154
episode 5740 avg length: 281 reward: 154
episode 5760 avg length: 281 reward: 144
episode 5780 avg length: 272 reward: 145
episode 5800 avg length: 275 reward: 128
episode 5820 avg length: 290 reward: 159
episode 5840 avg length: 274 reward: 142
episode 5860 avg length: 243 reward: 122
episode 5880 avg length: 236 reward: 124
episode 5900 avg length: 255 reward: 139
episode 5920 avg length: 288 reward: 140
episode 5940 avg length: 271 reward: 140
episode 5960 avg length: 254 reward: 108
episode 5980 avg length: 299 reward: 149
episode 6000 avg length: 289 reward: 149
episode 6020 avg length: 258 reward: 109
episode 6040 avg length: 289 reward: 129
episode 6060 avg length: 238 reward: 94
episode 6080 avg length: 270 reward: 87
episode 6100 avg length: 268 reward: 96
episode 6120 avg length: 279 reward: 142
episode 6140 avg length: 233 reward: 112
episode 6160 avg length: 268 reward: 142
episode 6180 avg length: 260 reward: 133
episode 6200 avg length: 210 reward: 109
episode 6220 avg length: 248 reward: 111
episode 6240 avg length: 229 reward: 92
episode 6260 avg length: 210 reward: 98
episode 6280 avg length: 218 reward: 102
episode 6300 avg length: 225 reward: 117
episode 6320 avg length: 235 reward: 112
episode 6340 avg length: 259 reward: 124
episode 6360 avg length: 252 reward: 113
episode 6380 avg length: 239 reward: 119
episode 6400 avg length: 242 reward: 95
episode 6420 avg length: 249 reward: 111
episode 6440 avg length: 257 reward: 136
episode 6460 avg length: 259 reward: 123
episode 6480 avg length: 259 reward: 112
episode 6500 avg length: 259 reward: 129
episode 6520 avg length: 215 reward: 101
episode 6540 avg length: 249 reward: 137
episode 6560 avg length: 245 reward: 121
episode 6580 avg length: 259 reward: 127
episode 6600 avg length: 267 reward: 142
episode 6620 avg length: 257 reward: 86
episode 6640 avg length: 278 reward: 141
episode 6660 avg length: 255 reward: 92
episode 6680 avg length: 289 reward: 145
episode 6700 avg length: 259 reward: 133
episode 6720 avg length: 247 reward: 116
episode 6740 avg length: 243 reward: 56
episode 6760 avg length: 274 reward: 114
episode 6780 avg length: 279 reward: 133
episode 6800 avg length: 269 reward: 152
episode 6820 avg length: 252 reward: 105
episode 6840 avg length: 254 reward: 123
episode 6860 avg length: 253 reward: 98
episode 6880 avg length: 273 reward: 132
episode 6900 avg length: 249 reward: 108
episode 6920 avg length: 248 reward: 84
episode 6940 avg length: 250 reward: 107
episode 6960 avg length: 279 reward: 99
episode 6980 avg length: 279 reward: 140
episode 7000 avg length: 270 reward: 105
episode 7020 avg length: 250 reward: 109
episode 7040 avg length: 202 reward: 87
episode 7060 avg length: 188 reward: 56
episode 7080 avg length: 229 reward: 93
episode 7100 avg length: 248 reward: 105
episode 7120 avg length: 218 reward: 105
episode 7140 avg length: 213 reward: 77
episode 7160 avg length: 279 reward: 128
episode 7180 avg length: 247 reward: 110
episode 7200 avg length: 269 reward: 124
episode 7220 avg length: 217 reward: 64
episode 7240 avg length: 258 reward: 140
episode 7260 avg length: 279 reward: 116
episode 7280 avg length: 244 reward: 97
episode 7300 avg length: 245 reward: 104
episode 7320 avg length: 213 reward: 81
episode 7340 avg length: 268 reward: 126
episode 7360 avg length: 277 reward: 124
episode 7380 avg length: 251 reward: 122
episode 7400 avg length: 234 reward: 108
episode 7420 avg length: 267 reward: 127
episode 7440 avg length: 218 reward: 89
episode 7460 avg length: 199 reward: 80
episode 7480 avg length: 154 reward: 55
episode 7500 avg length: 228 reward: 114
episode 7520 avg length: 197 reward: 49
episode 7540 avg length: 147 reward: 59
episode 7560 avg length: 139 reward: 49
episode 7580 avg length: 181 reward: 74
episode 7600 avg length: 191 reward: 61
episode 7620 avg length: 176 reward: 78
episode 7640 avg length: 160 reward: 35
episode 7660 avg length: 159 reward: 50
episode 7680 avg length: 143 reward: 68
episode 7700 avg length: 227 reward: 103
episode 7720 avg length: 192 reward: 59
episode 7740 avg length: 248 reward: 118
episode 7760 avg length: 250 reward: 128
episode 7780 avg length: 261 reward: 110
episode 7800 avg length: 279 reward: 157
episode 7820 avg length: 249 reward: 153
episode 7840 avg length: 212 reward: 78
episode 7860 avg length: 249 reward: 144
episode 7880 avg length: 257 reward: 107
episode 7900 avg length: 271 reward: 136
episode 7920 avg length: 244 reward: 129
episode 7940 avg length: 262 reward: 145
episode 7960 avg length: 224 reward: 94
episode 7980 avg length: 247 reward: 110
episode 8000 avg length: 190 reward: 81
episode 8020 avg length: 157 reward: 67
episode 8040 avg length: 171 reward: 67
episode 8060 avg length: 203 reward: 96
episode 8080 avg length: 225 reward: 87
episode 8100 avg length: 166 reward: 84
episode 8120 avg length: 196 reward: 82
episode 8140 avg length: 249 reward: 120
episode 8160 avg length: 216 reward: 112
episode 8180 avg length: 178 reward: 97
episode 8200 avg length: 221 reward: 120
episode 8220 avg length: 265 reward: 122
episode 8240 avg length: 240 reward: 125
episode 8260 avg length: 266 reward: 146
episode 8280 avg length: 253 reward: 116
episode 8300 avg length: 233 reward: 129
episode 8320 avg length: 260 reward: 126
episode 8340 avg length: 264 reward: 138
episode 8360 avg length: 196 reward: 88
episode 8380 avg length: 189 reward: 60
episode 8400 avg length: 227 reward: 66
episode 8420 avg length: 257 reward: 114
episode 8440 avg length: 254 reward: 99
episode 8460 avg length: 268 reward: 127
episode 8480 avg length: 263 reward: 131
episode 8500 avg length: 246 reward: 107
episode 8520 avg length: 281 reward: 127
episode 8540 avg length: 273 reward: 146
episode 8560 avg length: 290 reward: 124
episode 8580 avg length: 261 reward: 103
episode 8600 avg length: 294 reward: 140
episode 8620 avg length: 236 reward: 110
episode 8640 avg length: 261 reward: 125
episode 8660 avg length: 284 reward: 108
episode 8680 avg length: 278 reward: 141
episode 8700 avg length: 256 reward: 124
episode 8720 avg length: 245 reward: 95
episode 8740 avg length: 258 reward: 136
episode 8760 avg length: 289 reward: 147
episode 8780 avg length: 229 reward: 98
episode 8800 avg length: 277 reward: 138
episode 8820 avg length: 237 reward: 129
episode 8840 avg length: 276 reward: 141
episode 8860 avg length: 224 reward: 102
episode 8880 avg length: 220 reward: 108
episode 8900 avg length: 277 reward: 137
episode 8920 avg length: 259 reward: 120
episode 8940 avg length: 242 reward: 124
episode 8960 avg length: 275 reward: 119
episode 8980 avg length: 256 reward: 140
episode 9000 avg length: 263 reward: 110
episode 9020 avg length: 247 reward: 101
episode 9040 avg length: 251 reward: 99
episode 9060 avg length: 266 reward: 128
episode 9080 avg length: 247 reward: 119
episode 9100 avg length: 227 reward: 95
episode 9120 avg length: 242 reward: 95
episode 9140 avg length: 234 reward: 120
episode 9160 avg length: 271 reward: 145
episode 9180 avg length: 234 reward: 106
episode 9200 avg length: 230 reward: 102
episode 9220 avg length: 217 reward: 111
episode 9240 avg length: 182 reward: 68
episode 9260 avg length: 225 reward: 111
episode 9280 avg length: 224 reward: 110
episode 9300 avg length: 195 reward: 97
episode 9320 avg length: 245 reward: 110
episode 9340 avg length: 249 reward: 87
episode 9360 avg length: 238 reward: 105
episode 9380 avg length: 231 reward: 83
episode 9400 avg length: 245 reward: 60
episode 9420 avg length: 251 reward: 81
episode 9440 avg length: 218 reward: 86
episode 9460 avg length: 177 reward: 62
episode 9480 avg length: 212 reward: 64
episode 9500 avg length: 213 reward: 96
episode 9520 avg length: 267 reward: 121
episode 9540 avg length: 195 reward: 89
episode 9560 avg length: 259 reward: 140
episode 9580 avg length: 246 reward: 116
episode 9600 avg length: 266 reward: 122
episode 9620 avg length: 255 reward: 104
episode 9640 avg length: 203 reward: 116
episode 9660 avg length: 239 reward: 117
episode 9680 avg length: 239 reward: 118
episode 9700 avg length: 254 reward: 137
episode 9720 avg length: 269 reward: 144
episode 9740 avg length: 274 reward: 136
episode 9760 avg length: 259 reward: 123
episode 9780 avg length: 230 reward: 102
episode 9800 avg length: 268 reward: 139
episode 9820 avg length: 258 reward: 120
episode 9840 avg length: 271 reward: 111
episode 9860 avg length: 260 reward: 130
episode 9880 avg length: 280 reward: 135
episode 9900 avg length: 269 reward: 126
episode 9920 avg length: 290 reward: 159
episode 9940 avg length: 286 reward: 129
episode 9960 avg length: 259 reward: 117
episode 9980 avg length: 299 reward: 139
episode 10000 avg length: 298 reward: 141
episode 10020 avg length: 294 reward: 115
episode 10040 avg length: 284 reward: 117
episode 10060 avg length: 299 reward: 156
episode 10080 avg length: 290 reward: 145
episode 10100 avg length: 280 reward: 151
episode 10120 avg length: 299 reward: 163
episode 10140 avg length: 290 reward: 151
episode 10160 avg length: 269 reward: 133
episode 10180 avg length: 259 reward: 134
episode 10200 avg length: 272 reward: 137
episode 10220 avg length: 260 reward: 121
episode 10240 avg length: 259 reward: 103
episode 10260 avg length: 260 reward: 126
episode 10280 avg length: 279 reward: 150
episode 10300 avg length: 268 reward: 128
episode 10320 avg length: 261 reward: 140
episode 10340 avg length: 243 reward: 111
episode 10360 avg length: 236 reward: 113
episode 10380 avg length: 219 reward: 112
episode 10400 avg length: 267 reward: 140
episode 10420 avg length: 279 reward: 146
episode 10440 avg length: 285 reward: 137
episode 10460 avg length: 255 reward: 107
episode 10480 avg length: 249 reward: 115
episode 10500 avg length: 241 reward: 106
episode 10520 avg length: 219 reward: 102
episode 10540 avg length: 200 reward: 52
episode 10560 avg length: 267 reward: 124
episode 10580 avg length: 235 reward: 111
episode 10600 avg length: 223 reward: 86
episode 10620 avg length: 220 reward: 90
episode 10640 avg length: 269 reward: 145
episode 10660 avg length: 255 reward: 133
episode 10680 avg length: 277 reward: 130
episode 10700 avg length: 280 reward: 142
episode 10720 avg length: 278 reward: 128
episode 10740 avg length: 260 reward: 90
episode 10760 avg length: 288 reward: 145
episode 10780 avg length: 238 reward: 94
episode 10800 avg length: 278 reward: 136
episode 10820 avg length: 288 reward: 150
episode 10840 avg length: 280 reward: 148
episode 10860 avg length: 240 reward: 117
episode 10880 avg length: 257 reward: 124
episode 10900 avg length: 261 reward: 130
episode 10920 avg length: 229 reward: 115
episode 10940 avg length: 259 reward: 144
episode 10960 avg length: 238 reward: 138
episode 10980 avg length: 230 reward: 112
episode 11000 avg length: 254 reward: 126
episode 11020 avg length: 281 reward: 141
episode 11040 avg length: 270 reward: 120
episode 11060 avg length: 297 reward: 174
episode 11080 avg length: 261 reward: 138
episode 11100 avg length: 259 reward: 125
episode 11120 avg length: 292 reward: 173
episode 11140 avg length: 275 reward: 146
episode 11160 avg length: 299 reward: 165
episode 11180 avg length: 299 reward: 175
episode 11200 avg length: 289 reward: 161
episode 11220 avg length: 299 reward: 166
episode 11240 avg length: 278 reward: 160
episode 11260 avg length: 290 reward: 142
episode 11280 avg length: 299 reward: 164
episode 11300 avg length: 279 reward: 155
episode 11320 avg length: 299 reward: 178
episode 11340 avg length: 299 reward: 150
episode 11360 avg length: 265 reward: 110
episode 11380 avg length: 288 reward: 156
episode 11400 avg length: 278 reward: 146
episode 11420 avg length: 268 reward: 141
episode 11440 avg length: 291 reward: 130
episode 11460 avg length: 299 reward: 161
episode 11480 avg length: 284 reward: 142
episode 11500 avg length: 262 reward: 132
episode 11520 avg length: 287 reward: 149
episode 11540 avg length: 288 reward: 150
episode 11560 avg length: 288 reward: 157
episode 11580 avg length: 288 reward: 156
episode 11600 avg length: 284 reward: 133
episode 11620 avg length: 287 reward: 152
episode 11640 avg length: 249 reward: 130
episode 11660 avg length: 240 reward: 106
episode 11680 avg length: 271 reward: 131
episode 11700 avg length: 271 reward: 117
episode 11720 avg length: 286 reward: 143
episode 11740 avg length: 293 reward: 150
episode 11760 avg length: 289 reward: 155
episode 11780 avg length: 290 reward: 137
episode 11800 avg length: 289 reward: 133
episode 11820 avg length: 273 reward: 121
episode 11840 avg length: 274 reward: 109
episode 11860 avg length: 261 reward: 147
episode 11880 avg length: 210 reward: 114
episode 11900 avg length: 245 reward: 143
episode 11920 avg length: 210 reward: 115
episode 11940 avg length: 218 reward: 102
episode 11960 avg length: 214 reward: 102
episode 11980 avg length: 269 reward: 133
episode 12000 avg length: 262 reward: 144
episode 12020 avg length: 235 reward: 131
episode 12040 avg length: 253 reward: 149
episode 12060 avg length: 227 reward: 120
episode 12080 avg length: 202 reward: 98
episode 12100 avg length: 240 reward: 117
episode 12120 avg length: 231 reward: 108
episode 12140 avg length: 230 reward: 122
episode 12160 avg length: 228 reward: 108
episode 12180 avg length: 233 reward: 96
episode 12200 avg length: 252 reward: 123
episode 12220 avg length: 272 reward: 154
episode 12240 avg length: 251 reward: 122
episode 12260 avg length: 273 reward: 147
episode 12280 avg length: 239 reward: 111
episode 12300 avg length: 287 reward: 126
episode 12320 avg length: 278 reward: 121
episode 12340 avg length: 258 reward: 120
episode 12360 avg length: 265 reward: 104
episode 12380 avg length: 279 reward: 118
episode 12400 avg length: 254 reward: 72
episode 12420 avg length: 187 reward: 74
episode 12440 avg length: 244 reward: 90
episode 12460 avg length: 228 reward: 116
episode 12480 avg length: 258 reward: 125
episode 12500 avg length: 247 reward: 118
episode 12520 avg length: 244 reward: 101
episode 12540 avg length: 267 reward: 135
episode 12560 avg length: 253 reward: 99
episode 12580 avg length: 285 reward: 135
episode 12600 avg length: 259 reward: 113
episode 12620 avg length: 256 reward: 108
episode 12640 avg length: 238 reward: 114
episode 12660 avg length: 265 reward: 128
episode 12680 avg length: 289 reward: 145
episode 12700 avg length: 287 reward: 147
episode 12720 avg length: 283 reward: 139
episode 12740 avg length: 255 reward: 108
episode 12760 avg length: 299 reward: 150
episode 12780 avg length: 277 reward: 138
episode 12800 avg length: 290 reward: 151
episode 12820 avg length: 284 reward: 159
episode 12840 avg length: 299 reward: 150
episode 12860 avg length: 289 reward: 146
episode 12880 avg length: 299 reward: 158
episode 12900 avg length: 299 reward: 144
episode 12920 avg length: 279 reward: 129
episode 12940 avg length: 282 reward: 132
episode 12960 avg length: 280 reward: 132
episode 12980 avg length: 278 reward: 108
episode 13000 avg length: 284 reward: 136
episode 13020 avg length: 289 reward: 128
episode 13040 avg length: 291 reward: 149
episode 13060 avg length: 299 reward: 140
episode 13080 avg length: 292 reward: 141
episode 13100 avg length: 290 reward: 139
episode 13120 avg length: 299 reward: 139
episode 13140 avg length: 291 reward: 151
episode 13160 avg length: 291 reward: 141
episode 13180 avg length: 299 reward: 169
episode 13200 avg length: 299 reward: 162
episode 13220 avg length: 299 reward: 170
episode 13240 avg length: 299 reward: 170
episode 13260 avg length: 299 reward: 155
episode 13280 avg length: 299 reward: 153
episode 13300 avg length: 299 reward: 163
episode 13320 avg length: 281 reward: 131
episode 13340 avg length: 289 reward: 153
episode 13360 avg length: 285 reward: 133
episode 13380 avg length: 280 reward: 134
episode 13400 avg length: 282 reward: 134
episode 13420 avg length: 268 reward: 114
episode 13440 avg length: 290 reward: 142
episode 13460 avg length: 270 reward: 145
episode 13480 avg length: 257 reward: 127
episode 13500 avg length: 272 reward: 139
episode 13520 avg length: 270 reward: 129
episode 13540 avg length: 279 reward: 149
episode 13560 avg length: 269 reward: 95
episode 13580 avg length: 270 reward: 113
episode 13600 avg length: 258 reward: 125
episode 13620 avg length: 217 reward: 88
episode 13640 avg length: 157 reward: 59
episode 13660 avg length: 132 reward: 41
episode 13680 avg length: 220 reward: 92
episode 13700 avg length: 241 reward: 109
episode 13720 avg length: 252 reward: 127
episode 13740 avg length: 253 reward: 104
episode 13760 avg length: 269 reward: 128
episode 13780 avg length: 230 reward: 96
episode 13800 avg length: 258 reward: 127
episode 13820 avg length: 290 reward: 151
episode 13840 avg length: 299 reward: 135
episode 13860 avg length: 280 reward: 111
episode 13880 avg length: 268 reward: 124
episode 13900 avg length: 255 reward: 93
episode 13920 avg length: 258 reward: 128
episode 13940 avg length: 244 reward: 127
episode 13960 avg length: 238 reward: 117
episode 13980 avg length: 237 reward: 104
episode 14000 avg length: 251 reward: 123
episode 14020 avg length: 267 reward: 114
episode 14040 avg length: 271 reward: 109
episode 14060 avg length: 247 reward: 117
episode 14080 avg length: 282 reward: 129
episode 14100 avg length: 266 reward: 144
episode 14120 avg length: 256 reward: 132
episode 14140 avg length: 267 reward: 140
episode 14160 avg length: 289 reward: 149
episode 14180 avg length: 262 reward: 95
episode 14200 avg length: 278 reward: 128
episode 14220 avg length: 279 reward: 136
episode 14240 avg length: 249 reward: 105
episode 14260 avg length: 235 reward: 112
episode 14280 avg length: 273 reward: 131
episode 14300 avg length: 278 reward: 130
episode 14320 avg length: 259 reward: 123
episode 14340 avg length: 234 reward: 78
episode 14360 avg length: 268 reward: 125
episode 14380 avg length: 294 reward: 153
episode 14400 avg length: 299 reward: 150
episode 14420 avg length: 278 reward: 129
episode 14440 avg length: 297 reward: 155
episode 14460 avg length: 247 reward: 106
episode 14480 avg length: 289 reward: 154
episode 14500 avg length: 270 reward: 133
episode 14520 avg length: 259 reward: 133
episode 14540 avg length: 280 reward: 151
episode 14560 avg length: 268 reward: 129
episode 14580 avg length: 299 reward: 159
episode 14600 avg length: 279 reward: 131
episode 14620 avg length: 242 reward: 100
episode 14640 avg length: 236 reward: 114
episode 14660 avg length: 253 reward: 132
episode 14680 avg length: 272 reward: 134
episode 14700 avg length: 297 reward: 175
episode 14720 avg length: 278 reward: 148
episode 14740 avg length: 289 reward: 154
episode 14760 avg length: 288 reward: 148
episode 14780 avg length: 278 reward: 140
episode 14800 avg length: 266 reward: 128
episode 14820 avg length: 288 reward: 161
episode 14840 avg length: 278 reward: 145
episode 14860 avg length: 290 reward: 161
episode 14880 avg length: 279 reward: 139
episode 14900 avg length: 284 reward: 155
episode 14920 avg length: 245 reward: 136
episode 14940 avg length: 269 reward: 137
episode 14960 avg length: 262 reward: 146
episode 14980 avg length: 299 reward: 154
episode 15000 avg length: 273 reward: 172
episode 15020 avg length: 278 reward: 142
episode 15040 avg length: 277 reward: 150
episode 15060 avg length: 232 reward: 119
episode 15080 avg length: 280 reward: 141
episode 15100 avg length: 260 reward: 137
episode 15120 avg length: 285 reward: 167
episode 15140 avg length: 280 reward: 149
episode 15160 avg length: 237 reward: 118
episode 15180 avg length: 223 reward: 111
episode 15200 avg length: 243 reward: 134
episode 15220 avg length: 269 reward: 138
episode 15240 avg length: 251 reward: 127
episode 15260 avg length: 289 reward: 157
episode 15280 avg length: 229 reward: 107
episode 15300 avg length: 277 reward: 143
episode 15320 avg length: 288 reward: 154
episode 15340 avg length: 289 reward: 149
episode 15360 avg length: 288 reward: 145
episode 15380 avg length: 260 reward: 134
episode 15400 avg length: 246 reward: 126
episode 15420 avg length: 244 reward: 132
episode 15440 avg length: 272 reward: 129
episode 15460 avg length: 267 reward: 134
episode 15480 avg length: 263 reward: 135
episode 15500 avg length: 280 reward: 141
episode 15520 avg length: 254 reward: 126
episode 15540 avg length: 275 reward: 133
episode 15560 avg length: 271 reward: 120
episode 15580 avg length: 270 reward: 130
episode 15600 avg length: 299 reward: 144
episode 15620 avg length: 254 reward: 88
episode 15640 avg length: 271 reward: 126
episode 15660 avg length: 289 reward: 153
episode 15680 avg length: 231 reward: 104
episode 15700 avg length: 227 reward: 127
episode 15720 avg length: 174 reward: 82
episode 15740 avg length: 214 reward: 92
episode 15760 avg length: 190 reward: 89
episode 15780 avg length: 159 reward: 49
episode 15800 avg length: 222 reward: 100
episode 15820 avg length: 269 reward: 133
episode 15840 avg length: 243 reward: 100
episode 15860 avg length: 191 reward: 68
episode 15880 avg length: 221 reward: 86
episode 15900 avg length: 206 reward: 109
episode 15920 avg length: 228 reward: 89
episode 15940 avg length: 250 reward: 108
episode 15960 avg length: 229 reward: 110
episode 15980 avg length: 263 reward: 139
episode 16000 avg length: 250 reward: 125
episode 16020 avg length: 270 reward: 140
episode 16040 avg length: 251 reward: 131
episode 16060 avg length: 258 reward: 124
episode 16080 avg length: 268 reward: 130
episode 16100 avg length: 263 reward: 125
episode 16120 avg length: 280 reward: 150
episode 16140 avg length: 267 reward: 132
episode 16160 avg length: 284 reward: 137
episode 16180 avg length: 275 reward: 128
episode 16200 avg length: 269 reward: 132
episode 16220 avg length: 280 reward: 132
episode 16240 avg length: 279 reward: 145
episode 16260 avg length: 299 reward: 152
episode 16280 avg length: 238 reward: 112
episode 16300 avg length: 284 reward: 159
episode 16320 avg length: 280 reward: 136
episode 16340 avg length: 271 reward: 120
episode 16360 avg length: 281 reward: 139
episode 16380 avg length: 267 reward: 141
episode 16400 avg length: 299 reward: 164
episode 16420 avg length: 239 reward: 113
episode 16440 avg length: 276 reward: 143
episode 16460 avg length: 268 reward: 144
episode 16480 avg length: 269 reward: 134
episode 16500 avg length: 273 reward: 148
episode 16520 avg length: 247 reward: 97
episode 16540 avg length: 266 reward: 129
episode 16560 avg length: 267 reward: 119
episode 16580 avg length: 270 reward: 124
episode 16600 avg length: 262 reward: 101
episode 16620 avg length: 257 reward: 121
episode 16640 avg length: 233 reward: 99
episode 16660 avg length: 268 reward: 114
episode 16680 avg length: 261 reward: 126
episode 16700 avg length: 278 reward: 143
episode 16720 avg length: 278 reward: 117
episode 16740 avg length: 266 reward: 135
episode 16760 avg length: 282 reward: 140
episode 16780 avg length: 299 reward: 154
episode 16800 avg length: 279 reward: 144
episode 16820 avg length: 281 reward: 124
episode 16840 avg length: 280 reward: 132
episode 16860 avg length: 278 reward: 148
episode 16880 avg length: 280 reward: 113
episode 16900 avg length: 268 reward: 133
episode 16920 avg length: 291 reward: 147
episode 16940 avg length: 274 reward: 150
episode 16960 avg length: 281 reward: 137
episode 16980 avg length: 251 reward: 126
episode 17000 avg length: 261 reward: 135
episode 17020 avg length: 267 reward: 105
episode 17040 avg length: 274 reward: 176
episode 17060 avg length: 262 reward: 131
episode 17080 avg length: 186 reward: 184
episode 17100 avg length: 225 reward: 150
episode 17120 avg length: 201 reward: 218
episode 17140 avg length: 211 reward: 220
episode 17160 avg length: 221 reward: 218
episode 17180 avg length: 232 reward: 210
episode 17200 avg length: 216 reward: 220
episode 17220 avg length: 226 reward: 203
episode 17240 avg length: 198 reward: 170
episode 17260 avg length: 196 reward: 222
episode 17280 avg length: 214 reward: 196
episode 17300 avg length: 229 reward: 205
episode 17320 avg length: 183 reward: 192
episode 17340 avg length: 212 reward: 186
episode 17360 avg length: 192 reward: 164
########## solved! ##########

到此这篇关于python强化练习之pytorch opp算法实现月球登陆器的文章就介绍到这了,更多相关python opp内容请搜索<猴子技术宅>以前的文章或继续浏览下面的相关文章希望大家以后多多支持<猴子技术宅>!

需要了解更多python教程分享Python强化练习之PyTorch opp算法实现月球登陆器,都可以关注python教程分享栏目—猴子技术宅(www.ssfiction.com)

本文来自网络收集,不代表猴子技术宅立场,如涉及侵权请点击右边联系管理员删除。

如若转载,请注明出处:https://www.ssfiction.com/pythons/839596.html

发表评论

邮箱地址不会被公开。 必填项已用*标注