metadata
library_name: numpy
tags:
- Taxi-v3
- reinforcement-learning
- q-learning
- custom-implementation
model-index:
- name: Q-Learning
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: Taxi-v3
type: Taxi-v3
metrics:
- type: mean_reward
name: mean_reward
value: 7.92 +/- 2.60
verified: false
๐ Q-Learning Agent for Taxi-v3
This is a trained Q-Learning agent for the Taxi-v3 environment using a tabular approach.
Developer
Vishand S (@Vishand03)
Frameworks
- Python
- NumPy
- Gymnasium
Training Details
- Algorithm: Q-Learning
- Episodes: 2,000,000
- Max Steps per Episode: 200
- Learning rate (ฮฑ): 0.1
- Discount factor (ฮณ): 0.99
- Exploration: Epsilon-greedy
- Epsilon decay: 0.0005
- Mean Reward: ~7.92 ยฑ 2.60
๐ Usage
import gymnasium as gym
import pickle
from huggingface_hub import hf_hub_download
# -------------------------
# Load pretrained model
# -------------------------
model_file = hf_hub_download("Vishand03/q-Taxi-v3", "q-learning.pkl")
with open(model_file, "rb") as f:
model = pickle.load(f)
env = gym.make(model["env_id"])
# -------------------------
# Evaluate agent
# -------------------------
def greedy_policy(Qtable, state):
return max(range(len(Qtable[state])), key=lambda a: Qtable[state][a])
total_rewards = []
for _ in range(model["n_eval_episodes"]):
state, _ = env.reset()
done = False
episode_reward = 0
while not done:
action = greedy_policy(model["qtable"], state)
state, reward, terminated, truncated, _ = env.step(action)
episode_reward += reward
done = terminated or truncated
total_rewards.append(episode_reward)
mean_reward = sum(total_rewards) / len(total_rewards)
print(f"Mean Reward: {mean_reward:.2f}")