q-Taxi-v3-new / README.md
Vishand03's picture
Create README.md
ea4774b verified
|
Raw
History Blame Contribute Delete
1.89 kB
metadata
library_name: numpy
tags:
  - Taxi-v3
  - reinforcement-learning
  - q-learning
  - custom-implementation
model-index:
  - name: Q-Learning
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: Taxi-v3
          type: Taxi-v3
        metrics:
          - type: mean_reward
            name: mean_reward
            value: 7.92 +/- 2.60
            verified: false

๐Ÿš– Q-Learning Agent for Taxi-v3

This is a trained Q-Learning agent for the Taxi-v3 environment using a tabular approach.

Developer

Vishand S (@Vishand03)

Frameworks

  • Python
  • NumPy
  • Gymnasium

Training Details

  • Algorithm: Q-Learning
  • Episodes: 2,000,000
  • Max Steps per Episode: 200
  • Learning rate (ฮฑ): 0.1
  • Discount factor (ฮณ): 0.99
  • Exploration: Epsilon-greedy
  • Epsilon decay: 0.0005
  • Mean Reward: ~7.92 ยฑ 2.60

๐Ÿ›  Usage

import gymnasium as gym
import pickle
from huggingface_hub import hf_hub_download

# -------------------------
# Load pretrained model
# -------------------------
model_file = hf_hub_download("Vishand03/q-Taxi-v3", "q-learning.pkl")
with open(model_file, "rb") as f:
    model = pickle.load(f)

env = gym.make(model["env_id"])

# -------------------------
# Evaluate agent
# -------------------------
def greedy_policy(Qtable, state):
    return max(range(len(Qtable[state])), key=lambda a: Qtable[state][a])

total_rewards = []
for _ in range(model["n_eval_episodes"]):
    state, _ = env.reset()
    done = False
    episode_reward = 0
    while not done:
        action = greedy_policy(model["qtable"], state)
        state, reward, terminated, truncated, _ = env.step(action)
        episode_reward += reward
        done = terminated or truncated
    total_rewards.append(episode_reward)

mean_reward = sum(total_rewards) / len(total_rewards)
print(f"Mean Reward: {mean_reward:.2f}")