--- library_name: numpy tags: - Taxi-v3 - reinforcement-learning - q-learning - custom-implementation model-index: - name: Q-Learning results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: Taxi-v3 type: Taxi-v3 metrics: - type: mean_reward name: mean_reward value: 7.92 +/- 2.60 verified: false --- # 🚖 Q-Learning Agent for Taxi-v3 This is a trained **Q-Learning agent** for the **Taxi-v3** environment using a **tabular approach**. ## Developer **Vishand S (@Vishand03)** ## Frameworks - Python - NumPy - Gymnasium ## Training Details - Algorithm: Q-Learning - Episodes: 2,000,000 - Max Steps per Episode: 200 - Learning rate (α): 0.1 - Discount factor (γ): 0.99 - Exploration: Epsilon-greedy - Epsilon decay: 0.0005 - Mean Reward: ~7.92 ± 2.60 --- ## 🛠 Usage ```python import gymnasium as gym import pickle from huggingface_hub import hf_hub_download # ------------------------- # Load pretrained model # ------------------------- model_file = hf_hub_download("Vishand03/q-Taxi-v3", "q-learning.pkl") with open(model_file, "rb") as f: model = pickle.load(f) env = gym.make(model["env_id"]) # ------------------------- # Evaluate agent # ------------------------- def greedy_policy(Qtable, state): return max(range(len(Qtable[state])), key=lambda a: Qtable[state][a]) total_rewards = [] for _ in range(model["n_eval_episodes"]): state, _ = env.reset() done = False episode_reward = 0 while not done: action = greedy_policy(model["qtable"], state) state, reward, terminated, truncated, _ = env.step(action) episode_reward += reward done = terminated or truncated total_rewards.append(episode_reward) mean_reward = sum(total_rewards) / len(total_rewards) print(f"Mean Reward: {mean_reward:.2f}")