--- title: Hermes Agent emoji: 🕊️ colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.36.1 python_version: 3.11 app_file: app.py pinned: false --- # Hermes Agent ## Project Overview Hermes Agent is a helpful, fast, and practical multi-purpose AI assistant designed to answer questions, reason through tasks, plan, summarize, provide coding help, offer research-style explanations, and break down complex tasks. This project provides a deployable version of Hermes Agent on Hugging Face Spaces with a Gradio web interface. ## Recommended Architecture The Hermes Agent is built using a straightforward architecture optimized for deployment on Hugging Face Spaces, particularly with a focus on free or low-cost CPU-based environments. - **Frontend**: Gradio is chosen for its simplicity and direct integration with Python backends, making it ideal for rapid prototyping and deployment on Hugging Face Spaces. It provides an interactive web UI with a chatbot interface, text input, and conversation history. - **Backend**: A pure Python backend leverages the `transformers` library from Hugging Face. This allows for easy integration of pre-trained language models. - **Model**: For initial deployment and to ensure compatibility with CPU-only Spaces, `distilgpt2` is selected. This model is lightweight and suitable for basic text generation tasks. While `distilgpt2` is not as powerful as larger models, it serves as a functional starting point. For more advanced capabilities, a GPU-enabled Space with a larger model like `HuggingFaceH4/zephyr-7b-beta` or `mistralai/Mistral-7B-Instruct-v0.2` would be recommended. - **Deployment Target**: Hugging Face Spaces provides a robust and easy-to-use platform for hosting ML demos and applications. It handles environment setup based on `requirements.txt` and automatically runs `app.py`. This architecture prioritizes ease of deployment, cost-effectiveness, and maintainability, while providing a clear path for future upgrades to more powerful models and hardware. ## File Tree ``` hermes-agent/ ├── app.py ├── requirements.txt └── README.md ``` ## Full Code for Each File ### `app.py` ```python import gradio as gr from transformers import pipeline # 1. System Prompt/Personality for Hermes Agent HERMES_SYSTEM_PROMPT = """You are Hermes Agent, a helpful, fast, and practical multi-purpose assistant. You are professional, calm, and user-friendly. You can answer questions, reason through tasks step-by-step, plan, summarize, provide coding help, offer research-style explanations, and break down complex tasks. Do not pretend to have abilities you do not possess. Always strive for clarity and conciseness.""" # 2. Model Integration # Using a small, CPU-friendly model for demonstration on Hugging Face Spaces free tier. # For better performance and more complex tasks, a larger model with GPU would be recommended. # Example: 'distilgpt2' is a good starting point for CPU inference. # For more capable models, consider 'HuggingFaceH4/zephyr-7b-beta' or 'mistralai/Mistral-7B-Instruct-v0.2' # which would require a GPU-enabled Space. try: # Initialize the pipeline for text generation # Using 'text-generation' task with a pre-trained model # Setting trust_remote_code=True might be necessary for some models, but generally avoid if not explicitly needed. generator = pipeline('text-generation', model='distilgpt2') except Exception as e: print(f"Error loading model: {e}") generator = None # 3. Chatbot Logic def predict(message, history): if generator is None: return "Error: Model could not be loaded. Please check the backend logs." # Format conversation history for the model # For distilgpt2, a simple concatenation is sufficient. # For more advanced models, a specific chat template might be required. conversation = HERMES_SYSTEM_PROMPT + "\n\n" for human, agent in history: conversation += f"User: {human}\nHermes Agent: {agent}\n" conversation += f"User: {message}\nHermes Agent:" try: # Generate response # max_new_tokens controls the length of the generated response # num_return_sequences=1 to get a single best response # truncation=True to handle long inputs gracefully response = generator(conversation, max_new_tokens=150, num_return_sequences=1, truncation=True) generated_text = response[0]['generated_text'] # Extract only the agent's response, removing the prompt and user's input # This is a simple heuristic and might need refinement for complex models/prompts agent_response_start = generated_text.rfind("Hermes Agent:") if agent_response_start != -1: agent_response = generated_text[agent_response_start + len("Hermes Agent:"):].strip() else: agent_response = generated_text.strip() # Fallback if marker not found # Clean up any potential incomplete sentences or model artifacts # For distilgpt2, it often generates incomplete sentences, so we might need to truncate at the last punctuation. last_punctuation = max(agent_response.rfind('.'), agent_response.rfind('?'), agent_response.rfind('!')) if last_punctuation != -1: agent_response = agent_response[:last_punctuation + 1] return agent_response except Exception as e: return f"An error occurred during model inference: {e}" # 4. Gradio Web UI with gr.Blocks() as demo: gr.Markdown("# Hermes Agent") gr.Markdown(""" Hermes Agent is a helpful, fast, and practical multi-purpose AI assistant. It can answer questions, reason through tasks, plan, summarize, and provide coding help. """) chatbot = gr.Chatbot(height=400) msg = gr.Textbox(label="Your Message", placeholder="Type your message here...") clear = gr.Button("Clear") msg.submit(predict, [msg, chatbot], [msg, chatbot]) clear.click(lambda: None, None, [msg, chatbot], queue=False) # Launch the Gradio app # The share=True option creates a public link, useful for testing, but should be False for deployment on Spaces. # For Hugging Face Spaces, the app runs automatically when app.py is present. if __name__ == "__main__": demo.launch(debug=True) # debug=True for local development, set to False for production ``` ### `requirements.txt` ``` gradio transformers torch ``` ## Hugging Face Setup Steps To deploy your Hermes Agent on Hugging Face Spaces, follow these steps: 1. **Create a new Space**: Go to [Hugging Face Spaces](https://huggingface.co/spaces) and click on "Create new Space". 2. **Choose Space details**: * **Owner**: Your Hugging Face username or organization. * **Space name**: `hermes-agent` (or a name of your choice). * **License**: Choose an appropriate license (e.g., MIT). * **SDK**: Select `Gradio`. * **Space hardware**: For the `distilgpt2` model, you can start with a "CPU Basic" or "CPU Upgrade" instance. If you plan to upgrade to larger models, you will need a GPU instance (e.g., "GPU Small"). * **Visibility**: Choose "Public" or "Private" as per your preference. 3. **Create Space**: Click "Create Space". 4. **Upload Files**: You will be redirected to your new Space's page. You can upload files in two ways: * **Web Interface**: Click on the "Files" tab, then "Add file" to upload `app.py`, `requirements.txt`, and `README.md`. * **Git Clone (Recommended)**: a. Clone your Space's Git repository to your local machine: ```bash git clone https://huggingface.co/spaces//hermes-agent cd hermes-agent ``` b. Copy the `app.py`, `requirements.txt`, and `README.md` files into this directory. c. Commit and push the files: ```bash git add . git commit -m "Initial commit of Hermes Agent" git push ``` 5. **Monitor Deployment**: Once the files are uploaded, Hugging Face Spaces will automatically detect `app.py` and `requirements.txt`, install dependencies, and launch your Gradio application. You can monitor the build logs in the "Logs" tab of your Space. 6. **Access the App**: Once the build is successful, your Hermes Agent will be live and accessible from the "App" tab of your Space. ## Local Testing Steps To test the Hermes Agent locally before deploying to Hugging Face Spaces: 1. **Clone the repository (if applicable)**: If you've already set up a Git repository for your Space, clone it. ```bash git clone https://huggingface.co/spaces//hermes-agent cd hermes-agent ``` Otherwise, create a new directory and place `app.py` and `requirements.txt` inside it. 2. **Create a Python Virtual Environment (Recommended)**: ```bash python3 -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate ``` 3. **Install Dependencies**: ```bash pip install -r requirements.txt ``` 4. **Run the Gradio App**: ```bash python app.py ``` 5. **Access the UI**: Open your web browser and navigate to the local URL provided by Gradio (usually `http://127.0.0.1:7860`). ## Deployment Checklist Before finalizing your deployment, consider the following: * [x] `app.py` is present and correctly configured for Gradio. * [x] `requirements.txt` lists all necessary Python packages (`gradio`, `transformers`, `torch`). * [x] `README.md` provides clear instructions and project overview. * [x] Model choice (`distilgpt2`) is compatible with the selected Hugging Face Space hardware (CPU Basic/Upgrade). * [x] System prompt (`HERMES_SYSTEM_PROMPT`) defines the agent's personality and behavior. * [x] Error handling is implemented for model loading and inference. * [x] Gradio UI includes a chatbot, text input, conversation history, title, and description. * [ ] (Optional) Environment variables for API keys or tokens are configured securely on Hugging Face Spaces if using external APIs (not applicable for `distilgpt2`). ## Possible Upgrades * **More Powerful LLM**: For enhanced performance and more sophisticated responses, consider upgrading to a larger model. This would likely require a GPU-enabled Hugging Face Space. Examples include `HuggingFaceH4/zephyr-7b-beta`, `mistralai/Mistral-7B-Instruct-v0.2`, or other models available on Hugging Face. * **Hugging Face Inference API**: Instead of loading the model directly into the Space, you could use the Hugging Face Inference API for very large models. This would involve making API calls to a hosted model endpoint, potentially reducing the resource requirements of your Space. * **Advanced Prompt Engineering**: Implement more sophisticated prompt engineering techniques, such as few-shot prompting or chain-of-thought reasoning, to improve the agent's ability to reason and generate coherent responses. * **Tool Use/Function Calling**: Integrate external tools or APIs that the Hermes Agent can call to perform specific actions (e.g., search the web, access databases, perform calculations). This would significantly expand its capabilities. * **Persistent Conversation History**: For a more seamless user experience, implement a way to store and retrieve conversation history across sessions (e.g., using a simple database or file storage). * **Customizable Agent Persona**: Allow users to customize the Hermes Agent's system prompt or personality through the UI. * **Streamlit UI**: While Gradio is excellent, exploring Streamlit as an alternative UI framework could offer different customization options and development workflows.