--- library_name: transformers license: apache-2.0 language: - it - en - es - fr - de pipeline_tag: text-generation tags: - RAG - function-calling - structured-generation - enterprise - italian --- # ÆRA-4B
[🚀 **Try Demo**](https://aera.andemili.com/) | [💻 **GitHub Examples**](https://github.com/andemilisrl/aera)
## Overview ÆRA is a specialized 4 billion parameter language model developed by [AND EMILI](https://www.andemili.com/) as an enterprise-focused foundation for building intelligent agents and automation pipelines. Unlike general-purpose conversational models, ÆRA is intentionally designed with a narrow, practical focus on context-based reasoning and structured outputs. ## Key Capabilities ### 🇮🇹 Native Italian Language Support ÆRA excels at understanding and generating Italian text, making it ideal for Italian-speaking enterprises and applications. ### 📄 Context-Only Responses ÆRA is trained to rely exclusively on provided context rather than internal knowledge. When asked questions without relevant context, it will respond honestly: > "Currently I don't have access to information about the actors who played Dr. Who. Feel free to share content and I will analyze it and tell you what I can infer from it." This behavior ensures reliability and reduces hallucination in enterprise applications. ### 🔧 Structured Output Generation - **JSON Generation**: Reliably produces well-formed JSON outputs - **Entity Extraction**: Identifies and extracts entities from provided text - **Classification**: Categorizes content based on given criteria - **Sentiment Analysis**: Analyzes emotional tone in context ### 🛠️ Function Calling Native support for tool use and function calling, enabling seamless integration into agentic workflows and automation pipelines. ## Design Philosophy ÆRA is not intended to be a general-knowledge assistant like ChatGPT. Instead, it serves as a lightweight, efficient starting point for enterprises exploring: - **Retrieval Augmented Generation (RAG)** implementations - **Document analysis** and information extraction - **Automated workflows** with structured outputs - **Multi-agent systems** requiring reliable, predictable behavior ## Use Cases This model is ideal for companies looking to: - Test the viability of RAG systems for their specific needs - Build proof-of-concepts for document processing pipelines - Implement lightweight automation without cloud dependencies - Evaluate whether LLM-based solutions fit their requirements If initial tests with ÆRA prove successful, organizations can then invest in developing more specialized, powerful models tailored to their specific domain needs. ## Technical Details - **Parameters**: 4 billion - **Training**: Post-trained on synthetic data focused on structured reasoning and Italian language tasks - **Deployment**: Optimized for local deployment on standard hardware - **Privacy**: Runs entirely on-premises with no external API calls ## Precision & Memory - Recommended: GPU with bfloat16 or float16. - If you don’t set `torch_dtype`, many setups will load float32 on CPU → higher RAM usage and slower inference. - If you don’t pass `device_map="auto"`, the model may not use your GPU. - Best practice: load on GPU with `torch_dtype=torch.bfloat16` (or `torch.float16`) and `device_map="auto"`. Total runtime memory is higher than weights alone due to buffers and KV-cache and scales with context length and batch size. ### GGUF weights for local runtimes [GGUF 4-bit weights](https://huggingface.co/and-emili/aera-4b-GGUF) are available for local runners like LM Studio, Ollama, and llama.cpp. ## Getting Started ### Using Pipeline (Simplest) ```python from transformers import pipeline import torch pipe = pipeline( "text-generation", model="and-emili/aera-4b", model_kwargs={ "torch_dtype": torch.bfloat16, # or torch.float16 if preferred "low_cpu_mem_usage": True, "device_map": "auto", }, ) messages = [{"role": "user", "content": "Chi sei?"}] answer = pipe(messages)[0]['generated_text'][-1]['content'] print(answer) # Output: 'Ciao! Mi chiamo ÆRA, un assistente virtuale sviluppato da AND EMILI.' ``` ### Direct Model Loading ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("and-emili/aera-4b", use_fast=True) model = AutoModelForCausalLM.from_pretrained( "and-emili/aera-4b", torch_dtype=torch.bfloat16, # or torch.float16 device_map="auto", low_cpu_mem_usage=True, ) messages = [ {"role": "user", "content": "Chi è L'attuale presidente della Repubblica Italiana?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=400) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) # Output: 'Al momento non ho informazioni aggiornate sull'attuale presidente della Repubblica Italiana. # Se hai un testo o dei dati specifici che vuoi condividere, posso aiutarti a estrarre questa informazione.' ``` ### RAG-Style Context Analysis ```python from transformers import pipeline import torch pipe = pipeline( "text-generation", model="and-emili/aera-4b", model_kwargs={ "torch_dtype": torch.bfloat16, "low_cpu_mem_usage": True, "device_map": "auto", }, ) # Document/context document = """ Il nuovo prodotto XYZ-3000 è stato lanciato nel 2024 con un prezzo di €1,299. Include 3 anni di garanzia e supporto tecnico gratuito. Il prodotto pesa 2.5kg ed è disponibile in tre colori: nero, argento e blu. La batteria dura 48 ore con uso normale. """ messages = [ {"role": "system", "content": document}, {"role": "user", "content": "Quanto costa il prodotto e quali colori sono disponibili?"} ] response = pipe(messages, max_new_tokens=100, temperature=0.3)[0]['generated_text'][-1]['content'] print(response) # Output: "Il prodotto XYZ-3000 costa €1,299 e è disponibile in tre colori: nero, argento e blu." ``` ## OpenAI-Compatible API (via VLLM) For production deployments, ÆRA supports OpenAI-compatible endpoints through VLLM, enabling structured output with Pydantic schemas: ```python from openai import OpenAI from pydantic import BaseModel, Field from typing import Optional, List client = OpenAI( api_key="your-key", base_url="https://your-vllm-endpoint/v1", ) # Complex structured output for meeting analysis class ActionItem(BaseModel): azione: str = Field(description="Descrizione dell'azione da intraprendere") responsabile: Optional[str] = Field(description="Persona responsabile") scadenza: Optional[str] = Field(description="Data di scadenza") priorita: str = Field(description="Priorità: alta, media, bassa") class MeetingSummary(BaseModel): riassunto: str = Field(description="Riassunto generale della riunione") decisioni_prese: List[str] = Field(description="Lista delle decisioni prese") azioni_da_intraprendere: List[ActionItem] = Field(description="Azioni specifiche da intraprendere") partecipanti: List[str] = Field(default=[], description="Lista dei partecipanti") prossima_riunione: Optional[str] = Field(description="Data della prossima riunione se menzionata") # Real meeting notes to analyze meeting_notes = """ Riunione del 15 giugno 2024 - Team Marketing Presenti: Laura Bianchi (Marketing Manager), Marco Verdi (Social Media), Sara Neri (Grafica) Discusso nuovo piano marketing Q3: - Approvato budget €15.000 per campagna social media - Laura coordinerà con agenzia esterna per video promozionali - Marco deve preparare content calendar entro 30 giugno - Sara creerà mockup nuova brochure entro 25 giugno - Decidere fornitori stampa entro luglio - Prossimo meeting: 29 giugno ore 14:00 Priorità alta: lancio campagna entro 15 luglio Marco deve anche analizzare performance attuali social """ completion = client.beta.chat.completions.parse( model="and-emili/aera-4b", messages=[ {"role": "system", "content": "Sei un assistente esperto che riassume riunioni aziendali italiane."}, {"role": "user", "content": f"Analizza e riassumi questi appunti:\n\n{meeting_notes}"} ], response_format=MeetingSummary, temperature=0.5 ) result = completion.choices[0].message.parsed print(f"RIASSUNTO: {result.riassunto}\n") print(f"DECISIONI PRESE: {', '.join(result.decisioni_prese)}\n") print("AZIONI DA INTRAPRENDERE:") for action in result.azioni_da_intraprendere: print(f"- {action.azione}") if action.responsabile: print(f" Responsabile: {action.responsabile}") print(f" Priorità: {action.priorita}") # Customer Support Automation with Escalation Logic class CustomerResponse(BaseModel): risposta: str = Field(description="Risposta professionale al cliente") categoria_richiesta: str = Field(description="Categoria: spedizione, reso, pagamento, etc.") livello_urgenza: str = Field(description="Urgenza: basso, medio, alto") azioni_suggerite: List[str] = Field(description="Azioni che il cliente può intraprendere") escalation_richiesta: bool = Field(description="Se necessita escalation a operatore umano") inquiry = "URGENTE! Il mio ordine per il matrimonio di domani non è ancora arrivato! Avevo pagato la spedizione express!" completion = client.beta.chat.completions.parse( model="and-emili/aera-4b", messages=[ {"role": "system", "content": "Sei un assistente clienti professionale per e-commerce."}, {"role": "user", "content": inquiry} ], response_format=CustomerResponse, temperature=0.5 ) response = completion.choices[0].message.parsed print(f"Urgenza: {response.livello_urgenza}") # "alto" print(f"Escalation: {response.escalation_richiesta}") # True print(f"Risposta: {response.risposta}") ``` ### Advanced Use Cases For more complex examples including: - Customer support automation - Meeting notes summarization - Contract information extraction Check the examples in our [GitHub repository](https://github.com/andemilisrl/aera). ## Limitations - Does not provide information beyond what's in the given context - Not suitable for open-ended creative tasks or general knowledge queries - Optimized for Italian; performance may vary in other languages - Designed for specific enterprise use cases, not general conversation ## About AND EMILI [AND EMILI](https://www.andemili.com/) specializes in developing practical AI solutions for enterprise automation and intelligence augmentation. --- **License**: Apache 2.0