Spaces:

minhvtt
/

Aus_F

Sleeping

App Files Files Community

minhvtt commited on Jan 31

Commit

0668f26

verified ·

1 Parent(s): 28607ce

Upload 3 files

Browse files

Files changed (3) hide show

README.md +142 -92
app.py +125 -398
requirements.txt +4 -36

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Aus F
-emoji: 👁
 colorFrom: indigo
 colorTo: pink
 sdk: gradio
@@ -9,138 +9,188 @@ app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# Audience Segmentation AI System
-Hệ thống phân khúc khách hàng và phân tích cảm xúc sử dụng AI cho nền tảng quản lý sự kiện.
-## Tính năng
-### 1. Phân khúc khách hàng (Audience Segmentation)
-- **Phân cụm tự động** dựa trên hành vi mua vé (RFM Analysis)
-- **Phân loại theo sở thích** về danh mục sự kiện
-- **Đặt tên tự động** cho từng phân khúc bằng tiếng Việt
-- **Tạo nội dung email marketing** tự động cho từng nhóm khách hàng
-### 2. Phân tích cảm xúc (Sentiment Analysis)
-- **Phân loại cảm xúc** của bình luận (Tích cực/Tiêu cực/Trung tính)
-- **Sử dụng PhoBERT** - mô hình NLP chuyên biệt cho tiếng Việt
-- **Trích xuất từ khóa** tự động từ feedback
-### 3. Tạo Insight tự động (Generative AI)
-- **Top 5 vấn đề** cần cải thiện
-- **Gợi ý cải thiện** cho từng vấn đề
-- **Dự đoán NPS Score** dựa trên tone của comments
-- **Sử dụng Vistral-7B-Chat** - LLM tiên tiến cho tiếng Việt
-## Cấu trúc thư mục
-```
-AudienceSegmentation/
-├── models/                      # MongoDB data models
-│   ├── segmentation_models.py   # Audience segment models
-│   └── sentiment_models.py      # Sentiment analysis models
-├── services/                    # Business logic
-│   ├── data_aggregation.py      # MongoDB aggregation pipelines
-│   ├── segmentation_service.py  # K-Means clustering
-│   ├── sentiment_service.py     # PhoBERT sentiment analysis
-│   └── genai_service.py         # Vistral-7B content generation
-├── config.py                    # Configuration
-├── database.py                  # MongoDB connection manager
-├── main.py                      # Main orchestration script
-├── requirements.txt             # Python dependencies
-└── .env.example                 # Environment variables template
-```
-## Cài đặt
-### 1. Clone repository
-```bash
-cd AudienceSegmentation
 ```
-### 2. Tạo môi trường
-```bash
-python -m venv venv
-source venv/bin/activate  # Linux/Mac
-# hoặc
-venv\Scripts\activate     # Windows
 ```
-### 3. Cài đặt dependencies
 ```bash
 pip install -r requirements.txt
 ```
-### 4. Download Vistral-7B-Chat
 ```bash
-# Tải mô hình GGUF từ Hugging Face (CPU nên tải)
-mkdir -p models/vistral-7b-chat
-# Download từ: https://huggingface.co/Vistral/Vistral-7B-Chat-GGUF
 ```
-### 5. Cấu hình môi trường
-```bash
-cp .env.example .env
-# Chỉnh sửa .env với thông tin MongoDB của bạn
-```
-## Sử dụng
-### Chạy toàn bộ pipeline
-```bash
-python main.py --task all
 ```
-### Chỉ chạy phân khúc khách hàng
-```bash
-python main.py --task segmentation
 ```
-### Chỉ chạy phân tích cảm xúc
 ```bash
-python main.py --task sentiment
 ```
-### Chỉ tạo nội dung email
 ```bash
-python main.py --task email
 ```
-### Tạo insights cho sự kiện cụ thể
 ```bash
-python main.py --task insights --event-code <event_id>
 ```
-## Kiến trúc kỹ thuật
-### MongoDB Aggregation Framework
-Hệ thống tận dụng MongoDB Aggregation để:
-- **Tính toán RFM** (Recency, Frequency, Monetary) trực tiếp trên database
-- **Đếm danh mục sự kiện** mà user quan tâm
-- **Lọc dữ liệu chưa xử lý** để tránh duplicate
-- **Giảm thiểu network transfer** - chỉ truyền kết quả cuối cùng
-### AI Models
-#### 1. Segmentation: scikit-learn K-Means
-- **Input**: Feature vector [R, F, M, Category1, Category2, ...]
-- **Output**: Cluster labels + Confidence scores
-- **Số cụm**: 5 (có thể cấu hình)
-#### 2. Sentiment: wonrax/phobert-base-vietnamese-sentiment
-- **Model**: PhoBERT fine-tuned cho Vietnamese
-- **Output**: Positive/Negative/Neutral + Confidence
-- **Batch size**: 32
-## Collections MongoDB
-### Output Collections (New)
-- `AudienceSegment` - Các phân khúc khách hàng
-- `UserSegmentAssignment` - Gán user vào segment
-- `SentimentAnalysisResult` - Kết quả phân tích cảm xúc
-- `EventInsightReport` - Báo cáo insight cho sự kiện

 ---
+title: Real Estate Formatter
+emoji: 🏠
 colorFrom: indigo
 colorTo: pink
 sdk: gradio
 pinned: false
 ---
+# 🏠 Real Estate Description Formatter
+API sử dụng AI models nhẹ từ HuggingFace để format mô tả bất động sản từ dạng "xấu" (không cấu trúc) sang dạng HTML đẹp mắt với CSS inline.
+## 🎯 Tính năng
+- **Format tự động**: Chuyển đổi mô tả BDS xấu thành HTML được styling đẹp
+- **Giữ nguyên nội dung**: Không thay đổi text gốc, chỉ thêm HTML/CSS
+- **AI nhẹ & nhanh**: Sử dụng Small Language Models (1.7B params)
+- **Free & Open-source**: Hoàn toàn miễn phí sử dụng
+## 🤖 AI Models được sử dụng
+Dự án hỗ trợ các model mạnh và miễn phí từ HuggingFace:
+- **Qwen2.5-7B-Instruct** (mặc định) - 7B params, hỗ trợ tiếng Việt, instruction-tuned
+- **Mistral-7B-Instruct-v0.3** - 7B params, outperforms Llama 2 13B
+- **Gemma-2-7B-IT** - 7B params, Google, chat-optimized
+### Models nhẹ hơn (nếu cần tiết kiệm tài nguyên):
+- **Qwen2.5-3B-Instruct** - 3B params, cân bằng tốt
+- **Phi-4-mini-instruct** - 3.8B params, Microsoft, reasoning tốt
+Bạn có thể thay đổi model bằng biến môi trường `MODEL_NAME`.
+## 📋 Ví dụ
+**Input (form xấu):**
+```
+NHÀ 2 TẦNG HẺM OTO LIÊN HOA VĨNH NGỌC - TÂY NHA TRANG- Diện tích 92m² full ONT- Hướng Đông Bắc- Pháp lý sổ hồng hoàn công. Hẻm betong rộng 3m- Kết cấu 1 trệt 1 lầu với 4Pn, 2wc, bếp, p.khách, p.thờ, sân để xe ô tô... Full nội thất như hình..- Khu dân cư đông đúc, cách Bệnh viện Sài Gòn Nha Trang 450m, gần siêu thị Big C Go 700m.G.iá bán 4tỷ5 ( thương lượng )0905 124 ***
 ```
+**Output (HTML đẹp):**
+```html
+<div class="property-card" style="background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%); border-radius: 12px; padding: 24px; box-shadow: 0 4px 6px rgba(0,0,0,0.1);">
+  <h1 class="title" style="color: #2c3e50; font-size: 24px; text-transform: uppercase; margin-bottom: 16px;">
+    NHÀ 2 TẦNG HẺM OTO LIÊN HOA VĨNH NGỌC - TÂY NHA TRANG
+  </h1>
+  <div class="specs" style="display: flex; gap: 10px; flex-wrap: wrap; margin-bottom: 16px;">
+    <span style="background: #3498db; color: white; padding: 6px 12px; border-radius: 20px; font-size: 14px;">
+      92m²
+    </span>
+    <span style="background: #2ecc71; color: white; padding: 6px 12px; border-radius: 20px; font-size: 14px;">
+      Đông Bắc
+    </span>
+    <span style="background: #9b59b6; color: white; padding: 6px 12px; border-radius: 20px; font-size: 14px;">
+      Sổ hồng hoàn công
+    </span>
+  </div>
+  <div class="description" style="background: white; padding: 16px; border-radius: 8px; margin-bottom: 16px;">
+    <p style="margin-bottom: 8px;">Hẻm betong rộng 3m</p>
+    <p style="margin-bottom: 8px;">Kết cấu: 1 trệt 1 lầu</p>
+    <p style="margin-bottom: 8px;">4 <span title="Phòng ngủ">Pn</span>, 2 <span title="WC">wc</span>, bếp, phòng khách, phòng thờ, sân để xe ô tô</p>
+    <p>Full nội thất</p>
+  </div>
+  <div class="location" style="background: #ecf0f1; padding: 16px; border-radius: 8px; margin-bottom: 16px;">
+    <p style="margin-bottom: 8px;"><strong>Khu dân cư đông đúc</strong></p>
+    <p style="margin-bottom: 8px;">Cách Bệnh viện Sài Gòn Nha Trang 450m</p>
+    <p>Gần siêu thị Big C Go 700m</p>
+  </div>
+  <div class="price" style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); color: white; padding: 16px; border-radius: 8px; text-align: center; margin-bottom: 16px;">
+    <p style="font-size: 28px; font-weight: bold; margin: 0;">4 tỷ 500 triệu</p>
+    <p style="font-size: 14px; margin: 4px 0 0 0;">(Thương lượng)</p>
+  </div>
+  <div class="contact" style="text-align: center;">
+    <p style="font-size: 18px; font-weight: bold; color: #2c3e50;">0905 124 ***</p>
+  </div>
+</div>
 ```
+## 🚀 Cài đặt & Chạy
+### 1. Cài đặt dependencies
 ```bash
 pip install -r requirements.txt
 ```
+### 2. Chạy server
 ```bash
+python app.py
 ```
+Server sẽ chạy tại: `http://0.0.0.0:7860`
+### 3. Sử dụng API
+**Endpoint:** `POST /format`
+**Request:**
+```json
+{
+  "description": "NHÀ 2 TẦNG HẺM OTO LIÊN HOA VĨNH NGỌC..."
+}
 ```
+**Response:**
+```json
+{
+  "original": "NHÀ 2 TẦNG HẺM OTO...",
+  "formatted_html": "<div class=\"property-card\">...</div>",
+  "success": true,
+  "error": null
+}
 ```
+### 4. Test với cURL
 ```bash
+curl -X POST "http://localhost:7860/format" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "description": "NHÀ 2 TẦNG HẺM OTO LIÊN HOA VĨNH NGỌC - TÂY NHA TRANG- Diện tích 92m² full ONT- Hướng Đông Bắc- Pháp lý sổ hồng hoàn công..."
+  }'
 ```
+## ⚙️ Cấu hình
+### Thay đổi model AI
+Sử dụng biến môi trường `MODEL_NAME`:
 ```bash
+# Model mặc định - Qwen2.5-7B-Instruct (khuyên dùng cho tiếng Việt)
+python app.py
+# Sử dụng Mistral-7B (mạnh, nhanh)
+export MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.3"
+python app.py
+# Sử dụng Gemma-2-7B (Google)
+export MODEL_NAME="google/gemma-2-7b-it"
+python app.py
+# Sử dụng model nhẹ hơn nếu cần (3B params)
+export MODEL_NAME="Qwen/Qwen2.5-3B-Instruct"
+python app.py
 ```
+### Sử dụng HuggingFace Token (cho private models)
 ```bash
+export HF_TOKEN="your_huggingface_token"
+python app.py
 ```
+## 📚 API Documentation
+Sau khi chạy server, truy cập:
+- Swagger UI: `http://localhost:7860/docs`
+- ReDoc: `http://localhost:7860/redoc`
+## 🎨 Prompt Engineering
+Prompt được thiết kế phức tạp để:
+- ✅ Giữ nguyên nội dung text gốc
+- ✅ Phân tích và nhận diện các thành phần (tiêu đề, specs, giá, liên hệ...)
+- ✅ Thêm CSS inline với màu sắc hiện đại
+- ✅ Xử lý viết tắt với tooltip
+- ✅ Output HTML thuần, không kèm markdown
+## 🌐 Deploy lên HuggingFace Spaces
+1. Tạo Space mới trên HuggingFace
+2. Chọn SDK: **Gradio** hoặc **Docker**
+3. Upload code và đợi build
+4. Truy cập URL của Space
+## 📖 Tài liệu tham khảo
+**AI Models:**
+- [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) - Model mặc định, hỗ trợ tiếng Việt
+- [10 Best Open-Source LLM Models 2025](https://huggingface.co/blog/daya-shankar/open-source-llms)
+- [Open LLM Leaderboard](https://huggingface.co/collections/open-llm-leaderboard/open-llm-leaderboard-best-models)
+- [HuggingFace Text Generation Models](https://huggingface.co/models?pipeline_tag=text-generation)
+- [Best Open Source LLMs of 2025](https://klu.ai/blog/open-source-llm-models)
+## 📝 License
+MIT License - Free to use
+## 👨‍💻 Author
+Created with ❤️ using Claude Code

app.py CHANGED Viewed

@@ -1,32 +1,20 @@
 """
-FastAPI Application for Event-Centric Audience Segmentation AI
-Author: AI Generated
-Created: 2025-11-24 (Refactored)
-Purpose: REST API with event-based endpoints
 """
-from fastapi import FastAPI, HTTPException, BackgroundTasks, status, Query
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
-from typing import List, Dict, Optional, Any
-from datetime import datetime
-from bson import ObjectId
-# Import services
-from services.segmentation_service import SegmentationService
-from services.sentiment_service import SentimentAnalysisService
-from services.genai_service import GenerativeAIService
-from database import db
-from config import settings
 # FastAPI app
 app = FastAPI(
-    title="Audience Segmentation AI - Event-Centric",
-    description="REST API for per-event audience analysis",
-    version="2.0.0",
-    docs_url="/api/docs",
-    redoc_url="/api/redoc"
 )
 # CORS
@@ -38,402 +26,141 @@ app.add_middleware(
     allow_headers=["*"],
 )
-# Helper
-def serialize_doc(doc: Dict) -> Optional[Dict]:
-    """Convert MongoDB document to JSON-serializable dict"""
-    if doc is None:
-        return None
-    if '_id' in doc:
-        doc['id'] = str(doc.pop('_id'))
-    # Handle nested ObjectIds and lists
-    for key, value in list(doc.items()):
-        if isinstance(value, ObjectId):
-            doc[key] = str(value)
-        elif isinstance(value, list):
-            doc[key] = [str(v) if isinstance(v, ObjectId) else v for v in value]
-        elif isinstance(value, dict):
-            doc[key] = serialize_doc(value)
-    return doc
-# ===== HEALTH =====
-@app.get("/health", tags=["System"])
-async def health_check():
-    """Health check"""
-    try:
-        db.client.server_info()
-        return {
-            "status": "healthy",
-            "timestamp": datetime.utcnow(),
-            "database": "connected"
-        }
-    except Exception as e:
-        raise HTTPException(
-            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
-            detail=f"Unhealthy: {str(e)}"
-        )
-# ===== EVENT ANALYSIS =====
-@app.post("/api/events/{event_code}/analyze", tags=["Event Analysis"])
-async def analyze_event(event_code: str, background_tasks: BackgroundTasks):
-    """Run full AI pipeline for an event"""
-    def run_pipeline():
-        # Step 1: Segmentation
-        seg_service = SegmentationService(event_code)
-        seg_service.run_segmentation()
-        # Step 2: Sentiment
-        sent_service = SentimentAnalysisService(event_code)
-        sent_service.analyze_event_comments()
-        # Step 3: Email generation
-        genai_service = GenerativeAIService(event_code)
-        genai_service.generate_emails_for_all_segments()
-        # Step 4: Insights
-        genai_service.update_sentiment_summary_with_insights()
-    background_tasks.add_task(run_pipeline)
-    return {
-        "status": "started",
-        "message": f"Analysis pipeline started for event {event_code}"
-    }
-@app.get("/api/events/{event_code}/dashboard", tags=["Event Analysis"])
-async def get_event_dashboard(event_code: str):
-    """Get complete dashboard for Event Owner"""
-    # Get segments
-    segments = list(db.event_audience_segments.find({"event_code": event_code}))
-    # Get sentiment summary
-    sentiment_summary = db.event_sentiment_summary.find_one({"event_code": event_code})
-    return {
-        "event_code": event_code,
-        "segments": [serialize_doc(s) for s in segments],
-        "sentiment_summary": serialize_doc(sentiment_summary) if sentiment_summary else None
-    }
-# ===== SEGMENTATION =====
-@app.post("/api/events/{event_code}/segmentation/run", tags=["Segmentation"])
-async def run_event_segmentation(
-    event_code: str,
-    background_tasks: BackgroundTasks,
-    n_clusters: int = Query(default=5, ge=2, le=10)
-):
-    """Run segmentation for an event"""
-    def run_task():
-        service = SegmentationService(event_code, n_clusters=n_clusters)
-        service.run_segmentation()
-    background_tasks.add_task(run_task)
-    return {
-        "status": "started",
-        "message": f"Segmentation started for event {event_code}",
-        "event_code": event_code
-    }
-@app.get("/api/events/{event_code}/segments", tags=["Segmentation"])
-async def get_event_segments(
-    event_code: str,
-    status_filter: Optional[str] = Query(default=None, description="Filter by Draft, Approved, Sent")
-):
-    """Get all segments for an event"""
-    query = {"event_code": event_code}
-    if status_filter:
-        query["marketing_content.status"] = status_filter
-    segments = list(db.event_audience_segments.find(query))
-    return [serialize_doc(s) for s in segments]
-@app.get("/api/events/{event_code}/segments/{segment_id}", tags=["Segmentation"])
-async def get_segment_detail(event_code: str, segment_id: str):
-    """Get specific segment details"""
-    segment = db.event_audience_segments.find_one({
-        "_id": ObjectId(segment_id),
-        "event_code": event_code
-    })
-    if not segment:
-        raise HTTPException(status_code=404, detail="Segment not found")
-    return serialize_doc(segment)
-@app.get("/api/events/{event_code}/segments/{segment_id}/users", tags=["Segmentation"])
-async def get_segment_users(
-    event_code: str,
-    segment_id: str,
-    skip: int = 0,
-    limit: int = 100
-):
-    """Get users in a segment with details"""
-    segment = db.event_audience_segments.find_one({
-        "_id": ObjectId(segment_id),
-        "event_code": event_code
-    })
-    if not segment:
-        raise HTTPException(status_code=404, detail="Segment not found")
-    user_ids = segment.get('user_ids', [])
-    total_users = len(user_ids)
-    # Paginate
-    paginated_ids = user_ids[skip:skip + limit]
-    # Get user details
-    users = list(db.users.find({
-        "_id": {"$in": paginated_ids}
-    }))
-    # Enrich with stats (optional)
-    enriched_users = []
-    for user in users:
-        enriched_users.append({
-            "user_id": str(user['_id']),
-            "email": user.get('email'),
-            "full_name": f"{user.get('FirstName', '')} {user.get('LastName', '')}".strip()
-        })
-    return {
-        "segment_id": segment_id,
-        "total_users": total_users,
-        "users": enriched_users
-    }
-# ===== APPROVAL WORKFLOW =====
-@app.post("/api/events/{event_code}/segments/{segment_id}/approve", tags=["Approval"])
-async def approve_segment(
-    event_code: str,
-    segment_id: str,
-    approved_by: Optional[str] = None,
-    modified_subject: Optional[str] = None,
-    modified_body: Optional[str] = None
-):
-    """Event Owner approves marketing content"""
-    segment = db.event_audience_segments.find_one({
-        "_id": ObjectId(segment_id),
-        "event_code": event_code
-    })
-    if not segment:
-        raise HTTPException(status_code=404, detail="Segment not found")
-    # Update fields
-    update = {
-        "marketing_content.status": "Approved",
-        "marketing_content.approved_at": datetime.utcnow(),
-        "marketing_content.approved_by": approved_by,
-        "last_updated": datetime.utcnow()
-    }
-    if modified_subject:
-        update["marketing_content.email_subject"] = modified_subject
-    if modified_body:
-        update["marketing_content.email_body"] = modified_body
-    db.event_audience_segments.update_one(
-        {"_id": ObjectId(segment_id)},
-        {"$set": update}
-    )
-    updated_segment = db.event_audience_segments.find_one({"_id": ObjectId(segment_id)})
-    return {
-        "status": "success",
-        "message": "Segment approved",
-        "segment_id": segment_id,
-        "marketing_content": updated_segment.get('marketing_content')
-    }
-@app.post("/api/events/{event_code}/segments/{segment_id}/send-email", tags=["Approval"])
-async def send_segment_email(
-    event_code: str,
-    segment_id: str,
-    send_immediately: bool = True
-):
-    """Send approved marketing email"""
-    segment = db.event_audience_segments.find_one({
-        "_id": ObjectId(segment_id),
-        "event_code": event_code
-    })
-    if not segment:
-        raise HTTPException(status_code=404, detail="Segment not found")
-    marketing_content = segment.get('marketing_content', {})
-    if marketing_content.get('status') != "Approved":
-        raise HTTPException(status_code=400, detail="Segment not approved yet")
-    # TODO: Integrate with email service (SendGrid, AWS SES, etc.)
-    # For now, just mark as sent
-    db.event_audience_segments.update_one(
-        {"_id": ObjectId(segment_id)},
-        {"$set": {
-            "marketing_content.status": "Sent",
-            "last_updated": datetime.utcnow()
-        }}
-    )
-    return {
-        "status": "success",
-        "message": f"Email sent to {segment.get('user_count', 0)} users",
-        "segment_id": segment_id,
-        "emails_sent": segment.get('user_count', 0),
-        "emails_failed": 0
-    }
-# ===== SENTIMENT =====
-@app.post("/api/events/{event_code}/sentiment/analyze", tags=["Sentiment"])
-async def analyze_event_sentiment(event_code: str, background_tasks: BackgroundTasks):
-    """Analyze sentiment for event comments"""
-    def run_task():
-        service = SentimentAnalysisService(event_code)
-        service.analyze_event_comments()
-    background_tasks.add_task(run_task)
-    return {
-        "status": "started",
-        "message": f"Sentiment analysis started for event {event_code}"
-    }
-@app.get("/api/events/{event_code}/sentiment/summary", tags=["Sentiment"])
-async def get_sentiment_summary(event_code: str):
-    """Get sentiment summary for an event"""
-    summary = db.event_sentiment_summary.find_one({"event_code": event_code})
-    if not summary:
-        raise HTTPException(status_code=404, detail="No sentiment data for this event")
-    return serialize_doc(summary)
-@app.get("/api/events/{event_code}/sentiment/results", tags=["Sentiment"])
-async def get_sentiment_results(
-    event_code: str,
-    sentiment_label: Optional[str] = None,
-    skip: int = 0,
-    limit: int = 100
-):
-    """Get detailed sentiment results"""
-    query = {"event_code": event_code}
-    if sentiment_label:
-        query["sentiment_label"] = sentiment_label
-    total = db.sentiment_results.count_documents(query)
-    results = list(
-        db.sentiment_results.find(query)
-        .sort("analyzed_at", -1)
-        .skip(skip)
-        .limit(limit)
-    )
     return {
-        "total": total,
-        "results": [serialize_doc(r) for r in results]
     }
-# ===== GENAI =====
-@app.post("/api/events/{event_code}/genai/generate-emails", tags=["GenAI"])
-async def generate_event_emails(event_code: str, background_tasks: BackgroundTasks):
-    """Generate marketing emails for all segments"""
-    def run_task():
-        service = GenerativeAIService(event_code)
-        service.generate_emails_for_all_segments()
-    background_tasks.add_task(run_task)
-    return {
-        "status": "started",
-        "message": "Email generation started"
-    }
-@app.post("/api/events/{event_code}/genai/generate-insights", tags=["GenAI"])
-async def generate_event_insights(event_code: str, background_tasks: BackgroundTasks):
-    """Generate AI insights from negative feedback"""
-    def run_task():
-        service = GenerativeAIService(event_code)
-        service.update_sentiment_summary_with_insights()
-    background_tasks.add_task(run_task)
-    return {
-        "status": "started",
-        "message": "Insight generation started"
     }
-# ===== MONITORING =====
-@app.get("/api/monitoring/pipelines/{pipeline}/metrics", tags=["Monitoring"])
-async def get_pipeline_metrics(
-    pipeline: str,
-    event_code: Optional[str] = None,
-    days: int = 7
-):
-    """Get performance metrics"""
-    # TODO: Implement based on monitoring.py
-    return {
-        "pipeline": pipeline,
-        "event_code": event_code,
-        "message": "Metrics endpoint - implement as needed"
-    }
-# ===== ADMIN =====
-@app.post("/api/admin/indexes/create", tags=["Admin"])
-async def create_indexes():
-    """Create MongoDB indexes"""
-    from scripts.create_indexes import create_all_indexes
-    try:
-        create_all_indexes()
-        return {"status": "success", "message": "Indexes created"}
     except Exception as e:
-        raise HTTPException(status_code=500, detail=str(e))
-# ===== ROOT =====
-@app.get("/")
-async def root():
-    """API root"""
     return {
-        "name": "Audience Segmentation AI - Event-Centric",
-        "version": "2.0.0",
-        "docs": "/api/docs",
-        "health": "/health"
     }

 """
+Real Estate Description Formatter API
+Uses lightweight AI models from HuggingFace to format real estate descriptions
 """
+from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
+from typing import Optional
+import os
+from huggingface_hub import InferenceClient
 # FastAPI app
 app = FastAPI(
+    title="Real Estate Description Formatter",
+    description="API to format real estate descriptions with AI",
+    version="1.0.0"
 )
 # CORS
     allow_headers=["*"],
 )
+# Initialize HuggingFace Inference Client
+# Sử dụng model mạnh: Qwen2.5-7B-Instruct (7B params, hỗ trợ tiếng Việt)
+# Alternatives: mistralai/Mistral-7B-Instruct-v0.3, google/gemma-2-7b-it
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-7B-Instruct")
+HF_TOKEN = os.getenv("HF_TOKEN", None)  # Optional: for private models or faster inference
+client = InferenceClient(model=MODEL_NAME, token=HF_TOKEN)
+# Request/Response Models
+class RealEstateInput(BaseModel):
+    description: str
+class RealEstateOutput(BaseModel):
+    original: str
+    formatted_html: str
+    success: bool
+    error: Optional[str] = None
+# Prompt phức tạp để xử lý form description bất động sản
+SYSTEM_PROMPT = """Bạn là một chuyên gia định dạng nội dung bất động sản chuyên nghiệp.
+NHIỆM VỤ: Chuyển đổi mô tả bất động sản "xấu" (không có cấu trúc, viết tắt, thiếu dấu câu) thành HTML được format đẹp mắt với CSS inline.
+QUY TẮC QUAN TRỌNG:
+1. KHÔNG ĐƯỢC thay đổi nội dung text gốc - CHỈ thêm HTML tags và CSS styling
+2. PHẢI giữ nguyên tất cả thông tin: giá, diện tích, số phòng, địa chỉ, số điện thoại
+3. PHẢI phân tích và nhận diện các thành phần:
+   - Tiêu đề (tên BDS, loại nhà)
+   - Thông tin kỹ thuật (diện tích, hướng, pháp lý)
+   - Mô tả chi tiết (kết cấu, nội thất)
+   - Vị trí (địa chỉ, khu vực, tiện ích xung quanh)
+   - Giá bán
+   - Liên hệ (SĐT, tên người bán)
+   - ... các thành phần khác
+4. SỬ DỤNG CẤU TRÚC HTML:
+   - <div class="property-card"> cho toàn bộ nội dung
+   - <h1 class="title"> cho tiêu đề chính
+   - <div class="specs"> cho thông số kỹ thuật (badge style)
+   - <div class="description"> cho mô tả chi tiết
+   - <div class="location"> cho thông tin vị trí
+   - <div class="price"> cho giá (highlight, font lớn)
+   - <div class="contact"> cho thông tin liên hệ
+5. CSS INLINE STYLING - Màu sắc hiện đại, dễ đọc:
+   - Property card: background, border radius, shadow
+   - Title: màu đậm, font-size lớn, text-transform uppercase
+   - Specs: badge style với background color khác nhau
+   - Price: màu phù hợp, không quá chói mắt, cũng không quá trầm, nổi bật, font-weight bold, font-size lớn
+   - Icons: không sử dụng icon gì khác.
+6. XỬ LÝ VIẾT TẮT:
+   - Giữ nguyên viết tắt nhưng thêm tooltip/title
+   - Ví dụ: <span title="Phòng ngủ">Pn</span>, <span title="Phòng vệ sinh">wc</span>
+OUTPUT FORMAT: Chỉ trả về HTML thuần, KHÔNG kèm markdown hoặc giải thích.
+"""
+USER_PROMPT_TEMPLATE = """Hãy format mô tả bất động sản sau thành HTML đẹp với CSS inline:
+{description}
+Nhớ: GIỮ NGUYÊN nội dung text, CHỈ thêm HTML/CSS để trình bày đẹp hơn."""
+@app.get("/")
+async def root():
+    """API root"""
     return {
+        "name": "Real Estate Description Formatter",
+        "version": "1.0.0",
+        "model": MODEL_NAME,
+        "endpoint": "/format"
     }
+@app.post("/format", response_model=RealEstateOutput)
+async def format_description(input_data: RealEstateInput):
+    """
+    Format real estate description with AI
+    Example input:
+    {
+        "description": "NHÀ 2 TẦNG HẺM OTO LIÊN HOA VĨNH NGỌC - TÂY NHA TRANG- Diện tích 92m² full ONT- Hướng Đông Bắc- Pháp lý sổ hồng hoàn công..."
     }
+    """
+    try:
+        # Prepare messages for chat model
+        messages = [
+            {
+                "role": "system",
+                "content": SYSTEM_PROMPT
+            },
+            {
+                "role": "user",
+                "content": USER_PROMPT_TEMPLATE.format(description=input_data.description)
+            }
+        ]
+        # Call HuggingFace Inference API
+        response = client.chat_completion(
+            messages=messages,
+            max_tokens=2000,
+            temperature=0.3,  # Low temperature for consistent formatting
+        )
+        # Extract formatted HTML
+        formatted_html = response.choices[0].message.content.strip()
+        # Clean up markdown if model added it
+        if formatted_html.startswith("```html"):
+            formatted_html = formatted_html.replace("```html", "").replace("```", "").strip()
+        return RealEstateOutput(
+            original=input_data.description,
+            formatted_html=formatted_html,
+            success=True
+        )
     except Exception as e:
+        return RealEstateOutput(
+            original=input_data.description,
+            formatted_html="",
+            success=False,
+            error=str(e)
+        )
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
     return {
+        "status": "healthy",
+        "model": MODEL_NAME,
+        "service": "Real Estate Formatter"
     }

requirements.txt CHANGED Viewed

@@ -1,36 +1,4 @@
-# FastAPI Backend Requirements
-# Updated for November 2025
-# Web Framework
-fastapi==0.121.3
-uvicorn[standard]==0.38.0
-python-multipart==0.0.20
-# Database
-pymongo==4.15.4
-motor==3.7.0
-# Data Validation
-pydantic==2.10.4
-pydantic-settings==2.12.0
-# Data Processing
-pandas
-numpy
-scikit-learn
-# NLP & AI
-transformers
-torch
-tokenizers
-# Vietnamese NLP
-pyvi==0.1.1
-# Utilities
-python-dotenv==1.0.1
-tqdm==4.67.1
-# Security
-python-jose[cryptography]==3.4.0
-passlib[bcrypt]==1.7.4

+fastapi==0.115.0
+uvicorn[standard]==0.32.1
+pydantic==2.10.3
+huggingface-hub==0.26.5