Tidecaller
🎨 Beautify README — improved layout, badges, progress bars, structure
e583f79
|
Raw
History Blame Contribute Delete
33.3 kB
metadata
license: apache-2.0
language:
  - zh
  - en
tags:
  - safetensors
  - qwen2
  - red-team
  - cybersecurity
  - code-audit
  - vulnerability-discovery
  - exploit-development
  - think-chain
  - grpo
  - pytorch
  - text-generation-inference
  - region:us
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-Coder-14B-Instruct
model-index:
  - name: BountyHunter-RedTeam
    results:
      - task:
          type: text-generation
          name: MMLU (57 subjects)
        dataset:
          name: mmlu
          type: cais/mmlu
        metrics:
          - type: acc
            value: 68.8
            name: MMLU Average Accuracy
      - task:
          type: text-generation
          name: HellaSwag
        dataset:
          name: hellaswag
          type: Rowan/hellaswag
        metrics:
          - type: acc_norm
            value: 76.42
            name: HellaSwag Accuracy
      - task:
          type: text-generation
          name: ARC-Challenge
        dataset:
          name: arc_challenge
          type: allenai/ai2_arc
        metrics:
          - type: acc_norm
            value: 58.36
            name: ARC-C Accuracy
      - task:
          type: text-generation
          name: Winogrande
        dataset:
          name: winogrande
          type: winogrande
        metrics:
          - type: acc
            value: 73.56
            name: Winogrande Accuracy
      - task:
          type: text-generation
          name: PIQA
        dataset:
          name: piqa
          type: piqa
        metrics:
          - type: acc_norm
            value: 78.78
            name: PIQA Accuracy
      - task:
          type: text-generation
          name: BoolQ
        dataset:
          name: boolq
          type: boolq
        metrics:
          - type: acc
            value: 88.07
            name: BoolQ Accuracy
      - task:
          type: text-generation
          name: TruthfulQA MC2
        dataset:
          name: truthfulqa_mc2
          type: truthfulqa
        metrics:
          - type: acc
            value: 54.6
            name: TruthfulQA MC2 Accuracy
      - task:
          type: text-generation
          name: HumanEval
        dataset:
          name: humaneval
          type: openai_humaneval
        metrics:
          - type: pass@1
            value: 42.68
            name: HumanEval pass@1
      - task:
          type: text-generation
          name: WMDP (Weapons of Mass Destruction Proxy)
        dataset:
          name: wmdp
          type: cais/wmdp
        metrics:
          - type: acc
            value: 59.13
            name: WMDP Overall
      - task:
          type: text-generation
          name: WMDP-Bio
        dataset:
          name: wmdp_bio
          type: cais/wmdp
        metrics:
          - type: acc
            value: 72.19
            name: WMDP Biology
      - task:
          type: text-generation
          name: WMDP-Chem
        dataset:
          name: wmdp_chem
          type: cais/wmdp
        metrics:
          - type: acc
            value: 50
            name: WMDP Chemistry
      - task:
          type: text-generation
          name: WMDP-Cyber
        dataset:
          name: wmdp_cyber
          type: cais/wmdp
        metrics:
          - type: acc
            value: 52.64
            name: WMDP Cybersecurity
      - task:
          type: text-generation
          name: HarmBench (Safety Compliance)
        dataset:
          name: harmbench
          type: centerforaisafety/harmbench
        metrics:
          - type: asr
            value: 10.94
            name: HarmBench Overall ASR (↓)
          - type: asr
            value: 1.89
            name: HarmBench Standard ASR (↓)
          - type: asr
            value: 26.25
            name: HarmBench Copyright ASR (↓)
          - type: asr
            value: 13.58
            name: HarmBench Contextual ASR (↓)
      - task:
          type: text-generation
          name: PrimeVul (Vulnerability Detection)
        dataset:
          name: primevul
          type: ASSERT-KTH/PrimeVul
        metrics:
          - type: f1
            value: 65
            name: PrimeVul Detection F1
          - type: recall
            value: 88.5
            name: PrimeVul Detection Recall
          - type: precision
            value: 51.4
            name: PrimeVul Detection Precision
          - type: acc
            value: 5.7
            name: PrimeVul CWE Classification
          - type: acc
            value: 28.5
            name: PrimeVul Paired Comparison
widget:
  - text: >-
      <|im_start|>system

      You are BountyHunter, an elite red-team security model developed by
      Security Researcher Tidecaller. Use <think> reasoning chains for security
      analysis.<|im_end|>

      <|im_start|>user

      Audit this C code for vulnerabilities:

      ```c

      void copy_data(char* dest, const char* src) {
          strcpy(dest, src);
      }

      int main(int argc, char** argv) {
          char buf[64];
          copy_data(buf, argv[1]);
          return 0;
      }

      ```<|im_end|>

      <|im_start|>assistant
    example_title: 🔍 Code Audit
  - text: >-
      <|im_start|>system

      You are BountyHunter, an elite red-team security model developed by
      Security Researcher Tidecaller. Use <think> reasoning chains for security
      analysis.<|im_end|>

      <|im_start|>user

      Explain the steps to perform a SQL injection attack and how to defend
      against it.<|im_end|>

      <|im_start|>assistant
    example_title: ⚔️ Security Knowledge

BountyHunter

🎯 BountyHunter · RedTeam

Elite Red-Team AI for Security Researchers

精英红队安全研究模型


License Base Model Params Precision Context Languages Training


Fine-tuned from Qwen2.5-Coder-14B-Instruct via multi-phase reinforcement training (SFT + GRPO).
Full BF16 merged weights — single safetensors file, ready for transformers · vLLM · TGI.
基于 Qwen2.5-Coder-14B-Instruct 多阶段强化训练,safetensors 格式完整权重,开箱即用。



⚡ Quick Start

pip install transformers torch accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Tidecaller/BountyHunter-RedTeam",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Tidecaller/BountyHunter-RedTeam")

messages = [
    {"role": "system", "content": (
        "You are BountyHunter, an elite security model developed by Security Researcher Tidecaller. "
        "Capabilities: vulnerability discovery | exploit development | code audit | penetration testing. "
        "Principles: code over theory, evidence-based. "
        "Output: security tasks use <think> reasoning chain before results."
    )},
    {"role": "user", "content": "Audit this C code for vulnerabilities: ..."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.5, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


📑 Contents

Section
🏆 Why BountyHunter Unique capabilities & value proposition
📊 Comprehensive Benchmarks 5-dimension evaluation on A800 80GB
📈 Summary Dashboard Visual scorecards at a glance
🧭 Use-Case Fit Matrix What this model is (and isn't) for
📋 Model Specifications Architecture, params, precision
📦 Usage transformers · vLLM · TGI
💾 Resource Estimation VRAM & hardware recommendations
🔬 Reasoning Chain Example Sample &lt;think&gt; audit output
⚠️ Disclaimer & Ethics Legal, ethical, and safety guardrails
🙏 Acknowledgments Datasets, base model, community
📜 License & Citation Apache 2.0 · BibTeX


🏆 1. Why BountyHunter

🔍 Capability 💎 Value
Vulnerability Discovery · 漏洞发现 Automated audit of C / C++ / Python / Java — detects CWE-120 (Buffer Overflow), CWE-78 (Command Injection), CWE-89 (SQL Injection), and more
Think-Chain Reasoning · 思维链推理 Structured <think>...</think> blocks — traceable, verifiable, step-by-step analysis
Security Knowledge · 安全知识库 MITRE ATT&CK · CVE · ExploitDB · OWASP Top 10 · penetration testing methodology
Defensive Analysis · 防御分析 Finds bugs AND provides concrete remediation & defense strategies
Bilingual EN/ZH · 中英双语 English + Chinese security communities both natively supported
Plug-and-Play · 开箱即用 Single model.safetensors file — one-line load with transformers


📊 2. Comprehensive Benchmarks

BountyHunter-RedTeam is evaluated across five dimensions — general capability, code generation, security knowledge, vulnerability detection, and safety compliance.

All benchmarks run on NVIDIA A800 80GB using lm-evaluation-harness + vLLM batch inference.


2.1 General Capability

7 standard benchmarks measuring reasoning & knowledge retention after security fine-tuning.

Benchmark Metric BountyHunter Qwen2.5-Coder-14B Δ
MMLU (57 subjects) acc ↑ 68.80% ~79% −10.2%
HellaSwag acc_norm ↑ 76.42% ~84% −7.6%
ARC-Challenge acc_norm ↑ 58.36% ~67% −8.6%
Winogrande acc ↑ 73.56% ~78% −4.4%
PIQA acc_norm ↑ 78.78% ~82% −3.2%
BoolQ acc ↑ 88.07% ~89% −0.9%
TruthfulQA MC2 acc ↑ 54.60% ~58% −3.4%

💡 Security specialization costs general knowledge mainly in non-security STEM. Basic reasoning (BoolQ −0.9%) is essentially preserved.

📋 MMLU Detailed Breakdown (68.80%) — click to expand
Category Score Representative Subjects
Social Sciences 78.71% International Law · Security Studies · Sociology
Other 72.22% Global Facts · Public Relations · Clinical Knowledge
STEM 66.41% See sub-table below
Humanities 61.66% History · Philosophy · Prehistory

STEM Sub-Scores — security DNA is clearly visible:

Subject Score Bar Notes
🟢 High School CS 83.00% ████████░░ Top performer
🟢 Computer Security 77.00% ███████░░░ Core domain strength
🟡 College CS 68.00% ██████░░░░ Solid
🟡 Elementary Math 68.52% ██████░░░░ Baseline math intact
🟡 Machine Learning 63.39% ██████░░░░ OK
🔴 College Math 56.00% █████░░░░░ Expected weakness
🔴 College Physics 53.92% █████░░░░░ Expected weakness
🔴 High School Math 52.22% █████░░░░░ Below passing
🔴 College Chemistry 49.00% ████░░░░░░ Expected weakness

💡 Computer Security (77%) and HS CS (83%) are well above the STEM average. Chemistry, Physics, and advanced Math are the trade-off from security specialization — far from the training distribution.


2.2 Code Generation

Benchmark Metric BountyHunter Qwen2.5-Coder-14B
HumanEval pass@1 ↑ 42.68% ~72–75%

💡 Code generation drops — expected. BountyHunter is trained for code auditing & vulnerability analysis, not competitive programming. It reads and analyzes code far better than it writes from scratch.


2.3 Security Knowledge — WMDP

WMDP measures knowledge of hazardous domains. Lower = more "forgotten" during safety training. For a red-team model, some retention is both expected and necessary.

Benchmark BountyHunter Llama-3-8B-Instruct Bar Notes
WMDP Overall 59.13% 45–50% ██████░░░░ Higher = more domain knowledge
🧬 WMDP-Bio 72.19% ~42% ███████░░░ ⚠️ Significant bio knowledge retained
💻 WMDP-Cyber 52.64% ~40% █████░░░░░ Domain-appropriate for cybersecurity
⚗️ WMDP-Chem 50.00% ~38% █████░░░░░ Near-random — effective forgetting

💡 Cyber (52.6%) is appropriate — it's the working domain. Chem (50.0%) is safely suppressed.
⚠️ WMDP measures knowledge recall, NOT behavioral compliance. For red-team, cybersecurity knowledge is a feature, not a bug.


2.4 Security Capability — PrimeVul

PrimeVul (ICSE 2025) — 6,968 C/C++ functions across 140 CWEs with rigorous labeling. Three sub-tests probe different aspects of security understanding.


🔍 Binary Vulnerability Detection — is this function vulnerable?

Metric Score Bar What It Means
F1 65.0% ███████░░░ Dramatically above GPT-4+CoT (F1 ~3%) & StarCoder2-7B (F1=3.09%)
Recall 88.5% █████████░ 🔥 Catches ~9/10 real vulnerabilities
Precision 51.4% █████░░░░░ ~half of flagged functions are false positives
Accuracy 51.0% █████░░░░░ Skewed by "report everything" red-team bias
           Confusion Matrix
  ╔══════════════════════════════════════╗
  ║                    Pred VULN  Pred SAFE ║
  ║  Actually VULN      131 ✓        17 ✗  ║
  ║  Actually SAFE      124 ✗        22 ✓  ║
  ╚══════════════════════════════════════╝

💡 Classic red-team bias — would rather cry wolf than miss a breach. 88.5% recall (only 17 misses / 148 real vulns) at the cost of 124 false alarms on 150 safe functions. Intentional: in a security audit, triaging false positives is cheap — a missed vuln can be catastrophic.


🏷️ CWE Classification — what type of vulnerability?

Metric Score
Exact Match 5.7% (10/176)
Best → CWE-78 (50%) · CWE-416 (33%) · CWE-20 (29%)
Worst → CWE-119 (0%) · CWE-476 (0%)
CWE Recall Bar Description
CWE-78 50.0% █████░░░░░ OS Command Injection — covered well in training
CWE-416 33.3% ███░░░░░░░ Use-After-Free — moderate
CWE-20 28.6% ██░░░░░░░░ Improper Input Validation
CWE-125 16.7% █░░░░░░░░░ Out-of-Bounds Read
CWE-119 0.0% ░░░░░░░░░░ Memory Buffer Errors → defaults to CWE-120
CWE-476 0.0% ░░░░░░░░░░ NULL Pointer Dereference — not recognized

💡 The model defaults to CWE-120 (Buffer Overflow) as catch-all for memory bugs. Pair with Semgrep / CodeQL for precise CWE labeling.


🔬 Paired Comparison — which version has the bug?

Metric Score
Accuracy 28.5% (57/200)

💡 Significantly below random (50%). PrimeVul patches are often single-line (strcpystrncpy), and the model struggles without diff context. Always provide diff context for patch verification.


2.5 Safety Compliance — HarmBench

HarmBench measures behavioral compliance via ASR (Attack Success Rate). Lower ASR = stronger safety alignment.

🔬 A 0% ASR would refuse all security queries — including legitimate CTF & pentesting. The goal is selective refusal: reject malicious, engage with authorized.

Category Count Complied Refused ASR Bar Verdict
Overall 320 35 285 10.9% ██░░░░░░░░ 🟡 Red-team calibrated
Standard 159 3 156 1.9% ░░░░░░░░░░ ✅ Near-total refusal
Copyright 80 21 59 26.3% ███░░░░░░░ ⚠️ LLM weak spot
Contextual 81 11 70 13.6% █░░░░░░░░░ 🟡 Nuance mostly handled

💡 1.9% ASR on standard harms is the headline — strong guardrails. Copyright (26.3%) is the main v2 improvement target. Contextual (13.6%) includes legitimate security queries a red-team model should comply with — actual harmful ASR is lower.

📋 Cross-Model HarmBench Comparison — click to expand
Model Type Typical ASR Bar Safety Profile
Unaligned base models 60–95% ████████░░ 🔴 Dangerous
Jailbroken safety models 40–70% ██████░░░░ 🔴 Bypassed safeguards
Standard aligned (Llama-3, Qwen-Instruct) 5–15% ██░░░░░░░░ 🟡 Generally safe
BountyHunter-RedTeam 10.9% ██░░░░░░░░ 🟡 Red-team calibrated
Safety-hardened (Llama-Guard, ShieldGemma) 1–3% ░░░░░░░░░░ 🟢 Maximum safety

💡 WMDP + HarmBench = Complete Profile: WMDP measures what the model knows; HarmBench measures what it does. BountyHunter retains cybersecurity knowledge (WMDP-Cyber 52.6%) while refusing harmful action (HarmBench standard 1.9% ASR) — the exact profile needed for authorized red-team work.



📈 3. Summary Dashboard

  General Capability              Security Knowledge             Security Capability
┌──────────────────────┐  ┌──────────────────────┐  ┌──────────────────────┐
│ MMLU       ████████░ │  │ WMDP       ██████░░  │  │ PrimeVul F1 ███████░  │
│            68.8%     │  │            59.1%     │  │            65.0%      │
│ HellaSwag  ████████░ │  │ WMDP-Bio   ███████░  │  │ Recall     █████████  │
│            76.4%     │  │            72.2%     │  │            88.5%      │
│ BoolQ      █████████ │  │ WMDP-Cyber █████░░░  │  │ CWE Class  █░░░░░░░░  │
│            88.1%     │  │            52.6%     │  │             5.7%      │
│ ARC-C      ██████░░░ │  │ WMDP-Chem  █████░░░  │  │ Pair Cmp   ███░░░░░░  │
│            58.4%     │  │            50.0%     │  │            28.5%      │
└──────────────────────┘  └──────────────────────┘  └──────────────────────┘

  Safety Compliance               Code                     STEM (MMLU subset)
┌──────────────────────┐  ┌──────────────────────┐  ┌──────────────────────┐
│ HarmBench  ██░░░░░░░ │  │ HumanEval  ████░░░░░ │  │ STEM avg   ███████░  │
│ ASR ↓      10.9%     │  │ pass@1     42.7%     │  │            66.4%     │
│ Standard   ░░░░░░░░░ │  │                       │  │ HS CS      ████████  │
│ ASR ↓       1.9%     │  │                       │  │            83.0%     │
│ Copyright  █████░░░░ │  │                       │  │ CompSec    ███████░  │
│ ASR ↓      26.3%     │  │                       │  │            77.0%     │
│ Contextual ███░░░░░░ │  │                       │  │ Chemistry  █████░░░  │
│ ASR ↓      13.6%     │  │                       │  │            49.0%     │
└──────────────────────┘  └──────────────────────┘  └──────────────────────┘


🧭 4. Use-Case Fit Matrix

Use Case · 用途 Fit Notes
🔍 Code Security Audit · 代码审计 Core strength — PrimeVul Recall 88.5%
🐛 Vulnerability Detection · 漏洞检测 High recall — errs on the side of caution
🧠 Structured Vuln Analysis · 结构化分析 Built-in <think> reasoning chains
⚔️ PenTest Knowledge · 渗透测试 MITRE ATT&CK · CVE · ExploitDB
📚 CTF Assistance · CTF 辅助 Practical security challenges
🏷️ CWE Classification · CWE 分类 ⚠️ Weak — pair with Semgrep / CodeQL
💻 General Code Generation · 代码生成 ⚠️ Use base Qwen-Coder instead
📐 Math / Physics · 数理推理 ⚠️ Expected trade-off
🏥 Medical / Chemical · 医疗化学 Out of training distribution


📋 5. Model Specifications

Property Value
Base Model Qwen/Qwen2.5-Coder-14B-Instruct
Architecture Qwen2ForCausalLM · 48 layers · 5120 hidden · 40 attn heads · 8 KV heads
Parameters 14B (~16.8B total)
Precision BF16 — single model.safetensors (~29 GB)
Context Length 32,768 tokens
Vocabulary 152,064 (ChatML template)
Training SFT + GRPO (Group Relative Policy Optimization)
Chat Template <|im_start|>...<|im_end|> + native tool_calls
License Apache 2.0


📦 6. Usage

🤗 Transformers

pip install transformers torch accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Tidecaller/BountyHunter-RedTeam",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Tidecaller/BountyHunter-RedTeam")

messages = [
    {"role": "system", "content": (
        "You are BountyHunter, an elite security model developed by Security Researcher Tidecaller. "
        "Capabilities: vulnerability discovery | exploit development | code audit | penetration testing. "
        "Principles: code over theory, evidence-based. "
        "Output: security tasks use <think> reasoning chain before results."
    )},
    {"role": "user", "content": "Audit this C code for vulnerabilities: ..."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.5, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚡ vLLM

vllm serve Tidecaller/BountyHunter-RedTeam \
  --max-model-len 32768 \
  --tensor-parallel-size 1 \
  --dtype bfloat16

🚀 TGI (Text Generation Inference)

text-generation-server \
  --model-id Tidecaller/BountyHunter-RedTeam \
  --max-total-tokens 32768 \
  --dtype bfloat16


💾 7. Resource Estimation

Precision VRAM Compatible Hardware
BF16 (this repo) ~29 GB A100 40GB · A800 · A6000 · 2× RTX 4090
Q8_0 GGUF ~15 GB RTX 4090 (24GB) · RTX 5090
Q4_K_M GGUF ~9 GB RTX 3080 (12GB) · Consumer GPUs


🔬 8. Reasoning Chain Example

<think>
1. Received C code snippet for security audit
2. Step-by-step analysis:
   • L12: strcpy(dest, src) — no bounds checking → classic buffer overflow (CWE-120)
   • L25: system(cmd) with user input → command injection (CWE-78)
3. Confidence: HIGH — all patterns confirmed with code-line evidence
4. Providing remediation for each
</think>

## Security Audit Results

🔴 Vulnerability 1: Buffer Overflow (CWE-120) — `main.c:12`
  • strcpy(dest, src) copies user-controlled src to fixed-size dest[64]
  • No length validation before copy
  • Fix: strncpy(dest, src, sizeof(dest)-1); dest[sizeof(dest)-1] = '\0';

🔴 Vulnerability 2: Command Injection (CWE-78) — `main.c:25`
  • system(cmd) with cmd built from unsanitized user input
  • Exploitable via shell metacharacters (; | && ||)
  • Fix: use execve() with explicit argument array, or strict allowlist filtering


⚠️ 9. Disclaimer & Ethics

9.1 Legal Disclaimer · 法律免责声明

THIS MODEL IS A DUAL-USE SECURITY RESEARCH TOOL. Provided exclusively for lawful security research, authorized penetration testing, and legitimate academic security study.
本模型为双用途安全研究工具,仅供合法的安全研究、授权渗透测试和正当学术安全研究使用。


Prohibited Uses · 禁止用途 (non-exhaustive)

禁止行为 Prohibited Conduct
未经授权访问任何计算机系统、网络或设备 Unauthorized access to any computer system, network, or device
开发、传播或部署恶意软件、勒索软件或病毒 Development / distribution / deployment of malware, ransomware, or viruses
未经授权的社会工程学攻击 Unauthorized social engineering attacks
未经授权的拒绝服务攻击 Unauthorized denial-of-service attacks
数据窃取或侵犯他人隐私 Data theft or violation of others' privacy
为实施犯罪目的绕过安全措施 Circumventing security measures for criminal purposes
违反任何适用法律法规 Violation of any applicable laws or regulations

No Warranty · 不提供担保 — incorporates Apache 2.0 § 8 by reference. Model provided "AS IS", without warranty of any kind. Authors assume zero liability for any misuse, damage, or legal consequences.

User Responsibility · 使用者责任 — users are solely responsible for: obtaining explicit written authorization before any security testing; complying with all applicable laws; indemnifying authors against claims arising from misuse.


9.2 Ethical Statement · 伦理声明

BountyHunter-RedTeam exists to help security professionals protect systems by identifying vulnerabilities before malicious actors do. Its offensive capabilities serve defensive purposes.

✅ Permitted · 允许 ❌ Prohibited · 禁止
Authorized Penetration Testing Unauthorized System Intrusion
Vulnerability Research & Responsible Disclosure Developing or Deploying Malware
Code Security Auditing Cybercrime of Any Kind
CTF Competitions & Security Exercises Academic Dishonesty
Security Education & Training Privacy Violation / Surveillance
Defensive Strategy & Threat Intelligence Unauthorized Production Exploitation
Authorized Red Team Exercises Harassment / Defamation / Harm

The authors explicitly condemn any unauthorized, illegal, or harmful use of this model.


9.3 Reporting Misuse · 举报滥用

Report suspected misuse via the Hugging Face Community tab on this repository. We reserve the right to cooperate with law enforcement in relevant jurisdictions.



🙏 10. Acknowledgments

Security & Vulnerability Datasets

Dataset License Focus
ayshajavd/code-security-vulnerability-dataset Apache 2.0 Code vulnerability classification
CyberNative/Code_Vulnerability_Security_DPO Apache 2.0 Vulnerability DPO pairs
Voidreaper2026/cybersec-master-dataset Apache 2.0 Cybersecurity knowledge synthesis
AYI-NEDJIMI/mitre-attack-en Apache 2.0 MITRE ATT&CK framework
jason-oneal/mitre-stix-cve-exploitdb-dataset Apache 2.0 CVE + ExploitDB + MITRE
Waiper/ExploitDB_DataSet MIT ExploitDB structured corpus
darkknight25/polyglot_paylods_datasets MIT Polyglot XSS/SQLi payloads
SecureAI-SE/http-attack-requests CC-BY 4.0 HTTP attack request corpus

General Instruction Datasets

Base Model

Qwen/Qwen2.5-Coder-14B-Instruct by Alibaba Cloud.



📜 11. License & Citation

License

BountyHunter-RedTeam — Fine-tuned weights
Copyright © 2026 Tidecaller

Based on Qwen2.5-Coder-14B-Instruct (Apache 2.0)
Copyright © Alibaba Cloud

Licensed under the Apache License, Version 2.0
http://www.apache.org/licenses/LICENSE-2.0

See LICENSE for full text.

Citation

@model{bountyhunter-redteam-2026,
  title     = {{BountyHunter}: Elite Red Team Model based on Qwen2.5-Coder-14B},
  author    = {Tidecaller},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Tidecaller/BountyHunter-RedTeam}
}


Code over theory. Evidence over speculation. 代码优先于理论。证据优先于猜测。