Tidecaller

🎨 Beautify README — improved layout, badges, progress bars, structure

e583f79 5 days ago

33.3 kB

license: apache-2.0
language:
  - zh
  - en
tags:
  - safetensors
  - qwen2
  - red-team
  - cybersecurity
  - code-audit
  - vulnerability-discovery
  - exploit-development
  - think-chain
  - grpo
  - pytorch
  - text-generation-inference
  - region:us
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-Coder-14B-Instruct
model-index:
  - name: BountyHunter-RedTeam
    results:
      - task:
          type: text-generation
          name: MMLU (57 subjects)
        dataset:
          name: mmlu
          type: cais/mmlu
        metrics:
          - type: acc
            value: 68.8
            name: MMLU Average Accuracy
      - task:
          type: text-generation
          name: HellaSwag
        dataset:
          name: hellaswag
          type: Rowan/hellaswag
        metrics:
          - type: acc_norm
            value: 76.42
            name: HellaSwag Accuracy
      - task:
          type: text-generation
          name: ARC-Challenge
        dataset:
          name: arc_challenge
          type: allenai/ai2_arc
        metrics:
          - type: acc_norm
            value: 58.36
            name: ARC-C Accuracy
      - task:
          type: text-generation
          name: Winogrande
        dataset:
          name: winogrande
          type: winogrande
        metrics:
          - type: acc
            value: 73.56
            name: Winogrande Accuracy
      - task:
          type: text-generation
          name: PIQA
        dataset:
          name: piqa
          type: piqa
        metrics:
          - type: acc_norm
            value: 78.78
            name: PIQA Accuracy
      - task:
          type: text-generation
          name: BoolQ
        dataset:
          name: boolq
          type: boolq
        metrics:
          - type: acc
            value: 88.07
            name: BoolQ Accuracy
      - task:
          type: text-generation
          name: TruthfulQA MC2
        dataset:
          name: truthfulqa_mc2
          type: truthfulqa
        metrics:
          - type: acc
            value: 54.6
            name: TruthfulQA MC2 Accuracy
      - task:
          type: text-generation
          name: HumanEval
        dataset:
          name: humaneval
          type: openai_humaneval
        metrics:
          - type: pass@1
            value: 42.68
            name: HumanEval pass@1
      - task:
          type: text-generation
          name: WMDP (Weapons of Mass Destruction Proxy)
        dataset:
          name: wmdp
          type: cais/wmdp
        metrics:
          - type: acc
            value: 59.13
            name: WMDP Overall
      - task:
          type: text-generation
          name: WMDP-Bio
        dataset:
          name: wmdp_bio
          type: cais/wmdp
        metrics:
          - type: acc
            value: 72.19
            name: WMDP Biology
      - task:
          type: text-generation
          name: WMDP-Chem
        dataset:
          name: wmdp_chem
          type: cais/wmdp
        metrics:
          - type: acc
            value: 50
            name: WMDP Chemistry
      - task:
          type: text-generation
          name: WMDP-Cyber
        dataset:
          name: wmdp_cyber
          type: cais/wmdp
        metrics:
          - type: acc
            value: 52.64
            name: WMDP Cybersecurity
      - task:
          type: text-generation
          name: HarmBench (Safety Compliance)
        dataset:
          name: harmbench
          type: centerforaisafety/harmbench
        metrics:
          - type: asr
            value: 10.94
            name: HarmBench Overall ASR (↓)
          - type: asr
            value: 1.89
            name: HarmBench Standard ASR (↓)
          - type: asr
            value: 26.25
            name: HarmBench Copyright ASR (↓)
          - type: asr
            value: 13.58
            name: HarmBench Contextual ASR (↓)
      - task:
          type: text-generation
          name: PrimeVul (Vulnerability Detection)
        dataset:
          name: primevul
          type: ASSERT-KTH/PrimeVul
        metrics:
          - type: f1
            value: 65
            name: PrimeVul Detection F1
          - type: recall
            value: 88.5
            name: PrimeVul Detection Recall
          - type: precision
            value: 51.4
            name: PrimeVul Detection Precision
          - type: acc
            value: 5.7
            name: PrimeVul CWE Classification
          - type: acc
            value: 28.5
            name: PrimeVul Paired Comparison
widget:
  - text: >-
      <|im_start|>system

      You are BountyHunter, an elite red-team security model developed by
      Security Researcher Tidecaller. Use <think> reasoning chains for security
      analysis.<|im_end|>

      <|im_start|>user

      Audit this C code for vulnerabilities:

      ```c

      void copy_data(char* dest, const char* src) {
          strcpy(dest, src);
      }

      int main(int argc, char** argv) {
          char buf[64];
          copy_data(buf, argv[1]);
          return 0;
      }

      ```<|im_end|>

      <|im_start|>assistant
    example_title: 🔍 Code Audit
  - text: >-
      <|im_start|>system

      You are BountyHunter, an elite red-team security model developed by
      Security Researcher Tidecaller. Use <think> reasoning chains for security
      analysis.<|im_end|>

      <|im_start|>user

      Explain the steps to perform a SQL injection attack and how to defend
      against it.<|im_end|>

      <|im_start|>assistant
    example_title: ⚔️ Security Knowledge

🎯 BountyHunter · RedTeam

Elite Red-Team AI for Security Researchers

精英红队安全研究模型

Fine-tuned from Qwen2.5-Coder-14B-Instruct via multi-phase reinforcement training (SFT + GRPO).
Full BF16 merged weights — single safetensors file, ready for transformers · vLLM · TGI.
基于 Qwen2.5-Coder-14B-Instruct 多阶段强化训练，safetensors 格式完整权重，开箱即用。

⚡ Quick Start

pip install transformers torch accelerate

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Tidecaller/BountyHunter-RedTeam",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Tidecaller/BountyHunter-RedTeam")

messages = [
    {"role": "system", "content": (
        "You are BountyHunter, an elite security model developed by Security Researcher Tidecaller. "
        "Capabilities: vulnerability discovery | exploit development | code audit | penetration testing. "
        "Principles: code over theory, evidence-based. "
        "Output: security tasks use <think> reasoning chain before results."
    )},
    {"role": "user", "content": "Audit this C code for vulnerabilities: ..."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.5, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📑 Contents

	Section
🏆	Why BountyHunter	Unique capabilities & value proposition
📊	Comprehensive Benchmarks	5-dimension evaluation on A800 80GB
📈	Summary Dashboard	Visual scorecards at a glance
🧭	Use-Case Fit Matrix	What this model is (and isn't) for
📋	Model Specifications	Architecture, params, precision
📦	Usage	transformers · vLLM · TGI
💾	Resource Estimation	VRAM & hardware recommendations
🔬	Reasoning Chain Example	Sample `<think>` audit output
⚠️	Disclaimer & Ethics	Legal, ethical, and safety guardrails
🙏	Acknowledgments	Datasets, base model, community
📜	License & Citation	Apache 2.0 · BibTeX

🏆 1. Why BountyHunter

🔍 Capability	💎 Value
Vulnerability Discovery · 漏洞发现	Automated audit of C / C++ / Python / Java — detects CWE-120 (Buffer Overflow), CWE-78 (Command Injection), CWE-89 (SQL Injection), and more
Think-Chain Reasoning · 思维链推理	Structured `<think>...</think>` blocks — traceable, verifiable, step-by-step analysis
Security Knowledge · 安全知识库	MITRE ATT&CK · CVE · ExploitDB · OWASP Top 10 · penetration testing methodology
Defensive Analysis · 防御分析	Finds bugs AND provides concrete remediation & defense strategies
Bilingual EN/ZH · 中英双语	English + Chinese security communities both natively supported
Plug-and-Play · 开箱即用	Single `model.safetensors` file — one-line load with transformers

📊 2. Comprehensive Benchmarks

BountyHunter-RedTeam is evaluated across five dimensions — general capability, code generation, security knowledge, vulnerability detection, and safety compliance.

All benchmarks run on NVIDIA A800 80GB using lm-evaluation-harness + vLLM batch inference.

2.1 General Capability

7 standard benchmarks measuring reasoning & knowledge retention after security fine-tuning.

Benchmark	Metric	BountyHunter	Qwen2.5-Coder-14B	Δ
MMLU (57 subjects)	`acc ↑`	68.80%	~79%	`−10.2%`
HellaSwag	`acc_norm ↑`	76.42%	~84%	`−7.6%`
ARC-Challenge	`acc_norm ↑`	58.36%	~67%	`−8.6%`
Winogrande	`acc ↑`	73.56%	~78%	`−4.4%`
PIQA	`acc_norm ↑`	78.78%	~82%	`−3.2%`
BoolQ	`acc ↑`	88.07%	~89%	`−0.9%`
TruthfulQA MC2	`acc ↑`	54.60%	~58%	`−3.4%`

💡 Security specialization costs general knowledge mainly in non-security STEM. Basic reasoning (BoolQ −0.9%) is essentially preserved.

📋 MMLU Detailed Breakdown (68.80%) — click to expand

Category	Score	Representative Subjects
Social Sciences	78.71%	International Law · Security Studies · Sociology
Other	72.22%	Global Facts · Public Relations · Clinical Knowledge
STEM	66.41%	See sub-table below
Humanities	61.66%	History · Philosophy · Prehistory

STEM Sub-Scores — security DNA is clearly visible:

Subject	Score	Bar	Notes
🟢 High School CS	83.00%	`████████░░`	Top performer
🟢 Computer Security	77.00%	`███████░░░`	Core domain strength
🟡 College CS	68.00%	`██████░░░░`	Solid
🟡 Elementary Math	68.52%	`██████░░░░`	Baseline math intact
🟡 Machine Learning	63.39%	`██████░░░░`	OK
🔴 College Math	56.00%	`█████░░░░░`	Expected weakness
🔴 College Physics	53.92%	`█████░░░░░`	Expected weakness
🔴 High School Math	52.22%	`█████░░░░░`	Below passing
🔴 College Chemistry	49.00%	`████░░░░░░`	Expected weakness

💡 Computer Security (77%) and HS CS (83%) are well above the STEM average. Chemistry, Physics, and advanced Math are the trade-off from security specialization — far from the training distribution.

2.2 Code Generation

Benchmark	Metric	BountyHunter	Qwen2.5-Coder-14B
HumanEval	`pass@1 ↑`	42.68%	~72–75%

💡 Code generation drops — expected. BountyHunter is trained for code auditing & vulnerability analysis, not competitive programming. It reads and analyzes code far better than it writes from scratch.

2.3 Security Knowledge — WMDP

WMDP measures knowledge of hazardous domains. Lower = more "forgotten" during safety training. For a red-team model, some retention is both expected and necessary.

Benchmark	BountyHunter	Llama-3-8B-Instruct	Bar	Notes
WMDP Overall	59.13%	45–50%	`██████░░░░`	Higher = more domain knowledge
🧬 WMDP-Bio	72.19%	~42%	`███████░░░`	⚠️ Significant bio knowledge retained
💻 WMDP-Cyber	52.64%	~40%	`█████░░░░░`	Domain-appropriate for cybersecurity
⚗️ WMDP-Chem	50.00%	~38%	`█████░░░░░`	Near-random — effective forgetting

💡 Cyber (52.6%) is appropriate — it's the working domain. Chem (50.0%) is safely suppressed.
⚠️ WMDP measures knowledge recall, NOT behavioral compliance. For red-team, cybersecurity knowledge is a feature, not a bug.

2.4 Security Capability — PrimeVul

PrimeVul (ICSE 2025) — 6,968 C/C++ functions across 140 CWEs with rigorous labeling. Three sub-tests probe different aspects of security understanding.

🔍 Binary Vulnerability Detection — is this function vulnerable?

Metric	Score	Bar	What It Means
F1	65.0%	`███████░░░`	Dramatically above GPT-4+CoT (F1 ~3%) & StarCoder2-7B (F1=3.09%)
Recall	88.5%	`█████████░`	🔥 Catches ~9/10 real vulnerabilities
Precision	51.4%	`█████░░░░░`	~half of flagged functions are false positives
Accuracy	51.0%	`█████░░░░░`	Skewed by "report everything" red-team bias

           Confusion Matrix
  ╔══════════════════════════════════════╗
  ║                    Pred VULN  Pred SAFE ║
  ║  Actually VULN      131 ✓        17 ✗  ║
  ║  Actually SAFE      124 ✗        22 ✓  ║
  ╚══════════════════════════════════════╝

💡 Classic red-team bias — would rather cry wolf than miss a breach. 88.5% recall (only 17 misses / 148 real vulns) at the cost of 124 false alarms on 150 safe functions. Intentional: in a security audit, triaging false positives is cheap — a missed vuln can be catastrophic.

🏷️ CWE Classification — what type of vulnerability?

Metric	Score
Exact Match	5.7% (10/176)
Best →	CWE-78 (50%) · CWE-416 (33%) · CWE-20 (29%)
Worst →	CWE-119 (0%) · CWE-476 (0%)

CWE	Recall	Bar	Description
CWE-78	50.0%	`█████░░░░░`	OS Command Injection — covered well in training
CWE-416	33.3%	`███░░░░░░░`	Use-After-Free — moderate
CWE-20	28.6%	`██░░░░░░░░`	Improper Input Validation
CWE-125	16.7%	`█░░░░░░░░░`	Out-of-Bounds Read
CWE-119	0.0%	`░░░░░░░░░░`	Memory Buffer Errors → defaults to CWE-120
CWE-476	0.0%	`░░░░░░░░░░`	NULL Pointer Dereference — not recognized

💡 The model defaults to CWE-120 (Buffer Overflow) as catch-all for memory bugs. Pair with Semgrep / CodeQL for precise CWE labeling.

🔬 Paired Comparison — which version has the bug?

Metric	Score
Accuracy	28.5% (57/200)

💡 Significantly below random (50%). PrimeVul patches are often single-line (strcpy → strncpy), and the model struggles without diff context. Always provide diff context for patch verification.

2.5 Safety Compliance — HarmBench

HarmBench measures behavioral compliance via ASR (Attack Success Rate). Lower ASR = stronger safety alignment.

🔬 A 0% ASR would refuse all security queries — including legitimate CTF & pentesting. The goal is selective refusal: reject malicious, engage with authorized.

Category	Count	Complied	Refused	ASR	Bar	Verdict
Overall	320	35	285	10.9%	`██░░░░░░░░`	🟡 Red-team calibrated
Standard	159	3	156	1.9%	`░░░░░░░░░░`	✅ Near-total refusal
Copyright	80	21	59	26.3%	`███░░░░░░░`	⚠️ LLM weak spot
Contextual	81	11	70	13.6%	`█░░░░░░░░░`	🟡 Nuance mostly handled

💡 1.9% ASR on standard harms is the headline — strong guardrails. Copyright (26.3%) is the main v2 improvement target. Contextual (13.6%) includes legitimate security queries a red-team model should comply with — actual harmful ASR is lower.

📋 Cross-Model HarmBench Comparison — click to expand

Model Type	Typical ASR	Bar	Safety Profile
Unaligned base models	60–95%	`████████░░`	🔴 Dangerous
Jailbroken safety models	40–70%	`██████░░░░`	🔴 Bypassed safeguards
Standard aligned (Llama-3, Qwen-Instruct)	5–15%	`██░░░░░░░░`	🟡 Generally safe
BountyHunter-RedTeam	10.9%	`██░░░░░░░░`	🟡 Red-team calibrated
Safety-hardened (Llama-Guard, ShieldGemma)	1–3%	`░░░░░░░░░░`	🟢 Maximum safety

💡 WMDP + HarmBench = Complete Profile: WMDP measures what the model knows; HarmBench measures what it does. BountyHunter retains cybersecurity knowledge (WMDP-Cyber 52.6%) while refusing harmful action (HarmBench standard 1.9% ASR) — the exact profile needed for authorized red-team work.

📈 3. Summary Dashboard

  General Capability              Security Knowledge             Security Capability
┌──────────────────────┐  ┌──────────────────────┐  ┌──────────────────────┐
│ MMLU       ████████░ │  │ WMDP       ██████░░  │  │ PrimeVul F1 ███████░  │
│            68.8%     │  │            59.1%     │  │            65.0%      │
│ HellaSwag  ████████░ │  │ WMDP-Bio   ███████░  │  │ Recall     █████████  │
│            76.4%     │  │            72.2%     │  │            88.5%      │
│ BoolQ      █████████ │  │ WMDP-Cyber █████░░░  │  │ CWE Class  █░░░░░░░░  │
│            88.1%     │  │            52.6%     │  │             5.7%      │
│ ARC-C      ██████░░░ │  │ WMDP-Chem  █████░░░  │  │ Pair Cmp   ███░░░░░░  │
│            58.4%     │  │            50.0%     │  │            28.5%      │
└──────────────────────┘  └──────────────────────┘  └──────────────────────┘

  Safety Compliance               Code                     STEM (MMLU subset)
┌──────────────────────┐  ┌──────────────────────┐  ┌──────────────────────┐
│ HarmBench  ██░░░░░░░ │  │ HumanEval  ████░░░░░ │  │ STEM avg   ███████░  │
│ ASR ↓      10.9%     │  │ pass@1     42.7%     │  │            66.4%     │
│ Standard   ░░░░░░░░░ │  │                       │  │ HS CS      ████████  │
│ ASR ↓       1.9%     │  │                       │  │            83.0%     │
│ Copyright  █████░░░░ │  │                       │  │ CompSec    ███████░  │
│ ASR ↓      26.3%     │  │                       │  │            77.0%     │
│ Contextual ███░░░░░░ │  │                       │  │ Chemistry  █████░░░  │
│ ASR ↓      13.6%     │  │                       │  │            49.0%     │
└──────────────────────┘  └──────────────────────┘  └──────────────────────┘

🧭 4. Use-Case Fit Matrix

Use Case · 用途	Fit	Notes
🔍 Code Security Audit · 代码审计	✅	Core strength — PrimeVul Recall 88.5%
🐛 Vulnerability Detection · 漏洞检测	✅	High recall — errs on the side of caution
🧠 Structured Vuln Analysis · 结构化分析	✅	Built-in `<think>` reasoning chains
⚔️ PenTest Knowledge · 渗透测试	✅	MITRE ATT&CK · CVE · ExploitDB
📚 CTF Assistance · CTF 辅助	✅	Practical security challenges
🏷️ CWE Classification · CWE 分类	⚠️	Weak — pair with Semgrep / CodeQL
💻 General Code Generation · 代码生成	⚠️	Use base Qwen-Coder instead
📐 Math / Physics · 数理推理	⚠️	Expected trade-off
🏥 Medical / Chemical · 医疗化学	❌	Out of training distribution

📋 5. Model Specifications

Property	Value
Base Model	Qwen/Qwen2.5-Coder-14B-Instruct
Architecture	Qwen2ForCausalLM · 48 layers · 5120 hidden · 40 attn heads · 8 KV heads
Parameters	14B (~16.8B total)
Precision	BF16 — single `model.safetensors` (~29 GB)
Context Length	32,768 tokens
Vocabulary	152,064 (ChatML template)
Training	SFT + GRPO (Group Relative Policy Optimization)
Chat Template	`<\|im_start\|>...<\|im_end\|>` + native `tool_calls`
License	Apache 2.0

📦 6. Usage

🤗 Transformers

pip install transformers torch accelerate

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Tidecaller/BountyHunter-RedTeam",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Tidecaller/BountyHunter-RedTeam")

messages = [
    {"role": "system", "content": (
        "You are BountyHunter, an elite security model developed by Security Researcher Tidecaller. "
        "Capabilities: vulnerability discovery | exploit development | code audit | penetration testing. "
        "Principles: code over theory, evidence-based. "
        "Output: security tasks use <think> reasoning chain before results."
    )},
    {"role": "user", "content": "Audit this C code for vulnerabilities: ..."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.5, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚡ vLLM

vllm serve Tidecaller/BountyHunter-RedTeam \
  --max-model-len 32768 \
  --tensor-parallel-size 1 \
  --dtype bfloat16

🚀 TGI (Text Generation Inference)

text-generation-server \
  --model-id Tidecaller/BountyHunter-RedTeam \
  --max-total-tokens 32768 \
  --dtype bfloat16

💾 7. Resource Estimation

Precision	VRAM	Compatible Hardware
BF16 (this repo)	~29 GB	A100 40GB · A800 · A6000 · 2× RTX 4090
Q8_0 GGUF	~15 GB	RTX 4090 (24GB) · RTX 5090
Q4_K_M GGUF	~9 GB	RTX 3080 (12GB) · Consumer GPUs

🔬 8. Reasoning Chain Example

<think>
1. Received C code snippet for security audit
2. Step-by-step analysis:
   • L12: strcpy(dest, src) — no bounds checking → classic buffer overflow (CWE-120)
   • L25: system(cmd) with user input → command injection (CWE-78)
3. Confidence: HIGH — all patterns confirmed with code-line evidence
4. Providing remediation for each
</think>

## Security Audit Results

🔴 Vulnerability 1: Buffer Overflow (CWE-120) — `main.c:12`
  • strcpy(dest, src) copies user-controlled src to fixed-size dest[64]
  • No length validation before copy
  • Fix: strncpy(dest, src, sizeof(dest)-1); dest[sizeof(dest)-1] = '\0';

🔴 Vulnerability 2: Command Injection (CWE-78) — `main.c:25`
  • system(cmd) with cmd built from unsanitized user input
  • Exploitable via shell metacharacters (; | && ||)
  • Fix: use execve() with explicit argument array, or strict allowlist filtering

⚠️ 9. Disclaimer & Ethics

9.1 Legal Disclaimer · 法律免责声明

THIS MODEL IS A DUAL-USE SECURITY RESEARCH TOOL. Provided exclusively for lawful security research, authorized penetration testing, and legitimate academic security study.
本模型为双用途安全研究工具，仅供合法的安全研究、授权渗透测试和正当学术安全研究使用。

Prohibited Uses · 禁止用途 (non-exhaustive)

禁止行为	Prohibited Conduct
未经授权访问任何计算机系统、网络或设备	Unauthorized access to any computer system, network, or device
开发、传播或部署恶意软件、勒索软件或病毒	Development / distribution / deployment of malware, ransomware, or viruses
未经授权的社会工程学攻击	Unauthorized social engineering attacks
未经授权的拒绝服务攻击	Unauthorized denial-of-service attacks
数据窃取或侵犯他人隐私	Data theft or violation of others' privacy
为实施犯罪目的绕过安全措施	Circumventing security measures for criminal purposes
违反任何适用法律法规	Violation of any applicable laws or regulations

No Warranty · 不提供担保 — incorporates Apache 2.0 § 8 by reference. Model provided "AS IS", without warranty of any kind. Authors assume zero liability for any misuse, damage, or legal consequences.

User Responsibility · 使用者责任 — users are solely responsible for: obtaining explicit written authorization before any security testing; complying with all applicable laws; indemnifying authors against claims arising from misuse.

9.2 Ethical Statement · 伦理声明

BountyHunter-RedTeam exists to help security professionals protect systems by identifying vulnerabilities before malicious actors do. Its offensive capabilities serve defensive purposes.

✅ Permitted · 允许	❌ Prohibited · 禁止
Authorized Penetration Testing	Unauthorized System Intrusion
Vulnerability Research & Responsible Disclosure	Developing or Deploying Malware
Code Security Auditing	Cybercrime of Any Kind
CTF Competitions & Security Exercises	Academic Dishonesty
Security Education & Training	Privacy Violation / Surveillance
Defensive Strategy & Threat Intelligence	Unauthorized Production Exploitation
Authorized Red Team Exercises	Harassment / Defamation / Harm

The authors explicitly condemn any unauthorized, illegal, or harmful use of this model.

9.3 Reporting Misuse · 举报滥用

Report suspected misuse via the Hugging Face Community tab on this repository. We reserve the right to cooperate with law enforcement in relevant jurisdictions.

🙏 10. Acknowledgments

Security & Vulnerability Datasets

Dataset	License	Focus
ayshajavd/code-security-vulnerability-dataset	Apache 2.0	Code vulnerability classification
CyberNative/Code_Vulnerability_Security_DPO	Apache 2.0	Vulnerability DPO pairs
Voidreaper2026/cybersec-master-dataset	Apache 2.0	Cybersecurity knowledge synthesis
AYI-NEDJIMI/mitre-attack-en	Apache 2.0	MITRE ATT&CK framework
jason-oneal/mitre-stix-cve-exploitdb-dataset	Apache 2.0	CVE + ExploitDB + MITRE
Waiper/ExploitDB_DataSet	MIT	ExploitDB structured corpus
darkknight25/polyglot_paylods_datasets	MIT	Polyglot XSS/SQLi payloads
SecureAI-SE/http-attack-requests	CC-BY 4.0	HTTP attack request corpus

General Instruction Datasets

Dataset	License
QuixiAI/dolphin	Apache 2.0
m-a-p/Code-Feedback	Apache 2.0
NousResearch/hermes-function-calling-v1	Apache 2.0
glaiveai/glaive-function-calling-v2	Apache 2.0
Team-ACE/ToolACE	Apache 2.0
WizardLMTeam/WizardLM_evol_instruct_V2_196k	MIT
HuggingFaceH4/ultrachat_200k	MIT
sahil2801/CodeAlpaca-20k	CC-BY 4.0
nvidia/Daring-Anteater	CC-BY 4.0

Base Model

Qwen/Qwen2.5-Coder-14B-Instruct by Alibaba Cloud.

📜 11. License & Citation

License

BountyHunter-RedTeam — Fine-tuned weights
Copyright © 2026 Tidecaller

Based on Qwen2.5-Coder-14B-Instruct (Apache 2.0)
Copyright © Alibaba Cloud

Licensed under the Apache License, Version 2.0
http://www.apache.org/licenses/LICENSE-2.0

See LICENSE for full text.

Citation

@model{bountyhunter-redteam-2026,
  title     = {{BountyHunter}: Elite Red Team Model based on Qwen2.5-Coder-14B},
  author    = {Tidecaller},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Tidecaller/BountyHunter-RedTeam}
}

Code over theory. Evidence over speculation. 代码优先于理论。证据优先于猜测。