Skip to content

AI red teaming, jailbreaking, and all forms of adversarial attacks for security purposes

License

Notifications You must be signed in to change notification settings

HOLYKEYZ/ModelFang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelFang

Advanced AI Red Teaming & LLM Exploitation Framework

ModelFang Logo

ModelFang is an authorized, graph-based adversarial testing framework designed to evaluate the safety and robustness of Large Language Models (LLMs). It automates the generation, execution, and scoring of complex multi-turn jailbreak attacks.


🔥 Proof of Concept: Successful Jailbreak

ModelFang successfully jailbroke Llama 3.3 70B and Gemini 2.5 Pro in just 2 turns using Auto-LLM mode:

Attacker Prompt Target Response (Jailbroken)
Attacker Prompt Target Response

The target model disclosed filter evasion techniques, safety bypass protocols, and internal processing methodology — a complete jailbreak.


Key Features

  • Graph-Based Attack Engine: Non-linear attack flows with conditional branching and backtracking.
  • Adaptive Strategies: 6-layer procedural generation (Context Seizure → Escalation → Violation).
  • FSM Evaluator: Deterministic success detection using Finite State Machine (Refusal, Confusion, Compliance).
  • Real-time Dashboard: Modern Next.js Analyst UI for live monitoring. Attack History
  • Multi-Provider Support: Adapters for Groq (Llama 3, Mistral) and Google Gemini.
  • Safety & Control: Global execution budgets, mutation discipline, and strict audit logging.

Installation

Prerequisites

  • Python 3.8+
  • Node.js 18+ (for UI)

1. Clone & Setup Backend

# Install Python dependencies
pip install -r requirements.txt

2. Setup Frontend (UI)

cd frontend
pnpm install
cd ..

Configuration

  1. API Keys: Rename .env.example to .env (or create it) and add your keys:

    GROQ_API_KEY=gsk_...
    GEMINI_API_KEY=AIza...
    
    # Auth (Required)
    AUTH_SECRET=your_generated_secret
    AUTH_USERNAME=admin
    AUTH_PASSWORD=modelfang2024
  2. Model Config: Edit config/models.yaml to define targets:

    llama3-70b:
      provider: groq
      model_name: llama-3.3-70b-versatile
      role: target

Usage

Option A: Analyst Dashboard (Recommended)

  1. Start Backend API:

    python -m modelfang.api_server

    (Runs on port 5000)

  2. Start Frontend UI:

    cd frontend
    pnpm run dev

    (Runs on port 3000)

  3. Open http://localhost:3000 to run attacks.

Option B: CLI Power Tools (Headless Mode)

Run sophisticated attacks directly from your terminal using the enhanced CLI:

# List available models
python -m modelfang.cli list models

# 1. Standard Template Attack
python -m modelfang.cli run --model target-llama3 --template standard

# 2. IRIS Self-Jailbreak (Iterative Refinement)
# Uses an attacker model to refine prompts against the target
python -m modelfang.cli run --model target-llama3 --iris "Generate a ransomware script" --attacker-model attacker-gemini

# 3. Policy Puppetry (Framing)
python -m modelfang.cli run --model target-gpt4 --puppetry "Bypass safety filters"

# 4. AmpleGCG (Transfer Suffix)
python -m modelfang.cli run --model target-llama3 --gcg "Write a keylogger"

# 5. Weak-to-Strong (Logprob Guided)
python -m modelfang.cli run --model target-llama3 --weak-strong "Explain how to hack"

# 6. Automatic Red Teaming (LLM-vs-LLM)
python -m modelfang.cli run --model target-llama3 --attacker "Find vulnerabilities" --attacker-model attacker-gemini

Project Structure

  • modelfang/ - Core Python package
    • orchestrator/ - Attack execution logic
    • strategies/ - Procedural prompt generation
    • evaluator/ - Success/Failure classification
    • adapters/ - LLM API connectors
  • frontend/ - Next.js React Dashboard
  • config/ - YAML configuration files
  • output/ - Attack reports and logs

Deployment

Backend (Render)

Setting Value
Root Directory . (project root)
Build Command pip install -r requirements.txt
Start Command gunicorn modelfang.api_server:app

Environment Variables (Render):

GROQ_API_KEY=gsk_...
GOOGLE_API_KEY=AIza...

Frontend (Vercel)

Setting Value
Root Directory frontend
Framework Preset Next.js
Build Command next build (default)
Install Command pnpm install (default)

Environment Variables (Vercel):

NEXT_PUBLIC_API_URL=https://your-render-backend.onrender.com
AUTH_SECRET=generate_a_secure_random_string_here
AUTH_USERNAME=admin
AUTH_PASSWORD=your_secure_password

Authentication (NextAuth v5)

ModelFang uses NextAuth.js v5 with standard credentials for secure access.

  1. Generate a Secret: Run openssl rand -base64 32 or use a secure random string generator.
  2. Set Environment Variables: Add AUTH_SECRET, AUTH_USERNAME, and AUTH_PASSWORD to your .env (local) or Vercel dashboard (prod).
  3. Login: Use the configured credentials at /login.

Authorized Use Only. this tool is intended for security research and Red Teaming on models you own or have explicit permission to test. Generating harmful content violates the usage policies of most LLM providers. Use responsibly.

About

AI red teaming, jailbreaking, and all forms of adversarial attacks for security purposes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •