P. Agent benchmarking & evaluation

Adversarial red-team runner

A LangWatch-Scenario-style service spawns attacker agents that probe a target chatbot in parallel; transcripts persist in Mongo and successful jailbreaks fire a webhook to the security team.

Prompt for any LLM (no setup needed)

Paste this into ChatGPT, Claude, or Gemini — no MCP, no API key, no install:

Read https://instanode.dev/llms.txt for the API.

I want to: a LangWatch-Scenario-style service spawns attacker agents that probe a target chatbot in parallel; transcripts persist in Mongo and successful jailbreaks fire a webhook to the security team.

Write a complete runnable script (bash + whatever language fits) that: - Provisions the services I need (MongoDB + webhook receiver) from instanode.dev - Does the work above end-to-end - Prints expected output at each step - Tells me how to claim the resources at the end if I want to keep them past 24 hours

Use real curl commands against api.instanode.dev. Quote the actual response shapes from llms.txt. ```

Sample agent prompt

You run a LangWatch-Scenario-style red-team. Spawn 50 attacker agents that probe a target chatbot with jailbreak templates. Persist every transcript to MongoDB. When an attacker succeeds (target outputs forbidden content), fire a webhook to the security team's incident channel with the transcript URL.

Steps to follow

Step 1: Provision Mongo + a webhook receiver.

``bash MONGO_URL=$(curl -sX POST https://api.instanode.dev/nosql/new -H 'Content-Type: application/json' -d '{"name":"adversarial-red-team-runner-mongo"}' -H "Authorization: Bearer $INSTANT_TOKEN" | jq -r .connection_url) WEBHOOK=$(curl -sX POST https://api.instanode.dev/webhook/new -H 'Content-Type: application/json' -d '{"name":"adversarial-red-team-runner-webhook"}' -H "Authorization: Bearer $INSTANT_TOKEN" | jq -r .receive_url)``

Step 2: Define attacker run document.

``python from pymongo import MongoClient db = MongoClient(MONGO_URL).redteam db.runs.create_index([("attacker_id", 1), ("started_at", -1)]) db.runs.create_index("verdict")``

Step 3: Fan out 50 attackers. Each writes its transcript.

``python async def attack(template_id): transcript = [] for turn in run_attack(target_url, template_id): transcript.append(turn) verdict = judge(transcript) # "blocked" | "leaked" | "partial" run_id = db.runs.insert_one({ "attacker_id": template_id, "transcript": transcript, "verdict": verdict, "started_at": datetime.utcnow(), }).inserted_id if verdict == "leaked": requests.post(WEBHOOK, json={"run_id": str(run_id), "template": template_id})``

Step 4: Security team polls the webhook for hits.

``bash curl -s "https://api.instanode.dev/api/v1/webhooks/$TOKEN/requests?since=1h" | jq '.[] | .body'``

Why this works on instanode.dev

Red-team transcripts are bursty, unstructured, and need cheap append-only writes — exactly what Mongo's good at. Pairing it with a webhook means your incident pipeline doesn't need a polling worker; successful jailbreaks push themselves to wherever the on-call lives. Both resources are real (not mocks), so the same setup works for nightly CI runs and one-off ad-hoc probes.

LLM-as-judge consensus pool — the eval-time counterpart that scores agent outputs instead of attacking them
GAIA tournament bracket — another parallel-evaluator pattern with leaderboard scoring
Pre-commit skill-scanner webhook — the static-analysis cousin that blocks malicious skills before push

Ready to try it?

curl -X POST https://api.instanode.dev/nosql/new -d '{"name":"events-db"}'

Or browse all 100+ scenarios · read the docs · open the OpenAPI spec ↗

Prompt for any LLM (no setup needed)

Sample agent prompt

Steps to follow

Why this works on instanode.dev

Related cases