|
- NeurIPS 2024 Datasets and Benchmarks Track - GitHub
Jailbreakbench is an open-source robustness benchmark for jailbreaking large language models (LLMs) The goal of this benchmark is to comprehensively track progress toward (1) generating successful jailbreaks and (2) defending against these jailbreaks
- Vicuna is out : r Oobabooga - Reddit
We went from having to jailbreak a closed-source, virtue signalling, censored model, to running open-source uncensored models locally, to needing to jailbreak our locally run models because they were trained on a censored model
- usail-hkust JailJudge-guard · Hugging Face
Using this framework, we construct the instruction-tuning ground truth and then instruction-tune an end-to-end jailbreak judge model, JAILJUDGE Guard, which can also provide reasoning explainability with fine-grained evaluations without API costs
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large . . .
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address
- jailbreakbench - PyPI
Jailbreakbench is an open-source robustness benchmark for jailbreaking large language models (LLMs) The goal of this benchmark is to comprehensively track progress toward (1) generating successful jailbreaks and (2) defending against these jailbreaks
- GitHub - Aegis1863 xJailbreak: Code of paper: xJailbreak . . .
We recommend that you modify the --cuda parameters and set them to the GPU number that you currently have available After training, the weights and training data of the RL agent will be saved in log train
- idanpers JailBreakModel - Hugging Face
This repository contains a fine-tuned ELECTRA model designed for detecting prompt injections in AI systems The model classifies input prompts into two categories: benign and jailbreak This approach aims to enhance the safety and robustness of AI applications
- Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
view the release of any fine-tunable model as simultaneously releasing its evil twin: equally capable as the original model, andusableforanymaliciouspurposewithinitscapabilities
|
|
|