|
- Forum - OpenReview
The server responded with the following message: Too many requests: You have made 70 requests, surpassing the limit of 60 requests
- CLEVER: A Curated Benchmark for Formally Verified Code Generation
TL;DR: We introduce CLEVER, a hand-curated benchmark for verified code generation in Lean It requires full formal specs and proofs It requires full formal specs and proofs No few-shot method solves all stages, making it a strong testbed for synthesis and formal reasoning
- Clever: A Curated Benchmark for Formally Verified Code Generation
4CLEVER: Curated Lean Verified Code Generation Bench-mark microkernel However, writing formal specifications and correctness proofs for software systems can take tremen-dous effort — for example, the development of seL4 was reported to take 20+ person-years These costs are a key impediment to the broad deployment of ITP-based formal
- Submissions - OpenReview
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers Lorenzo Pacchiardi , Marko Tesic , Lucy G Cheke , Jose Hernandez-Orallo 27 Sept 2024 (modified: 05 Feb 2025)
- Counterfactual Debiasing for Fact Verification - OpenReview
016 namely CLEVER, which is augmentation-free 017 and mitigates biases on the inference stage 018 Specifically, we train a claim-evidence fusion 019 model and a claim-only model independently 020 Then, we obtain the final prediction via sub-021 tracting output of the claim-only model from 022 output of the claim-evidence fusion model,
- STAIR: Improving Safety Alignment with Introspective Reasoning
This is where safety alignment comes in One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can trick the AI into providing harmful responses
- Super Deep Contrastive Information Bottleneck for Multi . . . - OpenReview
Super Deep Contrastive Information Bottleneck for Multi-modal Clustering Zhengzheng Lou 1Ke Zhang Yucong Wu Shizhe Hu1
- Multimodal Composition Example Mining for Composed Query Image . . .
Composed query image retrieval task aims to retrieve the target image in the database by a query that composes two different modalities: a reference image and a sentence declaring that some details of the reference image need to be modified and replaced by new elements
|
|
|