copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
[2404. 01475] Are large language models superhuman chemists? Large language models (LLMs) have gained widespread interest due to their ability to process human language and perform tasks on which they have not been explicitly trained However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm Here, we introduce "ChemBench," an automated
AI4Chem ChemBench4K · Datasets at Hugging Face chembench is a large-scale chemistry competency evaluation benchmark for language models, which includes nine chemistry core tasks and 4100 high-quality single-choice questions and answers
ChemBench ChemBench is a cutting-edge framework to evaluate the chemical knowledge and reasoning capabilities of large language models (LLMs) While LLMs excel in general domains, their chemistry expertise remains unexplored ChemBench fills this gap with 2,700+ curated question-answer pairs across diverse chemistry topics, plus advanced features like visual LLM support, batched inference, and refusal
How-To Guides - ChemBench ChemBench enables benchmarking across a wide range of API-based models To use these models, you'll need to configure the appropriate API keys ChemBench uses LiteLLM Read about the litellm guidelines to name the models ( in general the format followed is provider model-name, for example openai gpt-4o)
ChemBench] A New Benchmark In Chemistry! - AI-SCHOLAR This paper proposes a new benchmarking framework, ChemBench, which reveals the limitations of current state-of-the-art models ChemBench consists of 7059 question-answer pairs collected from a variety of sources, covering the majority of undergraduate and graduate chemistry curricula in the It covers the majority of undergraduate and graduate
chembench · PyPI ChemBench was developed as a comprehensive benchmarking suite for the performance of LLMs in chemistry (see our paper) but has since then been extended to support multimodal models as well (see our paper) ChemBench is designed to be modular and extensible, allowing users to easily add new datasets, models, and evaluation metrics
Evaluating chemical reasoning capabilities of LLMs with ChemBench A benchmark built for chemical depthExisting datasets typically focus on narrow tasks or general knowledge checks ChemBench stands apart by curating over 2,700 question–answer pairs spanning analytical, inorganic, organic, and physical chemistry Each question is tagged by difficulty (undergraduate to graduate level) and skill type (knowledge, reasoning, calculation, or intuition), with
ChemBench - lamalab-org. github. io ChemBench is a Python package for building and running benchmarks of large language models and multimodal models such as vision-language models ChemBench was developed as a comprehensive benchmarking suite for the performance of LLMs in chemistry (see our paper) but has since then been extended to support multimodal models as well (see our paper)