AI IDea Bench 2025

AI Research Idea Generation Benchmark

Benchmark

We construct the AI Idea Bench 2025 dataset, comprising 3,495 influential target papers in AI-related conferences along with their corresponding motivating papers, to systematically evaluate the effectiveness of idea generation methods.

Evaluation Framework

We propose an evaluation framework that aligns generated research ideas with the content of ground-truth papers, while simultaneously assessing their merits and drawbacks based on other reference material.

Comprehensive Experiments

We conducted comprehensive experiments to showcase the effectiveness of various idea generation methods in producing innovative research ideas in AI domain, leveraging our dataset and evaluation framework.

Paper Code Dataset

Evaluation Framework

We introduce a novel evaluation framework for assessing the quality and relevance of AI research ideas based on historical patterns of scientific development.

Figure 2: Overall pipeline of AI Idea Bench 2025. First, we decompose and summarize the motivation, experimental steps, topic, and the inspiration papers from the target paper. Then, we extract the motivation and experimental steps from the inspiration papers, and generate a cluster of ideas in combination with the topic of the target paper. Finally, we compare the idea-generation methods in six evaluations: idea multiple-choice evaluation, idea-to-idea matching, idea-to-topic matching, idea competition among baselines, novelty assessment, and feasibility assessment.

BibTeX

@article{qiu2025ai,
    title={AI Idea Bench 2025: AI Research Idea Generation Benchmark},
    author={Qiu, Yansheng and Zhang, Haoquan and Xu, Zhaopan and Li, Ming and Song, Diping and Wang, Zheng and Zhang, Kaipeng},
    journal={arXiv preprint arXiv:2504.14191},
    year={2025}
    }

AI IDea Bench 2025

AI Research Idea Generation Benchmark

Benchmark

Evaluation Framework

Comprehensive Experiments

Authors

Affiliations

Date

Evaluation Framework

BibTeX