We construct the AI Idea Bench 2025 dataset, comprising 3,495 influential target papers in AI-related conferences along with their corresponding motivating papers, to systematically evaluate the effectiveness of idea generation methods.
We propose an evaluation framework that aligns generated research ideas with the content of ground-truth papers, while simultaneously assessing their merits and drawbacks based on other reference material.
We conducted comprehensive experiments to showcase the effectiveness of various idea generation methods in producing innovative research ideas in AI domain, leveraging our dataset and evaluation framework.
Yansheng Qiu1, Haoquan Zhang2, Zhaopan Xu3, Ming Li2, Diping Song2, Zheng Wang1*, Kaipeng Zhang2*
* Equal Corresponding authors
1 Wuhan University
2 Shanghai Artificial Intelligence Laboratory
3 Harbin Institute of Technology
April 18th, 2025
We introduce a novel evaluation framework for assessing the quality and relevance of AI research ideas based on historical patterns of scientific development.
Figure 2: Overall pipeline of AI Idea Bench 2025. First, we decompose and summarize the motivation, experimental steps, topic, and the inspiration papers from the target paper. Then, we extract the motivation and experimental steps from the inspiration papers, and generate a cluster of ideas in combination with the topic of the target paper. Finally, we compare the idea-generation methods in six evaluations: idea multiple-choice evaluation, idea-to-idea matching, idea-to-topic matching, idea competition among baselines, novelty assessment, and feasibility assessment.
@article{qiu2025ai, title={AI Idea Bench 2025: AI Research Idea Generation Benchmark}, author={Qiu, Yansheng and Zhang, Haoquan and Xu, Zhaopan and Li, Ming and Song, Diping and Wang, Zheng and Zhang, Kaipeng}, journal={arXiv preprint arXiv:2504.14191}, year={2025} }