Open-ended model evaluation & benchmarking
- Prabhu, Ameya, et al. “Efficient Lifelong Model Evaluation in an Era of Rapid Progress.” NeurIPS 2024.
- Ghosh, Adhiraj, et al. “Democratizing Evaluation with Infinity-Benchmarks: Sample-Level Heterogeneous Testing Over Arbitrary Capabilities”, Arxiv
- Kümmerer, Matthias, et al. “Saliency Beyond Datasets: Overcoming Dataset Biases.” (Coming Soon)
- Udandarao, Vishaal, et al. “No ‘zero-shot’ without exponential data: Pretraining concept frequency determines multimodal model performance.” NeurIPS 2024.
- Berens, Philipp, et al. “Community-based benchmarking improves spike rate inference from two-photon calcium imaging data.” PLoS computational biology 14.5 (2018): e1006157.