Common flaws in running human evaluation experiments in NLP C Thomson, E Reiter, A Belz Computational Linguistics 50 (2), 795-805, 2024 Paper Abstract None Direct Link Previous Next