Strong claim from a recent article on using AI to evaluate teaching: "AI evaluation removes human bias in teaching assessment" sciencedirect.com/science/arti
Sort of ironically, I asked AI to evaluate this claim. Gemini 2.5: this is an overstatement, but AI may serve as a "complementary component within a comprehensive, multi-method evaluation framework." docs.google.com/document/d/1V0
ChatGPT 4.1: The claim is "misleading in its absoluteness" drive.google.com/file/d/1p_69v

@dougholton It's interesting that they only compared AI evaluation to student evaluation, concluding "Our analysis shows that AI-based assessments strongly correlate with student perceptions, validating their role as an effective complementary evaluation tool."

How does that validate it as a complementary tool? The challenge is what standard to compare to. Ideally, you want to compare it to actual teaching effectiveness--but do we have that? Do we know how to measure teaching effectiveness?