Post · bonfire.cafe

OK, time for a Mastodon peer review of this preprint!

It's interesting to see this paper lay out the assumptions so clearly. For me, the paper starts from a big one that I think is fundamentally wrong, which is that proposals have an intrinsic merit that reviewers assess with some error. The accuracy measure in the proposal is essentially the probability that proposals ranked as fundable according to the noisy reviews matches the ranking according to the instrinsic merit. This feels to me then like an argument for the current system based on the assumption that the current system is fundamentally correct.

Let's say that the goal is a portfolio of funding decisions that maximises the chance of a set of important discoveries. For there to be an intrinsic and linear inherent merit for proposals, it would have to be possible to predict on the basis of a specification of a set of experiments but without having carried them out, what the probability of making an important discovery is. But this can't be true, or there would be no reason to actually do the experiments.

There could be some 'signal' in the sense that some proposals might be based on clear factual errors that the authors were not aware of. This would suggest a threshold funding model where as long as proposals meet a minimum threshold score by reviewers assessing methodological soundness, they were given an equal chance of funding. In this model, once you're over the threshold, the reviewer score is ONLY bias because there is no signal above the threshold. This model would likely lead to very different conclusions than the model in this paper. The authors describe this in a footnote as being "too nihilistic" to consider.

Another alternative 'signal' would be the probability of producing results that the scientific community would find interesting. There is likely a valid case that review scores - to some extent - measure this. However, I'm not sure that a decision process based on ranking these probabilities would lead to good outcomes, even if there was no noise. Take an extreme case where you have enough funding for 5 proposals, and you get 5 identical proposals that if successful would make (the same) discovery that 50% of the field would be interested in, and 5 different proposals each of which would make a discovery that 10% of the field would be interested in. Do you want to fund 5 copies of the same proposal, or would you instead maybe fund 2 copies of the same proposal (in case one fails for some reason), and 3 different proposals? If you were funding based on intrinsic merit rank, you'd fund 5 copies of the same proposal.

I personally think it's likely that this is actually what funding bodies are doing.

Going back to the question of bias, there's an interesting interaction here. If your intrinstic 'merit' that you want to reward is based on what the field would find interesting, then 'bias' is no longer statistically independent of 'merit'. In other words, by giving a high merit to things that more people think are interesting, you are inherently saying that if minority groups are interested in something else it's because what they are interested in is objectively worse. It is building in a biased assumption.

A very interesting suggestion that has been made (and indeed I think tried in some places) is to rank proposals at least partly based on reviewer variance rather than mean. In other words, the assumption is that if a bunch of reviewers disagree violently about a proposal, it's likely to be more important than if they all agree. I suspect this approach is also too simplistic, but it seems likely to me that there's an element of this that is right. The assumptions and modelling framework of this paper rule out the possibility of this being a good procedure. This is another indication to me that the proposed model is not a good one.

So, I'm happy this paper is out there making these assumptions explicit, but I think those assumptions are very problematic, likely wrong, and I hope that UKRI do not base their policies on its conclusions.

#metascience