Update. In response to this problem (previous post, this thread), some publishers are desk-rejecting papers based on open health datasets. The problem is not the quality of the data, but the absence of additional work to validate findings.
Two reports:
1. "Journals and publishers crack down on research from open health data sets," Science, Oct 8, 2025.
https://www.science.org/content/article/journals-and-publishers-crack-down-research-open-health-data-sets
2. "AI: Journals are automatically rejecting public health dataset papers to combat paper mills," BMJ, Oct 15, 2025.
https://www.bmj.com/content/391/bmj.r2170
( #paywalled)
Update. Here's how #arXiv is dealing with a similar problem in computer science.
https://blog.arxiv.org/2025/10/31/attention-authors-updated-practice-for-review-articles-and-position-papers-in-arxiv-cs-category/
"Before being considered for submission to arXiv’s #CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review…In the past few years, arXiv has been flooded with papers. Generative #AI / #LLMs have added to this flood by making papers – especially papers not introducing new research results – fast and easy to write. While categories across arXiv have all seen a major increase in submissions, it’s particularly pronounced in arXiv’s CS category."