the reaction to the arXiv slop penalty is the gift
"So this means you expect every author to check every citation and make sure that every citation is real and accurate" GOOD LORD WHAT ARE YOU EVEN DOING IN THERE THEN
Post
the reaction to the arXiv slop penalty is the gift
"So this means you expect every author to check every citation and make sure that every citation is real and accurate" GOOD LORD WHAT ARE YOU EVEN DOING IN THERE THEN
@davidgerard And I don't think these are even necessarily sloppers!
Academia has had a citation issue for a long time. People include citations without verifying them because they're just copying a citation from another paper.
Of course, if this policy also helps fix *that* problem, that's great!
People should be grateful. The policy affects the whole field equally, so everyone can slow down and do the citations right without worrying about losing competitive edge.
@varx these individuals do turn out to be sloppers tho
@davidgerard Haha, amazing.
@davidgerard As a student working on my dissertation (but procrastinating on Mastodon instead): are these people even real?
@kraks i hope these guys thoroughly cure you of any impostor syndrome
@davidgerard They're trying to shut Arxiv down. Pump in slop, discredit, destroy. Bye-bye info.
@davidgerard so, out of interest, what if one of the citations is a genune paper, but *that* paper's citations are slop?
@fishidwardrobe you're asking me? I dunno, write to arXiv and ask them?
@davidgerard i'm just speculating without any expectation of an answer from anyone. sorry if that wasn't clear.
@fishidwardrobe sorry 😄 i just had one guy yesterday furiously demand that i explain precisely how arxiv would deal with his hypothetical edge case and he went off when I said "i dunno, ask them?" and demanded MY answer like, the fuck?
@davidgerard how can you cite something you can’t read or understand?
@davidgerard Seeing the article in the link looks like someone is getting himself into a pickle: If what I am quoting is in a foreign language I just translate it, I don't have the nerve to cite something that I have not read and kinda sends a shiver that someone has the audacity to put in writing some ridiculous arguments to justify the practice.
@davidgerard Back when I was in research, we just copied the citations from other papers 😜
But on a more serious note, I have discovered several assumptions that have been repeated from a single paper without anyone checking what it actually says... some going back as far as the 1800s.
@olivia @davidgerard Adding to the list of obfuscations of women in science: the very first known chemist was a woman from Mesopotamia. But she created a recipe for perfume, so that makes it "not real chemistry", I guess.
@AimeeMaroux @davidgerard you might find this analysis useful too especially given AI context
@AimeeMaroux @davidgerard Copying citations from other papers is definitely rampant, and yes, that is the problem: often they don't say the things people attribute to them. I've learned a ton by just making sure I've read everything I've cited!
@AimeeMaroux @davidgerard This Jim Miller guy is a tenured economist at Smith with a PhD from Chicago?
@grvsmth @AimeeMaroux these people are the cure for impostor syndrome
@davidgerard @AimeeMaroux This may not be an earth-shaking revelation, but reading Student's original paper and explanations of the context he was working in helped me to understand why we test for statistical significance - it's all about saving us the labor of collecting larger samples!
https://grieve-smith.com/blog/2014/01/how-big-a-sample-do-you-need/
@davidgerard What can The Onion do with this? They seem to be literally objecting to the requirement of reading.
@davidgerard "collective punishment is not the answer" how is someone individually being banned for submitting a dubious paper "collective punishment"?
@rnd COLLECTIVE PUNISHMENT of the FIELD of MACHINE LEARNING (which lets you slop all over the damn place)
@davidgerard You mean we have to do the shit we should've already been doing?
@davidgerard As a teacher, researcher, and editor, the outrage at the idea that an author needs to—as a bare fucking minimum—actually know that what they are citing EXISTS makes me want to walk into the sea. Then again, what doesn't make me want to walk into the sea these days?
@davidgerard
I've been terrified of academic publishing for my entire career, thinking I'm not good enough. But it turns out the expectations are in the toilet. What a fucking waste of a decade.
@Rhodium103 luca ambrogioni: the cure for academic imposter syndrome
@davidgerard Seems I have to change the "Science works" in my profile to "Science used to work".
@davidgerard Ha, this is like the nonsense about producing X thousand lines of code to make a claim for quality.
Neither "AI", its makers, nor the troglodytes celebrating it understand the point of well considered research and output. Whether that be in code or academic writing.
@davidgerard when you ask llm boosters to mean things instead of just say them
@davidgerard You'd think people wouldn't be so upset. I thought the entire idea of a citation was, "I read this thing and am using their ideas in furtherance of my own." They aren't there as an appeal to authority. They actually serve a purpose.
@trashpanda look you'll be left behind with that sort of thinking
@davidgerard …wat? I mean, yes? I do expect paper authors to check all the references in their papers or trust sufficiently that the other authors will do so? Why would you coauthor a paper with someone you don’t trust to do rigorous science?
@wordshaper @davidgerard Former academic here (last saw the inside of a lab in 1999).
The arXiv ban on LLM-generated citations is not even as restrictive as some people are thinking.
The arXiv ban is on *hallucinated* citations. Ones that an LLM made up and which don't exist anywhere. It isn't based on whether the citation is relevant to the submission, or on whether every author in the author list read the cited work and understood it. It's only for things that can be checked fairly automatically...
...as you, a prospective or actual scientist, should be doing *anyway* as part of writing the fscking paper. Even if you don't understand the subfield that led your co-author to include Citation X, you can -- and should -- still either ask said co-author to show you the reference, or do a quick literature search to verify it exists.
@dpnash @wordshaper @davidgerard
I have been in situations where I contributed early and ended up as a co-author a significant time later after having to withdraw due to sustained illness, without further input into the final paper. My contribution was recognized, but I was not involved in the final draft. It's not reasonable for me to ask my co-authors to throw away their work or delay publication indefinitely because I can't review the citations list. I trust the people I worked with (after all, they agreed that I should continue to receive credit for my contribution). But things change, and trust can be misplaced.
It's not always reasonable to assume every person involved in writing a paper is going to have an equal opportunity to check it before submission.
There should be a process to let people fix mistakes before a blanket ban like this is applied. There should be recognition that different people contribute different things to a paper, and that the world is more complex than this simple rule assumes. There should be an appeals process.
And if the person reporting it is an author, the penalty applies to them as well. The incentive structure this establishes is entirely counterproductive.
@Robotistry @dpnash @wordshaper you're presently theorycrafting in a vacuum based on a headline. I urge you to write to the arXiv and ask them, rather than continue posting ex culo in my mentions.