#Tag · bonfire.cafe

Can we trust general-purpose LLMs for rigorous academic work? According to @SarahOberbichler from the #DHlab_IEG, research integrity demands specialized AI models, not general-purpose LLMs when using AI as an analysis tool.

Join her upcoming talk "Argument Mining in News Media: Tailoring Models and Methods for Responsible Application" on Nov 26th at CESR (University of Tours) or online
➡️ https://prima.hypotheses.org/3181

#DigitalHumanities #NLP #ArgumentMining #ResearchIntegrity #AI #LLMs #Histodons #DH

Infoflyer for the announced event. Left: Pictures of Newspaper articles. Right: Text with announcement. The text reads: "Humanites Numeriques Tourangelles, Mercredi 26 novembre 2025, 17h-19h - Tours, CESR - Salle Rapin et en visioconférence. #9 Argument Mining in News Media: Tailoring Models and Methods for Responsible Application, Sarah Oberbichler - Digital Humanities Lab, Leibniz Institute of European History."

Andreas Wagner boosted

DH Lab IEG

@DHLab_IEG@fedihum.org · 2 weeks ago

Promovierende aufgepasst: Ihr seid euch unsicher, ob Topic Modeling das richtige für eure computergestützten Auswertungen ist? Oder doch lieber (nicht) Netzwerkanalyse? Und welche Alternativen gäbe es? Am 17. 11. (16-17:30 Uhr) habt Ihr die Gelegenheit, euch darüber mit unserer Kollegin Cindarella Petz ( @cprog7) im Rahmen des HERMES-Netzwerktreffens (online) auszutauschen.

Infos / Anmeldung: https://hermes-hub.de/aktuelles/events/netzwerktreffen-2025-11-17.html

#DHLab_IEG #HERMES #DigitalHumanities #NLP #TextMining #NetzwerkAnalyse #HNR

DH Lab IEG

@DHLab_IEG@fedihum.org · 5 days ago

#DigitalHumanities #NLP #ArgumentMining #ResearchIntegrity #AI #LLMs #Histodons #DH

Hacker News

@h4ckernews@mastodon.social · last week

Tiny Diffusion – A character-level text diffusion model from scratch

https://github.com/nathan-barry/tiny-diffusion

#HackerNews #TinyDiffusion #TextModel #DiffusionAI #MachineLearning #NLP

DH Lab IEG

@DHLab_IEG@fedihum.org · 2 weeks ago

Infos / Anmeldung: https://hermes-hub.de/aktuelles/events/netzwerktreffen-2025-11-17.html

#DHLab_IEG #HERMES #DigitalHumanities #NLP #TextMining #NetzwerkAnalyse #HNR

Hacker News

@h4ckernews@mastodon.social · 3 weeks ago

Word2vec-style vector arithmetic on docs embeddings

https://technicalwriting.dev/embeddings/arithmetic/index.html

#HackerNews #Word2vec-style #vector #arithmetic #on #docs #embeddings #Word2vec #vectorarithmetic #docsembeddings #NLP #MachineLearning

word2vec-style vector arithmetic on docs embeddings

UKP Lab

@UKPLab@sigmoid.social · last month

🔗 Learn more:
• Official website → https://sbert.net/
• Original paper → https://aclanthology.org/D19-1410.pdf
• GitHub repository → https://github.com/UKPLab/sentence-transformers

📰 Read the full announcements:
TU Darmstadt Press Release
→ https://www.tu-darmstadt.de/universitaet/aktuelles_meldungen/einzelansicht_528832.de.jsp

Hugging Face Blog Post
→ https://huggingface.co/blog/sentence-transformers-joins-hf

(2/2)

#UKPLab #HuggingFace #SentenceTransformers #NLP #AIresearch #OpenSource 🚀

Sentence Transformers is joining Hugging Face!

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

TU Darmstadt

So versteht KI ganze Sätze

Das Ubiquitous Knowledge Processing (UKP) Lab der Technischen Universität Darmstadt hat die Wartung und Weiterentwicklung von Sentence Transformers, einer der weltweit meistgenutzten Open-Source-Bibliotheken für semantische Embeddings, offiziell an die Plattform Hugging Face übertragen. Die Open-Source-Software ging 2019 aus der Forschung am UKP Lab hervor und hat sich seither als eine der wichtigsten Ressourcen für Künstliche Intelligenz (KI) in der Sprachverarbeitung (Natural Language Processing, kurz NLP) etabliert. Sie ermöglicht es, ganze Sätze so darzustellen, dass Computer deren Bedeutung erfassen und vergleichen können.

GitHub

GitHub - huggingface/sentence-transformers: State-of-the-Art Text Embeddings

State-of-the-Art Text Embeddings. Contribute to huggingface/sentence-transformers development by creating an account on GitHub.

View (PDF)

Kathy Reid

@KathyReid@aus.social · 2 months ago

🚨 #NLP SHARED TASK 🚨

Use Mozilla Common Voice Spontaneous #Speech datasets to train #ASR #SpeechRecognition models that work for conversational speech on 21 under-represented languages.

📆 Dataset release 1 Dec
📆 Submissions 8 Dec
💰 $USD 11k prize pool !!!

Boosts appreciated ❤️

https://community.mozilladatacollective.com/shared-task-mozilla-common-voice-spontaneous-speech-asr?utm_source=mastodon&utm_campaign=kathysharedtask

Mozilla Data Collective

Shared Task: Mozilla Common Voice Spontaneous Speech ASR

Overview Automatic speech recognition (ASR) has come a long way – but most systems are still trained on polished, read-aloud speech. So we set out to build a model that can handle the messy, beautiful reality of spontaneous responses and languages long ignored by mainstream tech. We’re raising the standards

Ulrike Hahn boosted

UKP Lab

@UKPLab@sigmoid.social · 4 months ago

🤔 What is #NLP research 𝘳𝘦𝘢𝘭𝘭𝘺 about?
We analyzed 29k+ papers to find out! 📚🔍

📌 Our NLPContributions dataset, from the ACL Anthology, reveals what authors actually contribute—artifacts, insights, and more.

📈 Trends show a swing back towards language & society. Curious where you fit in?

🎁 Tools, data, and analysis await you:

📄 Paper: https://arxiv.org/abs/2409.19505
🌐Project: https://ukplab.github.io/acl25-nlp-contributions/
💻 Code: https://github.com/UKPLab/acl25-nlp-contributions
💾 Data: https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4678

(1/🧵)

Diagram showing a taxonomy of NLP contributions, divided into two main categories:

Artifact Contributions

Method/Model

Dataset/Corpus

Task Definition

Knowledge Contributions

Model Analysis

Database Properties

Task Insights

Linguistic Findings

Societal Insights

The structure is visualized as a hierarchical tree with "NLP Contributions" at the top, branching into the two categories, each followed by their respective subtypes represented with icons. All items are grayed out except for the category labels, which are highlighted in blue. — Diagram showing a taxonomy of NLP contributions, divided into two main categories: Artifact Contributions Method/Model Dataset/Corpus Task Definition Knowledge Contributions Model Analysis Database Properties Task Insights Linguistic Findings Societal Insights The structure is visualized as a hierarchical tree with "NLP Contributions" at the top, branching into the two categories, each followed by their respective subtypes represented with icons. All items are grayed out except for the category labels, which are highlighted in blue.

UKP Lab

@UKPLab@sigmoid.social · 4 months ago

🤔 What is #NLP research 𝘳𝘦𝘢𝘭𝘭𝘺 about?
We analyzed 29k+ papers to find out! 📚🔍

📌 Our NLPContributions dataset, from the ACL Anthology, reveals what authors actually contribute—artifacts, insights, and more.

📈 Trends show a swing back towards language & society. Curious where you fit in?

🎁 Tools, data, and analysis await you:

(1/🧵)

Ulrike Hahn boosted

Miguel Afonso Caetano

@remixtures@tldr.nettime.org · 7 months ago

"Asking scientists to identify a paradigm shift, especially in real time, can be tricky. After all, truly ground-shifting updates in knowledge may take decades to unfold. But you don’t necessarily have to invoke the P-word to acknowledge that one field in particular — natural language processing, or NLP — has changed. A lot.

The goal of natural language processing is right there on the tin: making the unruliness of human language (the “natural” part) tractable by computers (the “processing” part). A blend of engineering and science that dates back to the 1940s, NLP gave Stephen Hawking a voice, Siri a brain and social media companies another way to target us with ads. It was also ground zero for the emergence of large language models — a technology that NLP helped to invent but whose explosive growth and transformative power still managed to take many people in the field entirely by surprise.

To put it another way: In 2019, Quanta reported on a then-groundbreaking NLP system called BERT without once using the phrase “large language model.” A mere five and a half years later, LLMs are everywhere, igniting discovery, disruption and debate in whatever scientific community they touch. But the one they touched first — for better, worse and everything in between — was natural language processing. What did that impact feel like to the people experiencing it firsthand?

Quanta interviewed 19 current and former NLP researchers to tell that story. From experts to students, tenured academics to startup founders, they describe a series of moments — dawning realizations, elated encounters and at least one “existential crisis” — that changed their world. And ours."

https://www.quantamagazine.org/when-chatgpt-broke-an-entire-field-an-oral-history-20250430/

#AI #GenerativeAI #ChatGPT #NLP #OralHistory #LLMs #Chatbots

Miguel Afonso Caetano

@remixtures@tldr.nettime.org · 7 months ago

https://www.quantamagazine.org/when-chatgpt-broke-an-entire-field-an-oral-history-20250430/

#AI #GenerativeAI #ChatGPT #NLP #OralHistory #LLMs #Chatbots

pettter

@pettter@social.accum.se · 10 months ago

I might as well do another #introduction specifically for the #academic side of this here fediverse:

Coming from #theoreticalCS (with applications in #NLP) to doing #digitalhumanities (computational #musicology), I've now landed in #ResponsibleAI. Specifically, I'm interested in exploring #AntiCapitalistAI, both sharpening existing critiques of current AI practise by confronting capital and exploring inherent politics of technologies, and finding better ones for a socialist world.