⚠️ 𝗖𝗮𝗻 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝗹𝗲𝗮𝗱 𝘁𝗼 𝘂𝗻𝘄𝗮𝗻𝘁𝗲𝗱 𝗯𝗲𝗵𝗮𝘃𝗶𝗼𝘂𝗿𝘀 𝗶𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀?
A recent paper in Nature suggests that even small amounts of targeted fine-tuning data can trigger unexpected and problematic behaviour that generalises beyond the original pre-training task.
(1/🧵)