• @Septimaeus@infosec.pub
    link
    fedilink
    English
    06 days ago

    There was a comment yesterday that offered a simpler, verifiable explanation than the headline’s conclusion.

    The papers were published by Iranian researchers and in Farsi “scanning” (روبشی) and “vegetative” (رويشی) differ only by one character (ب and یـ) which also happen to be adjacent on the keyboard.

    In other words, there’s some evidence that this is a typo or mistranslation that has been reused among non-native speakers, as opposed to a hallucination (though it technically could be both, assuming they used an LM with the original error in its corpus).

    So while this is a lesson in careful proofreading, if it’s just a technical term that was mistranslated and reused, I wouldn’t necessarily consider it plagiarism, and probably wouldn’t blame it entirely on AI either.

    There’s enough evidence that AI slop is infecting scientific publishing without manufacturing examples.

    • @bitcrafter@programming.dev
      link
      fedilink
      English
      06 days ago

      A couple of decades ago I got really confused because I found a lot of papers referring to “comer” cubes, but could not find an actual definition. Eventually I figured out that these were actually “corner” cubes, but somewhere a transcription error occurred that merged the r and n into an m, and this error kept getting propagated because people were just copying and pasting.

      • @Septimaeus@infosec.pub
        link
        fedilink
        English
        06 days ago

        That’s an apt example from English, especially given the visual similarity of the error.

        It’s the kind of error we would expect AI to be especially resilient against, since the phrase “corner cube” probably appears many times in the training dataset.

        Likewise scanning electron microscopes are common instruments in many schools and commercial labs, so an AI writing tool is likely to infer a correction needed given the close similarity.

        Transcription errors by human authors, however, have been dutifully copied into future works since we began writing stuff down.

    • @catloaf@lemm.ee
      link
      fedilink
      English
      06 days ago

      Yes. Between that and some bad OCR not recognizing text in columns, causing it to see these words in separate columns as a single phrase, it makes sense that it would be replicated in machine translations.