Plagiarism: Copy, Paste, Thesaurus?

By Neuroskeptic | February 7, 2015 9:04 am

I’m a regular reader of Jeffrey Beall’s invaluable Scholarly OA blog. Earlier this week Beall blogged about a dubious-looking new ‘predatory’ journal called International Journal Online of Humanities (IJOHMN). I took a look and noticed that one of their papers is called Leaders Produce Teamwork Organizations.

plagiarism_roget

That’s an odd title. The prose is even odder. Here’s the start of the article:

Wisdom perpetuates the legend of modernism as a private act, a spark of originality imminent, an Aha! Instant in the brain of a mastermind. People in fact favor to consider in the rough individuality of detection, possibly since they hardly ever get to see the sausage-making process behind every get through modernism.

Three decades of investigate has obviously exposed that modernism is most often a group attempt. Thomas Edison, for example, is remembered as almost certainly the most American discoverer of the untimely 20th century. From his productive intelligence came the brightest bulb and the turntable, along with additional than a thousand further untested inventions over a sixty-year vocation. However, he only just worked by yourself.

I wondered if this text was plagiarized. I Googled several fragments of it, but found no hits. However, on a hunch I tried searching for the “greatest American inventor”, which I suspected was the meaning of “most American discoverer”. I quickly found this article (part of a book called Collective Genius) and the mystery was solved: the IJOHMN paper appears to be a direct copy of the book extract, with various words replaced with synonyms, presumably with the help of a thesaurus. Here’s the corresponding text from the book:

Lore perpetuates the myth of innovation as a solitary act, a flash of creative insight, an Aha! moment in the mind of a genius. People apparently prefer to believe in the rugged individualism of discovery, perhaps because they rarely get to see the sausage-making process behind every breakthrough innovation.

Three decades of research has clearly revealed that innovation is most often a group effort. Thomas Edison, for example, is remembered as probably the greatest American inventor of the early twentieth century. From his fertile mind came the light bulb and the phonograph, along with more than a thousand other patented inventions over a sixty-year career. But he hardly worked alone.

I’d never heard of this kind of plagiarism before, and I was quite proud of my “discovery”. However it turns out that I wasn’t the first person to come across this. The problem even has a name, Rogeting (after Roget’s Thesaurus). British lecturer Chris Sadler named it this after discovering the ruse in some student essays.

Rogeting would probably fool any common plagiarism detection software, but done sloppily (like in the IJOHMN paper) it produces very strange prose. Many synonyms just don’t make sense out of context. For instance, while “modernism” might mean the same thing as “innovation” in the context of art history, in other situations it makes no sense at all to switch them.

I wonder, however, if a careful plagiarist could Roget a text without making it look stupid? I decided to have a go myself:

Lore maintains the legend of invention as a lonely endeavor, a spark of creative revelation, a Eureka! event in the psyche of a genius. Humans, it seems, want to believe in the harsh individualism of innovation, maybe because they seldom get the chance to witness the sausage-making labor underlying each landmark discovery.

This took me a couple of minutes. I did all the replacements manually, without using a thesaurus. The result is certainly less elegant than the original, but it’s much better than the IJOHMN version. My conclusion is that it would be extremely difficult to detect Rogeting, so long as it were done right. In fact, it would be disturbingly easy to produce seemingly original texts in this way.

CATEGORIZED UNDER: papers, select, Top Posts, Uncategorized
ADVERTISEMENT
  • https://plus.google.com/u/0/101046916407340625977/posts Rolf Degen

    You can now even find countless offers of specialized software on the web, called “rewriters” or “article spinners”, that do the whole job for you, substituting synonyms for the original words and applying several cover-up tactics. Some have even big study databases and do translations and re-translations.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      That’s disturbing – maybe the article I discovered was produced by such software.

      • http://www.facebook.com/felonious.grammar Felonious Grammar

        My first thought was that it was perhaps written in Sanskrit then translated into Russian, the Russian translated into Korean, and the Korean translated into English, or some other random combination.

        If I were bored enough, I might make those translations jut to see if it ended up more awkward than the original.

        Great detective work! It’s sad that it’s so common for college students to cheat like this. It’s also sad that paper was published anywhere.

        But what’s most sad to me is that the humanities are being underfunded, and diminished academically. Undergrad Liberal Arts programs and classes seem to demand too little of students these days.

      • Pedro Paulo Jr.

        Some knowledge of Python+NTLK will do the trick

  • Dr Pete Etchells

    Nice find. I’ve definitely seen an increase in this sort of behaviour in student essays over the past few years – in some cases they think it’s enough to avoid being flagged by Turnitin (essay plagiarism checker). Usually it’s not, as it’s only key words that are Rogeted, so the rest of the sentence gets flagged. That, and as you say, it makes for nonsensical prose – it takes time and effort to make the plagiarised version make sense, which sort of defeats the purpose of plagiarising in the first place. At any rate, it’s very frustrating to have to deal with.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      Ah. Well I’m glad to hear that Turnitin is able to spot cases of this.

      But still I worry that it might be fooled by a comprehensive Roget-ization i.e. replacement of the majority of words, like I did in this post.

      It might be possible to detect it by searching for “thesaurus matches” rather than literal matches, using a database of original texts. But those searches would be orders of magnitude slower than normal searches…

      • EJ

        He said Turintin can’t detect this.

        • Benjamin Edge

          He said students think it is enough to defeat Turnitin. But that they are usually wrong.

          • EJ

            Oops!

  • Alessandra Rampinini

    Officially disturbed.

  • Gabriel Finkelstein

    I’m a proud colleague of Jeff Beall, a happy follower of you, and a miserable victim of exactly this.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      What a coincidence… because you see…

      I’m an honorable comrade of Jeff Beall, a joyful disciple of you, plus a melancholic sufferer of just that.

  • Guest

    My US English teacher told me that plagiarism consisted of three or more consecutive words without quotes and the popular plagiarism detectors usually look for five consecutive words in a row. From what I was taught and the plagiarism detector this doesn’t seem like plagiarism to me. Maybe people are taught different definitions of plagiarism?

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      I disagree with your English teacher – there’s no exact definition of plagiarism. Certainly not “three consecutive words”.

      Take for example my plagiarism of the “Lore perpetuates the myth…” passage. I totally plagiarized it, I copied, pasted, and made a series of trivial changes.

      But it doesn’t contain any three consecutive words that are the same.

      • Guest

        So what constitutes trivial?

        • Anonymouse

          Asking for a definition of the colloqial use of a term like “trivial” is pointless. #wittgenstein

          However, in this context, what’s relevant to count as plagiarism is not the form, like using the exact same words as the source that didn’t get cited, but their meaning. The whole point is that someone else’s intellectual achievement isn’t passed of as one’s own.

          I can rephrase the complete first volume of Harry Potter and not ever use the exact three or five word sequences that JKR uses (at least not above the chance level, if I want it to sound natural in any way), while keeping the plot, even the characters’ names, the exact same. That’s plagiarism.

        • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

          in this case, the changes I made to the text were trivial because each change was very simple – I swapped a word for a synonym.

          I made many such swaps but each one was trivial, and each of the swaps was made in isolation (I didn’t choose a particular synonym in order to fit better with another synonym I chose later).

          So my mock-plagiarism consisted of a series of trivial changes, which are no more than the sum of their parts.

          • Indigo Rhythms

            Ok, thanks for the clarification. I still find find the concept of plagiarism fuzzy and to a degree arbitrary. How different does a product( a song, piece of writing, art work) have to be in order to be original?….could endlessly be debated.

            According to this site: http://www.plagiarism.org/plagiarism-101/what-is-plagiarism/

            “Re-creating a visual work in the same medium. (for example:
            shooting a photograph that uses the same composition and subject matter
            as someone else’s photograph)”…..this seems too extreme for me. If someone can prove that the “plagiarism” affected their income, then maybe.

      • http://www.facebook.com/felonious.grammar Felonious Grammar

        This paper is pure quackery and plagiarism, but I think not plagiarizing when dealing with complex concepts might require specific training/skills in writing and a lot of experience in order to produce fluid and original writing.

        I’ve been working on that for my own edification. Right now I’m reading “Evolution in Four Dimensions” and doing my best to take notes in my own words as a process of learning and just to learn to express what I’m learning without either plagiarizing or copying a lot of text. I’m guessing that a lot of writers just don’t know how to do this. I don’t.

        • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

          It is difficult, I would say that the trick (albeit a difficult one) is to always make your notes as concise as possible. So if you’re making notes on a whole page of text, you need to condense that into a short paragraph. If you can do that, you’ll have made the material your own (and you can then expand the summary into a long text in your own words.)

          • feloniousgrammar

            THNX.

    • 7eggert

      Now the phrase “three or more” may never be used again without quoting your teacher.

      >>There are usually “three or more” [Indi’s teacher] players playing a Scotland Yard game<<

    • Benjamin Edge

      At the technical college where I taught, we considered even taking an idea from someone else without proper attribution as plagiarism, whether any of the same words appeared consecutively or not.

  • 7eggert

    Use word clouds – translate each word into a generic word before comparing two texts. Maybe even sort the words in each sentence alphabetically.

    Downside: You can’t just google, you have to build your own database, and sometimes you’ll have to rebuild it.

    • Anonymouse

      This kind of processing will find exclusively what’s in your data base to the degree that you processed it right. In other words: It’s completely unfeasible unless you *are* google.

  • Colleen101

    This is the form of plagiarism used by Matthew C. Whitaker across his publications. The University of Nebraska Press, his customary outlet, actually responded to complaints that Whitaker’s book, Peace Be Still, was plagiaried in this manner by saying that he’d run the text through plagiarism software twice — presumably changing words each time. Examples of this can be found at a website called Cabinet of Plagiarism, Exhibits A, B, and C. If a scholarly press defends this kind of thing, it’s hard to see what the purpose of peer review — rather than just plagiarism software and a Roget’s — is.

    • http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

      See this IHE article for an overview on the Whitaker case.

      • Colleen101

        Thanks for that reference! I suppose I can understand his university’s reluctance to act. But I don’t understand UNP’s defense of the book, which, to link up to one of your conversations below, is essentially patch-written, Patch writing is a term of art for the act of taking notes on an unfamiliar discipline, if you’re an academic, or on simply difficult material, if you’re a student, then writing something entIrely dependent on the original text, with some word changes. The term comes out of pedagogy, and some in the field of “teaching and learning” say it’s an acceptable stage of learning, not simply plagiarism. But that’s been picked up by some, including I’d say, UNP, implicitly, to argue that credentialed scholars writing within their own fields, in pieces put out by professional and even university presses, should be able to patch write, since it’s not really plagiarism. I just don’t understand the long term sense of saying scholars and scholarly presses have no real expertise. But I suppose the fuzziness of “plagiarism” combined with the absolute nature of the shame of plagiarism, lead people to find any way possible to distinguish what they or their authors have done, from the term.

  • http://www.mazepath.com/uncleal/qz4.htm Uncle Al

    Heteronormatism problematizes homosocial othering.

  • Krista

    It seems like the amount of time it takes to do Rogeting properly (manually) would negate the amount of time saved by plagerizing in the first place.

    • Benjamin Edge

      Unless you have a hard time coming up with original ideas. Which a lot of today’s students seem to do.

  • Gabriella

    It’s plagiarism, but that seems beside the point when it’s totally incoherent. Can’t a paper get an F for content and style any more?

    I realize that this appeared in a “scholarly” journal, but most comments address the possible use of this technique in student papers. The fact that a journal would publish this is even more disturbing than that a student would plagiarize in this way. I get all sorts of spam from bogus journals, and I wonder what their business plan is: How does anyone fall for the ruse? How do they make a profit?

  • Pingback: Fate passare, per favore? - Ocasapiens - Blog - Repubblica.it()

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE
ADVERTISEMENT

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.

ADVERTISEMENT

See More

@Neuro_Skeptic on Twitter

ADVERTISEMENT

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Collapse bottom bar
+