[R] 'R' Software Output Plagiarism

Tue Sep 22 22:18:43 CEST 2015

Isn't plagiarism detection based on overlaps with sentence structure?
That way, it would catch plagiarism if someone simply did a
find-and-replace. But that would also catch regressions with the same
output format.

How long was the original thesis?  If 25% of it was all regression
output, sounds like a lot of regressions.

On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaard <pdalgd at gmail.com> wrote:
> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this. Urkund and similar tools are to my knowledge entirely about plagiarism. So the issue would seem to be that the R output is considered identical or nearly indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in 25% of the text) is whether this includes the numbers and variable names. If those are somehow factored out, then any R regression could be pretty much identical to any other R regression. However, two analyses with similar variable names could happen if they are based on the same cookbook recipe and analyses with similar numerical output come from analyzing the same standard data. Such situations would not necessarily be considered plagiarism (I mean: If you claim that you are analyzing data from experiments that you yourself have performed, and your numbers are exactly identical to something that has been previously published, then it would be suspect. If you analyze something from public sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what sources the text is claimed to be plagiarized from and/or what parts of the text that are being matched by Urkund. If it turns out that Urkund is generating false positives, then this needs to be pointed out to them and to the people basing decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwartz at me.com> wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyer....and that I am not speaking on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied and pasted verbatim into your thesis constitutes the use of copyrighted output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or by other parties for CRAN packages), albeit, the source code underlying R is, along with other copyright owner's as apropos. There is some caselaw to support the notion that the output alone is not protected in a similar manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there is any guidance provided for students regarding the crediting of software used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>> <oliver.barrett at skema.edu> wrote:
>>>>
>>>> Dear 'R' community support,
>>>>
>>>>
>>>> I am a student at Skema business school and I have recently submitted my MSc thesis/dissertation. This has been passed on to an external plagiarism service provider, Urkund, who have scanned my document and returned a plagiarism report to my professor having detected 32% plagiarism.
>>>>
>>>>
>>>> I have contacted Urkund regarding this issue having committed no such plagiarism and they have told me that all the plagiarism detected in my document comes from the last 25% which consists only of 'R' regressions like the one I have pasted below:
>>>>
>>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>>>
>>>> Residuals:
>>>>     Min        1Q    Median        3Q       Max
>>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>>>
>>>> Coefficients:
>>>>            Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>>> ---
>>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>>
>>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>> (20 observations deleted due to missingness)
>>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>>>
>>>> I have produced all of these regressions myself and pasted them directly from the 'R' software package. My regression methodology is entirely my own along with the sourcing and preperation of the data used to produce these statistics.
>>>>
>>>> I would be very grateful if you could provide my with some clarity as to why this output from 'R' is reading as plagiarism.
>>>>
>>>> I would like to thank you in advance,
>>>>
>>>> Kind regards,
>>>>
>>>> Oliver Barrett
>>>> (+44) 7341 834 217
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.