[R] Amazing AI

Mon Dec 19 09:05:36 CET 2022

Hi Boris,
I think these are good questions.
Some initial reactions:
1. with better tools available to the students, you can cover more material
at a faster pace 🙂
2. For years, it has been possible for students to find "answers" online
(e.g. Google search). Most programmers would regard this as an essential
part of their work - looking to overcome some obstacle via searching on
StackOverflow, Google, R-Help 🙂etc*. I consider the ability to do such
searches something worth teaching. In this spirit, you could teach
effective ways to use ChatGPT.
3. Finally, as you observed, some of the ChatGPT responses have
bugs/deficiencies. For example, the Codex code does *not* remove
duplicates. It requires knowledge and skill of the material you are
teaching, for the students to be able to spot and fix the bugs. In which
case, what's your problem? 😁

HTH,
Eric

* and as a shout-out to Bert, they should also read the documentation -
e.g. ?subset

On Mon, Dec 19, 2022 at 12:47 AM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> It would help students formulate a plan for coding. Successful students
> will be able to give good directions that the AI can turn into good code.
> This skill is essential no matter who writes the program.
> In more advanced classes I might collect some data sets designed to cause
> the AI problems. Another option is to make tests where students have to
> write code on paper or multiple guess where students must choose between
> similar code snips.
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Boris Steipe
> Sent: Sunday, December 18, 2022 2:01 PM
> To: r-help using r-project.org
> Subject: [R] Amazing AI
>
> [External Email]
>
> Technically not a help question. But crucial to be aware of, especially
> for those of us in academia, or otherwise teaching R. I am not aware of a
> suitable alternate forum. If this does not interest you, please simply
> ignore - I already know that this may be somewhat OT.
>
> Thanks.
> ------------------------------------------------------
>
> You very likely have heard of ChatGPT, the conversation interface on top
> of the GPT-3 large language model and that it can generate code. I thought
> it doesn't do R - I was wrong. Here is a little experiment:
>
> MY PROMPT:
> ==========
> Please write me some R code that will read a tsv file, remove all rows
> that have duplicates in the column "names", and store the median values of
> the remaining rows in a variable called "Medians". Of course, that
> calculation should only be done with columns that are numeric. Print the
> last variable in order from largest to smallest.
>
>
> AI RESPONSE:
> ============
>
> Here is some R code that should accomplish what you've described:
>
> (Note: the code is formatted in a little terminal window, complete with a
> button to "copy code" at the top.)
>
> ----
> # Read the tsv file
> data <- read.table("file.tsv", header = TRUE, sep = "\t")
>
> # Remove rows with duplicates in the "names" column data <-
> data[!duplicated(data$names), ]
>
> # Select only the numeric columns
> num_columns <- sapply(data, is.numeric)
> numeric_data <- data[, num_columns]
>
> # Calculate the medians for each column
> Medians <- apply(numeric_data, 1, median)
>
> # Sort Medians in descending order
> Medians <- sort(Medians, decreasing = TRUE)
>
> # Print Medians
> print(Medians)
> ----
>
> This code assumes that the tsv file is called "file.tsv" and is stored in
> the current working directory. It also assumes that the first row of the
> file contains the column names.
>
>
>
> MY PROMPT:
> ==========
> Can you provide me with a bit of sample data that I can use to test this
> code?
>
> AI RESPONSE:
> ============
> Certainly! Here is some sample data that you can use to test the code I
> provided:
>
> ----
> name    age     height  weight
> Alice   23      5.6     120
> Bob     32      6.2     180
> Charlie 29      5.9     160
> Alice   23      5.6     120
> David   25      5.8     140
> Eve     27      5.5     110
> ----
>
> You can save this data as a tsv file and use it to test the code.
>
>
> ------------------------------------------------------
>
> Notes:
> 1) The code, as written does not work with the test data. The reason is
> that the test data specifies a column "name", but the code assumes "names"
> (which I specified in the instructions). Once this is fixed in the test
> data, the code executes.
>
> 2) The instructions are not accurately translated: I had asked for row
> means. But changing the margin in the apply() statement from 2 to 1 does
> the correct thing.
>
> 3) Note that the test data contains both numeric and non-numeric columns.
> Also, the name "Alice" is duplicated in the test data, which is subtle, and
> the right thing to do.
>
>
> ------------------------------------------------------
>
> A similar experiment, run on the beta-site of OpenAI's Codex
> code-completion tool gives the following code (complete with the comments.):
>
> ----
> # Read in the data
> data <- read.table("mydata.tsv", sep = "\t", header = TRUE)
>
> # Remove all rows that have duplicates in the column "names"
> unique_names <- unique(data$names)
> data_unique <- data[data$names %in% unique_names,]
>
> # Calculate medians
> Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median,
> na.rm = TRUE)
>
> # Print last variable in order form largest to smallest
> rev(sort(Medians[length(Medians)]))
> ----
>
> Note that the strategy is quite different (e.g using %in%, not
> duplicated() ), the interpretation of "last variable" is technically
> correct but not what I had in mind (ChatGPT got that right though).
>
>
> Changing my prompts slightly resulted it going for a dplyr solution
> instead, complete with %>% idioms etc ... again, syntactically correct but
> not giving me the fully correct results.
>
> ------------------------------------------------------
>
> Bottom line: The AI's ability to translate natural language instructions
> into code is astounding. Errors the AI makes are subtle and probably not
> easy to fix if you don't already know what you are doing. But the way that
> this can be "confidently incorrect" and plausible makes it nearly
> impossible to detect unless you actually run the code (you may have noticed
> that when you read the code).
>
> Will our students use it? Absolutely.
>
> Will they successfully cheat with it? That depends on the assignment. We
> probably need to _encourage_ them to use it rather than sanction - but
> require them to attribute the AI, document prompts, and identify their own,
> additional contributions.
>
> Will it help them learn? When you are aware of the issues, it may be quite
> useful. It may be especially useful to teach them to specify their code
> carefully and completely, and to ask questions in the right way. Test cases
> are crucial.
>
> How will it affect what we do as instructors? I don't know. Really.
>
> And the future? I am not pleased to extrapolate to a job market in which
> they compete with knowledge workers who work 24/7 without benefits,
> vacation pay, or even a salary. They'll need to rethink the value of their
> investment in an academic education. We'll need to rethink what we do to
> provide value above and beyond what AI's can do. (Nb. all of the arguments
> I hear about why humans will always be better etc. are easily debunked, but
> that's even more OT :-)
>
> --------------------------------------------------------
>
> If you have thoughts to share how your institution is thinking about
> academic integrity in this situation, or creative ideas how to integrate
> this into teaching, I'd love to hear from you.
>
>
> All the best!
> Boris
>
>
> --
> Boris Steipe MD, PhD
> University of Toronto
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7Ce75e9fae0cc6458889d808dae12a42fc%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638069869013242055%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SUHfSjl4x6R6iJd3HZ8A5RLOxz7%2BycNv6gvjTTch%2BYg%3D&reserved=0
> PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7Ce75e9fae0cc6458889d808dae12a42fc%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638069869013242055%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nFy1v8poyMtjvYlXgt8yRaerlPwSpeoTWMOrgzvxeH0%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]