[R-SIG-Mac] M3 not working with torch
Gilberto Camara
g||berto@c@m@r@ @end|ng |rom |npe@br
Mon May 20 14:48:59 CEST 2024
Dear R-SIG-MAC
I bought a new MacBook Air with the M3 chip, which has 8 CPUs, 10 GPUs, and 16GB of integrated memory. My R `torch` apps are crashing. I have assembled an MWE that works on other Mac architectures, including MacBook Air M1 and MacMini. The OS is the same (Sonoma 14.5). The MWE follows:
```{r}
# ==== MWE
# Download the training samples
rds_file <- "https://raw.githubusercontent.com/e-sensing/sitsdata/master/inst/extdata/torch/train_samples.rds?raw=true"
dest_file <- paste0(tempdir(),"/train_samples.rds")
download.file(rds_file,
destfile = dest_file,
method = "curl")
train_samples <- readRDS(dest_file)
# Sample labels
labels <- c("Cerrado", "Forest", "Pasture", "Soy_Corn")
# Create numeric labels vector
code_labels <- seq_along(labels)
names(code_labels) <- labels
# Split the data into training and validation data sets
# Create partitions for different splits of the input data
frac <- 0.2
train_samples <- dplyr::group_by(train_samples, .data[["label"]])
test_samples <- train_samples |>
dplyr::slice_sample(prop = frac) |>
dplyr::ungroup()
# Remove the lines used for validation
sel <- !train_samples[["sample_id"]] %in% test_samples[["sample_id"]]
train_samples <- train_samples[sel, ]
# Shuffle the data
train_samples <- train_samples[sample(nrow(train_samples), nrow(train_samples)), ]
test_samples <- test_samples[sample(nrow(test_samples), nrow(test_samples)), ]
# Organize data for model training
train_x <- as.matrix(train_samples[, -2:0])
train_y <- unname(code_labels[train_samples[["label"]]])
# Create the test data
test_x <- as.matrix(test_samples[, -2:0])
test_y <- unname(code_labels[test_samples[["label"]]])
# Set torch seed
torch::torch_manual_seed(sample.int(10^5, 1))
# Avoid a global variable for 'self'
self <- NULL
# function to create a simple sequential NN module
.torch_linear_relu_dropout <- torch::nn_module(
classname = "torch_linear_batch_norm_relu_dropout",
initialize = function(input_dim,
output_dim,
dropout_rate) {
self$block <- torch::nn_sequential(
torch::nn_linear(input_dim, output_dim),
torch::nn_relu(),
torch::nn_dropout(dropout_rate)
)
},
forward = function(x) {
self$block(x)
}
)
# Define the MLP architecture
mlp_model <- torch::nn_module(
initialize = function(num_pred, layers, dropout_rates, y_dim) {
tensors <- list()
# input layer
tensors[[1]] <- .torch_linear_relu_dropout(
input_dim = num_pred,
output_dim = 512,
dropout_rate = 0.40
)
# output layer
tensors[[length(tensors) + 1]] <-
torch::nn_linear(layers[length(layers)], y_dim)
# add softmax tensor
tensors[[length(tensors) + 1]] <- torch::nn_softmax(dim = 2)
# create a sequential module that calls the layers in the same
# order.
self$model <- torch::nn_sequential(!!!tensors)
},
forward = function(x) {
self$model(x)
}
)
# Train the model using luz
torch_model <- luz::setup(
module = mlp_model,
loss = torch::nn_cross_entropy_loss(),
metrics = list(luz::luz_metric_accuracy()),
optimizer = torch::optim_adamw,
)
torch_model <- luz::set_hparams(
torch_model,
num_pred = ncol(train_x),
layers = 512,
dropout_rates = 0.3,
y_dim = length(code_labels)
)
torch_model <- luz::set_opt_hparams(
torch_model,
lr = 0.001,
eps = 1e-08,
weight_decay = 1.0e-06
)
torch_model <- luz::fit(
torch_model,
data = list(train_x, train_y),
epochs = 100,
valid_data = list(test_x, test_y),
callbacks = list(luz::luz_callback_early_stopping(
patience = 20,
min_delta = 0.01
)),
verbose = TRUE
)
```
The error occurs in the `luz::fit` function. Inside RStudio, the code gets stuck, and then RStudio asks to restart R. When running R from the terminal, the output is:
```{r}
*** caught bus error ***
address 0x16daa0000, cause 'invalid alignment'
*** caught segfault ***
address 0x9, cause 'invalid permissions'
zsh: segmentation fault R
```
The `sessionInfo()` output is as follows:
```{r}
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Sao_Paulo
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] crayon_1.5.2 vctrs_0.6.5 cli_3.6.2 zeallot_0.1.0
[5] rlang_1.1.3 processx_3.8.4 generics_0.1.3 torch_0.12.0.9000
[9] coro_1.0.4 glue_1.7.0 bit_4.0.5 prettyunits_1.2.0
[13] luz_0.4.0 ps_1.7.6 hms_1.1.3 fansi_1.0.6
[17] tibble_3.2.1 progress_1.2.3 lifecycle_1.0.4 compiler_4.4.0
[21] dplyr_1.1.4 fs_1.6.4 Rcpp_1.0.12 pkgconfig_2.0.3
[25] rstudioapi_0.16.0 R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4
[29] pillar_1.9.0 callr_3.7.6 magrittr_2.0.3 tools_4.4.0
[33] bit64_4.0.5
```
Any clues will be most appreciated.
Thanks
Gilberto
============================
Prof Dr Gilberto Camara
Senior Researcher
National Institute for Space Research (INPE), Brazil
https://gilbertocamara.org/
=============================
More information about the R-SIG-Mac
mailing list