Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12141

Turn dataframe of events into sparse matrix

$
0
0

I have a dataset called transactions representing shopping carts that I've gotten into the following format:

    member  Date        V11   1000    15-03-2015  sausage,whole milk,semi-finished bread,yogurt   2   1000    24-06-2014  whole milk,pastry,salty snack   3   1000    24-07-2015  canned beer,misc. beverages 4   1001    25-11-2015  sausage,hygiene articles    5   1001    27-05-2015  soda,pickled vegetables 6   1001    02-05-2015  frankfurter,curd    

What I need is something that looks like canonical sparse matrix cart data:

cart sausage whole milk bread yogurt frankfurter #many more cols1    TRUE    TRUE       TRUE  TRUE   FALSE

After a few hours of struggling, I'm currently doing this in a very non-R way. My dataframe is called transactions and has all of my 'shopping events' in the format shown above in the first code block.

ll <- unique(unlist(strsplit(paste0(transactions$V1, collapse=","), ',')))txn_df <- data.frame()txn_df[c(ll, "cart")] <- list(character(0))build_carts <- function(row){  xs <- sapply(strsplit(row$V1, ","), trimws)  # first `strsplit` by comma, then trim whitespace  tmp <- data.frame(matrix(nrow=1, ncol = length(txn_df))) #new dataframe  names(tmp) <- names(txn_df) #copy columns  tmp$cart <- paste(row$Date, row$member, sep="_") #make a new cart ID  #set present items to TRUE  for (i in 1:length(xs)) {    tmp[,which(colnames(tmp)==xs[i])] = TRUE  }  tmp <- replace(tmp, is.na(tmp), FALSE) # all other items false  txn_df <<- rbind(txn_df, tmp) #copy to parent DF}res <- by(transactions, seq_len(nrow(transactions)), build_carts)

This works but is, as you'd imagine, very very slow. Is there a way to speed this up without going too deep down into the tidyverse? e.g. if the code could be at least partially legible to a tidyverse noob that would be great (for didactic purposes).


Viewing all articles
Browse latest Browse all 12141

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>