Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 11691

What to do if distinct() command is removing unique project IDs that it shouldn't

$
0
0

I tried to run a distinct() command on my dataset to get rid of duplicated respondents, but when I do that it also gets rid of unique project IDs that I need. It is ok for there to be duplicated IDs for the projects, but RESPNO should not be duplicated.

I have a merged dataset with different aid projects that I have merged with survey data. I use the following line of code to merge the two data sets

Full_dataset <- merge(AidData, OpinionData, by = "Recipient") 

which produces far too many observations and I notice the respondent IDs from the survey data are duplicated. I also use

Here is an example dataset of what the data frame looks like after the merge. It includes 250 Unique IDs and 46860 RESPNO IDs. I use distinct() within the dplyr package to filter down to the unique RESPNO IDs.

set.seed(42)# Number of rowsn_rows <- 386378Full_dataset <- data.frame("ID" = rep(1:250, length.out = n_rows, each = ceiling(n_rows/250)),"RESPNO" = rep(1:46860, length.out = n_rows),"Recipient" = sample(c("Angola", "Benin", "Peru", "UK", "South Africa", "Congo", "Mali", "India", "Greece"), n_rows, replace = TRUE),"Mitigation" = runif(n_rows, 0, 100),"Adaptation" = runif(n_rows, 0, 100),"Fossil_Fuel" = runif(n_rows, 0, 100))Full_dataset <- Full_dataset %>% distinct(RESPNO, .keep_all = TRUE)

I use the following code in the dplyr package to see that I have 250 unique IDs

result <- Full_dataset %>%  group_by(ID) %>%  summarise(count = n()) %>%  ungroup() %>%  arrange(desc(count))result

Yet when I use the distinct command on the full dataset, I drop down to just 31 project IDs even though I know I should have 250. I don't understand why this is happening or how to fix it.

Full_dataset <- Full_dataset %>% distinct(RESPNO, .keep_all = TRUE)

How can I use the distinct command to get rid of duplicated respondents (RESPNO) while keeping the correct amount of unique project IDs?


Viewing all articles
Browse latest Browse all 11691

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>