Rimozione di casi incompleti dall'output di Tidyr - Raccolta () - r

https://stackoverflow.com//questions/25045301

21-12-2019
|

Domanda

Ho dei dati disordinati in un dataframe che sembra questo.

Qui puoi vedere in "squadra" i nomi di alcune squadre di calcio. NAME1-3 sono variabili elencando i diversi nomi utilizzati per riferirsi a questi team nella prima colonna.

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town

Il mio obiettivo è ottenere i dati in 2 colonne con team-name1, team-name2, abbinamenti da squadra3. Voglio solo mantenere quegli abbinamenti dove ci sono dati in nome1, nome2 o nome3.

Per fare questo, sto provando Tidyr's- gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

Questo fornisce la seguente uscita:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town

Ho provato a rimuovere casi incompleti (ad esempio righe 20,21, 23,24 ma non 22), utilizzando:

temp[complete.cases(temp),]

Questo non ha funzionato come le osservazioni del valore apparentemente vuoto contengono un personaggio "" - Immagino che questo sia il modo in cui gather() restituisce i dati mancanti?. Ho provato a convertire temp$value in un fattore, ma questo non funzionava neanche.

Mi piacerebbe sentire come sbarazzarsi dei casi incompleti.

Dati di esempio ...

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")

Soluzione

Puoi anche aggiungere filter (per rimuovere gli spazi vuoti) e select (per rimuovere la colonna key) dal pacchetto dplyr e ottenere tutto in uno Vai

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United

Altri suggerimenti

Stai cercando: temp[temp$value!='',]?gather non deve essere accusato per le stringhe vuote, i tuoi dati iniziali li avevano anche loro.Puoi sostituirli prima, quindi utilizzare l'argomento na.rm in gather:

dat[dat==''] <- NA
temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE) 
temp$key<-NULL
tempA

approccio simile, ma facendo uso di na.OMa:

dat %>% 
  gather(key, value, -team) %>% 
  select(-key) %>%
  mutate(value = ifelse(value == "", NA, value)) %>%
  na.omit %>%
  arrange(team)

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow