RHadoop key with multiple components

https://stackoverflow.com/questions/17457645

02-06-2022
|

Frage

I'm stuck when trying to get keyval pair when the key has multiple components.

Say all the keys contain 3 string components, for example: {"I" "like" "Lucy"}, or {"You" "hate" "Jimmy"}.

The combination of these 3 strings will be unique key. And what I want for the mapreduce result is the number of record of {"I" "like" "Lucy"} or {"You" "hate" "Jimmy"}.

The question is what kind of structure should I use for the 3 strings key?

If I use list as key:

LST1<-list(who="I", how="like", whom="Lucy")
LST2<-list(who="I", how="like", whom="Lucy")

LST1 and LST2 are supposed to have same key value, but problem is they are different objects, therefore the list structure can't be used as key.

If I use vector as key:

v1<-c("I","like","lucy")
v2<-c("I","like","Jimmy")

What R will do is trying to compare every entry at same position, and return a vector of boolean values, which in this case is {TRUE, TRUE, FALSE}.

Any suggestion? What kind of structure can I use? Or is there any tricky way to handle this?

I know I can deal with this in Java, but I need a solution in R. And the 3 strings case is just an example, the components can be everything like numeric, string, char, etc.

Lösung

How about concatenating the vector of strings to a new string and using it as key?

For example,

v1<-c("I","like","lucy")
v2<-c("I","like","Jimmy")
s1 <- paste(v1, sep = " ")
s2 <- paste(v2, sep = " ")

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow