Numeric comparison operators and wildcards in Gremlin
Frage
Is there a way to search a manual index in Neo4j using a numeric comparison operator(>=, <=, <, >, ...)? Looking at the Gremlin index examples, they all tend to present a search with a specific property value.
Say I have around 10M relationships of two types which both have numeric value in a property called 'property', in the first double, in the second int.
gremlin> g.e(123).getProperty('property')
==> 1.57479276459179
Now, if I knew the exact property value, which is a log-based pvalue of type double, I could easily locate the node with
gremlin> g.idx('index_e_ASSOC').get('property', 1.57479276459179)
==> e[2421730][31493-ASSOCIATION->53378]
==> e[4885094][53378-ASSOCIATION->31493]
==> e[866409][37891-ASSOCIATION->6292]
==> e[123][6292-ASSOCIATION->37891]
Instead I want to do a range search for the 'property', so for example find all edges in which 'property' >= 0 && 'property' <= 1.6. Is this something that could be done with Gremlin? Looking at Gremlin users discussion group tells me that even the wildcard search from a fulltext Lucene index is a bit of hack, and Neo4j API doesn't help.
Edit: Found another question like this in Stackoverflow (titled "Range queries in Neo4j using Lucene query syntax"; new users can only post a maximum of two hyperlinks) which lead to Neo4j documentation. I recreated the index by using ValueContext for numeric values. By following an example found in neo4j discussion group (title: combine numericRange query with relationship query) I can do a query like
start a=node(123)
match a-[rel]-(b)
where type(rel) = "ASSOCIATION" AND rel.`property` > 1.0 AND rel.`property` < 2.0
RETURN b
LIMIT 20;
which uses the range search. What's the syntax for Gremlin? It should be something like
g.idx('index_e_ASSOC')[[property: Neo4jTokens.QUERY_HEADER + "[1.0 TO 2.0]"]].count()
Which is syntactically correct, but count yields 0 results even though there are edges with property within that range.
Lösung
You can use the Gremlin filter step on all the edges, but this does a table scan:
g.E.filter{it.property >= 0 && it.property <= 1.6}
See https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps
If index index_e_ASSOC
contains a subset of all the edges, you could use a wildcard query to narrow the range:
start = g.idx('index_e_ASSOC')[['property': Neo4jTokens.QUERY_HEADER + "*"]]
start.filter{it.property >= 0 && it.property <= 1.6}
Note that Neo4jTokens.QUERY_HEADER
resolves to "%query%"
so you could also write it like this:
start = g.idx('index_e_ASSOC')[['property': "%query%" + "*"]]
start.filter{it.property >= 0 && it.property <= 1.6}
Andere Tipps
Probably your best bet is to use the Neo4j API through groovy, much like http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-send-an-arbitrary-groovy-script---lucene-sorting ?