Numeric comparison operators and wildcards in Gremlin

https://stackoverflow.com/questions/11351844

19-06-2021
|

Frage

Is there a way to search a manual index in Neo4j using a numeric comparison operator(>=, <=, <, >, ...)? Looking at the Gremlin index examples, they all tend to present a search with a specific property value.

Say I have around 10M relationships of two types which both have numeric value in a property called 'property', in the first double, in the second int.

gremlin> g.e(123).getProperty('property')
==> 1.57479276459179

Now, if I knew the exact property value, which is a log-based pvalue of type double, I could easily locate the node with

gremlin> g.idx('index_e_ASSOC').get('property', 1.57479276459179)
==> e[2421730][31493-ASSOCIATION->53378]
==> e[4885094][53378-ASSOCIATION->31493]
==> e[866409][37891-ASSOCIATION->6292]
==> e[123][6292-ASSOCIATION->37891]

Instead I want to do a range search for the 'property', so for example find all edges in which 'property' >= 0 && 'property' <= 1.6. Is this something that could be done with Gremlin? Looking at Gremlin users discussion group tells me that even the wildcard search from a fulltext Lucene index is a bit of hack, and Neo4j API doesn't help.

Edit: Found another question like this in Stackoverflow (titled "Range queries in Neo4j using Lucene query syntax"; new users can only post a maximum of two hyperlinks) which lead to Neo4j documentation. I recreated the index by using ValueContext for numeric values. By following an example found in neo4j discussion group (title: combine numericRange query with relationship query) I can do a query like

start a=node(123)
match a-[rel]-(b)
where type(rel) = "ASSOCIATION" AND rel.`property` > 1.0 AND rel.`property` < 2.0
RETURN b
LIMIT 20;

which uses the range search. What's the syntax for Gremlin? It should be something like

g.idx('index_e_ASSOC')[[property: Neo4jTokens.QUERY_HEADER + "[1.0 TO 2.0]"]].count()

Which is syntactically correct, but count yields 0 results even though there are edges with property within that range.

Lösung

You can use the Gremlin filter step on all the edges, but this does a table scan:

g.E.filter{it.property >= 0 && it.property <= 1.6}

See https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps

If index index_e_ASSOC contains a subset of all the edges, you could use a wildcard query to narrow the range:

start = g.idx('index_e_ASSOC')[['property': Neo4jTokens.QUERY_HEADER + "*"]]
start.filter{it.property >= 0 && it.property <= 1.6}

Note that Neo4jTokens.QUERY_HEADER resolves to "%query%" so you could also write it like this:

start = g.idx('index_e_ASSOC')[['property': "%query%" + "*"]]
start.filter{it.property >= 0 && it.property <= 1.6}

Andere Tipps

Probably your best bet is to use the Neo4j API through groovy, much like http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-send-an-arbitrary-groovy-script---lucene-sorting ?

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow