Question

I'm writing some code (just for fun so far) in Python that will store some data on every point in a 3d space. I'm basically after a 3d matrix object that stores arbitary objects that will allow me to do some advanced selections, like:

  • Get the point where x=1,y=2,z=3.
  • Getting all points where y=2.
  • Getting all points within 3 units of position x=1,y=2,z=3.
  • Getting all points where point.getType() == "Foo"

In all of the above, I'd need to end up with some sort of output that would give me the original position in the space, and the data stored at that point.

Apparently numpy can do what I want, but it seems highly optimised for scientific computing and working out how to get the data like I want above has so far eluded me.

Is there a better alternative or should I return to banging my head on the numpy wall? :)

EDIT: some more info the first three answers made me realise I should include: I'm not worried about performance, this is purely a proof-of-concept where I'd prefer clean code to good performance. I will also have data for every point in the given 3d space, so I guess a Spare Matrix is bad?

Was it helpful?

Solution

Here's another common approach

class Point( object ):
    def __init__( self, x, y, z, data ):
        self.x, self.y, self.z = x, y, z
        self.data = data
    def distFrom( self, x, y, z )
        return math.sqrt( (self.x-x)**2 + (self.y-y)**2 + (self.z-z)**2 )

database = [ Point(x,y,z,data), Point(x,y,z,data), ... ]

Let's look at your use cases.

Get the point where x=1,y=2,z=3.

[ p for p in database if (p.x, p.y, p.z) == ( 1, 2, 3 ) ]

Getting all points where y=2.

[ p for p in database if p.y == 2 ]

Getting all points within 3 units of position x=1,y=2,z=3.

[ p for p in database if p.distFrom( 1, 2, 3 ) <= 3.0 ]

Getting all points where point.getType() == "Foo"

[ p for p in database if type(p.data) == Foo ]

OTHER TIPS

Well ... If you expect to really fill that space, then you're probably best off with a densely packed matrix-like structure, basically voxels.

If you don't expect to fill it, look into something a bit more optimized. I would start by looking at octrees, which are often used for things like this.

One advantage of numpy is that it is blazingly fast, e.g. calculating the pagerank of a 8000x8000 adjacency matrix takes milliseconds. Even though numpy.ndarray will only accept numbers, you can store number/id-object mappings in an external hash-table i.e. dictionary (which in again is a highly optimized datastructure).

The slicing would be as easy as list slicing in python:

>>> from numpy import arange

>>> the_matrix = arange(64).reshape(4, 4, 4)
>>> print the_matrix[0][1][2]
    6
>>> print the_matrix[0][1]
    [4 5 6 7]
>>> print the_matrix[0]
    [[ 0  1  2  3]
    [ 4  5  6  7]
    [ 8  9 10 11]
    [12 13 14 15]]

If you wrap some of your desired functions (distances) around some core matrix and a id-object-mapping hash, you could have your application running within a short period of time.

Good luck!

Here's an approach that may work.

Each point is a 4-tuple (x,y,z,data), and your database looks like this:

database = [ (x,y,z,data), (x,y,z,data), ... ]

Let's look at your use cases.

Get the point where x=1,y=2,z=3.

[ (x,y,z,data) for x,y,z,data in database if (x,y,z) == (1,2,3) ]

Getting all points where y=2.

[ (x,y,z,data) for x,y,z,data in database if y == 2 ]

Getting all points within 3 units of position x=1,y=2,z=3.

[ (x,y,z,data) for x,y,z,data in database if math.sqrt((x-1)**2+(y-2)**2+(z-3)**2)<=3.0 ]

Getting all points where point.getType() == "Foo"

[ (x,y,z,data) for x,y,z,data in database if type(data) == Foo ]

You can do the first 2 queries with slicing in numpy :

a = numpy.zeros((4, 4, 4))
a[1, 2, 3] # The point at x=1,y=2,z=3
a[:, 2, :] # All points where y=2

For the third one if you mean "getting all points within a sphere of radius 3 and centered at x=1,y=2,z=3", you will have to write a custom function to do that ; if you want a cube you can proceed with slicing, e.g.:

a[1:3, 1:3, 1:3] # The 2x2x2 array sliced from the center of 'a'

For the fourth query if the only data stored in your array is the cells type, you could encode it as integers:

FOO = 1
BAR = 2
a = numpy.zeros((4, 4, 4), dtype="i")
a[1, 2, 3] = FOO
a[3, 2, 1] = BAR
def filter(a, type_code):
    coords = []
    for z in range(4):
        for y in range(4):
            for x in range(4):
                if a[x, y, z] == type_code:
                    coords.append((x, y, z))
    return coords
filter(a, FOO) # => [(1, 2, 3)]

numpy looks like the good tool for doing what you want, as the arrays will be smaller in memory, easily accessible in C (or even better, cython !) and extended slicing syntax will avoid you writing code.

Using a dictionary with x,y,z tuples as keys is another solution, if you want a relatively simple solution with the standard library.

import math

#use indexing to get point at (1,2,3): points[(1,2,3)]
get_points(points, x=None, y=None, x=None):
    """returns dict of all points with given x,y and/or z values.  Call using keywords (eg get_points(points, x=3)"""
    filteredPoints = points.items()
    if x:
        filteredPoints = [p for p in filteredPoints if p[0][0] == x]
    if y:
        filteredPoints = [p for p in filteredPoints if p[0][1] == y]
    if z:
        filteredPoints = [p for p in filteredPoints if p[0][0] == x]
    return dict(filteredPoints)

get_point_with_type(points, type_):
    """returns dict of points with point.getType() == type_"""
    filteredPoints = points.items()
    return dict((position,data) for position,data in points.iterItems() if data.getType == type_)

get_points_in_radius(points,x,y,z,r):
    """Returns a dict of points within radius r of point (x,y,z)"""
    def get_dist(x1,y1,z1,x2,y2,z3):
        return math.sqrt((x1-x2)*(x1-x2)+(y1-y2)*(y1-y2)+(z1-z2)*(z1-z2))
    return dict((position,data) for position,data in points.iterItems() if get_dist(x,y,z, *position) <= r))

And due to python referencing, you can alter "points" in the returned dictionaries, and have the original points change as well (I think).

When to use Binary Space Partitioning, Quadtree, Octree?

3d array imo is worthless. Especially if your world is dynamic. You should decide between BSP, Quadtree or Octtree. BSP would do just fine. Since your world is in 3d, you need planes when splitting the bsp, not lines.

Cheers !

Edit

I will also have data for every point in the given 3d space, so I guess a Spare Matrix is bad?

I guess this is alright if always know how large you data set is and that it never changes, i.e. if more points are added to it that in turn are out of bound. You would have to resize the 3d array in that case.

It depends upon the precise configuration of your system, but from the example you give you are using integers and discrete points, so it would probably be appropriate to consider Sparse Matrix data structures.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top