What you describe looks very similar to http://en.wikipedia.org/wiki/Single-linkage_clustering. This article includes a pointer to SLINK, which is optimal in the sense that it receives a matrix of distances between points of size N^2 and works in time O(N^2). Another attraction of the method is that, if you can compute distances on demand, you only need O(N) storage.
However, if your distances have some structure behind them, so that not all patterns of distance are possible, you can do better. This problem can also be stated as finding a minimum spanning tree on N points. If the points live in a euclidean space - so the distance between X and Y is ((X0-Y0)^2 + (X1-Y1)^2 + ...) you can take account of this to reduce the cost further - see http://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree.
You should be aware that there are a large number of different sorts of clustering, with different statistical pluses and minuses, and different amounts of time required for their construction. If you are not actually sure which one you want you will find an introduction at http://en.wikipedia.org/wiki/Category:Data_clustering_algorithms