Are there Machine Learning Models for Networks?

https://datascience.stackexchange.com/questions/16762

16-10-2019
|

Question

I am working on a regression problem, where the goal is to estimate historic traffic volumes throughout a transportation network. I have traffic counters at 100 locations, so a model can learn the relation between traffic volumes and a number of explanatory variables (e.g., speeds, road characteristics, weather). Afterwards, I can apply the model to estimate historic traffic volumes in places where I don't have traffic counters.

My neural network works reasonably well, but I am wondering if there are machine learning models that could explicitly account for the topology of my road network and the fact that traffic on neighboring road links is highly correlated. I could add "traffic volume at the closest traffic counter" as an input variable to my ANN, but I am wondering if there is a more intelligent approach.

In this regard, I came across Bayesian networks, which can account for the network topology and correlation. However, they seem applicable to cases when we have sensors at 100 locations and we want to predict the traffic state (at these 100 locations) at a future time point. On the other hand, I have measurements at 100 locations and are looking to estimate traffic at a different location for the same time point.

Any suggestion is much appreciated!

Solution

Coming from the related field of measuring and predicting network security, I'd strongly suggest trying a time-series forecasting. I assume your data is time-stamp based (network congestion values, sampled at some interval? if not skip to the second idea)

1st idea: I'd borrow from time-series the concept of flattening the 100 measurements into 1 datum. So instead of:

[t1+delta1, location1, messurement1]

[t1+delta2, location2, messurement2]

[t1+delta3, location3, messurement3]

Fold into:

[t1-bucketed, loc1, mess1, loc2, mess2, loc3, mess3.]

This would help the model "grasp" the relation between the different measurements, with emphasis on the time axis

2nd idea Flatten into each measurement row the closest measurements by topology (or even the entire 100 neighbors) into:

[mess, topol-1-mess, topol-1-dist, topol-1-other, topol-2-mess, topol-2-dist, topol-2-other, ..]

This would help the model "grasp" the relation between a specific measurement and its neighborhood measurements, with emphasis on the topology features from each measurement

Please let us know if that helped :)

OTHER TIPS

A simple approach would be to use k-nearest-neighbors, where the distance metric is, in your case, "the number of road links away." The technique is described in chapters 2 and 13 of The Elements of Statistical Learning. Basically it would take the average traffic volume for the k nearest traffic nodes. There is essentially no training involved, other than cross validation and tuning to find the optimal k. The trade-off is that it's computationally heavy at query time (when you want to make a prediction).

Side note: If you aren't already, I highly recommend using a database like Neo4j to make querying the link distance much easier. You could probably code the KNN "model" yourself using just 3-4 lines of code if you use a graph database.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange