Finding the optimal root

https://softwareengineering.stackexchange.com/questions/289852

09-10-2020
|

题

I am trying to solve a quora hackathon challenge question. The question descrpition is as follows :

For the purposes of this problem, suppose Quora has N questions, and question i (1≤i≤N) takes Ti time to read. There exists exactly one path from any question to another, and related questions form undirected pairs between themselves. In other words, the graph of related questions is a tree.

Each time Steve reads a question, he will see a list of related questions and navigate to one that he hasn't read yet at random. Steve will stop reading once there are no unread related questions.

Which question should we show first to Steve so that we minimize his total expected reading time? It is guaranteed that there is one unique question that is optimal.

I figured out that the question simply boils down to finding the root from which the average of root to leaves cost is minimum. I originally thought of finding the average cost by taking every node in the tree as a root and then calcualting the average cost for each root. But this would take O(n^2). Since n=10^5, I don't think this will cut it. Any suggestion/hints on solving this in O(n).

The question can be foudn here : https://www.hackerrank.com/contests/quora-haqathon/challenges/relatedquestions

解决方案

Your solution seems to try to tackle average reading time of nodes and all paths from them. This will, of course, work, provided it's properly implemented.

The problem with this approach is that the quantities you calculate are not reusable. The average reading time of a node depends on how you got to it. Hence the quadratic behaviour of your naive solution. The first step to try to optimize it is to try and find reusable results of calculations.

One such potential result is the cost of taking a link or what's the expected reading time after having gone from question A to question B? . What's the difference between a 'node' and a 'link'? A 'link' has a both a destination and an origin. Since there is exactly 1 path between any 2 nodes and you cannot repeat nodes in your reading, once you take a link, any node that's reachable through the origin is now inaccessible. Hence it does not matter what path you took before the link. Hence this quantity is reusable.

Next we need to answer 2 questions:

Does it help?
How quickly can we calculate this quantity?

First question is easy - if we have the costs of taking links, then it's extremely easy to find the average reading times when given a start node - it's just that node time + average of outgoing links. Hence given these we can get the answer in a single linear lookup.

Second question is slightly more difficult - how do we calculate the costs of taking links in a (relatively) efficient manner?

It's fairly simple if you think about it.

The cost of taking a leaf node is just the expected read time of the leaf.
The cost of taking a link to a node is the node reading time plus the average across all outgoing links from that node except the reverse link.

That's a method. How good is it? It's actually linear! If we take the given tree (and take any node as root), we can work out the costs of taking downward links in a simple linear sweep (since we cannot take an upward link after taking a downward one). After we have all costs of downward links, we can make a second linear sweep from top to bottom to calculate upward links.

Hence given that you can get connections of a node in constant time, the entire thing is just 3 linear sweeps!

Since there is apparently some confusion as to how this works, a step by step of the example input on the website:

We have 5 nodes connected in a chain with weights 2, 2, 1, 2, 2:

O - O - O - O - O
2   2   1   2   2

Take, say, the second node as the root for the procedure. A second variant on the right will have the third node as the root for comparison:

  O            O
 / \          / \
O   O        O   O
    |        |   |
    O        O   O
    |        
    O

Compute the cost of taking downward links (on the right of the link):

   O              O
 /2  \5         /4  \4
O     O        O     O
      |4       |2    |2
      O        O     O
      |2
      O

Now compute upward links (left of link):

   O              O
7/2 4\5        5/4 5\4
O     O        O     O
     5|4      7|2   7|2
      O        O     O
     7|2
      O

Note that the results are the same! (As in link costs, not the trees).

Now generate average reading time from this data (node time + left link + right link):

Node 1: 2 + 0   + 7   = 9  
Node 2: 2 + 2/2 + 5/2 = 5.5  
Node 3: 1 + 4/2 + 4/2 = 5  
Node 4: 2 + 5/2 + 2/2 = 5.5  
Node 5: 2 + 7   + 0   = 9

Hence the node with minimum expected reading time is Node 3!

许可以下： CC-BY-SA 和归因

不隶属于 softwareengineering.stackexchange