How do I create a list of timedeltas in python?
문제
I've been searching through this website and have seen multiple references to time deltas, but haven't quite found what I'm looking for.
Basically, I have a list of messages that are received by a comms server and I want to calcuate the latency time between each message out and in. It looks like this:
161336.934072 - TMsg out: [O] enter order. RefID [123] OrdID [4568]
161336.934159 - TMsg in: [A] accepted. ordID [456] RefNumber [123]
Mixed in with these messages are other messages as well, however, I only want to capture the difference between the Out messages and in messages with the same RefID.
So far, to sort out from the main log which messages are Tmessages I've been doing this, but it's really inefficient. I don't need to be making new files everytime.:
big_file = open('C:/Users/kdalton/Documents/Minicomm.txt', 'r')
small_file1 = open('small_file1.txt', 'w')
for line in big_file:
if 'T' in line: small_file1.write(line)
big_file.close()
small_file1.close()
How do I calculate the time deltas between the two messages and sort out these messages from the main log?
해결책
This generator function returns a tuple containing the id and the difference in timestamps between the out and in messages. (If you want to do something more complex with the time difference, check out datetime.timedelta
). Note that this assumes out messages always appear before in messages.
def get_time_deltas(infile):
entries = (line.split() for line in open(INFILE, "r"))
ts = {}
for e in entries:
if len(e) == 11 and " ".join(e[2:5]) == "TMsg out: [O]":
ts[e[8]] = e[0] # store timestamp for id
elif len(e) == 10 and " ".join(e[2:5]) == "TMsg in: [A]":
in_ts, ref_id = e[0], e[9]
# Raises KeyError if out msg not seen yet. Handle if required.
out_ts = ts.pop(ref_id) # get ts for this id
yield (ref_id[1:-1], float(in_ts) - float(out_ts))
You can now get a list out of it:
>>> INFILE = 'C:/Users/kdalton/Documents/Minicomm.txt'
>>> list(get_time_deltas(INFILE))
[('123', 8.699999307282269e-05), ('1233', 0.00028700000257231295)]
Or write it to a file:
>>> with open("out.txt", "w") as outfile:
... for id, td in get_time_deltas(INFILE):
... outfile.write("Msg %s took %f seconds\n", (id, td))
Or chain it into a more complex workflow.
Update:
(in response to looking at the actual data)
Try this instead:
def get_time_deltas(infile):
entries = (line.split() for line in open(INFILE, "r"))
ts = {}
for e in entries:
if " ".join(e[2:5]) == "OuchMsg out: [O]":
ts[e[8]] = e[0] # store timestamp for id
elif " ".join(e[2:5]) == "OuchMsg in: [A]":
in_ts, ref_id = e[0], e[7]
out_ts = ts.pop(ref_id, None) # get ts for this id
# TODO: handle case where out_ts = None (no id found)
yield (ref_id[1:-1], float(in_ts) - float(out_ts))
INFILE = 'C:/Users/kdalton/Documents/Minicomm.txt'
print list(get_time_deltas(INFILE))
Changes in this version:
- the number of fields is not as stated in the sample input posted in question. Removed check based on entry number
ordID
forin
messages is the one that matchesrefID
in theout
messages- used
OuchMsg
instead ofTMsg
Update 2
To get an average of the deltas:
deltas = [d for _, d in get_time_deltas(INFILE)]
average = sum(deltas) / len(deltas)
Or, if you have previously generated a list containing all the data, we can reuse it instead of reparsing the file:
data = list(get_time_deltas(INFILE))
# .. use data for something some operation ...
# calculate average using the list
average = sum(d for _, d in data) / len(data)
다른 팁
First of all, don't write out the raw log lines. Secondly use a dict.
tdeltas = {} # this is an empty dict
if "T" in line:
get Refid number
if Refid in tedeltas:
tdeltas[Refid] = timestamp - tdeltas[Refid]
else:
tdeltas[Refid] = timestamp
Then at the end, convert to a list and print
allRefids = sorted(tdeltas.keys())
for k in allRefids:
print k+": "+tdeltas[k]+" secs"
You may want to convert your dates into time
objects from the datetime
module and then use timedelta objects to store in the dict. Probably not worth it for this task but it is worthwhile to learn how to use the datetime module.
Also, I have glossed over parsing the Refid from the input string, and the possible issue of converting the times from string to float and back.
Actually, just storing deltas will cause confusion if you ever have a Refid that is not accepted. If I were doing this for real, I would store a tuple in the value with the start datetime, end datetime and the delta. For a new record it would look like this: (161336.934072,0,0)
and after the acceptance was detected it would look like this: (161336.934072,161336.934159,.000087)
. If the logging activity was continuous, say a global ecommerce site running 24x7, then I would periodically scan the dict for any entries with a non-zero delta, report them, and delete them. Then I would take the remaining values, sort them on the start datetime, then report and delete any where the start datetime is too old because that indicates failed transactions that will never complete.
Also, in a real ecommerce site, I might consider using something like Redis or Memcache as an external dict so that reporting and maintenance can be done by another server/application.