Can we use HDFS and big data Analytics for processing huge log files being processed through some application on some central server?

https://datascience.stackexchange.com/questions/6139

16-10-2019
|

Question

Detailed Question Explanation:

Suppose say our application X is processing huge logs (size varying from MBs to GBs) and giving insight results in these logs(NOT A Social Data logs or Security Logs)

now this logs are in format say log.y with different variety, using C++ as Engine to process these huge logs.(It generates imp. insights about data but need to be processed using our application X only and we don't want to change core way processing of application X)

If this processing happens on some server it under or over utilizes resources (That I already know).

If we use cloud computing for this processing we get that processing power with optimum usage.

How do we see help of BIG data analytics in this particular sort of usage? Any help or suggestion is very deeply appreciated

Solution

If the size of your logs are still growing then a distributed data system is definitely the right way to go. I have been using Mesos in production for almost a year now and it solved the problem of if this processing happens on some server it under or over utilizes resources.

I probably would look into some stacks like this:

Mesos as your fault-tolerant and elastic distributed systems
Spark or some Hadoop-based solutions for log processing and store the output in a DFS like HDFS
Have your applications consume data stored in HDFS as final steps

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange