문제

I'm creating a multi-threaded application in Linux. here is the scenario:

Suppose I am having x instance of a class BloomFilter and I have some y GB of data(greater than memory available). I need to test membership for this y GB of data in each of the bloom filter instance. It is pretty much clear that parallel programming will help to speed up the task moreover since I am only reading the data so it can be shared across all processes or threads.

Now I am confused about which one to use Cilk, Cilk++ or OpenMP(which one is better). Also I am confused about which one to go for Multithreading or Multiprocessing

도움이 되었습니까?

해결책

Cilk Plus is the current implementation of Cilk by Intel. They both are multithreaded environment, i.e., multiple threads are spawned during execution.

If you are new to parallel programming probably OpenMP is better for you since it allows an easier parallelization of already developed sequential code. Do you already have a sequential version of your code?

OpenMP uses pragma to instruct the compiler which portions of the code has to run in parallel. If I understand your problem correctly you probably need something like this:

   #pragma omp parallel for firstprivate(array_of_bloom_filters)
   for i in DATA:
      check(i,array_of_bloom_filters);

the instances of different bloom filters are replicated in every thread in order to avoid contention while data is shared among thread.

update: The paper actually consider an application which is very unbalanced, i.e., different taks (allocated on different thread) may incur in very different workload. Citing the paper that you mentioned "a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies". Consider that in order to balance computation among threads it is necessary to reduce the task size and therefore increase the time spent in synchronizations. In other words, good load balancing comes always at a cost. The description of your problem is not very detailed but it seems to me that the problem you have is quite balanced. If this is not the case then go for Cilk, its work stealing approach its probably the best solution for unbalanced workloads.

다른 팁

At the time this was posted, Intel was putting a lot of effort into boosting Cilk(tm) Plus; more recently, some effort has been diverted toward OpenMP 4.0. It's difficult in general to contrast OpenMP with Cilk(tm) Plus.
If it's not possible to distribute work evenly across threads, one would likely set schedule(runtime) in an OpenMP version, and then at run time try various values of environment variable, such as OMP_SCHEDULE=guided, OMP_SCHEDULE=dynamic,2 or OMP_SCHEDULE=auto. Those are the closest OpenMP analogies to the way Cilk(tm) Plus work stealing works. Some sparse matrix functions in Intel MKL library do actually scan the job first and determine how much to allocate to each thread so as to balance work. For this method to be useful, the time spent in serial scanning and allocating has to be of lower order than the time spent in parallel work. Work-stealing, or dynamic scheduling, may lose much of the potential advantage of OpenMP in promoting cache locality by pinning threads with cache locality e.g. by OMP_PROC_BIND=close. Poor cache locality becomes a bigger issue on a NUMA architecture where it may lead to significant time spent on remote memory access. Both OpenMP and Cilk(tm) Plus have facilities for switching between serial and parallel execution.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top