С#:Реализация решета Аткина

https://stackoverflow.com/questions/1569127

21-09-2019
|

Вопрос

Мне было интересно, есть ли у кого-нибудь здесь хорошая реализация Решета Аткина, которой они хотели бы поделиться.

Я пытаюсь реализовать это, но не могу полностью осмыслить это.Вот что у меня есть на данный момент.

public class Atkin : IEnumerable<ulong>
{
    private readonly List<ulong> primes;
    private readonly ulong limit;

    public Atkin(ulong limit)
    {
        this.limit = limit;
        primes = new List<ulong>();
    }

    private void FindPrimes()
    {
        var isPrime = new bool[limit + 1];
        var sqrt = Math.Sqrt(limit);

        for (ulong x = 1; x <= sqrt; x++)
            for (ulong y = 1; y <= sqrt; y++)
            {
                var n = 4*x*x + y*y;
                if (n <= limit && (n % 12 == 1 || n % 12 == 5))
                    isPrime[n] ^= true;

                n = 3*x*x + y*y;
                if (n <= limit && n % 12 == 7)
                    isPrime[n] ^= true;

                n = 3*x*x - y*y;
                if (x > y && n <= limit && n % 12 == 11)
                    isPrime[n] ^= true;
            }

        for (ulong n = 5; n <= sqrt; n++)
            if (isPrime[n])
                for (ulong k = n*n; k <= limit; k *= k)
                    isPrime[k] = false;

        primes.Add(2);
        primes.Add(3);
        for (ulong n = 5; n <= limit; n++)
            if (isPrime[n])
                primes.Add(n);
    }


    public IEnumerator<ulong> GetEnumerator()
    {
        if (!primes.Any())
            FindPrimes();


        foreach (var p in primes)
            yield return p;
    }


    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Я почти только что попытался «перевести» псевдокод, указанный в Википедия, но он работает неправильно.Так что либо я что-то не так понял, либо просто сделал что-то не так.Или, скорее всего, и то, и другое...

Имейте список первых 500 простых чисел, которые я использую в качестве теста, и моя реализация терпит неудачу под номером 40 (или 41?).

Значения различаются по индексу [40]
Ожидал:179
Но было:175

Сможете ли вы найти мою ошибку, есть ли у вас реализация, которой вы могли бы поделиться, или и то, и другое?

Точный тест, который я использую, выглядит так:

public abstract class AtkinTests
{
    [Test]
    public void GetEnumerator_FirstFiveHundredNumbers_AreCorrect()
    {
        var sequence = new Atkin(2000000);
        var actual = sequence.Take(500).ToArray();
        var expected = First500;

        CollectionAssert.AreEqual(expected, actual);
    }

    private static readonly ulong[] First500 = new ulong[]
        {
            2, 3, 5, 7, 11, 13, 17, ...
        };
}

Решение

Этот код:

for (ulong k = n*n; k <= limit; k *= k)
  isPrime[k] = false;

не похоже на точный перевод этого псевдокода:

is_prime(k) ← false, k ∈ {n², 2n², 3n², ..., limit}

Ваш код выглядит так, как будто он будет работать n * n, n ^ 4, n ^ 8 и т. д.то естькаждый раз возводить в квадрат вместо того, чтобы каждый раз прибавлять n-квадрат.Попробуй это:

ulong nSquared = n * n;
for (ulong k = nSquared; k <= limit; k += nSquared)
  isPrime[k] = false;

Другие советы

А последний ответ от Аарона Мугатройда из переведенного исходного кода Python для Решета Аткина (SoA) не так уж и плохо, но его можно улучшить в нескольких отношениях, поскольку в нем отсутствуют некоторые важные оптимизации, а именно:

В его ответе не используется полная по модулю 60 оригинальная версия Sieve Аткина и Бернштейна, а скорее слегка улучшенный вариант псевдокода из статьи в Википедии, поэтому используется коэффициент 0,36 от числового диапазона сита в сочетании с операциями переключения/отбраковки;мой код ниже использует достаточно эффективный псевдокод нестраничного сегмента согласно мои комментарии в ответе, комментирующем Решето Аткина который использует коэффициент, примерно в 0,26 раз превышающий числовой диапазон, чтобы уменьшить объем выполняемой работы примерно в две седьмых раза.
Его код уменьшает размер буфера, используя только представления нечетных чисел, тогда как мой код дополнительно упаковывает бит, чтобы исключить любое представление чисел, делящихся на три и пять, а также чисел, делящихся на два, подразумеваемых «только шансы»;это снижает требования к памяти еще почти вдвое (до 8/15) и помогает лучше использовать кэш-память ЦП для дальнейшего увеличения скорости за счет уменьшения среднего времени доступа к памяти.
Мой код подсчитывает количество простых чисел, используя метод быстрого подсчета всплывающих таблиц Look Up Table (LUT), который почти не требует времени для подсчета по сравнению с примерно одной секундой при использовании побитового метода, который он использует;однако в этом примере кода даже это небольшое время вычитается из временной части кода.
Наконец, мой код оптимизирует операции манипулирования битами для минимального количества кода на внутренний цикл.Например, он не использует непрерывный сдвиг вправо на единицу для создания индекса нечетного представления и фактически вообще немного сдвигает бит, записывая все внутренние циклы как постоянные по модулю (равные позиции бита) операции.Кроме того, переведенный код Аарона весьма неэффективен в операциях, так как, например, при отбраковке простых квадратов он добавляет квадрат простого числа к индексу, а затем проверяет наличие нечетного результата, а не просто добавляет два раза больше квадрата, чтобы не требовать проверки. ;тогда даже проверка становится лишней, сдвигая число вправо на единицу (делая на два) перед выполнением операции отсечения во внутреннем цикле, как это происходит для всех циклов.Этот неэффективный код не будет сильно влиять на время выполнения для больших диапазонов с использованием метода «большого сито-буферного массива», поскольку большая часть времени на операцию используется для доступа к оперативной памяти (около 37 тактовых циклов ЦП или более для диапазон в один миллиард), но время выполнения будет намного медленнее, чем необходимо для меньших диапазонов, которые помещаются в кэш ЦП;другими словами, он устанавливает слишком высокий нижний предел скорости выполнения каждой операции.

Код выглядит следующим образом:

//Sieve of Atkin based on full non page segmented modulo 60 implementation...

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;

namespace NonPagedSoA {
  //implements the non-paged Sieve of Atkin (full modulo 60 version)...
  class SoA : IEnumerable<ulong> {
    private ushort[] buf = null;
    private long cnt = 0;
    private long opcnt = 0;
    private static byte[] modPRMS = { 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59, 61 };
    private static ushort[] modLUT;
    private static byte[] cntLUT;
    //initialize the private LUT's...
    static SoA() {
      modLUT = new ushort[60];
      for (int i = 0, m = 0; i < modLUT.Length; ++i) {
        if ((i & 1) != 0 || (i + 7) % 3 == 0 || (i + 7) % 5 == 0) modLUT[i] = 0;
        else modLUT[i] = (ushort)(1 << (m++));
      }
      cntLUT = new byte[65536];
      for (int i = 0; i < cntLUT.Length; ++i) {
        var c = 0;
        for (int j = i; j > 0; j >>= 1) c += j & 1;
        cntLUT[i] = (byte)c;
      }
    }
    //initialization and all the work producing the prime bit array done in the constructor...
    public SoA(ulong range) {
      this.opcnt = 0;
      if (range < 7) {
        if (range > 1) {
          cnt = 1;
          if (range > 2) this.cnt += (long)(range - 1) / 2;
        }
        this.buf = new ushort[0];
      }
      else {
        this.cnt = 3;
        var nrng = range - 7; var lmtw = nrng / 60;
        //initialize sufficient wheels to non-prime
        this.buf = new ushort[lmtw + 1];

        //Put in candidate primes:
        //for the 4 * x ^ 2 + y ^ 2 quadratic solution toggles - all x odd y...
        ulong n = 6; // equivalent to 13 - 7 = 6...
        for (uint x = 1, y = 3; n <= nrng; n += (x << 3) + 4, ++x, y = 1) {
          var cb = n; if (x <= 1) n -= 8; //cancel the effect of skipping the first one...
          for (uint i = 0; i < 15 && cb <= range; cb += (y << 2) + 4, y += 2, ++i) {
            var cbd = cb / 60; var cm = modLUT[cb % 60];
            if (cm != 0)
              for (uint c = (uint)cbd, my = y + 15; c < buf.Length; c += my, my += 30) {
                buf[c] ^= cm; // ++this.opcnt;
              }
          }
        }
        //for the 3 * x ^ 2 + y ^ 2 quadratic solution toggles - x odd y even...
        n = 0; // equivalent to 7 - 7 = 0...
        for (uint x = 1, y = 2; n <= nrng; n += ((x + x + x) << 2) + 12, x += 2, y = 2) {
          var cb = n;
          for (var i = 0; i < 15 && cb <= range; cb += (y << 2) + 4, y += 2, ++i) {
            var cbd = cb / 60; var cm = modLUT[cb % 60];
            if (cm != 0)
              for (uint c = (uint)cbd, my = y + 15; c < buf.Length; c += my, my += 30) {
                buf[c] ^= cm; // ++this.opcnt;
              }
          }
        }
        //for the 3 * x ^ 2 - y ^ 2 quadratic solution toggles all x and opposite y = x - 1...
        n = 4; // equivalent to 11 - 7 = 4...
        for (uint x = 2, y = x - 1; n <= nrng; n += (ulong)(x << 2) + 4, y = x, ++x) {
          var cb = n; int i = 0;
          for ( ; y > 1 && i < 15 && cb <= nrng; cb += (ulong)(y << 2) - 4, y -= 2, ++i) {
            var cbd = cb / 60; var cm = modLUT[cb % 60];
            if (cm != 0) {
              uint c = (uint)cbd, my = y;
              for ( ; my >= 30 && c < buf.Length; c += my - 15, my -= 30) {
                buf[c] ^= cm; // ++this.opcnt;
              }
              if (my > 0 && c < buf.Length) { buf[c] ^= cm; /* ++this.opcnt; */ }
            }
          }
          if (y == 1 && i < 15) {
            var cbd = cb / 60; var cm = modLUT[cb % 60];
            if ((cm & 0x4822) != 0 && cbd < (ulong)buf.Length) { buf[cbd] ^= cm; /* ++this.opcnt; */ }
          }
        }

        //Eliminate squares of base primes, only for those on the wheel:
        for (uint i = 0, w = 0, pd = 0, pn = 0, msk = 1; w < this.buf.Length ; ++i) {
          uint p = pd + modPRMS[pn];
          ulong sqr = (ulong)p * (ulong)p; //to handle ranges above UInt32.MaxValue
          if (sqr > range) break;
          if ((this.buf[w] & msk) != 0) { //found base prime, square free it...
            ulong s = sqr - 7;
            for (int j = 0; s <= nrng && j < modPRMS.Length; s = sqr * modPRMS[j] - 7, ++j) {
              var cd = s / 60; var cm = (ushort)(modLUT[s % 60] ^ 0xFFFF);
              //may need ulong loop index for ranges larger than two billion
              //but buf length only good to about 2^31 * 60 = 120 million anyway,
              //even with large array setting and half that with 32-bit...
              for (ulong c = cd; c < (ulong)this.buf.Length; c += sqr) {
                this.buf[c] &= cm; // ++this.opcnt;
              }
            }
          }
          if (msk >= 0x8000) { msk = 1; pn = 0; ++w; pd += 60; }
          else { msk <<= 1; ++pn; }
        }

        //clear any overflow primes in the excess space in the last wheel/word:
        var ndx = nrng % 60; //clear any primes beyond the range
        for (; modLUT[ndx] == 0; --ndx) ;
        this.buf[lmtw] &= (ushort)((modLUT[ndx] << 1) - 1);
      }
    }

    //uses a fast pop count Look Up Table to return the total number of primes...
    public long Count {
      get {
        long cnt = this.cnt;
        for (int i = 0; i < this.buf.Length; ++i) cnt += cntLUT[this.buf[i]];
        return cnt;
      }
    }

    //returns the number of toggle/cull operations used to sieve the prime bit array...
    public long Ops {
      get {
        return this.opcnt;
      }
    }

    //generate the enumeration of primes...
    public IEnumerator<ulong> GetEnumerator() {
      yield return 2; yield return 3; yield return 5;
      ulong pd = 0;
      for (uint i = 0, w = 0, pn = 0, msk = 1; w < this.buf.Length; ++i) {
        if ((this.buf[w] & msk) != 0) //found a prime bit...
          yield return pd + modPRMS[pn]; //add it to the list
        if (msk >= 0x8000) { msk = 1; pn = 0; ++w; pd += 60; }
        else { msk <<= 1; ++pn; }
      }
    }

    //required for the above enumeration...
    IEnumerator IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
    }
  }

  class Program {
    static void Main(string[] args) {
      Console.WriteLine("This program calculates primes by a simple full version of the Sieve of Atkin.\r\n");

      const ulong n = 1000000000;

      var elpsd = -DateTime.Now.Ticks;

      var gen = new SoA(n);

      elpsd += DateTime.Now.Ticks;

      Console.WriteLine("{0} primes found to {1} using {2} operations in {3} milliseconds.", gen.Count, n, gen.Ops, elpsd / 10000);

      //Output prime list for testing...
      //Console.WriteLine();
      //foreach (var p in gen) {
      //  Console.Write(p + " ");
      //}
      //Console.WriteLine();

//Test options showing what one can do with the enumeration, although more slowly...
//      Console.WriteLine("\r\nThere are {0} primes with the last one {1} and the sum {2}.",gen.Count(),gen.Last(),gen.Sum(x => (long)x));

      Console.Write("\r\nPress any key to exit:");
      Console.ReadKey(true);
      Console.WriteLine();
    }
  }
}

Этот код работает примерно в два раза быстрее, чем код Аарона (около 2,7 секунды при использовании 64-битного или 32-битного режима на i7-2700K (3,5 ГГц) с буфером около 16,5 мегабайт и около 0,258 миллиарда комбинированных операций переключения/отбраковки простых квадратов. (что можно показать, раскомментировав операторы «++this.opcnt») для диапазона сита в один миллиард, по сравнению с 5,4/6,2 секунды (32-бит/64-бит) для его кода без времени счета и почти вдвое больше использования памяти с использованием около 0,359 миллиарда комбинированных операций переключения/отбраковки для просеивания до одного миллиарда.

Хотя это быстрее, чем его самая оптимизированная наивная реализация невыгружаемого Решета Эратосфена (SoE), использующая только шансы, это не делает Решето Аткина быстрее Решета Эратосфена., Если применить к SoE методы, аналогичные использованным в приведенной выше реализации SoA, плюс использовать максимальную факторизацию колес, SoE будет примерно с такой же скоростью.

Анализ: Хотя количество операций для полностью оптимизированного SoE примерно такое же, как количество операций для SoA для диапазона сита в один миллиард, основным узким местом для этих невыгружаемых реализаций является доступ к памяти, когда размер ситового буфера превышает процессор. размеры кэша (32 килобайта кэша L1 при доступе за один такт, 256 килобайт кэша L2 при времени доступа около четырех тактов и 8 мегабайт кэша L3 при времени доступа около 20 тактов для моего i7), после чего доступ к памяти может превышать сто тактов циклы.

Теперь оба имеют примерно восьмикратное улучшение скорости доступа к памяти, когда алгоритмы адаптируются к сегментации страниц, чтобы можно было фильтровать диапазоны, которые в противном случае не поместились бы в доступную память.Тем не менее, SoE продолжает превосходить SoA, поскольку диапазон сита начинает становиться очень большим из-за трудностей с реализацией части алгоритма «без простых квадратов» из-за огромных успехов в отбраковке сканирований, которые быстро растут во многие сотни раз. размер страничных буферов.Кроме того, что, возможно, более серьезно, требуется очень много памяти и/или вычислительных ресурсов для вычисления новой начальной точки для каждого значения «x» относительно значения «y» в самом низком представлении каждого страничного буфера для дальнейшего довольно долгого времени. большая потеря эффективности страничного SoA по сравнению с SoE по мере увеличения диапазона.

РЕДАКТИРОВАТЬ_ДОБАВИТЬ: SoE только для шансов, используемый Аароном Мургатройдом, использует около 1,026 миллиарда операций отбора для диапазона сита в один миллиард, то есть примерно в четыре раза больше операций, чем SoA, и, следовательно, должен работать примерно в четыре раза медленнее, но SoA даже в том виде, в котором он реализован здесь, имеет более сложный внутренний цикл и особенно из-за гораздо более высокой доли отбракованных SoE только для шансов имеют гораздо более короткий шаг при сканировании отбраковки, чем шаги SoA, наивный SoE только для шансов имеет гораздо лучшее среднее время доступа к памяти, несмотря на решетчатого буфера, значительно превышающего размеры кэша ЦП (лучшее использование ассоциативности кэша).Это объясняет, почему вышеупомянутый SoA лишь примерно в два раза быстрее, чем SoE, основанный только на шансах, хотя теоретически кажется, что он выполняет только четверть работы.

Если бы кто-то использовал аналогичный алгоритм с использованием внутренних циклов с постоянным по модулю, как и в приведенном выше SoA, и реализовал бы ту же факторизацию колеса 2/3/5, SoE сократил бы количество операций отбраковки примерно до 0,405 миллиарда операций, то есть только примерно на 50% больше. операций, чем SoA, и теоретически будет работать немного медленнее, чем SoA, но может работать примерно с той же скоростью, поскольку шаги отбраковки все еще немного меньше, чем для SoA, в среднем для этого «наивного» использования большого буфера памяти.Увеличение факторизации колеса до колеса 2/3/5/7 означает, что операции отсеивания SoE уменьшаются примерно до 0,314 для диапазона отбраковки в один миллиард, и может заставить эту версию SoE работать примерно с той же скоростью для этого алгоритма.

Дальнейшее использование факторизации колеса может быть сделано путем предварительного отбора сито (копирование по шаблону) для простых множителей 2/3/5/7/11/13/17/19 практически без затрат времени выполнения, чтобы уменьшить общее количество операций отбраковки примерно до 0,251 миллиарда для диапазона сита в один миллиард, и SoE будет работать быстрее или примерно с той же скоростью, чем даже эта оптимизированная версия SoA, даже для этих версий с большим буфером памяти, при этом SoE все еще имеет много меньшая сложность кода, чем указано выше.

Таким образом, можно видеть, что количество операций для SoE может быть значительно сокращено по сравнению с простой версией или версией факторизации только для четных шансов или 2/3/5 колесной факторизации, так что количество операций будет примерно таким же, как для SoA, в то время как в то же время время на операцию на самом деле может быть меньше как за счет менее сложных внутренних циклов, так и за счет более эффективного доступа к памяти. END_EDIT_ADD

EDIT_ADD2: Здесь я добавляю код для SoE, используя аналогичный метод постоянного положения по модулю/биту для самых внутренних циклов, как и для SoA выше, в соответствии с псевдокодом. далее ответ, как указано выше.Код немного менее сложен, чем приведенный выше SoA, несмотря на то, что применяется высокая факторизация колес и предварительная отбраковка, так что общее количество операций отбраковки на самом деле меньше, чем комбинированные операции переключения/отбраковки для SoA до диапазона просеивания. около двух миллиардов.Код следующий:

EDIT_FINAL Исправлен код ниже и комментарии, связанные с ним. END_EDIT_FINAL

//Sieve of Eratosthenes based on maximum wheel factorization and pre-culling implementation...

using System;
using System.Collections;
using System.Collections.Generic;

namespace NonPagedSoE {
  //implements the non-paged Sieve of Eratosthenes (full modulo 210 version with preculling)...
  class SoE : IEnumerable<ulong> {
    private ushort[] buf = null;
    private long cnt = 0;
    private long opcnt = 0;
    private static byte[] basePRMS = { 2, 3, 5, 7, 11, 13, 17, 19 };
    private static byte[] modPRMS = { 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, //positions + 23
                                      97, 101, 103, 107, 109, 113, 121, 127, 131, 137, 139, 143, 149, 151, 157, 163,
                                      167, 169, 173, 179, 181 ,187 ,191 ,193, 197, 199, 209, 211, 221, 223, 227, 229 };
    private static byte[] gapsPRMS = { 6, 2, 6, 4, 2, 4, 6, 6, 2, 6, 4, 2, 6, 4, 6, 8,
                                       4, 2, 4, 2, 4, 8, 6, 4, 6, 2, 4, 6, 2, 6, 6, 4,
                                       2, 4, 6, 2, 6, 4, 2, 4, 2, 10, 2, 10, 2, 4, 2, 4 };
    private static ulong[] modLUT;
    private static byte[] cntLUT;
    //initialize the private LUT's...
    static SoE() {
      modLUT = new ulong[210];
      for (int i = 0, m = 0; i < modLUT.Length; ++i) {
        if ((i & 1) != 0 || (i + 23) % 3 == 0 || (i + 23) % 5 == 0 || (i + 23) % 7 == 0) modLUT[i] = 0;
        else modLUT[i] = 1UL << (m++);
      }
      cntLUT = new byte[65536];
      for (int i = 0; i < cntLUT.Length; ++i) {
        var c = 0;
        for (int j = i ^ 0xFFFF; j > 0; j >>= 1) c += j & 1; //reverse logic; 0 is prime; 1 is composite
        cntLUT[i] = (byte)c;
      }
    }
    //initialization and all the work producing the prime bit array done in the constructor...
    public SoE(ulong range) {
      this.opcnt = 0;
      if (range < 23) {
        if (range > 1) {
          for (int i = 0; i < modPRMS.Length; ++i) if (modPRMS[i] <= range) this.cnt++; else break;
        }
        this.buf = new ushort[0];
      }
      else {
        this.cnt = 8;
        var nrng = range - 23; var lmtw = nrng / 210; var lmtwt3 = lmtw * 3; 
        //initialize sufficient wheels to prime
        this.buf = new ushort[lmtwt3 + 3]; //initial state of all zero's is all potential prime.

        //initialize array to account for preculling the primes of 11, 13, 17, and 19;
        //(2, 3, 5, and 7 already eliminated by the bit packing to residues).
        for (int pn = modPRMS.Length - 4; pn < modPRMS.Length; ++pn) {
          uint p = modPRMS[pn] - 210u; ulong pt3 = p * 3;
          ulong s = p * p - 23;
          ulong xrng = Math.Min(9699709, nrng); // only do for the repeating master pattern size
          ulong nwrds = (ulong)Math.Min(138567, this.buf.Length);
          for (int j = 0; s <= xrng && j < modPRMS.Length; s += p * gapsPRMS[(pn + j++) % 48]) {
            var sm = modLUT[s % 210];
            var si = (sm < (1UL << 16)) ? 0UL : ((sm < (1UL << 32)) ? 1UL : 2UL);
            var cd = s / 210 * 3 + si; var cm = (ushort)(sm >> (int)(si << 4));
            for (ulong c = cd; c < nwrds; c += pt3) { //tight culling loop for size of master pattern
              this.buf[c] |= cm; // ++this.opcnt; //reverse logic; mark composites with ones.
            }
          }
        }
        //Now copy the master pattern so it repeats across the main buffer, allow for overflow...
        for (long i = 138567; i < this.buf.Length; i += 138567)
          if (i + 138567 <= this.buf.Length)
            Array.Copy(this.buf, 0, this.buf, i, 138567);
          else Array.Copy(this.buf, 0, this.buf, i, this.buf.Length - i);

        //Eliminate all composites which are factors of base primes, only for those on the wheel:
        for (uint i = 0, w = 0, wi = 0, pd = 0, pn = 0, msk = 1; w < this.buf.Length; ++i) {
          uint p = pd + modPRMS[pn];
          ulong sqr = (ulong)p * (ulong)p;
          if (sqr > range) break;
          if ((this.buf[w] & msk) == 0) { //found base prime, mark its composites...
            ulong s = sqr - 23; ulong pt3 = p * 3;
            for (int j = 0; s <= nrng && j < modPRMS.Length; s += p * gapsPRMS[(pn + j++) % 48]) {
              var sm = modLUT[s % 210];
              var si = (sm < (1UL << 16)) ? 0UL : ((sm < (1UL << 32)) ? 1UL : 2UL);
              var cd = s / 210 * 3 + si; var cm = (ushort)(sm >> (int)(si << 4));
              for (ulong c = cd; c < (ulong)this.buf.Length; c += pt3) { //tight culling loop
                this.buf[c] |= cm; // ++this.opcnt; //reverse logic; mark composites with ones.
              }
            }
          }
          ++pn;
          if (msk >= 0x8000) { msk = 1; ++w; ++wi; if (wi == 3) { wi = 0; pn = 0; pd += 210; } }
          else msk <<= 1;
        }

        //clear any overflow primes in the excess space in the last wheel/word:
        var ndx = nrng % 210; //clear any primes beyond the range
        for (; modLUT[ndx] == 0; --ndx) ;
        var cmsk = (~(modLUT[ndx] - 1)) << 1; //force all bits above to be composite ones.
        this.buf[lmtwt3++] |= (ushort)cmsk;
        this.buf[lmtwt3++] |= (ushort)(cmsk >> 16);
        this.buf[lmtwt3] |= (ushort)(cmsk >> 32);
      }
    }

    //uses a fast pop count Look Up Table to return the total number of primes...
    public long Count {
      get {
        long cnt = this.cnt;
        for (int i = 0; i < this.buf.Length; ++i) cnt += cntLUT[this.buf[i]];
        return cnt;
      }
    }

    //returns the number of cull operations used to sieve the prime bit array...
    public long Ops {
      get {
        return this.opcnt;
      }
    }

    //generate the enumeration of primes...
    public IEnumerator<ulong> GetEnumerator() {
      yield return 2; yield return 3; yield return 5; yield return 7;
      yield return 11; yield return 13; yield return 17; yield return 19;
      ulong pd = 0;
      for (uint i = 0, w = 0, wi = 0, pn = 0, msk = 1; w < this.buf.Length; ++i) {
        if ((this.buf[w] & msk) == 0) //found a prime bit...
          yield return pd + modPRMS[pn];
        ++pn;
        if (msk >= 0x8000) { msk = 1; ++w; ++wi; if (wi == 3) { wi = 0; pn = 0; pd += 210; } }
        else msk <<= 1;
      }
    }

    //required for the above enumeration...
    IEnumerator IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
    }
  }

  class Program {
    static void Main(string[] args) {
      Console.WriteLine("This program calculates primes by a simple maximually wheel factorized version of the Sieve of Eratosthenes.\r\n");

      const ulong n = 1000000000;

      var elpsd = -DateTime.Now.Ticks;

      var gen = new SoE(n);

      elpsd += DateTime.Now.Ticks;

      Console.WriteLine("{0} primes found to {1} using {2} operations in {3} milliseconds.", gen.Count, n, gen.Ops, elpsd / 10000);

//      Console.WriteLine();
//      foreach (var p in gen) {
//        Console.Write(p + " ");
//      }
//      Console.WriteLine();

      //      Console.WriteLine("\r\nThere are {0} primes with the last one {1} and the sum {2}.",gen.Count(),gen.Last(),gen.Sum(x => (long)x));

      Console.Write("\r\nPress any key to exit:");
      Console.ReadKey(true);
      Console.WriteLine();
    }
  }
}

Этот код фактически работает на несколько процентов быстрее, чем приведенный выше SoA, как и должно быть, поскольку операций немного меньше, а основным узким местом для этого большого размера массива в диапазоне в миллиард является время доступа к памяти, составляющее от 40 до более 100 тактовых циклов ЦП. в зависимости от характеристик процессора и памяти;это означает, что оптимизация кода (кроме уменьшения общего количества операций) неэффективна, поскольку большая часть времени тратится на ожидание доступа к памяти.В любом случае, использование огромного буфера памяти — не самый эффективный способ просеивания больших диапазонов: улучшение SoE примерно в восемь раз при использовании сегментации страниц с той же максимальной факторизацией колеса (что также открывает путь для многопроцессорность).

Именно при реализации сегментации страниц и многопроцессорной обработки SoA действительно недостаточен для диапазонов, намного превышающих четыре миллиарда, по сравнению с SoE, поскольку любые преимущества из-за уменьшенной асимптотической сложности SoA быстро съедаются факторами накладных расходов на обработку страниц, связанными с бесплатная обработка простых квадратов и вычисление гораздо большего количества начальных адресов страниц;альтернативно, это можно преодолеть, храня маркеры в оперативной памяти, что приводит к огромным затратам на потребление памяти и дальнейшей неэффективности доступа к этим структурам хранилища маркеров. END_EDIT_ADD2

Короче говоря, SoA на самом деле не является практичным ситом по сравнению с SoE, полностью факторизованным по колесам, поскольку, как только выигрыш в асимптотической сложности начинает приближать его по производительности к полностью оптимизированному SoE, он начинает терять эффективность из-за подробности практической реализации, касающиеся относительного времени доступа к памяти и сложностей сегментации страниц, а также, как правило, более сложных и трудных для написания.На мой взгляд, по сравнению с SoE, это скорее интересная интеллектуальная концепция и умственное упражнение, чем практическое сито.

Когда-нибудь я адаптирую эти методы к многопоточному сегментированному решету Эратосфена, чтобы оно работало на C# примерно так же быстро, как "primegen" реализация SoA Аткина и Бернстайна на C, и выбью его из воды для больших диапазонов. более четырех миллиардов даже в однопоточном режиме, с дополнительным увеличением скорости примерно до четырех при многопоточности на моем i7 (восемь ядер, включая Hyper Threading).

Вот другая реализация.Он использует BitArray для экономии памяти.А Параллельно.Для требуется .NET Framework 4.

static List<int> FindPrimesBySieveOfAtkins(int max)
{
//  var isPrime = new BitArray((int)max+1, false); 
//  Can't use BitArray because of threading issues.
    var isPrime = new bool[max + 1];
    var sqrt = (int)Math.Sqrt(max);

    Parallel.For(1, sqrt, x =>
    {
        var xx = x * x;
        for (int y = 1; y <= sqrt; y++)
        {
            var yy = y * y;
            var n = 4 * xx + yy;
            if (n <= max && (n % 12 == 1 || n % 12 == 5))
                isPrime[n] ^= true;

            n = 3 * xx + yy;
            if (n <= max && n % 12 == 7)
                isPrime[n] ^= true;

            n = 3 * xx - yy;
            if (x > y && n <= max && n % 12 == 11)
                isPrime[n] ^= true;
        }
    });

    var primes = new List<int>() { 2, 3 };
    for (int n = 5; n <= sqrt; n++)
    {
        if (isPrime[n])
        {
            primes.Add(n);
            int nn = n * n;
            for (int k = nn; k <= max; k += nn)
                isPrime[k] = false;
        }
    }

    for (int n = sqrt + 1; n <= max; n++)
        if (isPrime[n])
            primes.Add(n);

    return primes;
}

Вот более быстрая реализация решета Аткина, я украл алгоритм из этого скрипта Python здесь (я не беру на себя ответственность за этот алгоритм):

http://programmingpraxis.com/2010/02/19/sieve-of-atkin-improved/

using System;
using System.Collections;
using System.Collections.Generic;

namespace PrimeGenerator
{
    // The block element type for the bit array, 
    // use any unsigned value. WARNING: UInt64 is 
    // slower even on x64 architectures.
    using BitArrayType = System.UInt32;

    // This should never be any bigger than 256 bits - leave as is.
    using BitsPerBlockType = System.Byte;

    // The prime data type, this can be any unsigned value, the limit
    // of this type determines the limit of Prime value that can be
    // found. WARNING: UInt64 is slower even on x64 architectures.
    using PrimeType = System.Int32;

    /// <summary>
    /// Calculates prime number using the Sieve of Eratosthenes method.
    /// </summary>
    /// <example>
    /// <code>
    ///     var lpPrimes = new Eratosthenes(1e7);
    ///     foreach (UInt32 luiPrime in lpPrimes)
    ///         Console.WriteLine(luiPrime);
    /// </example>
    public class Atkin : IEnumerable<PrimeType>
    {
        #region Constants

        /// <summary>
        /// Constant for number of bits per block, calculated based on size of BitArrayType.
        /// </summary>
        const BitsPerBlockType cbBitsPerBlock = sizeof(BitArrayType) * 8;

        #endregion

        #region Protected Locals

        /// <summary>
        /// The limit for the maximum prime value to find.
        /// </summary>
        protected readonly PrimeType mpLimit;

        /// <summary>
        /// The number of primes calculated or null if not calculated yet.
        /// </summary>
        protected PrimeType? mpCount = null;

        /// <summary>
        /// The current bit array where a set bit means
        /// the odd value at that location has been determined
        /// to not be prime.
        /// </summary>
        protected BitArrayType[] mbaOddPrime;

        #endregion

        #region Initialisation

        /// <summary>
        /// Create Sieve of Atkin generator.
        /// </summary>
        /// <param name="limit">The limit for the maximum prime value to find.</param>
        public Atkin(PrimeType limit)
        {
            // Check limit range
            if (limit > PrimeType.MaxValue - (PrimeType)Math.Sqrt(PrimeType.MaxValue))
                throw new ArgumentOutOfRangeException();

            mpLimit = limit;

            FindPrimes();
        }

        #endregion

        #region Private Methods

        /// <summary>
        /// Finds the prime number within range.
        /// </summary>
        private unsafe void FindPrimes()
        {
            // Allocate bit array.
            mbaOddPrime = new BitArrayType[(((mpLimit >> 1) + 1) / cbBitsPerBlock) + 1];

            PrimeType lpYLimit, lpN, lpXX3, lpXX4, lpDXX, lpDN, lpDXX4, lpXX, lpX, lpYY, lpMinY, lpS, lpK;

            fixed (BitArrayType* lpbOddPrime = &mbaOddPrime[0])
            {
                // n = 3x^2 + y^2 section
                lpXX3 = 3;
                for (lpDXX = 0; lpDXX < 12 * SQRT((mpLimit - 1) / 3); lpDXX += 24)
                {
                    lpXX3 += lpDXX;
                    lpYLimit = (12 * SQRT(mpLimit - lpXX3)) - 36;
                    lpN = lpXX3 + 16;

                    for (lpDN = -12; lpDN < lpYLimit + 1; lpDN += 72)
                    {
                        lpN += lpDN;
                        lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= 
                            (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock));
                    }

                    lpN = lpXX3 + 4;
                    for (lpDN = 12; lpDN < lpYLimit + 1; lpDN += 72)
                    {
                        lpN += lpDN;
                        lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= 
                            (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock));
                    }
                }

                //    # n = 4x^2 + y^2 section
                lpXX4 = 0;
                for (lpDXX4 = 4; lpDXX4 < 8 * SQRT((mpLimit - 1) / 4) + 4; lpDXX4 += 8)
                {
                    lpXX4 += lpDXX4;
                    lpN = lpXX4 + 1;

                    if ((lpXX4 % 3) != 0)
                    {
                        for (lpDN = 0; lpDN < (4 * SQRT(mpLimit - lpXX4)) - 3; lpDN += 8)
                        {
                            lpN += lpDN;
                            lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= 
                                (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock));
                        }
                    }
                    else
                    {
                        lpYLimit = (12 * SQRT(mpLimit - lpXX4)) - 36;
                        lpN = lpXX4 + 25;

                        for (lpDN = -24; lpDN < lpYLimit + 1; lpDN += 72)
                        {
                            lpN += lpDN;
                            lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= 
                                (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock));
                        }

                        lpN = lpXX4 + 1;
                        for (lpDN = 24; lpDN < lpYLimit + 1; lpDN += 72)
                        {
                            lpN += lpDN;
                            lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= 
                                (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock));
                        }
                    }
                }

                //    # n = 3x^2 - y^2 section
                lpXX = 1;
                for (lpX = 3; lpX < SQRT(mpLimit / 2) + 1; lpX += 2)
                {
                    lpXX += 4 * lpX - 4;
                    lpN = 3 * lpXX;

                    if (lpN > mpLimit)
                    {
                        lpMinY = ((SQRT(lpN - mpLimit) >> 2) << 2);
                        lpYY = lpMinY * lpMinY;
                        lpN -= lpYY;
                        lpS = 4 * lpMinY + 4;
                    }
                    else
                        lpS = 4;

                    for (lpDN = lpS; lpDN < 4 * lpX; lpDN += 8)
                    {
                        lpN -= lpDN;
                        if (lpN <= mpLimit && lpN % 12 == 11)
                            lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= 
                                (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock));
                    }
                }

                // xx = 0
                lpXX = 0;
                for (lpX = 2; lpX < SQRT(mpLimit / 2) + 1; lpX += 2)
                {
                    lpXX += 4*lpX - 4;
                    lpN = 3*lpXX;

                    if (lpN > mpLimit)
                    {
                        lpMinY = ((SQRT(lpN - mpLimit) >> 2) << 2) - 1;
                        lpYY = lpMinY * lpMinY;
                        lpN -= lpYY;
                        lpS = 4*lpMinY + 4;
                    }
                    else
                    {
                        lpN -= 1;
                        lpS = 0;
                    }

                    for (lpDN = lpS; lpDN < 4 * lpX; lpDN += 8)
                    {
                        lpN -= lpDN;
                        if (lpN <= mpLimit && lpN % 12 == 11)
                            lpbOddPrime[(lpN>>1) / cbBitsPerBlock] ^= 
                                (BitArrayType)((BitArrayType)1 << (int)((lpN>>1) % cbBitsPerBlock));
                    }
                }

                // # eliminate squares
                for (lpN = 5; lpN < SQRT(mpLimit) + 1; lpN += 2)
                    if ((lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] & ((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock))) != 0)
                        for (lpK = lpN * lpN; lpK < mpLimit; lpK += lpN * lpN)
                            if ((lpK & 1) == 1)
                                lpbOddPrime[(lpK >> 1) / cbBitsPerBlock] &=
                                    (BitArrayType)~((BitArrayType)1 << (int)((lpK >> 1) % cbBitsPerBlock));
            }
        }

        /// <summary>
        /// Calculates the truncated square root for a number.
        /// </summary>
        /// <param name="value">The value to get the square root for.</param>
        /// <returns>The truncated sqrt of the value.</returns>
        private unsafe PrimeType SQRT(PrimeType value)
        {
            return (PrimeType)Math.Sqrt(value);
        }

        /// <summary>
        /// Gets a bit value by index.
        /// </summary>
        /// <param name="bits">The blocks containing the bits.</param>
        /// <param name="index">The index of the bit.</param>
        /// <returns>True if bit is set, false if cleared.</returns>
        private bool GetBitSafe(BitArrayType[] bits, PrimeType index)
        {
            if ((index & 1) == 1)
                return (bits[(index >> 1) / cbBitsPerBlock] & ((BitArrayType)1 << (int)((index >> 1) % cbBitsPerBlock))) != 0;
            else
                return false;
        }

        #endregion

        #region Public Properties

        /// <summary>
        /// Get the limit for the maximum prime value to find.
        /// </summary>
        public PrimeType Limit
        {
            get
            {
                return mpLimit;
            }
        }

        /// <summary>
        /// Returns the number of primes found in the range.
        /// </summary>
        public PrimeType Count
        {
            get
            {
                if (!mpCount.HasValue)
                {
                    PrimeType lpCount = 0;
                    foreach (PrimeType liPrime in this) lpCount++;
                    mpCount = lpCount;
                }

                return mpCount.Value;
            }
        }

        /// <summary>
        /// Determines if a value in range is prime or not.
        /// </summary>
        /// <param name="test">The value to test for primality.</param>
        /// <returns>True if the value is prime, false otherwise.</returns>
        public bool this[PrimeType test]
        {
            get
            {
                if (test > mpLimit) throw new ArgumentOutOfRangeException();
                if (test <= 1) return false;
                if (test == 2) return true;
                if ((test & 1) == 0) return false;
                return !GetBitSafe(mbaOddPrime, test >> 1);
            }
        }

        #endregion

        #region Public Methods

        /// <summary>
        /// Gets the enumerator for the primes.
        /// </summary>
        /// <returns>The enumerator of the primes.</returns>
        public IEnumerator<PrimeType> GetEnumerator()
        {
            //    return [2,3] + filter(primes.__getitem__, xrange(5,limit,2))

            // Two & Three always prime.
            yield return 2;
            yield return 3;

            // Start at first block, third MSB (5).
            int liBlock = 0;
            byte lbBit = 2;
            BitArrayType lbaCurrent = mbaOddPrime[0] >> lbBit;

            // For each value in range stepping in incrments of two for odd values.
            for (PrimeType lpN = 5; lpN <= mpLimit; lpN += 2)
            {
                // If current bit not set then value is prime.
                if ((lbaCurrent & 1) == 1)
                    yield return lpN;

                // Move to NSB.
                lbaCurrent >>= 1;

                // Increment bit value. 
                lbBit++;

                // If block is finished.
                if (lbBit == cbBitsPerBlock) 
                {
                    lbBit = 0;
                    lbaCurrent = mbaOddPrime[++liBlock];

                    //// Move to first bit of next block skipping full blocks.
                    while (lbaCurrent == 0)
                    {
                        lpN += ((PrimeType)cbBitsPerBlock) << 1;
                        if (lpN <= mpLimit)
                            lbaCurrent = mbaOddPrime[++liBlock];
                        else
                            break;
                    }
                }
            }
        }

        #endregion

        #region IEnumerable<PrimeType> Implementation

        /// <summary>
        /// Gets the enumerator for the primes.
        /// </summary>
        /// <returns></returns>
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        #endregion
    }
}

По скорости он близок к моей самой оптимизированной версии Решета Эратосфена, но все же медленнее примерно на 20%, его можно найти здесь:

https://stackoverflow.com/a/9700790/738380

Вот мой, он использует класс CompartmentalizedParallel, который позволяет вам выполнять параллельные циклы for, но контролировать количество потоков, чтобы индексы были сгруппированы.Однако из-за проблем с потоками вам необходимо либо блокировать BitArray каждый раз, когда он изменяется, либо создавать отдельный BitArray для каждого потока, а затем выполнять XOR их вместе в конце. Первый вариант был довольно медленным из-за количества блокировок, второй вариант мне показался более быстрым!

using System;
using System.Collections;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace PrimeGenerator
{
    public class Atkin : Primes
    {
        protected BitArray mbaPrimes;
        protected bool mbThreaded = true;

        public Atkin(int limit)
            : this(limit, true)
        {
        }

        public Atkin(int limit, bool threaded)
            : base(limit)
        {
            mbThreaded = threaded;
            if (mbaPrimes == null) FindPrimes();
        }

        public bool Threaded
        {
            get
            {
                return mbThreaded;
            }
        }

        public override IEnumerator<int> GetEnumerator()
        {
            yield return 2;
            yield return 3;
            for (int lsN = 5; lsN <= msLimit; lsN += 2)
                if (mbaPrimes[lsN]) yield return lsN;
        }

        private void FindPrimes()
        {
            mbaPrimes = new BitArray(msLimit + 1, false);

            int lsSQRT = (int)Math.Sqrt(msLimit);

            int[] lsSquares = new int[lsSQRT + 1];
            for (int lsN = 0; lsN <= lsSQRT; lsN++)
                lsSquares[lsN] = lsN * lsN;

            if (Threaded)
            {
                CompartmentalisedParallel.For<BitArray>(
                    1, lsSQRT + 1, new ParallelOptions(),
                    (start, finish) => { return new BitArray(msLimit + 1, false); },
                    (lsX, lsState, lbaLocal) =>
                    {
                        int lsX2 = lsSquares[lsX];

                        for (int lsY = 1; lsY <= lsSQRT; lsY++)
                        {
                            int lsY2 = lsSquares[lsY];

                            int lsN = 4 * lsX2 + lsY2;
                            if (lsN <= msLimit && (lsN % 12 == 1 || lsN % 12 == 5))
                                lbaLocal[lsN] ^= true;

                            lsN -= lsX2;
                            if (lsN <= msLimit && lsN % 12 == 7)
                                lbaLocal[lsN] ^= true;

                            if (lsX > lsY)
                            {
                                lsN -= lsY2 * 2;
                                if (lsN <= msLimit && lsN % 12 == 11)
                                    lbaLocal[lsN] ^= true;
                            }
                        }

                        return lbaLocal;
                    },
                    (lbaResult, start, finish) =>
                    {
                        lock (mbaPrimes) 
                            mbaPrimes.Xor(lbaResult);
                    },
                    -1
                );
            }
            else
            {
                for (int lsX = 1; lsX <= lsSQRT; lsX++)
                {
                    int lsX2 = lsSquares[lsX];

                    for (int lsY = 1; lsY <= lsSQRT; lsY++)
                    {
                        int lsY2 = lsSquares[lsY];

                        int lsN = 4 * lsX2 + lsY2;
                        if (lsN <= msLimit && (lsN % 12 == 1 || lsN % 12 == 5))
                            mbaPrimes[lsN] ^= true;

                        lsN -= lsX2;
                        if (lsN <= msLimit && lsN % 12 == 7)
                            mbaPrimes[lsN] ^= true;

                        if (lsX > lsY)
                        {
                            lsN -= lsY2 * 2;
                            if (lsN <= msLimit && lsN % 12 == 11)
                                mbaPrimes[lsN] ^= true;
                        }
                    }
                }
            }

            for (int lsN = 5; lsN < lsSQRT; lsN += 2)
                if (mbaPrimes[lsN])
                {
                    var lsS = lsSquares[lsN];
                    for (int lsK = lsS; lsK <= msLimit; lsK += lsS)
                        mbaPrimes[lsK] = false;
                }
        }
    }
}

И класс CompartmentalizedParallel:

using System;
using System.Threading.Tasks;

namespace PrimeGenerator
{
    public static class CompartmentalisedParallel
    {
        #region Int

        private static int[] CalculateCompartments(int startInclusive, int endExclusive, ref int threads)
        {
            if (threads == 0) threads = 1;
            if (threads == -1) threads = Environment.ProcessorCount;
            if (threads > endExclusive - startInclusive) threads = endExclusive - startInclusive;

            int[] liThreadIndexes = new int[threads + 1];
            liThreadIndexes[threads] = endExclusive - 1;
            int liIndexesPerThread = (endExclusive - startInclusive) / threads;
            for (int liCount = 0; liCount < threads; liCount++)
                liThreadIndexes[liCount] = liCount * liIndexesPerThread;

            return liThreadIndexes;
        }

        public static void For<TLocal>(
            int startInclusive, int endExclusive,
            ParallelOptions parallelOptions,
            Func<int, int, TLocal> localInit,
            Func<int, ParallelLoopState, TLocal, TLocal> body,
            Action<TLocal, int, int> localFinally,
            int threads)
        {
            int[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads);

            if (threads > 1)
                Parallel.For(
                    0, threads, parallelOptions,
                    (liThread, lsState) =>
                    {
                        TLocal llLocal = localInit(liThreadIndexes[liThread], liThreadIndexes[liThread + 1]);

                        for (int liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++)
                            body(liCounter, lsState, llLocal);

                        localFinally(llLocal, liThreadIndexes[liThread], liThreadIndexes[liThread + 1]);
                    }
                );
            else
            {
                TLocal llLocal = localInit(startInclusive, endExclusive);
                for (int liCounter = startInclusive; liCounter < endExclusive; liCounter++)
                    body(liCounter, null, llLocal);
                localFinally(llLocal, startInclusive, endExclusive);
            }
        }

        public static void For(
            int startInclusive, int endExclusive,
            ParallelOptions parallelOptions,
            Action<int, ParallelLoopState> body,
            int threads)
        {
            int[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads);

            if (threads > 1)
                Parallel.For(
                    0, threads, parallelOptions,
                    (liThread, lsState) =>
                    {
                        for (int liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++)
                            body(liCounter, lsState);
                    }
                );
            else
                for (int liCounter = startInclusive; liCounter < endExclusive; liCounter++)
                    body(liCounter, null);
        }

        public static void For(
            int startInclusive, int endExclusive,
            ParallelOptions parallelOptions,
            Action<int> body,
            int threads)
        {
            int[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads);

            if (threads > 1)
                Parallel.For(
                    0, threads, parallelOptions,
                    (liThread) =>
                    {
                        for (int liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++)
                            body(liCounter);
                    }
                );
            else
                for (int liCounter = startInclusive; liCounter < endExclusive; liCounter++)
                    body(liCounter);
        }

        public static void For(
            int startInclusive, int endExclusive,
            Action<int, ParallelLoopState> body,
            int threads)
        {
            For(startInclusive, endExclusive, new ParallelOptions(), body, threads);
        }

        public static void For(
            int startInclusive, int endExclusive,
            Action<int> body,
            int threads)
        {
            For(startInclusive, endExclusive, new ParallelOptions(), body, threads);
        }

        public static void For<TLocal>(
            int startInclusive, int endExclusive,
            Func<int, int, TLocal> localInit,
            Func<int, ParallelLoopState, TLocal, TLocal> body,
            Action<TLocal, int, int> localFinally,
            int threads)
        {
            For<TLocal>(startInclusive, endExclusive, new ParallelOptions(), localInit, body, localFinally, threads);
        }

        #endregion

        #region Long

        private static long[] CalculateCompartments(long startInclusive, long endExclusive, ref long threads)
        {
            if (threads == 0) threads = 1;
            if (threads == -1) threads = Environment.ProcessorCount;
            if (threads > endExclusive - startInclusive) threads = endExclusive - startInclusive;

            long[] liThreadIndexes = new long[threads + 1];
            liThreadIndexes[threads] = endExclusive - 1;
            long liIndexesPerThread = (endExclusive - startInclusive) / threads;
            for (long liCount = 0; liCount < threads; liCount++)
                liThreadIndexes[liCount] = liCount * liIndexesPerThread;

            return liThreadIndexes;
        }

        public static void For<TLocal>(
            long startInclusive, long endExclusive,
            ParallelOptions parallelOptions,
            Func<long, long, TLocal> localInit,
            Func<long, ParallelLoopState, TLocal, TLocal> body,
            Action<TLocal, long, long> localFinally,
            long threads)
        {
            long[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads);

            if (threads > 1)
                Parallel.For(
                    0, threads, parallelOptions,
                    (liThread, lsState) =>
                    {
                        TLocal llLocal = localInit(liThreadIndexes[liThread], liThreadIndexes[liThread + 1]);

                        for (long liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++)
                            body(liCounter, lsState, llLocal);

                        localFinally(llLocal, liThreadIndexes[liThread], liThreadIndexes[liThread + 1]);
                    }
                );
            else
            {
                TLocal llLocal = localInit(startInclusive, endExclusive);
                for (long liCounter = startInclusive; liCounter < endExclusive; liCounter++)
                    body(liCounter, null, llLocal);
                localFinally(llLocal, startInclusive, endExclusive);
            }
        }

        public static void For(
            long startInclusive, long endExclusive,
            ParallelOptions parallelOptions,
            Action<long, ParallelLoopState> body,
            long threads)
        {
            long[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads);

            if (threads > 1)
                Parallel.For(
                    0, threads, parallelOptions,
                    (liThread, lsState) =>
                    {
                        for (long liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++)
                            body(liCounter, lsState);
                    }
                );
            else
                for (long liCounter = startInclusive; liCounter < endExclusive; liCounter++)
                    body(liCounter, null);
        }

        public static void For(
            long startInclusive, long endExclusive,
            ParallelOptions parallelOptions,
            Action<long> body,
            long threads)
        {
            long[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads);

            if (threads > 1)
                Parallel.For(
                    0, threads, parallelOptions,
                    (liThread) =>
                    {
                        for (long liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++)
                            body(liCounter);
                    }
                );
            else
                for (long liCounter = startInclusive; liCounter < endExclusive; liCounter++)
                    body(liCounter);
        }

        public static void For(
            long startInclusive, long endExclusive,
            Action<long, ParallelLoopState> body,
            long threads)
        {
            For(startInclusive, endExclusive, new ParallelOptions(), body, threads);
        }

        public static void For(
            long startInclusive, long endExclusive,
            Action<long> body,
            long threads)
        {
            For(startInclusive, endExclusive, new ParallelOptions(), body, threads);
        }

        public static void For<TLocal>(
            long startInclusive, long endExclusive,
            Func<long, long, TLocal> localInit,
            Func<long, ParallelLoopState, TLocal, TLocal> body,
            Action<TLocal, long, long> localFinally,
            long threads)
        {
            For<TLocal>(startInclusive, endExclusive, new ParallelOptions(), localInit, body, localFinally, threads);
        }

        #endregion
    }
}

Базовый класс простых чисел:

using System.Collections;
using System.Collections.Generic;

namespace PrimeGenerator
{
    public abstract class Primes : IEnumerable<int>
    {
        protected readonly int msLimit;

        public Primes(int limit)
        {
            msLimit = limit;
        }

        public int Limit
        {
            get
            {
                return msLimit;
            }
        }

        public int Count
        {
            get
            {
                int liCount = 0;
                foreach (int liPrime in this)
                    liCount++;
                return liCount;
            }
        }

        public abstract IEnumerator<int> GetEnumerator();

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}

Используйте его, выполнив следующие действия:

    var lpPrimes = new Atkin(count, true);
    Console.WriteLine(lpPrimes.Count);
    Console.WriteLine(s.ElapsedMilliseconds);

Однако я обнаружил, что Эратосфен быстрее во всех случаях, даже если четырехъядерный процессор Аткина работает в многопоточном режиме:

using System;
using System.Collections;
using System.Collections.Generic;

namespace PrimeGenerator
{
    public class Eratosthenes : Primes
    {
        protected BitArray mbaOddEliminated;

        public Eratosthenes(int limit)
            : base(limit)
        {
            if (mbaOddEliminated == null) FindPrimes();
        }

        public override IEnumerator<int> GetEnumerator()
        {
            yield return 2;
            for (int lsN = 3; lsN <= msLimit; lsN+=2)
                if (!mbaOddEliminated[lsN>>1]) yield return lsN;
        }

        private void FindPrimes()
        {
            mbaOddEliminated = new BitArray((msLimit>>1) + 1);
            int lsSQRT = (int)Math.Sqrt(msLimit);
            for (int lsN = 3; lsN < lsSQRT + 1; lsN += 2)
                if (!mbaOddEliminated[lsN>>1])
                    for (int lsM = lsN*lsN; lsM <= msLimit; lsM += lsN<<1)
                        mbaOddEliminated[lsM>>1] = true;
        }
    }
}

Если вам удастся заставить Аткина работать быстрее, дайте мне знать!

Вот улучшение Решета Эратосфена с использованием специальных FixBitArrays и небезопасного кода для получения результатов по скорости, это примерно на 225% быстрее, чем мой предыдущий алгоритм Эратосфена, и класс является автономным (он не многопоточный - Эратосфен не совместим с многопоточностью), На процессоре AMD Phenom II X4 965 я могу вычислить простые числа до предела 1 000 000 000 за 9 261 мс:

using System;
using System.Collections;
using System.Collections.Generic;

namespace PrimeGenerator
{
    // The block element type for the bit array, 
    // use any unsigned value. WARNING: UInt64 is 
    // slower even on x64 architectures.
    using BitArrayType = System.UInt32;

    // This should never be any bigger than 256 bits - leave as is.
    using BitsPerBlockType = System.Byte;

    // The prime data type, this can be any unsigned value, the limit
    // of this type determines the limit of Prime value that can be
    // found. WARNING: UInt64 is slower even on x64 architectures.
    using PrimeType = System.UInt32;

    /// <summary>
    /// Calculates prime number using the Sieve of Eratosthenes method.
    /// </summary>
    /// <example>
    /// <code>
    ///     var lpPrimes = new Eratosthenes(1e7);
    ///     foreach (UInt32 luiPrime in lpPrimes)
    ///         Console.WriteLine(luiPrime);
    /// </example>
    public class Eratosthenes : IEnumerable<PrimeType>
    {
        #region Constants

        /// <summary>
        /// Constant for number of bits per block, calculated based on size of BitArrayType.
        /// </summary>
        const BitsPerBlockType cbBitsPerBlock = sizeof(BitArrayType) * 8;

        #endregion

        #region Protected Locals

        /// <summary>
        /// The limit for the maximum prime value to find.
        /// </summary>
        protected readonly PrimeType mpLimit;

        /// <summary>
        /// The current bit array where a set bit means
        /// the odd value at that location has been determined
        /// to not be prime.
        /// </summary>
        protected BitArrayType[] mbaOddNotPrime;

        #endregion

        #region Initialisation

        /// <summary>
        /// Create Sieve of Eratosthenes generator.
        /// </summary>
        /// <param name="limit">The limit for the maximum prime value to find.</param>
        public Eratosthenes(PrimeType limit)
        {
            // Check limit range
            if (limit > PrimeType.MaxValue - (PrimeType)Math.Sqrt(PrimeType.MaxValue))
                throw new ArgumentOutOfRangeException();

            mpLimit = limit;

            FindPrimes();
        }

        #endregion

        #region Private Methods

        /// <summary>
        /// Finds the prime number within range.
        /// </summary>
        private unsafe void FindPrimes()
        {
            // Allocate bit array.
            mbaOddNotPrime = new BitArrayType[(((mpLimit >> 1) + 1) / cbBitsPerBlock) + 1];

            // Cache Sqrt of limit.
            PrimeType lpSQRT = (PrimeType)Math.Sqrt(mpLimit);

            // Fix the bit array for pointer access
            fixed (BitArrayType* lpbOddNotPrime = &mbaOddNotPrime[0])
                // Scan primes up to lpSQRT
                for (PrimeType lpN = 3; lpN <= lpSQRT; lpN += 2)
                    // If the current bit value for index lpN is cleared (prime)
                    if (
                            (
                                lpbOddNotPrime[(lpN >> 1) / cbBitsPerBlock] & 
                                ((BitArrayType)1 << (BitsPerBlockType)((lpN >> 1) % cbBitsPerBlock))
                            ) == 0
                        )
                        // Leave it cleared (prime) and mark all multiples of lpN*2 from lpN*lpN as not prime
                        for (PrimeType lpM = lpN * lpN; lpM <= mpLimit; lpM += lpN << 1)
                            // Set as not prime
                            lpbOddNotPrime[(lpM >> 1) / cbBitsPerBlock] |= 
                                (BitArrayType)((BitArrayType)1 << (BitsPerBlockType)((lpM >> 1) % cbBitsPerBlock));
        }

        /// <summary>
        /// Gets a bit value by index.
        /// </summary>
        /// <param name="bits">The blocks containing the bits.</param>
        /// <param name="index">The index of the bit.</param>
        /// <returns>True if bit is set, false if cleared.</returns>
        private bool GetBitSafe(BitArrayType[] bits, PrimeType index)
        {
            return (bits[index / cbBitsPerBlock] & ((BitArrayType)1 << (BitsPerBlockType)(index % cbBitsPerBlock))) != 0;
        }

        #endregion

        #region Public Properties

        /// <summary>
        /// Get the limit for the maximum prime value to find.
        /// </summary>
        public PrimeType Limit
        {
            get
            {
                return mpLimit;
            }
        }

        /// <summary>
        /// Returns the number of primes found in the range.
        /// </summary>
        public PrimeType Count
        {
            get
            {
                PrimeType lptCount = 0;
                foreach (PrimeType liPrime in this)
                    lptCount++;
                return lptCount;
            }
        }

        /// <summary>
        /// Determines if a value in range is prime or not.
        /// </summary>
        /// <param name="test">The value to test for primality.</param>
        /// <returns>True if the value is prime, false otherwise.</returns>
        public bool this[PrimeType test]
        {
            get
            {
                if (test > mpLimit) throw new ArgumentOutOfRangeException();
                if (test <= 1) return false;
                if (test == 2) return true;
                if ((test & 1) == 0) return false;
                return !GetBitSafe(mbaOddNotPrime, test >> 1);
            }
        }

        #endregion

        #region Public Methods

        /// <summary>
        /// Gets the enumerator for the primes.
        /// </summary>
        /// <returns>The enumerator of the primes.</returns>
        public IEnumerator<PrimeType> GetEnumerator()
        {
            // Two always prime.
            yield return 2;

            // Start at first block, second MSB.
            int liBlock = 0;
            byte lbBit = 1;
            BitArrayType lbaCurrent = mbaOddNotPrime[0] >> 1;

            // For each value in range stepping in incrments of two for odd values.
            for (PrimeType lpN = 3; lpN <= mpLimit; lpN += 2)
            {
                // If current bit not set then value is prime.
                if ((lbaCurrent & 1) == 0)
                    yield return lpN;

                // Move to NSB.
                lbaCurrent >>= 1;

                // Increment bit value.
                lbBit++;

                // If block is finished.
                if (lbBit == cbBitsPerBlock) 
                {
                    // Move to first bit of next block.
                    lbBit = 0;
                    liBlock++;
                    lbaCurrent = mbaOddNotPrime[liBlock];
                }
            }
        }

        #endregion

        #region IEnumerable<PrimeType> Implementation

        /// <summary>
        /// Gets the enumerator for the primes.
        /// </summary>
        /// <returns>The enumerator for the prime numbers.</returns>
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        #endregion
    }
}

Простые числа найдены в числе 1 000 000 000:50 847 534 за 9 261 мс

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow