如何扩展二进制搜索迭代器以消耗多个目标
-
20-09-2019 - |
题
我有一个功能, binary_range_search
, ,这就是这样称为:
my $brs_iterator = binary_range_search(
target => $range, # eg. [1, 200]
search => $ranges # eg. [ {start => 1, end => 1000},
); # {start => 500, end => 1500} ]
brs_iterator->()
将在所有$范围重叠的所有 @$范围内迭代。
我想扩展 binary_range_search
能够以多个范围为目标,例如:
target => $target_ranges # eg. [ [1, 200], [50, 300], ... ]
search => $search_ranges # as above
因此,当$ range-> [0]的搜索耗尽时,它应该继续前进到$ ranga-> [1],依此类推。这是所讨论的功能,其原始形式:
sub binary_range_search {
my %options = @_;
my $range = $options{target} || return;
my $ranges = $options{search} || return;
my ( $low, $high ) = ( 0, @{$ranges} - 1 );
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down + 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
return $brs_iterator;
}
return sub { };
}
这是一种标准的二进制搜索策略,直到找到重叠范围为止。然后,它在右侧移动,耗尽它,在左侧移动,耗尽它,最后放弃。理想情况下,应该应该 shift
我想的下一个目标范围,并重做搜索(也许是通过递归?)。我的问题是,我不确定如何通过迭代器结构使该工作。
解决方案
我只是将您的迭代器生成包装为for循环,并构建了一系列迭代函数。
根据上下文,我要么返回主迭代器或迭代函数列表。我不确定你想要什么。
use strict;
use warnings;
my $t = [ [1,200], [400,900] ];
my @r = (
{ start => 1, end => 100 },
{ start => 2, end => 500 },
{ start => 204, end => 500 },
{ start => 208, end => 500 },
{ start => 215, end => 1000 },
{ start => 150, end => 1000 },
{ start => 500, end => 1100 },
);
# Get a master iterator that will process each iterator in turn.
my $brs_iterator = binary_range_search(
targets => $t,
search => \@r,
);
# Get an array of iterators
my @brs_iterator = binary_range_search(
targets => $t,
search => \@r,
);
sub binary_range_search {
my %options = @_;
my $targets = $options{targets} || return;
my $ranges = $options{search} || return;
my @iterators;
TARGET:
for my $target ( @$targets ) {
my ( $low, $high ) = ( 0, $#{$ranges} );
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
# Remove non-overlapping ranges
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $target->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $target->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( exists $ranges->[$up + 1]
and $ranges->[ $up + 1 ]{end} >= $target->[0]
and $ranges->[ $up + 1 ]{start} <= $target->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $target->[0]
and $ranges->[ $down + 1 ]{start} <= $target->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push @iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? @iterators
: sub {
while( @iterators ) {
if( my $range = $iterators[0]() ) {
return $range;
}
shift @iterators;
}
return;
};
}
其他提示
如果您想迭代与搜索范围重叠的所有值,则不需要二进制搜索。
首先是习惯前提:
use warnings;
use strict;
use Carp;
首先,检查我们是否有 target
和 search
参数和每个范围,起点不超过其终点。否则,我们拒绝继续进行。
sub binary_range_search {
my %arg = @_;
my @errors;
my $target = $arg{target} || push @errors => "no target";
my $search = $arg{search} || push @errors => "no search";
for (@$target) {
my($start,$end) = @$_;
push @errors => "Target start ($start) is greater than end ($end)"
if $start > $end;
}
for (@$search) {
my($start,$end) = @{$_}{qw/ start end /};
push @errors => "Search start ($start) is greater than end ($end)"
if $start > $end;
}
croak "Invalid use of binary_range_search:\n",
map " - $_\n", @errors
if @errors;
迭代器本身是维持以下状态的关闭:
my $i;
my($ta,$tb);
my($sa,$sb);
my $si = 0;
在哪里
$i
如果定义是当前重叠范围的下一个值$ta
和$tb
是当前目标范围的起点和终点$sa
和$sb
就像以上一样,但对于当前的搜索范围$si
是索引@$search
并定义当前搜索范围
我们将分配和返回迭代器 $it
. 。声明和初始化是分开的,因此迭代器可以在必要时调用自身。
my $it;
$it = sub {
如果没有更多的目标范围,或者没有搜索范围,我们就完成了。
return unless @$target && @$search;
什么时候 $i
已定义,这意味着我们通过增加发现了重叠和迭代 $i
直到它大于当前目标范围或当前搜索范围的终点。
if (defined $i) {
# iterating within a target range
if ($i > $tb || $i > $sb) {
++$si;
undef $i;
return $it->();
}
else {
return $i++;
}
}
否则,我们需要确定下一个目标范围是否重叠任何搜索范围。但是,如果 $i
是未定义的,我们已经考虑了所有搜索范围,我们丢弃了当前的目标范围并重新开始。
else {
# does the next target range overlap?
if ($si >= @$search) {
shift @$target;
$si = 0;
return $it->();
}
在这里,我们拔出了当前目标范围的起点和终点(始终在 @$target
)和当前搜索范围(由 $si
).
($ta,$tb) = @{ $target->[0] };
($sa,$sb) = @{ $search->[$si] }{qw/ start end /};
现在对重叠的测试很简单。对于截然不同的搜索范围,我们忽略并继续前进。否则,我们发现重叠中的最左点,并从那里迭代。
if ($sb < $ta || $sa > $tb) {
# disjoint
++$si;
undef $i;
return $it->();
}
elsif ($sa >= $ta) {
$i = $sa;
return $i++;
}
elsif ($ta >= $sa) {
$i = $ta;
return $i++;
}
}
};
最后,我们返回迭代器:
$it;
}
与您的问题类似的示例
my $it = binary_range_search(
target => [ [1, 200], [50, 300] ],
search => [ { start => 1, end => 1000 },
{ start => 500, end => 1500 },
{ start => 40, end => 60 },
{ start => 250, end => 260 } ],
);
while (defined(my $value = $it->())) {
print "got $value\n";
}
其内部点的输出是
got 1 [...] got 200 got 40 [...] got 60 got 50 [...] got 300 got 50 [...] got 60 got 250 [...] got 260
将其分为两个函数,即在范围上循环并调用实现常规二进制切碎的内部函数的外部函数。
警告:一个非常C ++的偏见答案:
您要做的是定义一种新型的迭代器,即一对通常的迭代器和一个segmemt iTerrator(如果您没有段迭代器,则是一对const指针 / ref to segments,to segments the sevents,to和指向正确段的索引)。您必须定义随机访问迭代器的所有概念(差异,添加整数等)。请记住,至少在C ++ Lingo中,这不是一个真正的随机迭代器,因为添加整数并不是真正恒定的时间。这就是生活。