Opened 17 months ago

#44 new enhancement

Slightly more efficient perl regex globbing pattern in privacy action file

Reported by: Steven Smith <s.t.smith@…> Owned by:
Priority: major Component:
Version: 1.4.2 Keywords:
Cc: Project: adblock2privoxy

Description

This is a (modest) efficiency request.

Adblock2privoxy's EasyList? pattern globbing to regex conversion is currently done using the translation

* -> .*

There is a better way to convert globbing wildcards to regex that uses the (*PRUNE) function and avoids greedy operators like .*. See http://blogs.perl.org/users/mauke/2017/05/converting-glob-patterns-to-efficient-regexes-in-perl-and-javascript.html

Here is the converter for EasyList? rules:

#!/usr/bin/env perl
use strict;
use warnings;

sub easylist_glob2re {
    my ($pat) = @_;
    $pat =~ s{(\W)}{
        $1 eq '*' ? '(*PRUNE).*?' :
        $1 eq '^' ? '[^\w%.-]' :
        '\\' . $1
    }eg;
    # return qr/\A$pat\z/s;
    return qr/\A$pat/s;
}

# ('a' x 100) =~ easylist_glob2re(('a*' x 70) . 'b*a');
# print(easylist_glob2re(('a*' x 70) . 'b*a'));

my $rule = '/*?ad=*^banner=';
print(easylist_glob2re($rule), "\n");
'/whatevs?ad=buy&banner=look' =~ easylist_glob2re($rule);

Example:

*?ad=*^banner= -> \A\/(*PRUNE).*?\?ad\=(*PRUNE).*?[^\w%.-]banner\=

I haven't timed this for real-world privoxy examples, but in my experience it's best practice and often helpful to use efficient NFA regex's whenever possible.

Change History (0)

Note: See TracTickets for help on using tickets.