perl diamond

我的perl直觉

diamond “<>” is a very important operator in perl, which can input the file,such as “<STDIN>”,"<STDOUT>","<…>" (源自intermediate perl)

“$",蛇杖,象征着权管,在perl所有操作最后都可以归结到reference, 包括subroutine reference, file reference, object reference, hash reference, array reference(源自intermediate perl)

“%",禅杖,一种hash散列字典,一种数据结构(源自唐玄奘)

“[]”, 立方体,cube, “@“array对象的具体内容(源自黑客帝国)

“{}”, 万花筒,,”%“禅杖对象的具体内容

perl万事万物模式的匹配均从”/ /“双斜杠开始(第一,首先),python一般简化为字符串或者regex字符串(源自pattern胡晖)

所有语言的正则

正则在任何语言均有实现,他是一个内语言,具有自己的一套语法,perl中从双斜杠开始(源自Mr Wang)。

在双斜杠”//“内部包含五类对象:

  1. 字符内容,比如 “\d"等效于[0-9] ,"\w” 等效于 [a-zA-Z0-9], “\s”, “\D"等效于[^0-9],"\W"等效于[^a-zA-Z0-9],"§”

  2. 元字符,具备特殊意义, “\”,”.”,”*”,"?","{3,}","{3,12}"((?#注意:1和2可以组合))

    为了获得真正的元字符需要转义,"\\","\.","\*","\?","\{","\(","\)"

  3. 位置符,"^","$","(?<=…)", “(?<!…)”, “\b”,"(?=…)", “(?!…)”

  4. 分组和反引用, “(?<=<(\w+)>).*(?=<\/\1>)”,

开始于左零宽断言(或者叫"^"),结束语右零宽断言(或者叫"\("),经典应用囊括着1、2、3、4所有内容,隐含这所有正则表达式,最终归结于0!或者perl的"\)"(具备)

  1. 分支,"?(group)yes|no" 如果存在分组group,则继续执行yes,否则no “\w+|\d+” 分支

  2. “(?<groupname>\w+)\b\s+\k<groupname>\b” 其中groupname指定情况下,需要使用"\k" 把捕获的内容命名为groupname中,并压入堆栈(以为着还可以弹出来),(?<-groupname>)弹出groupname捕获内容,如果groupname不存在则匹配失败(出错)

  3. “(?‘groupName’\w+)\b\s+\k’groupname’\b” 其中groupname指定情况下,需要使用"\k"

  <                   #最外层的左括号
    [^<>]*            #它后面非括号的内容
    (
        (
          (?'Open'<)  #左括号,压入"Open"
          [^<>]*      #左括号后面的内容
        )+
        (
          (?'-Open'>) #右括号,弹出一个"Open"
          [^<>]*      #右括号后面的内容
        )+
    )*
    (?(Open)(?!))     #最外层的右括号前检查
                      #若还有未弹出的"Open"
                      #则匹配失败

  >                #最外层的右括号

#平衡组的一个最常见的应用就是匹配HTML,下面这个例子可以匹配嵌套的<div>标签:<div[^>]*>[^<>]*(((?'Open'<div[^>]*>)[^<>]*)+((?'-Open'</div>)[^<>]*)+)*(?(Open)(?!))</div>.

例子

#!/usr/bin/env perl
#===============================================================================
#
#         FILE: testRegex.pl
#
#        USAGE: ./testRegex.pl
#
#  DESCRIPTION:
#
#      OPTIONS: ---
# REQUIREMENTS: ---
#         BUGS: ---
#        NOTES: ---
#       AUTHOR: Ye Zhaoliang (Ye Zhaoliang), zl_ye@qny.chng.com.cn
# ORGANIZATION:
#      VERSION: 1.0
#      CREATED: 2022-02-19 0:36:50
#     REVISION: ---
#===============================================================================

use strict;
use warnings;
use utf8;
no warnings qw(experimental::vlb);

## PerlSupport is very good vim pluasn
## \ra 可以设置命令行参数
## \rs 执行语法检查
## \ry 格式化程序
## \rr执行程序
## \rt save buffer to file with timestamp
## \rk显示配置信息
#
#my  $arg=shift @_;
print "command line argument is $arg \n";
my $he = "abc df jje";
print "Hello regex\n"           if $he =~ m/\b\d{2}\b/xm;
print "Hello regex \\w works\n" if $he =~ m/\b\w{2}\b/xm;

my $html = "<head>nothing</head>";

#print "label is " if $html=~m/(?<=<(\w{0,20})>).{0,20}(?=<\/\1)/;
print "label is $1    content is $2\n" if $html =~ m/
                                                (?<=<(\w{0,20})>)
                                                (.{0,20})
                                                (?=<\/\1>)
                                                /x;

print "labe2 is $1  content is $2\n" if $html =~ m/
                                                (?<=<(?<groupname>\w{0,20})>)
                                                (.{0,20})
                                                (?=<\/\k<groupname>>)
                                                /x;

my $line1 =
    "Score = 161 bits (409), Expect = 1e-43, Method: Compositional matrix adjust.";
my $line2 =
    "Identities = 141/471 (29%), Positives = 227/471 (48%), Gaps =41/471 (8%)";

if ( $line1 =~
     m/Score\s*=\s*([\d\.\,]+)\s*(?:bits|Bits)\s*\(\d+\)\,\s*Expect\s*=\s*(\d+e-\d+)\.*/xm
    )
{
    print "$1 \t $2\n";
}

if ( $line2 =~
     m/Identities\s*=\s*([\d\,\.]+)(?#141)\/([\d\,\.]+)\s*\((\S+)\%\).*/xm )
{
    print "$1 \t $2 \t $3\n";
}
Related
叶昭良
叶昭良
Engineer of offshore wind turbine technique research

My research interests include distributed energy, wind turbine power generation technique , Computational fluid dynamic and programmable matter.