批量修改pdf文件名以及创建wiki Links

bash、awk、sed等的好处,就是可以专一的完成你的需求,但是也存在很多不足,借着批量重命令和批量导出链接到vimwiki的wiki中,形成[[local:文件名路径]]的过程,巩固学习linux命令, it is homework(learning process)。

提取路径,用于vimwiki中,当作快速链接

如果IFS是”“,那么相当于一个文件名特别长会分成很多行显示,所以这边设置为 IFS=$‘\n’

[a-z]star替换原先的star,目的是去除点号。 使用echo和管道命令传递信息给sed或者awk等 在sed中似乎用^$等位置字符进行替换,如果是文件夹则进行名字替换,并且遍历当前文件夹 使用双重for循环进行控制

IFS=$'\n';
count=1;
countDir=1;
specialCharacter='pages';
generateChapter() # @Description : 对不同文件进行不同处理
                  # @usage       : generatechapter
{
    # 妙用find 得到当前目录的相对路径 不需要不断的进入目录
    for var2 in `find . -name "[a-z]*"`
    do
        if [[ -d  $var2 ]] # < cannot . Error
        then
            #echo "fuck"
            var=`echo $var2|sed 's/^./F:\/ScienceBase.Attachments\/WindEnergy/g'|sed 's/^/[[local:/g'|sed 's/$/]]/g'`
            printf "= $countDir. [ ] $var =\n" # 使用#号来删除之前的点号
            countDir=$(($countDir+1));

            for tempVar in `find $var2 -name "*.pdf"`
            do
                temp1=`echo $tempVar|sed 's/^./F:\/ScienceBase.Attachments\/WindEnergy/g'` 
    #            # echo ${var2} ${var2:0:$((${var2}-18))}.pdf  
                 varr=`echo $temp1|sed 's/^/[[local:/g'|sed 's/$/]]/g'`;
                #var=`echo $var2|sed 's/^./F:\/ScienceBase.Attachments\/WindEnergy/g'|sed 's/^/[[local:/g'|sed 's/$/]]/g'` 
                printf "\t$count. [ ] ${varr}\n" # 这边需要去除到第一个点号,这是才得到的处理方法
                count=$(($count+1));

            done
            count=1;
        fi
        

    done
}

generateChapter

删除不必要的名字特殊字符,重命名

  1. 删除文件pdf名字不必要的(pages 110—30)等信息。
  2. 使用awk printf产生逗号分隔字符串,使用xargs -d, mv提取以逗号分隔的字段, 并且对文件名进行重命名(在我找的多种方法中,就他有效)
  3. xargs -n 2 表示按照空格划分的方式 提取两个参数,逐个进行。
#!/bin/bash -
#===============================================================================
#
#          FILE: b.sh
#
#         USAGE: ./b.sh
#
#   DESCRIPTION: 
#
#       OPTIONS: ---
#  REQUIREMENTS: ---
#          BUGS: ---
#         NOTES: ---
#        AUTHOR: Ye Zhao Liang (Vimer), zhaoturkkey@163.com
#  ORGANIZATION: BrokenSun
#       CREATED: 2017/7/4 23:01:31
#      REVISION:  ---
#===============================================================================

IFS=$'\n';
count=1;
countDir=1;
specialCharacter='pages';
generateChapter() # @Description : 对不同文件进行不同处理
                  # @usage       : generatechapter
{
    # 妙用find 得到当前目录的相对路径 不需要不断的进入目录
    #for var2 in `find . -name "*"`
    for var2 in `find . -name "windEnergy201*"`
    do
        if [[ -d  $var2 ]] # < cannot . Error
        then
                cd $var2;
                for var in `find . -name "*"`;do echo $var|awk '/pages/{printf("%s,%s",$0,substr($0,0,length($0)-22)".pdf")|"xargs -d, mv ";}';done 
                cd ..;
        fi

    done
}

generateChapter

注意可以使用 ,学到技巧1中的检测工具,查看你的修改是否完全正确,如果出现文件名中有逗号的情况,通常pages没有删掉,原因是xargs也是按照,号进行分割,所以改进方法是使用分号输出

改进代码

for var in `find . -name "*"`;do echo $var|awk '/pages/{printf("%s;%s",$0,substr($0,0,length($0)-22)".pdf")|"xargs -d; mv ";}';done 

最终结果

= 1. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system]] =
  1. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch1.pdf]]
  2. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch10.pdf]]
  3. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch11.pdf]]
  4. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch12.pdf]]
  5. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch13.pdf]]
  6. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch14.pdf]]
  7. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch15.pdf]]
  8. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch16.pdf]]
  9. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch2.pdf]]
  10. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch3.pdf]]
  11. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch4.pdf]]
  12. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch5.pdf]]
  13. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch6.pdf]]
  14. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch7.pdf]]
  15. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch8.pdf]]
  16. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/ch9.pdf]]
  17. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/fmatter.pdf]]
  18. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/index.pdf]]
  19. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/harmonic power system/scard.pdf]]
= 2. [ ] [[local:F:/ScienceBase.Attachments/WindEnergy/Offshore Wind Energy Generation Control, Protection, and Integration to Electrical Systems/offshoreWindEnergy]] =

#学到的技巧

awk两种表示判断,if判断得用分号 如果不用分号隔开会报错

注意分号!!

awk '{if ($1==1) print "A"; else if ($1==2) print "B"; else print "C"}'

对应的bash使用的是if,then,else,fi的形式,且不用分号间隔语句


 for var in `find . -name "*"`
    do
        if [[ -d  $var ]] # < cannot . Error
        then
            printf "$var\n" # 使用#号来删除之前的点号
        else
            printf "\t${var}\n" # 这边需要去除到第一个点号,这是才得到的处理方法
        fi

    done

awk的’/page/{}‘等效于’if($0~/dfd/){}’ 下面的命令,也是一种检查上述程序正确与否的一种工具,可以看出哪些pdf文件依然有pages的字段

YeZhao@DESKTOP-YeZhao /cygdrive/f/ScienceBase.Attachments/WindEnergy
$ find . -name "*"|awk '{if($0~/pages/){print $0}}'
./windEnergy2009-i6/Characterizing future large, rapid changes in aggregated wind power using Numerical Weather Prediction spatial fields (pages 542–555).pdf
./windEnergy2012-i1/Modeling wake effects in large wind farms in complex terrain the problem, the methods and the issues (pages 161–182).pdf
./windEnergy2012-i2/The Betz–Joukowsky limit on the contribution to rotor aerodynamics by the British, German and Russian scientific schools (pages 335–344).pdf
./windEnergy2012-i3/Computational fluid dynamics simulation of the aerodynamics of a high solidity, small-scale vertical axis wind turbine (pages 349–361).pdf
./windEnergy2012-i3/Correction factors for NRG #40 anemometers potentially affected by dry friction whip characterization, analysis, and validation (pages 489–502).pdf
./windEnergy2012-i4/Analysis of wake measurements from the ECN Wind Turbine Test Site Wieringermeer, EWTW (pages 575–591).pdf
./windEnergy2012-i5/Atmospheric stability and turbulence fluxes at Horns Rev—an intercomparison of sonic, bulk and WRF model data (pages 717–731).pdf
./windEnergy2013-11/Modeling, simulation and control of a wind turbine with a hydraulic transmission system (pages 1259–1276).pdf
./windEnergy2013-8/Indicial lift response function an empirical relation for finite-thickness airfoils, and effects on aeroelastic simulations (pages 681–693).pdf
./windEnergy2013-8/Simulating the dynamics of wind turbine blades part I, model development and verification (pages 694–710).pdf
./windEnergy2013-8/Simulating the dynamics of wind turbine blades part鈥塈I, model validation and uncertainty quantification (pages 741–758).pdf
..l

#2. awk的 BEGIN

function name()
{}

BEGIN{
}
{
    
}
END{

}

3. awk gsub

echo "a b c 2011-11-22 a:d" | awk 'gsub(/-/,"",$4)'

#4. awk变量定义

BEGIN中 定义1 , awk -v单行 定义变量

awk内置变量 , 包括FS,OFS,NR,NFR,NF,$0,$1,$2,ARGC,ARGV1等。

#5.awk定义函数

awk的函数定义 是在BEGIN{},{},END{}之外的,和他们平级的关系

#!/usr/bin/awk -f
#===============================================================================
#
#          File:  func.awk
# 
#   Description:  awk -f func.awk file
#           file内容为400
# 
#   VIM Version:  7.0+
#        Author:  Ye Zhao Liang (Vimer), zhaoturkkey@163.com
#  Organization:  BrokenSun
#       Version:  1.0
#       Created:  2017/7/5 16:06:33
#      Revision:  ---
#       License:  Copyright (c) 2017, Ye Zhao Liang
#===============================================================================
# 
function b()
{
print "b.in.$1="$1;
}
{
v=100; y=200
print "a.in.v="v;
print "a.in.y="y;

a(y);
b();
print "a.out.v="v;
print "a.out.y="y;
}


function a(y)
{
print "(a)v="v;
v=v+$1+y;
y=300;
}

6.bash四种变量截取

${var#.*} 从左到右,满足#之后条件的最小长度
${var##.*} 从左到右,满足##之后条件的最大长度
${var%.*} 从右到左,满足%之后条件的最小长度
${var%%.*} 从右到左,满足%%之后条件的最小长度

在awk中可以使用substr($1,0,length($1)–..)实现类似的功能。

#7.bash中的包含关系

包含: 即一个大的部分包含小的部分(member) 等价: 即两个东西等价(equal) 比较:一般是两个数,另外也可以是字符串。

bash几种 包含关系用法

strA="helloworld"
strB="low"
if [[ $strA =~ $strB ]]
then
    echo "包含"
else
    echo "不包含"
fi

#8.awk去除左右空格

第5个知识点阐述了函数的定义方式,现在来运用一下, awk 去除左右空格 ,再一次使用中发现所有的文件名后缀中多了一个空格,于是尝试消掉空格,想着用awk实现。

function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
BEGIN{
        FS=","
}

{
        $0 = rtrim($0);
        if($2!="-" && $3=="-")
                a[$4]++;
        {
        if($4!="-")
                b[$4]++;
        else
                b[$5]++;
        }
}

END{
        print "   client    incr_num_day";
        for(i in a) printf("%10s   %d\n",i,a[i])
        print "\n\n   client    all_num";                                                                                                                                                     
        for(j in b) printf("%10s   %d\n",j,b[j]);
}

#9.awk调用系统命令

MEthod

ready:

touch c.txt touch d.txt

II. a.txt:

c.txt d.txt

III. code:

awk '{cmd="rm "$0;system(cmd)}' a.txt   

10.awk重定向和管道

有时候直接可以在awk使用管道,提供给shell,比如print|Sort,


awk '{print $1, $2 | "sort" }'

#11.windows下的cygwin使用脚本

必须得使用

dos2unix.exe *脚本名字
dos2unix.exe a.sh
dos2unix.exe func.awk


这样执行shell才有效。

12.awk性能比shell更高

参考 链接


性能比较

[chengmo@localhost nginx]# time (awk 'BEGIN{ total=0;for(i=0;i<=10000;i++){total+=i;}print total;}')
50005000

real    0m0.003s
user    0m0.003s
sys     0m0.000s
[chengmo@localhost nginx]# time(total=0;for i in $(seq 10000);do total=$(($total+i));done;echo $total;)
50005000

real    0m0.141s
user    0m0.125s
sys     0m0.008s 

结论:在awk中执行算术运算,比在bash中执行更好一些。

Related
叶昭良
叶昭良
Engineer of offshore wind turbine technique research

My research interests include distributed energy, wind turbine power generation technique , Computational fluid dynamic and programmable matter.