perl拆分大文件到excel表

背景介绍

在风资源分析中,经常需要导出测风塔和风机点位不同扇区下的风速风向、湍流度等风资源信息,而这些信息导出后糅合在同一个大的文件下不好分析,为此需要进行拆分.

数据文件也有一些规律和关键信息,看下面的原始数据片段(通过windsim计算导出的wind vertical profile,)

适合任意风场!


WECS number        :          1
WECS name          : T12
Manufacturer       : GW121.1
Type               : gw
Nominal effect     :       2500
Hub height         :       90.0
x (local)          :     5452.0
y (local)          :     4103.5
x (global)         : 20380130.0
y (global)         :  4475007.5



sector:     0

  k       z-coord        UCRT         VCRT         WCRT      Speed_2D       Inflow        Shear    Shear_low   Shear_high           KE           TI        alpha           EP
 (-)        (m)          (m/s)        (m/s)        (m/s)        (m/s)        (deg)        (1/s)        (1/s)        (1/s)    (m*2/s*2)          (%)          (-)    (m*2/s*3)
   1        4.546        0.076       -3.410       -0.093        3.411       -1.557        0.316        0.750        0.099        0.325       19.289        0.212       0.0165
   2       13.636        0.095       -4.306       -0.158        4.307       -2.107        0.072        0.099        0.045        0.364       16.171        0.195       0.0069

   ....

WECS number        :         23
WECS name          : T18
Manufacturer       : GW121.1
Type               : gw
Nominal effect     :       2500
Hub height         :       90.0
x (local)          :     4920.0
y (local)          :     5961.5
x (global)         : 20379598.0
y (global)         :  4476865.5



sector:     0

  k       z-coord        UCRT         VCRT         WCRT      Speed_2D       Inflow        Shear    Shear_low   Shear_high           KE           TI        alpha           EP
 (-)        (m)          (m/s)        (m/s)        (m/s)        (m/s)        (deg)        (1/s)        (1/s)        (1/s)    (m*2/s*2)          (%)          (-)    (m*2/s*3)
   1        4.545       -0.043       -3.503       -0.156        3.503       -2.558        0.304        0.771        0.071        0.343       19.305        0.154       0.0180


sector:   337

  k       z-coord        UCRT         VCRT         WCRT      Speed_2D       Inflow        Shear    Shear_low   Shear_high           KE           TI        alpha           EP
 (-)        (m)          (m/s)        (m/s)        (m/s)        (m/s)        (deg)        (1/s)        (1/s)        (1/s)    (m*2/s*2)          (%)          (-)    (m*2/s*3)
   1        4.545        1.902       -3.359       -0.050        3.860       -0.744        0.325        0.849        0.064        0.412       19.205        0.127       0.0235
   2       13.636        1.984       -3.970       -0.124        4.438       -1.595        0.045        0.064        0.025        0.418       16.813        0.113       0.0076
   3       22.727        1.986       -4.226       -0.166        4.670       -2.035        0.022        0.025        0.018        0.450       16.586        0.101       0.0046

原始数据分析

通过分析发现,数据具有以下特点

  1. 存在WECS name ,代表新的机位点
  2. 每个机位点 有很多的扇区,sector开头,并用冒号划分
  3. 为了提取坐标信息,增加了global字段,分别提取x, y信息,可用于保存文件名
  4. 存在WECS name 表示停止打印标记
  5. Sector之下的一行到数据信息截止【下一个扇区前或者WECS name前】需要保存到一个文件内

技术需求梳理

  1. 每个文件起名标准,“风机名-扇区名.csv”
  2. 什么时候修改风机名? 遇到WECS name的时候; 但因为还需要提取x,y信息放入headname所以增加一个skip的标记号
    1. skip==1 表示在sector数据范围内,开始打印到文件名内【此时用于提取核心数据】
    2. skip==2 表示在新风机位置处[此时用于提取x, y坐标信息]
    3. skip==3 表示停止打印,遇到WECS number【此时用于截断数据打印】
  3. 得到扇区名字后,开始进行文件读取命令

技术实现

为此设计了需求字段

my  $specialSectorTag="sector";
my  $specialHeadTag="WECS name";
#my  $specialHeadTag="Clim name";
## Sometimes, there are chinese ()
my  $specialHeadTag1="global";
my  $specialPrintTag="z-coord";
my  $specialStopPrintTag="WECS number";
#my  $specialStopPrintTag="Climatology number";
my  $skip=0;

其中SectorTag表示得到扇区信息,开始生成文件名,创建文件指针, SpecialHeadTag可设置多个,当前只需要提取头部两个信息,分别为WECS name风机名字和全局坐标点,所以设置了HeadTag 和HeadTag1,理论上无数多个

PrintTag用于打印核心数据; StopPrintTag用于停止打印核心数据

所有源代码

#!/usr/bin/env perl
#===============================================================================
#
#         FILE: forSplitWindSim.pl
#
#        USAGE: ./forSplitWindSim.pl
#
#  DESCRIPTION:
#
#      OPTIONS: ---
# REQUIREMENTS: ---
#         BUGS: ---
#        NOTES: ---
#       AUTHOR: Ye Zhaoliang (Ye Zhaoliang), yezhaoliang@ncepu.edu.cn
# ORGANIZATION:
#      VERSION: 1.0
#      CREATED: 2019-10-15 14:31:13
#     REVISION: ---
#===============================================================================

use strict;
use warnings;
use utf8;

my  $specialSectorTag="sector";
my  $specialHeadTag="WECS name";
#my  $specialHeadTag="Clim name";
## Sometimes, there are chinese ()
my  $specialHeadTag1="global";
my  $specialPrintTag="z-coord";
my  $specialStopPrintTag="WECS number";
#my  $specialStopPrintTag="Climatology number";
my  $skip=0;

#my	$INPUTDATA_file_name = 'D:/vertical_profile_wecs_cosMount.dat';		# input file name
my	$INPUTDATA_file_name = 'C:\Users\yezhaoliang\Desktop\work\AIWind\processWindSim\HuaiLaiTurbineZhuFengxiang\vertical_profile_wecs.dat';		# in

open  my $INPUTDATA, '<', $INPUTDATA_file_name
    or die  "$0 : failed to open  input file '$INPUTDATA_file_name' : $!\n";

my  $filename="";
my  $headName="";
my  $sectorName="";
my $INPUTDATAsector;
my	$INPUTDATAsector_file_name = "";		# input file name

while ( <$INPUTDATA> ) {

    ## Jump out empty line
    next if ($_=~/^\s*$/);
    chomp();
    my  $line=trim($_);
    #my  @columns=split(/\s+|:/,$line);
    # Every files have different patterns, you need to feel it! Modify it!
    my  @columns=split(/:/,$line);
    print $#columns,"\n";
    #print "$.: $_\n" if($_=~/kpoint/);

    ##  Test commands for specialTag1  when you are using newest word
    ##   You need to find out the most comfortable pattern  for you
    #print "$.: Sector $columns[0]  $columns[1] \n" if($line=~/$specialSectorTag/);
    #print "$.: Name     $columns[0]  $columns[1] \n" if($line=~/$specialHeadTag/);
    #print "$.: X     $columns[0]  $columns[1] \n" if($line=~/$specialHeadTag1/);


    my	$INPUTDATAsector_file_name = '';		# input file name


    ## 根据skip标记号控制打印,逐行执行,
    ## tag the start print position with $specialPrintTag
    ## tag the end print position with $specialStopPrintTag
    if ( $line =~/$specialHeadTag|$specialHeadTag1/ && $skip==3) {

        $headName = trim($columns[1]);  ## update the headName
        $skip = 2;
    }elsif(( $line =~/$specialHeadTag|$specialHeadTag1/ && $skip==2)){

        $headName = join("-",trim($headName),trim($columns[1])); ## Apeend the headName
    }

    if ( $line =~/$specialSectorTag/ ) {

        $sectorName = trim($columns[1]);
        $INPUTDATAsector_file_name = "D://$headName-$sectorName.csv";		# input file name
        open  $INPUTDATAsector, '>', $INPUTDATAsector_file_name
            or die  "$0 : failed to open  input file '$INPUTDATAsector_file_name' : $!\n";

        #        $skip=1;
    }

    if ( $line =~ /$specialPrintTag/) {
        $skip=1;
    }

    ### New Position for different sectors
    if ( $line =~ /$specialStopPrintTag/) {
        $skip=3;

    }

    if ( $skip ==1 ) {
        $line =~ s/\s+/,/g;  ### colon separate delimiter
        print $INPUTDATAsector $line,"\n";

    }


}
close  $INPUTDATA
    or warn "$0 : failed to close input file '$INPUTDATA_file_name' : $!\n";

sub ltrim { my $s = shift; $s =~ s/^\s+//;       return $s };
sub rtrim { my $s = shift; $s =~ s/\s+$//;       return $s };
sub  trim { my $s = shift; $s =~ s/^\s+|\s+$//g; return $s };
Related
叶昭良
叶昭良
Engineer of offshore wind turbine technique research

My research interests include distributed energy, wind turbine power generation technique , Computational fluid dynamic and programmable matter.