perl拆分大文件到excel表
背景介绍
在风资源分析中,经常需要导出测风塔和风机点位不同扇区下的风速风向、湍流度等风资源信息,而这些信息导出后糅合在同一个大的文件下不好分析,为此需要进行拆分.
数据文件也有一些规律和关键信息,看下面的原始数据片段(通过windsim计算导出的wind vertical profile,)
适合任意风场!
WECS number : 1
WECS name : T12
Manufacturer : GW121.1
Type : gw
Nominal effect : 2500
Hub height : 90.0
x (local) : 5452.0
y (local) : 4103.5
x (global) : 20380130.0
y (global) : 4475007.5
sector: 0
k z-coord UCRT VCRT WCRT Speed_2D Inflow Shear Shear_low Shear_high KE TI alpha EP
(-) (m) (m/s) (m/s) (m/s) (m/s) (deg) (1/s) (1/s) (1/s) (m*2/s*2) (%) (-) (m*2/s*3)
1 4.546 0.076 -3.410 -0.093 3.411 -1.557 0.316 0.750 0.099 0.325 19.289 0.212 0.0165
2 13.636 0.095 -4.306 -0.158 4.307 -2.107 0.072 0.099 0.045 0.364 16.171 0.195 0.0069
....
WECS number : 23
WECS name : T18
Manufacturer : GW121.1
Type : gw
Nominal effect : 2500
Hub height : 90.0
x (local) : 4920.0
y (local) : 5961.5
x (global) : 20379598.0
y (global) : 4476865.5
sector: 0
k z-coord UCRT VCRT WCRT Speed_2D Inflow Shear Shear_low Shear_high KE TI alpha EP
(-) (m) (m/s) (m/s) (m/s) (m/s) (deg) (1/s) (1/s) (1/s) (m*2/s*2) (%) (-) (m*2/s*3)
1 4.545 -0.043 -3.503 -0.156 3.503 -2.558 0.304 0.771 0.071 0.343 19.305 0.154 0.0180
sector: 337
k z-coord UCRT VCRT WCRT Speed_2D Inflow Shear Shear_low Shear_high KE TI alpha EP
(-) (m) (m/s) (m/s) (m/s) (m/s) (deg) (1/s) (1/s) (1/s) (m*2/s*2) (%) (-) (m*2/s*3)
1 4.545 1.902 -3.359 -0.050 3.860 -0.744 0.325 0.849 0.064 0.412 19.205 0.127 0.0235
2 13.636 1.984 -3.970 -0.124 4.438 -1.595 0.045 0.064 0.025 0.418 16.813 0.113 0.0076
3 22.727 1.986 -4.226 -0.166 4.670 -2.035 0.022 0.025 0.018 0.450 16.586 0.101 0.0046
原始数据分析
通过分析发现,数据具有以下特点
- 存在WECS name ,代表新的机位点
- 每个机位点 有很多的扇区,sector开头,并用冒号划分
- 为了提取坐标信息,增加了global字段,分别提取x, y信息,可用于保存文件名
- 存在WECS name 表示停止打印标记
- Sector之下的一行到数据信息截止【下一个扇区前或者WECS name前】需要保存到一个文件内
技术需求梳理
- 每个文件起名标准,“风机名-扇区名.csv”
- 什么时候修改风机名? 遇到WECS name的时候; 但因为还需要提取x,y信息放入headname所以增加一个skip的标记号
- skip==1 表示在sector数据范围内,开始打印到文件名内【此时用于提取核心数据】
- skip==2 表示在新风机位置处[此时用于提取x, y坐标信息]
- skip==3 表示停止打印,遇到WECS number【此时用于截断数据打印】
- 得到扇区名字后,开始进行文件读取命令
技术实现
为此设计了需求字段
my $specialSectorTag="sector";
my $specialHeadTag="WECS name";
#my $specialHeadTag="Clim name";
## Sometimes, there are chinese ()
my $specialHeadTag1="global";
my $specialPrintTag="z-coord";
my $specialStopPrintTag="WECS number";
#my $specialStopPrintTag="Climatology number";
my $skip=0;
其中SectorTag表示得到扇区信息,开始生成文件名,创建文件指针, SpecialHeadTag可设置多个,当前只需要提取头部两个信息,分别为WECS name风机名字和全局坐标点,所以设置了HeadTag 和HeadTag1,理论上无数多个
PrintTag用于打印核心数据; StopPrintTag用于停止打印核心数据
所有源代码
#!/usr/bin/env perl
#===============================================================================
#
# FILE: forSplitWindSim.pl
#
# USAGE: ./forSplitWindSim.pl
#
# DESCRIPTION:
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Ye Zhaoliang (Ye Zhaoliang), yezhaoliang@ncepu.edu.cn
# ORGANIZATION:
# VERSION: 1.0
# CREATED: 2019-10-15 14:31:13
# REVISION: ---
#===============================================================================
use strict;
use warnings;
use utf8;
my $specialSectorTag="sector";
my $specialHeadTag="WECS name";
#my $specialHeadTag="Clim name";
## Sometimes, there are chinese ()
my $specialHeadTag1="global";
my $specialPrintTag="z-coord";
my $specialStopPrintTag="WECS number";
#my $specialStopPrintTag="Climatology number";
my $skip=0;
#my $INPUTDATA_file_name = 'D:/vertical_profile_wecs_cosMount.dat'; # input file name
my $INPUTDATA_file_name = 'C:\Users\yezhaoliang\Desktop\work\AIWind\processWindSim\HuaiLaiTurbineZhuFengxiang\vertical_profile_wecs.dat'; # in
open my $INPUTDATA, '<', $INPUTDATA_file_name
or die "$0 : failed to open input file '$INPUTDATA_file_name' : $!\n";
my $filename="";
my $headName="";
my $sectorName="";
my $INPUTDATAsector;
my $INPUTDATAsector_file_name = ""; # input file name
while ( <$INPUTDATA> ) {
## Jump out empty line
next if ($_=~/^\s*$/);
chomp();
my $line=trim($_);
#my @columns=split(/\s+|:/,$line);
# Every files have different patterns, you need to feel it! Modify it!
my @columns=split(/:/,$line);
print $#columns,"\n";
#print "$.: $_\n" if($_=~/kpoint/);
## Test commands for specialTag1 when you are using newest word
## You need to find out the most comfortable pattern for you
#print "$.: Sector $columns[0] $columns[1] \n" if($line=~/$specialSectorTag/);
#print "$.: Name $columns[0] $columns[1] \n" if($line=~/$specialHeadTag/);
#print "$.: X $columns[0] $columns[1] \n" if($line=~/$specialHeadTag1/);
my $INPUTDATAsector_file_name = ''; # input file name
## 根据skip标记号控制打印,逐行执行,
## tag the start print position with $specialPrintTag
## tag the end print position with $specialStopPrintTag
if ( $line =~/$specialHeadTag|$specialHeadTag1/ && $skip==3) {
$headName = trim($columns[1]); ## update the headName
$skip = 2;
}elsif(( $line =~/$specialHeadTag|$specialHeadTag1/ && $skip==2)){
$headName = join("-",trim($headName),trim($columns[1])); ## Apeend the headName
}
if ( $line =~/$specialSectorTag/ ) {
$sectorName = trim($columns[1]);
$INPUTDATAsector_file_name = "D://$headName-$sectorName.csv"; # input file name
open $INPUTDATAsector, '>', $INPUTDATAsector_file_name
or die "$0 : failed to open input file '$INPUTDATAsector_file_name' : $!\n";
# $skip=1;
}
if ( $line =~ /$specialPrintTag/) {
$skip=1;
}
### New Position for different sectors
if ( $line =~ /$specialStopPrintTag/) {
$skip=3;
}
if ( $skip ==1 ) {
$line =~ s/\s+/,/g; ### colon separate delimiter
print $INPUTDATAsector $line,"\n";
}
}
close $INPUTDATA
or warn "$0 : failed to close input file '$INPUTDATA_file_name' : $!\n";
sub ltrim { my $s = shift; $s =~ s/^\s+//; return $s };
sub rtrim { my $s = shift; $s =~ s/\s+$//; return $s };
sub trim { my $s = shift; $s =~ s/^\s+|\s+$//g; return $s };
Related