Skip to content

Latest commit

 

History

History
340 lines (289 loc) · 11.8 KB

IPCC2021.md

File metadata and controls

340 lines (289 loc) · 11.8 KB

IPCC2021

0630

Baseline

Version Cost/s
g++ -std=c++11 SLIC.cpp -o SLIC + srun -p amd_256 -w fa0814 -n 1 ./SLIC p1 cost : 3351 ms
p2 cost : 283 ms
p3 cost : 21350 ms
p4 cost : 377 ms
Computing time=25364 ms

编译参数+p1的omp

Version Cost/s
g++ -std=c++11 SLIC.cpp -o SLIC -O3 + srun -p amd_256 -w fa0814 -n 1 ./SLIC p1 cost : 3048 ms
p2 cost : 50 ms
p3 cost : 5435 ms
p4 cost : 118 ms
Computing time=8654 ms
P1 omp thread 1 p1 cost : 3028 ms
p2 cost : 31 ms
p3 cost : 5339 ms
p4 cost : 113 ms
Computing time=8512 ms
P1 omp thread 2 p1 cost : 1548 ms
p2 cost : 35 ms
p3 cost : 5370 ms
p4 cost : 115 ms
Computing time=7070 ms
P1 omp thread 4 p1 cost : 808 ms
p2 cost : 32 ms
p3 cost : 5376 ms
p4 cost : 114 ms
Computing time=6332 ms
P1 omp thread 8 p1 cost : 408 ms
p2 cost : 33 ms
p3 cost : 5366 ms
p4 cost : 112 ms
Computing time=5922 ms
P1 omp thread 16 p1 cost : 208 ms
p2 cost : 34 ms
p3 cost : 5359 ms
p4 cost : 115 ms
Computing time=5718 ms
P1 omp thread 32 p1 cost : 107 ms
p2 cost : 33 ms
p3 cost : 5384 ms
p4 cost : 113 ms
Computing time=5639 ms
P1 omp thread 64 p1 cost : 57 ms
p2 cost : 34 ms
p3 cost : 5382 ms
p4 cost : 114 ms
Computing time=5589 ms

![image-20210630233122134](/Users/ylf9811/Library/Application Support/typora-user-images/image-20210630233122134.png)

![image-20210701000128734](/Users/ylf9811/Library/Application Support/typora-user-images/image-20210701000128734.png)

![image-20210701000840184](/Users/ylf9811/Library/Application Support/typora-user-images/image-20210701000840184.png)

![image-20210701002611742](/Users/ylf9811/Library/Application Support/typora-user-images/image-20210701002611742.png)

P3 omp

Version Cost/s
先枚举i,在枚举n,借此消除写冲突,thread 1 p1 cost : 86 ms
p1.5 cost : 33 ms
p2 cost : 0 ms
offset 227
minCost : 40205
p3 cost : 40956 ms
p4 cost : 113 ms
Computing time=41191 ms
先枚举i,在枚举n,借此消除写冲突,thread 64 p1 cost : 86 ms
p1.5 cost : 34 ms
p2 cost : 0 ms
offset 227
minCost : 831.787
p3 cost : 1624 ms
p4 cost : 114 ms
Computing time=1860 ms
还是先枚举n,在y循环并行,thread64 p1 cost : 87 ms
p1.5 cost : 33 ms
p2 cost : 0 ms
offset 227
minCost : 488.989
minCost2 : 0
minCost3 : 148.904
minCost4 : 0.032
minCost5 : 599.678
minCost6 : 0.053
p3 cost : 1263 ms
p4 cost : 136 ms
Computing time=1522 ms

0702

g++ -S -fverbose-asm -g -std=c++11 -O3 -fopenmp -march=native SLIC.cpp -o SLIC.s 
as -alhnd SLIC.s > SLIC.lst

Xmm 还没有做向量化呢

//TODO 把vector换成数组

简单的替换部分vector,去除无用变量 p1 cost : 101 ms
p1.5 cost : 31 ms
p2 cost : 0 ms
offset 227
minCost1 : 436.859
minCost2 : 0
minCost3 : 118.949
minCost4 : 0.024
minCost5 : 585.219
minCost6 : 0.048
p3 cost : 1141 ms
p4 cost : 153 ms
Computing time=1429 ms
完全偷换vector,对minCost3、5进行并行(开副本) p1 cost : 86 ms
p1.5 cost : 33 ms
p2 cost : 0 ms
offset 227
minCost0 : 0.611
minCost1 : 412.394
minCost2 : 0
minCost3 : 12.588
minCost4 : 0
minCost5 : 24.805
minCost6 : 0.04
p3 cost : 450 ms
p4 cost : 159 ms
Computing time=730 ms
p3中所有的地方都并行 p1 cost : 72 ms
p1.5 cost : 34 ms
p2 cost : 0 ms
offset 227
minCost0 : 0.526
minCost1 : 321.75
minCost2 : 0
minCost3 : 12.799
minCost4 : 0
minCost5 : 24.613
minCost6 : 0.172
p3 cost : 359 ms
p4 cost : 160 ms
Computing time=627 ms
minCost1之后记下每个聚类中心关联着那些像素G[numk] p1 cost : 87 ms
p1.5 cost : 33 ms
p2 cost : 0 ms
offset 227
minCost0 : 0.4
minCost1 : 393.509
minCost2 : 56.777
minCost3 : 23.661
minCost4 : 0
minCost5 : 52.05
minCost6 : 0
p3 cost : 527 ms
minCost0 : 4.177
minCost1 : 124.07
p4 cost : 132 ms
Computing time=780 ms
icpc -par-affinity=compactp1 cost : 87 ms
p1.5 cost : 33 ms
p2 cost : 0 ms
offset 227
minCost : 488.989
minCost2 : 0
minCost3 : 148.904
minCost4 : 0.032
minCost5 : 599.678
minCost6 : 0.053
p3 cost : 1263 ms
p4 cost : 136 ms
Computing time=1522 ms
p1 cost : 28 ms
p1.5 cost : 31 ms
p2 cost : 0 ms
offset 227
minCost0 : 0.521
minCost1 : 300.9
minCost2 : 0
minCost3 : 18.944
minCost4 : 0
minCost5 : 26.783
minCost6 : 0.078
p3 cost : 347 ms
minCost0 : 4.071
minCost1 : 133.902
p4 cost : 141 ms
Computing time=550 ms

//TODO minCost1还是xxm,看看自动向量化能不能成,实在不行就开始写手动版本

//TODO p4好像是有点子依赖,代码还没仔细读,还没加并行

0826

对于最后一个热点,试过了并行bfs,但是由于内部数据量以及分支(==4)太小了,效果不好。

今天突然有想法就是预处理出每一块(就是最后nlables)有谁,这样按照原来代码的逻辑走一遍,把bfs换成直接循环预处理好的数组,并且预处理部分可以按照类别并行(即聚类的结果),类之间不存在依赖,类内部迭代所有的点。注意这部分有很多omp并行因为数据量小或者是访存更慢,单独设置了线程数。最终大约是140ms->40ms,总时间能在450左右。

//TODO做一下明哥说的那个去if的操作,可能预处理的时候能向量化。

0827

今天把最大热点的循环顺序改了改,简单优化了一下常数,1case上能和y循环并行差不多,但是线程拓展性好了很多(也就是单线程很慢),这样就比较适合写多节点了,

调换循环顺序后单节点版本:

m_spcount is 200
input image is input_image.ppm
check image is check.ppm
p1 cost : 31 ms
p1.5 cost : 34 ms
p2 cost : 0 ms
offset 227
minCost0 : 0.535
minCost1 : 329.563
minCost2 : 0
minCost3 : 8.379
minCost4 : 0
minCost5 : 17.494
minCost6 : 0.068
p3 cost : 356 ms
minCost0 : 2.764
minCost1 : 11.048
minCost2 : 15.592
minCost3 : 12.864
minCost4 : 0.018
p4 cost : 45 ms
Computing time=468 ms
There are 0 points' labels are different from original file.


m_spcount is 400
input image is input_image2.ppm
check image is check2.ppm
p1 cost : 57 ms
p1.5 cost : 75 ms
p2 cost : 0 ms
offset 246
minCost0 : 0.741
minCost1 : 1298.71
minCost2 : 0
minCost3 : 26.344
minCost4 : 0
minCost5 : 45.359
minCost6 : 0.068
p3 cost : 1372 ms
minCost0 : 4.262
minCost1 : 27.086
minCost2 : 44.895
minCost3 : 32.516
minCost4 : 0.028
p4 cost : 129 ms
Computing time=1636 ms
There are 0 points' labels are different from original file.


m_spcount is 150
input image is input_image3.ppm
check image is check3.ppm
p1 cost : 24 ms
p1.5 cost : 31 ms
p2 cost : 0 ms
offset 222
minCost0 : 0.502
minCost1 : 184.263
minCost2 : 0
minCost3 : 3.988
minCost4 : 0
minCost5 : 11.883
minCost6 : 0.071
p3 cost : 201 ms
minCost0 : 2.701
minCost1 : 7.967
minCost2 : 15.395
minCost3 : 15.243
minCost4 : 0.01
p4 cost : 43 ms
Computing time=301 ms
There are 0 points' labels are different from original file.

双节点版本:

Process 1 of 2 ,processor name is fb0602.para.bscc
Process 0 of 2 ,processor name is fb0506.para.bscc
m_spcount is 200
input image is input_image.ppm
check image is check.ppm
p1 cost : 32 ms
process 1 word 5065451 to 10130902
p1.5 cost : 35 ms
p2 cost : 0 ms
offset 227
process 0 word 0 to 5065451
p3 cost : 244 ms
minCost0 : 1.83
minCost1 : 186.935
minCost2 : 2.856
minCost3 : 6.815
minCost4 : 3.358
minCost5 : 0
minCost6 : 0
p3 cost : 236 ms
numlabels 196
1done
1Computing time=312 ms
mx 195
minCost0 : 4.034
minCost1 : 13.293
minCost2 : 13.104
minCost3 : 12.306
minCost4 : 0.011
p4 cost : 45 ms
0done
0Computing time=350 ms
0There are 0 points' labels are different from original file.



Process 1 of 2 ,processor name is fb0602.para.bscc
Process 0 of 2 ,processor name is fa1013.para.bscc
m_spcount is 400
input image is input_image2.ppm
check image is check2.ppm
p1 cost : 58 ms
process 1 word 12000000 to 24000000
p1.5 cost : 79 ms
p2 cost : 0 ms
offset 246
process 0 word 0 to 12000000
minCost0 : 1.622
p3 cost : 943 ms
1done
1Computing time=1084 ms
minCost1 : 752.359
minCost2 : 9.655
minCost3 : 36.731
minCost4 : 6.81
minCost5 : 0
minCost6 : 0
p3 cost : 925 ms
numlabels 384
mx 383
minCost0 : 10.138
minCost1 : 38.086
minCost2 : 42.72
minCost3 : 31.405
minCost4 : 0.028
p4 cost : 142 ms
0done
0Computing time=1206 ms
0There are 0 points' labels are different from original file.



Process 0 of 2 ,processor name is fa1013.para.bscc
Process 1 of 2 ,processor name is fb0602.para.bscc
m_spcount is 150
input image is input_image3.ppm
check image is check3.ppm
p1 cost : 26 ms
p1.5 cost : 30 ms
p2 cost : 0 ms
offset 222
process 1 word 3657528 to 7315056
process 0 word 0 to 3657528
minCost0 : 1.579
p3 cost : 141 ms
1done
minCost1 : 105.331
minCost2 : 2.069
minCost3 : 5.511
minCost4 : 2.912
minCost5 : 0
minCost6 : 0
p3 cost : 142 ms
numlabels 147
1Computing time=195 ms
mx 146
minCost0 : 4.52
minCost1 : 11.935
minCost2 : 15.542
minCost3 : 17.845
minCost4 : 0.016
p4 cost : 52 ms
0done
0Computing time=252 ms
0There are 0 points' labels are different from original file.

对比上上个版本的时间

wi : 2599  he : 3898
p1 cost : 32 ms
p1.5 cost : 37 ms
p2 cost : 0 ms
offset 227
minCost0 : 0.548
minCost1 : 295.072
minCost2 : 0
minCost3 : 15.939
minCost4 : 0
minCost5 : 25.714
minCost6 : 0.079
p3 cost : 338 ms
minCost0 : 2.867
minCost1 : 10.801
minCost2 : 15.867
minCost3 : 12.675
minCost4 : 0.011
p4 cost : 45 ms
Computing time=453 ms
There are 0 points' labels are different from original file.


wi : 4000  he : 6000
p1 cost : 61 ms
p1.5 cost : 84 ms
p2 cost : 0 ms
offset 246
minCost0 : 0.875
minCost1 : 805.074
minCost2 : 0
minCost3 : 35.548
minCost4 : 0
minCost5 : 46.064
minCost6 : 0.086
p3 cost : 888 ms
minCost0 : 5.028
minCost1 : 27.845
minCost2 : 43.202
minCost3 : 31.55
minCost4 : 0.03
p4 cost : 128 ms
Computing time=1164 ms
There are 0 points' labels are different from original file.


wi : 2419  he : 3024
p1 cost : 25 ms
p1.5 cost : 29 ms
p2 cost : 0 ms
offset 222
minCost0 : 0.579
minCost1 : 233.842
minCost2 : 0
minCost3 : 15.255
minCost4 : 0
minCost5 : 28.462
minCost6 : 0.074
p3 cost : 301 ms
minCost0 : 3.359
minCost1 : 8.267
minCost2 : 18.033
minCost3 : 18.496
minCost4 : 0.017
p4 cost : 52 ms
Computing time=409 ms
There are 0 points' labels are different from original file.