在线观看不卡亚洲电影_亚洲妓女99综合网_91青青青亚洲娱乐在线观看_日韩无码高清综合久久

鍍金池/ 問(wèn)答/Linux/ parallel并行命令原理問(wèn)題

parallel并行命令原理問(wèn)題

對(duì)于parallel這個(gè)工具的官網(wǎng)介紹中的一段話有點(diǎn)不理解:
For better parallelism GNU parallel can distribute the arguments between all the parallel jobs when end of file is met.

Below GNU parallel reads the last argument when generating the second job. When GNU parallel reads the last argument, it spreads all the arguments for the second job over 4 jobs instead, as 4 parallel jobs are requested.

The first job will be the same as the --xargs example above, but the second job will be split into 4 evenly sized jobs, resulting in a total of 5 jobs:

cat num30000 | parallel --jobs 4 -m echo | wc -l
Output (if you run this under Bash on GNU/Linux):

5
上面明明是分成4個(gè)job,為什么結(jié)果是5行?
其次是按照上面的說(shuō)法是parallel會(huì)先讀完文件然后將文件內(nèi)容作為參數(shù)分配給各個(gè)job嗎?要是文件很大讀完文件再分配豈不是很費(fèi)時(shí)間?譬如統(tǒng)計(jì)一個(gè)很大文件的行數(shù)的話,這樣先讀完文件再分配任務(wù)(僅僅是統(tǒng)計(jì)行數(shù))并行運(yùn)算,應(yīng)該比直接wc -l花費(fèi)時(shí)間更多吧?

更奇怪的是,在我的計(jì)算機(jī)上面運(yùn)行結(jié)果是6行?

[10:01 sxuan@hulab ~]$ cat num30000 | parallel --jobs 4 -m echo | wc -l
6

謝謝!

回答
編輯回答
澐染

-m會(huì)把多行輸入當(dāng)作參數(shù)傳給命令,而參數(shù)長(zhǎng)度是有限的,所以會(huì)開多于4個(gè)進(jìn)程進(jìn)行處理。

> seq 1 30000 | parallel --jobs 4 -m echo | wc -l
5
> seq 1 100000 | parallel --jobs 4 -m echo | wc -l
8

可以用xargs --show-limits看到參數(shù)長(zhǎng)度限制:

> xargs --show-limits
Your environment variables take up 892 bytes
POSIX upper limit on argument length (this system): 2094212
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2093320
2018年6月23日 19:02