问题现象:
目标节点的日志中存在断开的管道 或者 Broken pipe
Caused by: smartbix.datamining.engine.execute.exception.ExecuteException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 186 in stage 718271.0 failed 1 times, most recent failure: Lost task 186.0 in stage 718271.0 (TID 1086093) (iZ2ze4ynpkdlg0b9rkae3qZ executor driver): ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002, host: 10.200.1.205, port: 8123; 断开的管道 (Write failed)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.getException(ClickHouseExceptionSpecifier.java:92)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:56)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:25)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:1076)
.......
Caused by: java.net.SocketException: 断开的管道 (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
原因:
高度并发的insert,CH服务器来不及处理和响应,导致Socket被关闭
处理办法:
方法一:增大网络相关的超时参数
使用spark将数据写入到clickhouse报Broken pipe错误_spark broken pipe-CSDN博客
方法二:核心思想是限流,方法是降低ETL的并行度
V11的目标节点可以配置参数,达到限流的目的,如下图所示:
参数默认值如下,去掉#,参数就会生效:
# 插入数据批次大小
# 值必须为正整数
# WRITE_JDBC_BATCHSIZE=1000
# 插入数据允许的最大并发数
# 值必须为正整数
# WRITE_MAX_PARALLELISM=10
#是否每BATCH都COMMIT,默认是false
# WRITE_JDBC_COMMIT_BATCH = false
需要极端限流,可以将 WRITE_MAX_PARALLELISM=1
将WRITE_JDBC_COMMIT_BATCH 设置为 true 可以更精准的控制内存使用量,同时也有一定的限流效果
V11之前的版本只能配置整个计算引擎的并行度