how to add RDD partition?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

how to add RDD partition?

suzzy
Hi 
Running query 'select count(1) from sunzy.datatest' 
this job had 16 blocks and 16 tasks, but only 4  partitions 
how  to add RDD partition? 
thanks 

CarbonData ThriftServer Log: 

INFO 16-06 16:14:34,039 -  
 Identified no.of.blocks: 16, 
 no.of.tasks: 16, 
 no.of.nodes: 0, 
 parallelism: 4 
INFO 16-06 16:14:34,059 - Starting job: run at AccessController.java:-2 
INFO 16-06 16:14:34,060 - Registering RDD 12 (run at AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Got job 1 (run at AccessController.java:-2) with 1 output partitions 
INFO 16-06 16:14:34,061 - Final stage: ResultStage 3 (run at AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Parents of final stage: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,061 - Missing parents: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,062 - Submitting ShuffleMapStage 2 (MapPartitionsRDD[12] at run at AccessController.java:-2), which has no missing parents 
INFO 16-06 16:14:34,065 - Block broadcast_2 stored as values in memory (estimated size 15.4 KB, free 62.2 KB) 
INFO 16-06 16:14:34,068 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 7.6 KB, free 69.8 KB) 
INFO 16-06 16:14:34,069 - Added broadcast_2_piece0 in memory on 192.168.1.41:57617 (size: 7.6 KB, free: 71.7 GB) 
INFO 16-06 16:14:34,069 - Created broadcast 2 from broadcast at DAGScheduler.scala:1006 
INFO 16-06 16:14:34,070 - Submitting 16 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[12] at run at AccessController.java:-2) 
INFO 16-06 16:14:34,070 - Adding task set 2.0 with 16 tasks 
INFO 16-06 16:14:34,072 - Starting task 2.0 in stage 2.0 (TID 16, H4, partition 2,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 0.0 in stage 2.0 (TID 17, H3, partition 0,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 1.0 in stage 2.0 (TID 18, H1, partition 1,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,074 - Starting task 4.0 in stage 2.0 (TID 19, H2, partition 4,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,089 - Added broadcast_2_piece0 in memory on H1:57002 (size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,096 - Added broadcast_2_piece0 in memory on H4:33086 (size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,116 - Added broadcast_2_piece0 in memory on H2:45618 (size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,117 - Added broadcast_2_piece0 in memory on H3:56719 (size: 7.6 KB, free: 57.3 GB)
Reply | Threaded
Open this post in threaded view
|

Re: how to add RDD partition?

Erlu Chen
Hi

please try to set spark.sql.shuffle.partitions or spark.default.parallelism which can set task number.

spark.sql.shuffle.partitions is for spark sql.

spark.default.parallelism is for spark rdd.

Regards.
Chenerlu.

Reply | Threaded
Open this post in threaded view
|

Re: how to add RDD partition?

Liang Chen
In reply to this post by suzzy
Hi

Can't understand your question exactly, do you want to increase parallelism? 
If yes:
You can set Spark's parallelism parameter

Regards
Liang

2017-06-20 11:41 GMT+08:00 suzzy <[hidden email]>:
Hi 
Running query 'select count(1) from sunzy.datatest' 
this job had 16 blocks and 16 tasks, but only 4  partitions 
how  to add RDD partition? 
thanks 

CarbonData ThriftServer Log: 

INFO 16-06 16:14:34,039 -  
 Identified no.of.blocks: 16, 
 no.of.tasks: 16, 
 no.of.nodes: 0, 
 parallelism: 4 
INFO 16-06 16:14:34,059 - Starting job: run at AccessController.java:-2 
INFO 16-06 16:14:34,060 - Registering RDD 12 (run at
AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Got job 1 (run at AccessController.java:-2) with 1
output partitions 
INFO 16-06 16:14:34,061 - Final stage: ResultStage 3 (run at
AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Parents of final stage: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,061 - Missing parents: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,062 - Submitting ShuffleMapStage 2 (MapPartitionsRDD[12]
at run at AccessController.java:-2), which has no missing parents 
INFO 16-06 16:14:34,065 - Block broadcast_2 stored as values in memory
(estimated size 15.4 KB, free 62.2 KB) 
INFO 16-06 16:14:34,068 - Block broadcast_2_piece0 stored as bytes in memory
(estimated size 7.6 KB, free 69.8 KB) 
INFO 16-06 16:14:34,069 - Added broadcast_2_piece0 in memory on
192.168.1.41:57617 (size: 7.6 KB, free: 71.7 GB) 
INFO 16-06 16:14:34,069 - Created broadcast 2 from broadcast at
DAGScheduler.scala:1006 
INFO 16-06 16:14:34,070 - Submitting 16 missing tasks from ShuffleMapStage 2
(MapPartitionsRDD[12] at run at AccessController.java:-2) 
INFO 16-06 16:14:34,070 - Adding task set 2.0 with 16 tasks 
INFO 16-06 16:14:34,072 - Starting task 2.0 in stage 2.0 (TID 16, H4,
partition 2,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 0.0 in stage 2.0 (TID 17, H3,
partition 0,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 1.0 in stage 2.0 (TID 18, H1,
partition 1,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,074 - Starting task 4.0 in stage 2.0 (TID 19, H2,
partition 4,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,089 - Added broadcast_2_piece0 in memory on H1:57002
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,096 - Added broadcast_2_piece0 in memory on H4:33086
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,116 - Added broadcast_2_piece0 in memory on H2:45618
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,117 - Added broadcast_2_piece0 in memory on H3:56719
(size: 7.6 KB, free: 57.3 GB)



--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

答复: how to add RDD partition?

suzzy

yes, thanks, it's ok now


发件人: Liang Chen <[hidden email]>
发送时间: 2017年6月26日 14:44:32
收件人: [hidden email]
主题: Re: how to add RDD partition?
 
Hi

Can't understand your question exactly, do you want to increase parallelism? 
If yes:
You can set Spark's parallelism parameter

Regards
Liang

2017-06-20 11:41 GMT+08:00 suzzy <[hidden email]>:
Hi 
Running query 'select count(1) from sunzy.datatest' 
this job had 16 blocks and 16 tasks, but only 4  partitions 
how  to add RDD partition? 
thanks 

CarbonData ThriftServer Log: 

INFO 16-06 16:14:34,039 -  
 Identified no.of.blocks: 16, 
 no.of.tasks: 16, 
 no.of.nodes: 0, 
 parallelism: 4 
INFO 16-06 16:14:34,059 - Starting job: run at AccessController.java:-2 
INFO 16-06 16:14:34,060 - Registering RDD 12 (run at
AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Got job 1 (run at AccessController.java:-2) with 1
output partitions 
INFO 16-06 16:14:34,061 - Final stage: ResultStage 3 (run at
AccessController.java:-2) 
INFO 16-06 16:14:34,061 - Parents of final stage: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,061 - Missing parents: List(ShuffleMapStage 2) 
INFO 16-06 16:14:34,062 - Submitting ShuffleMapStage 2 (MapPartitionsRDD[12]
at run at AccessController.java:-2), which has no missing parents 
INFO 16-06 16:14:34,065 - Block broadcast_2 stored as values in memory
(estimated size 15.4 KB, free 62.2 KB) 
INFO 16-06 16:14:34,068 - Block broadcast_2_piece0 stored as bytes in memory
(estimated size 7.6 KB, free 69.8 KB) 
INFO 16-06 16:14:34,069 - Added broadcast_2_piece0 in memory on
192.168.1.41:57617 (size: 7.6 KB, free: 71.7 GB) 
INFO 16-06 16:14:34,069 - Created broadcast 2 from broadcast at
DAGScheduler.scala:1006 
INFO 16-06 16:14:34,070 - Submitting 16 missing tasks from ShuffleMapStage 2
(MapPartitionsRDD[12] at run at AccessController.java:-2) 
INFO 16-06 16:14:34,070 - Adding task set 2.0 with 16 tasks 
INFO 16-06 16:14:34,072 - Starting task 2.0 in stage 2.0 (TID 16, H4,
partition 2,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 0.0 in stage 2.0 (TID 17, H3,
partition 0,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,073 - Starting task 1.0 in stage 2.0 (TID 18, H1,
partition 1,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,074 - Starting task 4.0 in stage 2.0 (TID 19, H2,
partition 4,NODE_LOCAL, 2376 bytes) 
INFO 16-06 16:14:34,089 - Added broadcast_2_piece0 in memory on H1:57002
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,096 - Added broadcast_2_piece0 in memory on H4:33086
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,116 - Added broadcast_2_piece0 in memory on H2:45618
(size: 7.6 KB, free: 57.3 GB) 
INFO 16-06 16:14:34,117 - Added broadcast_2_piece0 in memory on H3:56719
(size: 7.6 KB, free: 57.3 GB)



--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

答复: how to add RDD partition?

suzzy
In reply to this post by Erlu Chen

thanks


发件人: Erlu Chen <[hidden email]>
发送时间: 2017年6月26日 14:42:20
收件人: [hidden email]
主题: Re: how to add RDD partition?
 
Hi

please try to set spark.sql.shuffle.partitions or spark.default.parallelism
which can set task number.

spark.sql.shuffle.partitions is for spark sql.

spark.default.parallelism is for spark rdd.

Regards.
Chenerlu.





--
View this message in context: http://apache-carbondata-user-mailing-list.3231.n8.nabble.com/how-to-add-RDD-partition-tp31p32.html
Sent from the Apache CarbonData User Mailing List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: 答复: how to add RDD partition?

Erlu Chen
You are welcome!
: )