Can I set a larger HDFS block size, like 4 or 8 GB in production environment? What is the problem with large blocks?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Can I set a larger HDFS block size, like 4 or 8 GB in production environment? What is the problem with large blocks?

bianhaoqiong
Hi All,
I am wandering if I can use a very large block size in production HDFS cluster? Such as 4 or 8 gigabytes or even larger.

Is there any problem with HDFS if there are a large number of large blocks in it? 

Then if the large blocks are stored as Carbondata or other columnar formats such as Orc or Parquet, and we want to execute queries on top of such data, what troubles we may have?

Thanks!
Haoqiong
Reply | Threaded
Open this post in threaded view
|

Re: Can I set a larger HDFS block size, like 4 or 8 GB in production environment? What is the problem with large blocks?

Liang Chen

Hi,

 

In theory, it should support

 

But practically,

1. It may take long time to replicate in case any of the replica is lost/moved due to balancer/mover/replication

2. In case of pipeline recoveries during write/append, if new node is replaced the failed node, then existing data will be copied to new datanode. This may take long time based on written size (in this case GBs).

If this transfer didn’t complete within timeout(default 60s) client may get timeout, and write may fail.

3. Balancer may get timeout while moving blocks from one datanode to another for balancing considering the size.


(remark : the reply is from HDFS PMC Vinayakumar )



2017-08-06 10:27 GMT+08:00 Haoqiong Bian - 卞昊穹 <[hidden email]>:
Hi All,
I am wandering if I can use a very large block size in production HDFS cluster? Such as 4 or 8 gigabytes or even larger.

Is there any problem with HDFS if there are a large number of large blocks in it? 

Then if the large blocks are stored as Carbondata or other columnar formats such as Orc or Parquet, and we want to execute queries on top of such data, what troubles we may have?

Thanks!
Haoqiong