Apache CarbonData community is pleased to announce the release of the Version 1.5.1 in The Apache Software Foundation (ASF).
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
We encourage you to use the release https://dist.apache.org/repos/dist/release/carbondata/1.5.1/, and [hidden email]!
This release note provides information on the new features, improvements, and bug fixes of this release.
What’s New in CarbonData Version 1.5.1?
CarbonData 1.5.1 intention was to move more closer to unified analytics. We want to enable CarbonData files to be read from more engines/libraries to support various use cases. In this regard we have added support to write CarbonData files from c++ libraries.
CarbonData added multiple optimizations to improve query and compaction performance.
In this version of CarbonData, more than 78 JIRA tickets related to new features, improvements, and bugs have been resolved. Following are the summary.
Support Custom Column Compressor
Carbondata supports customized column compressor so that user can add their own implementation of compressor. To customize compressor, user can directly use its full class name while creating table or setting it to carbon property.
Optimized Carbondata Scan Performance
Carbondata scan performance is improved by avoiding multiple data copies in case of vector flow. This is achieved through short-circuit the read and vector filling, it means fill the data directly to vector after reading the data from file with out any intermediate copies.
Now row level filter processing is handled in execution engine, only blocklet and page pruning is handled in CarbonData for vector flow. This is controlled by property carbon.push.rowfilters.for.vector and default it is false.
Optimized Compaction Performance
Compaction performance is optimized through pre-fetching the data while reading carbon files.
Improved Blocklet DataMap Pruning in Driver
Blocklet DataMap pruning is improved using multi-thread processing in driver.
SDK Supports C++ Interfaces for Writing CarbonData files
To enable integration with non java based execution engines, CarbonData supports C++ JNI wrapper to write the CarbonData files. It can be integrated with any execution engine and write data to CarbonData files without the dependency on Spark or Hadoop.
Multi-Thread Read API in SDK
To improve the read performance when using SDK, CarbonData supports multi-thread read APIs. This enables the applications to read data from multiple CarbonData files in parallel. It significantly improves the SDK read performance.
New Configuration Parameters
Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344320
Thanks & Regards,
|Free forum by Nabble||Edit this page|