Apache CarbonData community is pleased to announce the release of the Version 1.4.1 in The Apache Software Foundation (ASF).
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookups on detail record, streaming analytics, etc. CarbonData has been deployed in many enterprise production environments, in one of the largest scenarios, it supports queries on a single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
We encourage you to use the release https://dist.apache.org/repos/dist/release/carbondata/1.4.1/, and [hidden email]!
This release note provides information on the new features, improvements, and bug fixes of this release.
What’s New in Version 1.4.1?
In this version of CarbonData, more than 230 JIRA tickets for new feature, improvement and bugs has been resolved. Following are the summary.
Support Cloud Storage (S3)
This can be used to store or retrieve data on Amazon cloud, Huawei Cloud(OBS) or on any other object stores conforming to S3 API. Storing data in cloud is advantageous as there are no restrictions on the size of data and the data can be accessed from anywhere at any time. Carbondata can support any Object Storage that conforms to Amazon S3 API. For more detail, please refer to S3 Guide.
Support Flat Folder
This feature allows all carbondata and index files to keep directly under table-path. This is useful for interoperability between the execution engines and plugin with other execution engines like Hive or Presto.
Support 32K Characters (Alpha Feature)
In common scenarios, the length of the string is less than 32000. In some cases, if the length of the string is more than 32000 characters, CarbonData introduces a table property called
Helps in getting more compression. Filter queries and full scan queries will be faster as filter will be done on encoded data. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. Getting higher IO throughput.
CarbonData supports merging of all the index files inside a segment to a single CarbonData index merge file. This enhances the first query performance.
Shows History Segments
CarbonData introduces a 'SHOW HISTORY SEGMENTS' to show all segment information including visible and invisible segments.
Custom compaction is a new compaction type in addition to MAJOR and MINOR compaction. In custom compaction, you can directly specify the segment ids to be merged.
Enhancement for Detail Record Analysis
Supports Bloom Filter DataMap
CarbonData introduces BloomFilter as an index datamap to enhance the performance of querying with precise value. It is well suitable for queries that do precise match on high cardinality columns(such as Name/ID). In concurrent filter query scenario (on high cardinality column), we observe 3~5 times improvement in concurrent queries per second comparing to last version. For more detail, please refer to BloomFilter DataMap Guide.
Improved Complex Datatypes
Improved complex datatypes compression and performance through adaptive encoding.
Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12343148
Thanks & Regards,
|Free forum by Nabble||Edit this page|